Automatic and human level Graf's type identification for detecting developmental dysplasia of the hip

Yueh-Peng Chen; Tzuo-Yau Fan; Cheng-CJ Chu; Jainn-Jim Lin; Chin-Yi Ji; Chang-Fu Kuo; Hsuan-Kai Kao

doi:10.1016/j.bj.2023.100614

. 2023 Jun 10;47(2):100614. doi: 10.1016/j.bj.2023.100614

Automatic and human level Graf's type identification for detecting developmental dysplasia of the hip

Yueh-Peng Chen ^a,^b,¹, Tzuo-Yau Fan ^a,^h,¹, Cheng-CJ Chu ^a, Jainn-Jim Lin ^c,^g, Chin-Yi Ji ^a,^h, Chang-Fu Kuo ^a,^d,^∗∗, Hsuan-Kai Kao ^e,^f,^g,^∗

PMCID: PMC10955653 PMID: 37308078

Abstract

Background

Developmental dysplasia of the hip (DDH) is a common congenital disorder that may lead to hip dislocation and requires surgical intervention if left untreated. Ultrasonography is the preferred method for DDH screening; however, the lack of experienced operators impedes its application in universal neonatal screening.

Methods

We developed a deep neural network tool to automatically register the five keypoints that mark important anatomical structures of the hip and provide a reference for measuring alpha and beta angles following Graf's guidelines, which is an ultrasound classification system for DDH in infants. Two-dimensional (2D) ultrasonography images were obtained from 986 neonates aged 0–6 months. A total of 2406 images from 921 patients were labeled with ground truth keypoints by senior orthopedists.

Results

Our model demonstrated precise keypoint localization. The mean absolute error was approximately 1 mm, and the derived alpha angle measurement had a correlation coefficient of R = 0.89 between the model and ground truth. The model achieved an area under the receiver operating characteristic curve of 0.937 and 0.974 for classifying alpha <60° (abnormal hip) and <50° (dysplastic hip), respectively. On average, the experts agreed with 96% of the inferenced images, and the model could generalize its prediction on newly collected images with a correlation coefficient higher than 0.85.

Conclusions

Precise localization and highly correlated performance metrics suggest that the model can be an efficient tool for assisting DDH diagnosis in clinical settings.

Keywords: Developmental dysplasia, Ultrasonography, Hips, Graf's method, Keypoint, Deep learning

Introduction

Developmental dysplasia of the hip (DDH) is a common congenital disorder. With early diagnosis and proper treatment, affected children can regain normal function [[1], [2], [3]]. However, delayed diagnosis, particularly at two years or after walking age, may lead to pain and osteoarthritis in early life, leg length discrepancy, or compromised mobility [[4], [5], [6], [7], [8]]. Several methods have been developed for screening DDH. The physical assessment of DDH with Palmén and Barlow's test [9,10], although fast and straightforward, lacks consistent diagnostic criteria, and the results may have observer's disagreement [11] and misdiagnosis [11,12]. Radiography, such as X-ray imaging, can achieve high accuracy (>90%) [[13], [14], [15], [16]] and is harmless; however, parents have raised concerns regarding exposing infants to ionized radiations [17,18]. Additionally, unlike ultrasound examination, X-ray imaging cannot be applied to patients under 6 months of age. For patients over 6 months of age who are diagnosed with DDH, a late diagnosis may lead to the requirement for surgical intervention since the femoral bone is almost fully formed. Subsequently, many feasible ultrasonography-based methods have been proposed and tested [19,20]. Graf's method [21] and its modifications are the current recommended practices to evaluate the hip morphology of neonates [[21], [22], [23], [24]] as it offers high spatial resolution images of early hips where cartilaginous materials are present [24] and reduces the risk of radiation exposure [24].

Graf's method classifies hips into four categories (type I to IV) based on two angles (alpha and beta angles) defined by three reference lines around the osseous convexity. This method has been widely adopted in Europe for the mass screening of neonates [25]. Despite its popularity, several factors can influence the consistency of the assessment, including inconsistent interpretations [26], variations in technique operations, imaging resolution and quality, intra- and inter-observer variations [27,28], inter-scan variations [29], specification of the imaging system, and transducer orientations [30]. Therefore, deep learning algorithms as diagnostic tools for ultrasonography are being developed to increase consistency and efficiency [31].

These tools encompass a wide range of algorithm designs to parametrically parse the hip anatomical structures of ultrasound imaging and use these references to determine the angles from Graf's standard [32]. Most algorithms were developed based on figure-ground separation techniques, such as contour detection [33] (93% accuracy for detecting dysplasia by alpha angle) and segmentation [34,35] (accuracy for detecting dysplasia by alpha angle were 86% and 84%), to segment related hip structures. However, obtaining a clear boundary of the anatomical structures on the ultrasound images is challenging and can influence the precision of the subsequent labeling. An alternative is to define landmark keypoints to refer to anatomical structures. A deep learning model developed earlier also aimed to parse alpha and beta angles by registering four keypoints on the contour of the segmented hips [35] but employed a segmentation step before the keypoint detection step. However, a direct keypoint detection method may be more economical to mark the referenced keypoints than the multi-step algorithm. Moreover, it matches Graf's guidelines directly by drawing lines between reference keypoints.

We found that there have been no direct attempts to perform DDH ultrasound examinations using a deep learning model in accordance with Graf's guidelines with five keypoints. In this study, we developed a landmark-based measurement model using two-dimensional (2D) ultrasound images. We evaluated the hips' Graf types with alpha and beta angles by registering and referring to the five keypoints associated with well-established anatomical structures. Moreover, to assess the performance of the deep learning model in achieving human-level accuracy, we conducted both a generalization test and an agreement test, comparing the keypoint positioning and angle precision with the decision-making ability of human-level experts. Thus, we offer an efficient and robust method for applying deep learning to assist in DDH diagnosis.

Material and methods

Subjects and ultrasonography datasets

This study was approved by the Institutional Review Board of Chang Gung Medical Hospital (IRB approval number: 201902257B0C501). Data were de-identified and encrypted to preserve patient confidentiality. Hip ultrasonography data were collected from 826 neonates aged 0–6 months who underwent ultrasonographic examination of at least one hip at Chang Gung Memorial Hospital, Linkou Branch, between September 2017 and February 2021. All images were captured during outpatient visits using a Siemens ACUSON X600™ ultrasound system with a linear probe (5–8 Hz) as portable network graphics (PNG) images. All collected images were reviewed by a review board comprising three senior pediatric orthopedists with 7, 10, and 15 years of experience. The board reviewed and selected suitable images from the database to avoid poor hip positioning, lack of identified hip features, or incorrect acquisition protocols. If available, patients’ de-identified information was also collected, including their clinical diagnosis, surgical reports, or examination results.

An alpha angle greater than 60°, representing a normal hip, is classified as Type I. An alpha angle between 50° and 59°, representing immature hips, is type IIa/IIb. Alpha angles less than 50°, representing advanced dysplasia, are classified as type IIc/D. These patients require urgent treatment. Beta angle is an auxiliary angle for classifying types IIc and D. Alpha angles less than 43°, representing a dislocated hip, are classified as type III/IV. For simplification, we observed the model's capability to classify hip development by setting the thresholds at alpha <60° (abnormal or not), alpha <50° (dysplasia or not), and beta <77° (Graf's type IIc or D) [Table 1].

Table 1.

Graf's type classification system.

Type	Alpha angle figs(deg.)	Beta angle figs(deg.)	Description figsof hip
I	≥ 60	≤ 55	normal
IIa/IIb	50–59	> 55	immature
IIc/D	< 50	> 77	abnormal
III/IV	< 43	> 77	sub-luxation or dislocated

Open in a new tab

We retrieved 2640 ultrasonographic images of 986 patients who underwent hip examination. The image sources were pediatric orthopedic physicians at the level of attending physician or above, who were employed by Chang Gung Memorial Hospital at Linkou, possessed certification from the Taiwan Pediatric Orthopedic Society, and had at least two years of experience conducting examinations. Images were excluded based on the following criteria: (1) age >6 months, (2) duplicated images, (3) window size of 5 × 5 (width × depth) cm, (4) machine without standard settings, and (5) images with unclear hip features or unrecognizable landmarks. After the exclusion of 65 patients (234 images, 8.9% of all enrolled), 921 patients with 2406 images (91.1% of all enrolled) were included in the study [Fig. 2]. depicts the image selection process. Finally, 434 images from 166 patients were selected as the testing dataset, and the remaining 1754 images from 660 patients were used as training datasets to develop the model. No patient was allocated to more than one group to avoid data leakage.

Fig. 2 — Flow diagram showing the process of selecting data and ultrasound images. We initially collected 2640 images from 986 patients from September 2017 to February 2021. After selecting proper patients and images, we retrieved 2188 images from 826 patients as the model development set and 218 images from 95 patients as the hold-out set for comparing the performance between the model and experts. For the development set, we split the data into 80% training set and 20% data testing set for the model and human agreement test. The hold-out set was used for testing the model's generalizability because the ultrasonographic images were collected after developing the model.

Hip landmarks, line drawing, and Graf's angles

From the bony and cartilaginous structures displayed [Fig. 1A] in the ultrasound, hip stability can be evaluated using Graf's method [21]. Five keypoints are predefined based on the hip's anatomical structures in the ultrasound to represent the alpha and beta angles [Fig. 1B]. Here, we defined keypoints #1 and #2 as the start and end points, respectively, on the straight outer border of the ilium. Keypoint #3 is the lower limb of the ilium. Keypoint #4 is the turning point at which the concavity of the bony acetabular roof changes to the convexity of the ilium. Keypoint #5 is the center of the labrum. Three lines were drawn through the five keypoints to extract the anatomical structure of the hip [Fig. 1E]. Line #1 is the baseline passing through keypoints #1 and #2, which represent the ilium. Line #2, passing through keypoints #3 and #4, represents the convexity of the acetabulum, whereas line #3, passing through keypoints #4 and #5, represents the location of the labrum. Finally, the alpha angle was measured from lines #1 and #2, representing the coverage of the iliac bone to the femoral head. The beta angle, composed of lines #1 and #3, represents coverage from the labrum to the femoral head. The physician assessment of the images revealed 1271 (58.1%) images with alpha angles ≥60° (normal), 745 (34%) with 50° < alpha ≤60° (mild dysplasia), and 172 (7.9%) with alpha <50° (severe dysplasia or dislocation).

Ground truth labeling was performed by a pediatric orthopedist, who labeled the images with five keypoints. The keypoint locations were reviewed by the other two orthopedists on the review board. In case of disagreements, a discussion was held until a consensus was reached. Repeated reading was proven to obtain a high degree of agreement in assessing hip morphology [36].

A keypoint labeling software was designed and developed using MATLAB's App Designer toolbox Release 2019b (MathWorks, Inc., Natick, Massachusetts, United). The predicted heatmaps were transformed from the coordinates of the keypoints [37,38] using a Gaussian kernel (σ = 5 pixels [Fig. 1C]). The pixel values and channels of the heatmap correspond to the probability and identity of the keypoint, respectively [Fig. 1D]. Therefore, a 5-channel Gaussian heatmap was constructed to represent the ground truth of each image.

Image preprocessing and augmentation

The ultrasonography files were fixed at 5 × 5 cm, and images were represented in 320 × 320 pixels; therefore, the pixel spacing of ultrasonography was 0.156 × 0.156 mm. Before training the model, the images were resized to 512 pixels × 512 pixels. The grayscale depth of the image is on an 8-bit (0–255) scale. During model training, images were randomly augmented by rotating from 20° to −20°, translating between 20 pixels and −20 pixels with black padding at the edges, and horizontally flipping with 0.5 probability.

Model development

We adopted and modified the UNet++ [39] architecture for landmark detection [Fig. 1F]. The architecture was initially designed for high-resolution segmentation tasks of medical images, such as X-rays and CT scans. The model was trained to predict the multichannel heatmaps from the images. The architecture comprises an encoder–decoder network. The encoder has multiple subnetworks, identical to the framework of VGG16 blocks [40], and is connected to the decoder with dense skip pathways. The underlying hypothesis is that the encoder network can capture feature details and gradually enrich semantic content before fusion with the corresponding semantically rich feature maps from the decoder network. We modified the loss layer at the last stage of UNet++ from binary cross-entropy and dice coefficient losses to mean squared error (MSE) loss across all channels to predict the ground truth heatmaps.

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y - ŷ)}^{2}

where $n$ is the batch size of the training dataset and $y$ and $ŷ$ are the pixel locations of the ground truth and predicted heatmaps, respectively.

Training procedure

The data were randomly divided into an 80% training set and a 20% testing set [Table 2]. The training set was used to develop the model, whereas the testing set evaluated the model's performance. Image augmentation was performed only on the training set to avoid overfitting. The model weights were trained from scratch using the Kaiming initialization [41]. The model weights were gradually updated for each iteration by backpropagation, and the gradients were optimized using the Adam optimizer [42]. The batch size was set to five, with a starting learning rate of 1e−4. The learning rate was controlled by cosine annealing during training. Training automatically stopped when no significant performance improvement was observed in five consecutive epochs. The model was trained under the Ubuntu 18.04 LTS environment in Python3.6 with an NVIDIA 1080Ti GPU. The model layout was determined using PyTorch v1.7.1, and image preprocessing was performed using Python Imaging Library (Python Imaging Library, PIL).

Table 2.

Demographic data of the training and testing sets.

Variables	Development set				Hold-out set
Variables	All (N = 826)	Training set (N = 660)	Testing set (N = 166)	p value	All (N = 95)
Age, days, mean ± SD	47.7 ± 29.3	46.7 ± 29	51.6 ± 30	0.063	52.3 ± 27.1
Gender
Male, N (%)	291 (35.2)	238 (36)	53 (31.9)	0.44	28 (29.5)
Female, N (%)	535 (64.8)	422 (64)	113 (68.1)	0.57	67 (70.5)
No. of Examinations
1, N (%)	623 (75.4)	497 (75.3)	126 (75.9)	0.92	84 (88.4)
>1, N (%)	203 (24.6)	163 (24.7)	40 (24.1)	0.86	11 (11.6)
Hips
Total, n	2188	1754	434		218
Left, n (%)	1090 (49.8)	875 (49.8)	215 (49.5)	0.92	109 (50%)
Right, n (%)	1098 (50.2)	879 (50.2)	219 (50.5)	0.89	109 (50%)
Alpha angle
60°≤Alpha, n (%)	1271 (58.1)	1017 (58)	254 (58.7)	0.89	–
50°≤Alpha<60°, n (%)	745 (34)	597 (34)	148 (33.8)	0.97	–
50°>Alpha, n (%)	172 (7.9)	140 (8)	32 (7.5)	0.70	–
Angle, (°, mean ± SD)	61.1 ± 7.5	61 ± 7.6	61.3 ± 7.3	0.46	–
Beta angle
77°≤Beta, n (%)	2093 (95.7)	1676 (95.5)	417 (96.1)	0.90	–
77°>Beta, n (%)	95 (4.3)	78 (4.5)	17 (3.9)	0.57	–
Angle, (°, mean ± SD)	60 ± 9.8	60.1 ± 9.9	59.3 ± 9.4	0.15	–

Open in a new tab

Abbreviations: N: number of patients; n: number of images; SD: standard deviation; p value: two-tailed analysis of variance or χ² tests for difference in the distribution of values between training and testing sets.

Model metrics

After training, the model was evaluated using the test dataset. The model outputs multichannel heat maps after processing the testing images. Each channel of the heatmap represents the probability of a keypoint. The location of the highest pixel value of the channel represents the predicted keypoint location. We calculated the absolute difference of the horizontal distance error (ΔX = | $x$ – $x^{'}$ |), vertical distance error (ΔY = | $y$ – $y^{'}$ |), and the distance error (ΔD = $\sqrt{{Δ X}^{2} + {Δ Y}^{2}}$ ) between the predicted $(x, y)$ and ground truth $(x', y')$ locations. The distance percentile was calculated by determining the distance error ranked on the 50, 75, and 95 percentiles. The precision of the keypoints is the proportion of the distance error within 0.5, 1, and 1.5 mm.

Agreement test

We performed agreement tests using three experts different from the review board members who had performed ultrasonographic examinations on neonates’ hips for years. Participant details are as follows: participant 1: 10 years, participant 2: 7 years, and participant 3: 15 years. The 434 testing set images were randomly divided into 217 images (50%) labeled as ground truth. The remaining images were annotated as model labeled [Fig. 5A]. Images were drawn using five keypoints and three lines [Fig. 5B]. No information regarding the alpha and beta angles was shown in the image. There were no distribution differences in the alpha and beta angles between the two groups (Kolmogorov–Smirnov test, all p > 0.05). Finally, the experts were asked to agree or disagree with the labels on the randomly ordered images [Fig. 5A]. For each individual participant, the term “agreement” denotes the proportion of accepted images, which is calculated by dividing the number of accepted images by the total number of images. Conversely, “disagreement” indicates that the individual participant did not accept the images and is calculated as 1 minus the agreement proportion.

Fig. 5 — Agreement test to validate that the automatic structural method is comparable to ground truth (A) We randomly labeled 217 images as ground truth (green) and 217 images as model inferencing (blue) from the 434 test images. Participants were asked to agree or disagree with the images labeled by the keypoints and lines without knowing the labeler (ground truth or inferenced). Images were randomly ordered during testing (B) Demonstration of the labeled images. The five black dots are the five keypoints and are connected by the three white lines based on Graf's method. (C) Distribution of the alpha and beta angles of the two datasets. The angle distributions produced from ground truth were not significantly different from the angles produced from inference models (KS-test, alpha angle, p = 0.82, beta angle; p = 0.36) (D) Participant's percentage of agreement.

Generalization test

We performed a generalization test to compare the performance of the model and that of the experts on the hold-out set. The hold-out set was collected from October 2020 to February 2021 and was not labeled by the review board. The three participants were individually asked to label the five keypoints on the ultrasonography images without hints. The alpha and beta angles were then calculated from the five labeled keypoints. The hold-out set was finally labeled by our model, and the inference angles were calculated to compare the outcome angles between the model and experts. The correlation coefficient measures the similarity of the labeled angles between the participants and model. Classification agreement is the percentage of measured angles that meet Graf's criteria.

Statistics

Statistical analyses were performed using MATLAB (MathWorks Inc., Natick, Massachusetts, United States of America). The linear correlation coefficient (R) evaluates the similarity of Graf's alpha and beta angles between physician annotations and model predictions and between model predictions and human subjects in generalization tests. Bland–Altman plots reveal the agreement in the alpha and beta angles between physician annotation and model predictions. We conducted a Kolmogorov–Smirnov test (KS-test) to assess the distribution of the alpha and beta angles between the ground truth labeled and model labeled groups. To assess the agreement between the ground truth and model labeled groups, we used a simple Chi-squared test to compare the two proportions.

Results

Training and testing datasets

Initially, 2640 ultrasonographic images were retrieved from 986 patients who underwent hip examination between September 2017 and February 2021 at Chang Gung Memorial Hospital, Linkou Branch. The dataset was split into a development dataset (September 2017 to September 2020) to develop the model and the hold-out set (October 2020 to February 2021) for testing to compare the performance between the model and humans. Furthermore, the development dataset was split into a training set for model training and a testing set to validate the model's performance and perform the agreement test. Images were excluded based on exclusion criteria [Fig. 2]. The demographic data are shown in [Table 2]. We randomly split the data into training and testing datasets. The baseline characteristics did not differ between the groups (all p > 0.05). The hold-out set was collected after the development set. We did not measure the ground-truth angles in the hold-out set because it was used to test the model's generalizability.

Accuracy and precision of predicted keypoints

An automatic five-keypoint labeling model was developed that learned to annotate the five anatomical features of the neonate's hip from ultrasonographic images following Graf's method [21] ([Fig. 1], details in material and methods). We measured the error between the annotated and predicted keypoint locations using the mean absolute error (MAE) to validate the accuracy [Table 3] and [Fig. 3]. shows the distributions of horizontal (ΔX), vertical (ΔY), and distance (ΔD) MAE. No obvious differences were observed in the vertical and horizontal errors for keypoints #3, #4, and #5. The three keypoints had a 95% error distance smaller than 2 mm, or a range of 1.5 mm errors contained over 91% of the predicted keypoints. The results indicate that the model can accurately and precisely capture fine-grained muscle and bone features from the hip ultrasonographic images.

Table 3.

Performances of the model to predict the locations of the five keypoints.

Keypoint		MAE (SD)	Percentile of error distance (mm)			Precision (%)
Keypoint		MAE (SD)	50%	75%	95%	0.5 mm	1 mm	1.5 mm
Keypoint #1	ΔX	1.17 (1.05)	0.84	1.51	3.01	21.7	48.2	70.3
	ΔY	0.2 (0.18)	0.13	0.26	0.51	90.1	99.5	99.8
	ΔD	1.22 (1.03)	0.98	1.59	3.01	17.3	47.9	70.3
Keypoint #2	ΔX	0.86 (0.89)	0.51	1.17	2.17	33.6	64.7	82
	ΔY	0.19 (0.17)	0.13	0.26	0.38	93.1	99.5	99.8
	ΔD	0.91 (0.88)	0.68	1.2	2.18	30.6	64.7	81.8
Keypoint #3	ΔX	0.33 (0.4)	0.17	0.34	1.01	77.6	93.1	97.2
	ΔY	0.43 (0.47)	0.26	0.51	1.26	66.6	89.2	95.6
	ΔD	0.59 (0.57)	0.38	0.71	1.89	56.5	83.9	92.4
Keypoint #4	ΔX	0.37 (0.36)	0.17	0.51	1.01	66.8	91.9	98.6
	ΔY	0.48 (0.45)	0.26	0.63	1.38	58.3	86.6	94.9
	ΔD	0.64 (0.54)	0.42	0.84	1.91	49.1	79.3	91
Keypoint #5	ΔX	0.37 (0.4)	0.17	0.34	1.01	69.1	93.1	97.7
	ΔY	0.45 (0.41)	0.26	0.63	1.26	57.8	89.2	97.2
	ΔD	0.63 (0.52)	0.51	0.81	1.63	46.1	83.9	93.1

Open in a new tab

Abbreviations:MAE: mean absolute error (mm); SD: standard error (mm); ΔX: horizontal distance error (mm); ΔY: Vertical distance error (mm); ΔD: distance error (Euclidean distance, mm).

Fig. 3 — Evaluating the performance of predicting the locations of five keypoints (A) The distribution diagrams of horizontal errors for the five keypoints. (B) The distribution diagrams of vertical errors for the five keypoints. (C) The distribution diagrams of distance errors for the five keypoints. Crosses represent the median of the error distribution, and circles represent the mean of the error distribution.

Performance of Graf's angle measurement

The model's composite performance was analyzed to measure Graf's alpha and beta angles. We evaluated the linear correlation of the two angles and established that the alpha and beta angles have a significant linear correlation coefficient of 0.89 (p < 0.001, RMSE = 3.4 [Fig. 4A]) and 0.68 (p < 0.001, RMSE = 7.09 [Fig. 4C]), respectively. The Bland–Altman plots show no proportional bias between the predicted and ground truth angles for alpha ([Fig. 4B], R = 0.06, p = 0.19, repeatability coefficient = 6.68) and beta angles ([Fig. 4D], R = 0.19, p = 0.09, repeatability coefficient = 13.9).

Fig. 4 — Evaluating the performance of predicting Graf's alpha and beta angles (A) Pearson's correlation is significant between the predicted and ground truth alpha angles. (B) Bland–Altman plot of the predicted and ground truth alpha angles shows no proportional bias (C) Pearson's correlation is significant between the predicted and ground truth beta angles. (D) Bland–Altman plot of the predicted and ground truth beta angles shows no proportional bias. LoA: 95% limits of agreement.

Performance of Graf's DDH classification between model and ground truth

Graf's classification system is primarily determined by the alpha angle. The model achieved an area under the receiver operating characteristic curve (AUROC) of 0.937, 0.974, and 0.904 when detecting alpha <60°, alpha <50°, and beta >77°, respectively [Table 4]. In summary, the predicted angle can classify Graf's DDH type with highly reliable accuracy.

Table 4.

Performance of the model by classifying alpha and beta angles with different clinical thresholds.

	Alpha <60°	Alpha <50°	Beta >77°
AUPRC (95% CI)	0.948 (0.923–0.963)	0.992 (0.990–0.994)	0.450 (0.246–0.688)
AUROC (95% CI)	0.937 (0.913–0.956)	0.974 (0.953–0.986)	0.904 (0.748–0.957)
Accuracy (95% CI)	0.848 (0.811–0.880)	0.949 (0.924–0.968)	0.972 (0.952–0.986)
Sensitivity (95% CI)	0.882 (0.835–0.920)	0.975 (0.949–0.985)	0.353 (0.142–0.617)
Specificity (95% CI)	0.803 (0.739–0.858)	0.625 (0.472–0.827)	0.998 (0.987–0.999)
PPV (95% CI)	0.854 (0.814–0.0.887)	0.970 (0.959–0.985)	0.857 (0.433–0.979)
NPV (95% CI)	0.839 (0.786–0.881)	0.667 (0.475–0.755)	0.974 (0.964–0.982)

Open in a new tab

Abbreviations: PPV: positive predictive value; NPV: negative predictive value; AUROC: area under receiver operating characteristic curve; AUPRC: area under precision–recall curve.

Agreement test to images with Graf's structural labels

Because the predicted keypoints could be correctly parsed into Graf's classification, we further explored its ability to capture the fine-grain structure of the hip (from which the final diagnostic decision was made) from a clinician's perspective. Participant 1 showed 99.1% agreement with the ground truth and inference images. Participants 2 and 3 showed 96.3% and 92.6% agreement with the ground truth images and 97.7% and 95.9% agreement with the inference images, respectively [Fig. 5D]. There was no statistically significant difference in the percentage between the two groups for the three participants (χ² = [0.252, 0.574, 1.511], p = [0.616, 0.315, 0.219], df = [1,1,1]). On average, the participants agreed on 96% of the ground truth images and 97.6% of inference images. The results indicate that participants agreed to model inference images comparable to ground truth labeled images, showing that our automatic method is capable of human-level identification of hip structures.

Generalization test for the model

A generalization test was designed to test the model's ability to generalize the hold-out set. The images of the hold-out set (218 images) were collected after the date of the development set and without labels. Three participants were asked to label hold-out images to compare the alpha and beta angle similarities among the models. In [Table 5], the correlation coefficients of alpha angles were higher than 0.85, indicating the models' performances in producing alpha angles that are comparable to the participants. Although the correlation coefficient is around 0.6 in classifying beta angles, the classification agreement for beta >77° is similar (the mean classification agreement across participants was 92%).

Table 5.

Generalization test for the hold-out set.

	Keypoints	ΔD MAE figs(SD)	Graf's figsAngle	Correlation figscoefficient (R)	Bland–Altman figsbias° (SD)	Classification agreement
	Keypoints	ΔD MAE figs(SD)	Graf's figsAngle	Correlation figscoefficient (R)	Bland–Altman figsbias° (SD)	< 60°	< 50°	> 77°
Participant 1	#1	2.32 (1.24)	Alpha	0.89	0.94° (0.244)	91.70%	95.90%	–
	#2	0.81 (0.52)
	#3	0.37 (0.24)
	#4	0.72 (0.67)	Beta	0.60	−3.49° (0.550)	–	–	88.50%
	#5	0.61 (0.64)	Beta	0.60	−3.49° (0.550)	–	–	88.50%
Participant 2	#1	1.07 (0.75)	Alpha	0.85	0.30° (0.297)	88.10%	95.00%	–
	#2	1.09 (0.69)
	#3	0.84 (0.60)
	#4	1.18 (0.73)	Beta	0.57	0.13° (0.585)	–	–	91.30%
	#5	1.92 (0.79)	Beta	0.57	0.13° (0.585)	–	–	91.30%
Participant 3	#1	1.43 (0.99)	Alpha	0.86	−3.47° (0.240)	77.50%	98.20%	–
	#2	2.07 (0.60)
	#3	0.42 (0.29)
	#4	1.89 (0.88)	Beta	0.61	12.43° (0.548)	–	–	96.30%
	#5	3.59 (0.93)

Open in a new tab

Abbreviations: MAE: mean absolute error (mm); SD: standard error (mm); ΔX: horizontal distance error (mm).

Discussion

We developed a novel model for registering five keypoints on a 2D ultrasonogram. These keypoints are related to the well-known anatomical structures of the hip and could be used to evaluate DDH based on Graf's method. We showed that the precision of the keypoint location was comparable to that of the ground truth from orthopedists. Furthermore, most predictions were agreed upon by clinicians familiar with the ultrasonographic examination of DDH. Previous algorithms have been reported to assist DDH diagnosis in various aspects, including determining the optimal frame by training additional classifiers [32], using segmentation-based methods to compute the average probability map [33,35,43], choosing the frame containing maximal coverage of the region of interest [34], or redefining the apex point of the acetabulum convexity and alpha and beta angles to be linked with Graf's classification [33,44]. Instead of relying on characterizing the holistic structures of the hip, our study showed that five keypoints are sufficient to localize the key anatomical structures of the hip for excellent classification performance.

The diagnosis procedure has two steps: 1) landmark detection of five keypoints; 2) measurement of alpha and beta angles by landmarks and classification of the case as dysplasia or normal. Specifically, our landmark detection model was similar to models [45,46] applied to pelvic X-rays for landmark detection. To the best of our knowledge, this is the first study to adopt a landmarking strategy for ultrasonograms. In particular, ultrasonography imaging is not as typical as X-ray imaging and has high variations. A similar study utilizing ultrasonography to develop a multi-detection model achieved 84% accuracy in detecting alpha angle <60°, and the correlation coefficients of the alpha and beta angles were 0.764 and 0.743, respectively [35]. Another study utilized ultrasonography to develop a deep convolutional neural network (DCNN) for detecting alpha angles of <60°, achieving an 86% accuracy and a correlation coefficient of 0.76 for the alpha angle [34]. A conventional feature extraction method exhibited significant discrepancies in alpha and beta angles between manual and the automatic methods [32]. Compared to the studies mentioned above, our method achieved an accuracy of 84.8% for detecting alpha angles of <60° and a correlation coefficient of 0.89, indicating superiority to these methods.

Because a senior doctor usually guides training for novices, the false positive rate is likely higher in 2D ultrasound with an inexperienced user [47]. Hence, we showed that our model could label the images correctly to provide useful guidance in learning DDH diagnosis using ultrasound. When referring to the inter-rater reliability among experts, in [Fig. 4], the model demonstrated angle variances (95% CI) of approximately 7° and 14° for the alpha and beta angles, respectively. The findings are consistent with those of previous studies that have reported interobserver variability in experts. Dias et al. (1993) reported interobserver differences (95% CI) of approximately 12.6° for the alpha angle and 19° for the beta angle [27]. Simon et al. (2004) reported interobserver differences (2SD) of approximately 6.3° for the alpha angle and 12.2° for the beta angle [48]. Jacobino et al. (2012) reported interobserver differences (2SD) of approximately 9° for the alpha angle and 12° for the beta angle [49]. These reports suggest that the variability of the angle measurements between the model and human experts is comparable to the interobserver variability among human experts using Graf's method. In [Fig. 5], image labeling is used to distinguish between the ground truth and model inference by three medical professionals. The results showed no difference, as discerned by the three experts, suggesting that model labeling was as accurate as that performed by a human. Moreover, such labeling could be further used to guide ultrasound-assisted DDH diagnosis training by providing instant user input feedback. A further comparison of the model's performance to experts' performances in the generalization test indicates that the model ensures consistent measurement that is still comparable to experts across a period, possibly providing a potentially feasible and efficient screening tool to assist human experts.

Several limitations in the application of our model should be considered. First, technically there is no “actual” ground truth, given that the labels are provided by experts based on their subjective experience to identify an “optimal frame” for analysis. Furthermore, performing a fair comparison between our model and studies using segmentation methods is challenging because a common dataset and labeler are required. Second, our model did not extensively test the generalization of real-world data. The inference results could be compromised when confronted with images from different acquisition equipment (different specifications) or different clinical pipelines and specific methods (e.g., probe orientation). Third, our model showed high sensitivity (0.975) in detecting alpha angles <50°, but lower sensitivity (0.885) in detecting angles <60°, indicating its superior performance in detecting patients with severe dysplasia (IIc/D/III/IV). However, the model's ability to detect patients with mild to moderate dysplasia (IIa/IIb) is limited, particularly for alpha angles between 50° and 60°, which is consistent with the human observers' performance [Table 4]. Fourth, obtaining acceptable results for later analysis, including identifying keypoints and calculating alpha and beta angles, may be challenging for less experienced users. Our model may not be applicable to users who do not have the skills to acquire proper ultrasound images. Additionally, our analysis did not include the variances caused by less experienced participants.

We observed larger horizontal errors for keypoints #1 and #2 (both ΔX approximately 1 mm). This is because the instructions for their ground truth labeling were to find two keypoints depicting the vertical position of the ends of the ilium regardless of their horizontal position. The horizontal errors (ΔX) do not affect the measurement of the alpha and beta angles because it relies on the vertical positions of both keypoints. We also observed that the predicted beta angles had larger errors, resulting in a smaller correlation coefficient with ground truths (R = 0.68). Because the beta angles depict the anatomical labrum structure, the prediction variability may be due to difficulty in identifying the labrums. In addition, because the length of the labrum is relatively short, the small location variation of the keypoint will cause the angles to significantly vary. Clinically, the beta angle serves as an auxiliary angle to determine the severity of dysplasia, with no effect on the DDH diagnosis.

In practice, for ultrasound-assisted DDH diagnosis, long-term follow-up should be considered because immature hips can become normal at three months of age [50]. This tool offers the possibility of continuous monitoring of hip development. In conjunction with the demand for prevalent and mass screening in neonates, computational efficiency is necessary for practical usage. Our model is easy to train, and data labeling is probably much simpler than other methods.

The proposed model can assist in locating the anatomical structures with a ±7° agreement to aid in ultrasound-based DDH diagnosis. The angles derived using the registered references showed high consistency with angles derived by the clinicians. In addition, our model is robust and economically feasible. Such a design could be easily integrated into a portable device and used in local clinics or other remote areas where medical resources are scarce.

Funding

The authors thank the statistical assistance and support of the Maintenance Project of the Center for Artificial Intelligence in Medicine at Chang Gung Memorial Hospital (Grant CLRPG3H0013) for study design, data analysis, and interpretation, the Chang Gung Medical Foundation Grant (CMRPG3K1731, CMRPG3L0441) for manpower, and the research grant from the National Science and Technology Council, Taiwan (MOST 111-2221-E-182A-007-MY3) for data collection and analysis.

Conflicts of interest

No Conflicts of interest.

Acknowledgment

None.

Footnotes

Peer review under responsibility of Chang Gung University.

Contributor Information

Chang-Fu Kuo, Email: zandis@gmail.com.

Hsuan-Kai Kao, Email: samiyadondon@gmail.com.

References

1.Sharpe P, Mulpuri K, Chan A, Cundy PJ. Differences in risk factors between early and late diagnosed developmental dysplasia of the hip. Arch Dis Child Fetal Neonatal Ed. 2006;91(3):F158–62. doi: 10.1136/adc.2004.070870. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Barlow T. Early diagnosis and treatment of congenital dislocation of the hip. J Bone Joint Surg. 1962;44(2):292–301. [Google Scholar]
3.Azzopardi T, Van Essen P, Cundy PJ, Tucker G, Chan A. Late diagnosis of developmental dysplasia of the hip: an analysis of risk factors. J Pediatr Orthop B. 2011;20(1):1–7. doi: 10.1097/BPB.0b013e3283415927. [DOI] [PubMed] [Google Scholar]
4.Malvitz TA, Weinstein SL. Closed reduction for congenital dysplasia of the hip. Functional and radiographic results after an average of thirty years. J Bone Joint Surg Am. 1994;76(12):1777–1792. doi: 10.2106/00004623-199412000-00004. [DOI] [PubMed] [Google Scholar]
5.Vitale MG, Skaggs DL. Developmental dysplasia of the hip from six months to four years of age. J Am Acad Orthop Surg. 2001;9(6):401–411. doi: 10.5435/00124635-200111000-00005. [DOI] [PubMed] [Google Scholar]
6.Mardam-Bey TH, MacEwen GD. Congenital hip dislocation after walking age. J Pediatr Orthop. 1982;2(5):478–486. doi: 10.1097/01241398-198212000-00003. [DOI] [PubMed] [Google Scholar]
7.Thomas SR. A review of long-term outcomes for late presenting developmental hip dysplasia. Bone Joint J. 2015;97(6):729–733. doi: 10.1302/0301-620X.97B6.35395. [DOI] [PubMed] [Google Scholar]
8.Ge Y, Cai H, Wang Z. Quality of reduction and prognosis of developmental dysplasia of the hip: a retrospective study. Hip Int. 2016;26(4):355–359. doi: 10.5301/hipint.5000348. [DOI] [PubMed] [Google Scholar]
9.Palmén K. Preluxation of the hip joint. Diagnosis and treatment in the newborn and the diagnosis of congenital dislocation of the hip joint in Sweden during the years 1948–1960. Acta Paediatr. 1961;50(6):655–657. doi: 10.1111/j.1651-2227.1961.tb07129.x. [DOI] [PubMed] [Google Scholar]
10.Barlow T. Early diagnosis and treatment of congenital dislocation of the hip. J Bone Joint Surg. 1962;44(2):292–301. [Google Scholar]
11.El-Shazly M, Trainor B, Kernohan WG., Turner I, Haugh PE, Johnston AF, et al. Reliability of the Barlow and Ortolani tests for neonatal hip instability. J Med Screen. 1994;1(3):165–168. doi: 10.1177/096914139400100306. [DOI] [PubMed] [Google Scholar]
12.Mahan ST, Katz JN, Kim YJ. To screen or not to screen? A decision analysis of the utility of screening for developmental dysplasia of the hip. J Bone Joint Surg Am. 2009;91(7):1705–1719. doi: 10.2106/JBJS.H.00122. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Xu W, Shu L, Gong P, Huang C, Xu J, Zhao J, et al. A deep-learning aided diagnostic system in assessing developmental dysplasia of the hip on pediatric pelvic radiographs. Front Pediatr. 2022;9:785480. doi: 10.3389/fped.2021.785480. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Den H, Ito J, Kokaze A. Diagnostic accuracy of a deep learning model using YOLOv5 for detecting developmental dysplasia of the hip on radiography images. Sci Rep. 2023;13(1):6693. doi: 10.1038/s41598-023-33860-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Fraiwan M, Al-Kofahi N, Ibnian A, Hanatleh O. Detection of developmental dysplasia of the hip in X-ray images using deep transfer learning. BMC Med Inform Decis Mak. 2022;22(1):216. doi: 10.1186/s12911-022-01957-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Zhang SC, Sun J, Liu CB, Fang JH, Xie HT, Ning B. Clinical application of artificial intelligence-assisted diagnosis using anteroposterior pelvic radiographs in children with developmental dysplasia of the hip. Bone Joint J. 2020;102(11):1574–1581. doi: 10.1302/0301-620X.102B11.BJJ-2020-0712.R2. [DOI] [PubMed] [Google Scholar]
17.Kitay A, Widmann RF, Doyle SM, Do HT, Green DW. Ultrasound is an Alternative to X-ray for Diagnosing Developmental Dysplasia of the Hips in 6-Month-Old Children. HSS J. 2019;15(2):153–158. doi: 10.1007/s11420-018-09657-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Aramesh M, Zanganeh KA, Dehdashtian M, Malekian A, Fatahiasl J. Evaluation of radiation dose received by premature neonates admitted to neonatal intensive care unit. J Clin Med Res. 2017;9(2):124. doi: 10.14740/jocmr2796w. 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Õmeroğlu H. Use of ultrasonography in developmental dysplasia of the hip. J Child Orthop. 2014;8(2):105–113. doi: 10.1007/s11832-014-0561-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Falliner A, Schwinzer D, Hahne HJ, Hedderich J, Hassenpflug J. Comparing ultrasound measurements of neonatal hips using the methods of Graf and Terjesen. J Bone Joint Surg Br. 2006;88(1):104–106. doi: 10.1302/0301-620X.88B1.16419. [DOI] [PubMed] [Google Scholar]
21.Graf R. The diagnosis of congenital hip-joint dislocation by the ultrasonic Combound treatment. Arch Orthop Trauma Surg. 1980;97(2):117–133. doi: 10.1007/BF00450934. [DOI] [PubMed] [Google Scholar]
22.Rosendahl K, Markestad T, Lie RT, Sudmann E, Geitung JT. Cost-effectiveness of alternative screening strategies for developmental dysplasia of the hip. Arch Pediatr Adolesc Med. 1995;149(6):643–648. [PubMed] [Google Scholar]
23.Rosendahl K, Markestad T, Lie RT. Ultrasound screening for developmental dysplasia of the hip in the neonate: the effect on treatment rate and prevalence of late cases. Pediatrics. 1994;94(1):47–52. [PubMed] [Google Scholar]
24.Dezateux C, Rosendahl K. Developmental dysplasia of the hip. Lancet. 2007;369(9572):1541–1552. doi: 10.1016/S0140-6736(07)60710-7. [DOI] [PubMed] [Google Scholar]
25.Loeber JG. Neonatal screening in Europe; the situation in 2004. J Inherit Metab Dis. 2007;30(4):430–438. doi: 10.1007/s10545-007-0644-5. [DOI] [PubMed] [Google Scholar]
26.Bar-On E, Meyer S, Harati G, Porat S. Ultrasonography of the hip in developmental hip dysplasia. J Bone Joint Surg Br. 1998;80(2):321–324. doi: 10.1302/0301-620x.80b2.8381. [DOI] [PubMed] [Google Scholar]
27.Dias JJ, Thomas IH, Lamont AC, Mody BS, Thompson JR. The reliability of ultrasonographic assessment of neonatal hips. J Bone Joint Surg Br. 1993;75(3):479–482. doi: 10.1302/0301-620X.75B3.8496227. [DOI] [PubMed] [Google Scholar]
28.Roovers EA, Boere-Boonekamp MM, Geertsma TS, Zielhuis GA, Kerkhoff AH. Ultrasonographic screening for developmental dysplasia of the hip in infants. Reproducibility of assessments made by radiographers. J Bone Joint Surg Br. 2003;85(5):726–730. [PubMed] [Google Scholar]
29.Orak MM., Onay T, Çağırmaz T, Elibol C, Elibol FD, Centel T. The reliability of ultrasonography in developmental dysplasia of the hip: How reliable is it in different hands? Indian J Orthop. 2015;49(6):610–614. doi: 10.4103/0019-5413.168753. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Jaremko JL, Mabee M, Swami VG, Jamieson L, Chow K, Thompson RB. Potential for change in US diagnosis of hip dysplasia solely caused by changes in probe orientation: patterns of alpha-angle variation revealed by using three-dimensional US. Radiology. 2014;273(3):870–878. doi: 10.1148/radiol.14140451. [DOI] [PubMed] [Google Scholar]
31.Liu S, Wang Y, Yang X, Lei B, Liu L, Li SX, et al. Deep learning in medical ultrasound analysis: a review. Engineering. 2019;5(2):261-75. [Google Scholar]
32.Quader N, Hodgson AJ, Mulpuri K, Schaeffer E, Abugharbieh R. Automatic Evaluation of Scan Adequacy and Dysplasia Metrics in 2-D Ultrasound Images of the Neonatal Hip. Ultrasound Med Biol. 2017;43(6):1252–1262. doi: 10.1016/j.ultrasmedbio.2017.01.012. [DOI] [PubMed] [Google Scholar]
33.Hareendranathan AR, Mabee M, Punithakumar K, Noga M, Jaremko JL. Toward automated classification of acetabular shape in ultrasound for diagnosis of DDH: Contour alpha angle and the rounding index. Comput Methods Programs Biomed. 2016;129:89–98. doi: 10.1016/j.cmpb.2016.03.013. [DOI] [PubMed] [Google Scholar]
34.Golan D, Donner Y, Mansi C, Jaremko J, Ramachandran M. Springer; 2016. Fully automating Graf’s method for DDH diagnosis using deep convolutional neural networks. Deep Learning and Data Labeling for Medical Applications; pp. 130–141. [Google Scholar]
35.Lee SW, Ye HU, Lee KJ, Jang WY, Lee JH, Hwang SM, et al. Accuracy of New Deep Learning Model-Based Segmentation and Key-Point Multi-Detection Method for Ultrasonographic Developmental Dysplasia of the Hip (DDH) Screening. Diagnostics(Basel) 2021;11(7):1174. doi: 10.3390/diagnostics11071174. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Rosendahl K, Aslaksen A, Lie RT, Markestad T. Reliability of ultrasound in the early diagnosis of developmental dysplasia of the hip. Pediatr Radiol. 1995;25(3):219–224. doi: 10.1007/BF02021541. [DOI] [PubMed] [Google Scholar]
37.Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint 2013:1312.6034, 10.48550/arXiv.1312.6034. [DOI]
38.Tompson J, Jain A, LeCun Y, Bregler C. Joint training of a convolutional network and a graphical model for human pose estimation. arXiv preprint :1406.2984. 2014 doi: 10.48550/arXiv.1406.2984. [DOI] [Google Scholar]
39.Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. 2018. Unet++: A Nested U-Net Architecture for Medical Image Segmentation. Deep Learn Med Image Anal Multimodal Learn Clin Decis Support(2018) 11045:3-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint :1409.1556. 2014 doi: 10.48550/arXiv.1409.1556. [DOI] [Google Scholar]
41.He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. Proceedings of the IEEE international conference on computer vision. 2015:1026–1034. [Google Scholar]
42.Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint 2014:14126980, 10.48550/arXiv.1412.69800. [DOI]
43.El-Hariri H, Mulpuri K, Hodgson A, Garbi R. Springer; 2019. Comparative evaluation of hand-engineered and deep-learned features for neonatal hip bone segmentation in ultrasound. Medical image computing and computer assisted intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II 22; pp. 12–20. [Google Scholar]
44.Hareendranathan AR, Zonoobi D, Mabee M, Cobzas D, Punithakumar K, Noga M, et al. Toward automatic diagnosis of hip dysplasia from 2D ultrasound. 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). IEEE 2017:982–5, doi: 10.1109/ISBI.2017.7950680.
45.Bier B, Goldmann F, Zaech JN, Fotouhi J, Hegeman R, Grupp R, et al. Learning to detect anatomical landmarks of the pelvis in X-rays from arbitrary views. Int J Comput Assist Radiol Surg. 2019;14(9):1463–1473. doi: 10.1007/s11548-019-01975-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Yang W, Ye Q, Ming S, Hu X, Jiang Z, Shen Q, et al. Feasibility of automatic measurements of hip joints based on pelvic radiography and a deep learning algorithm. Eur J Radiol. 2020;132:109303. doi: 10.1016/j.ejrad.2020.109303. [DOI] [PubMed] [Google Scholar]
47.Mostofi E, Chahal B, Zonoobi D, Hareendranathan A, Roshandeh KP, Dulai SK, et al. Reliability of 2D and 3D ultrasound for infant hip dysplasia in the hands of novice users. Eur Radiol. 2019;29(3):1489–1495. doi: 10.1007/s00330-018-5699-1. [DOI] [PubMed] [Google Scholar]
48.Simon EA, Saur F, Buerge M, Glaab R, Roos M, Kohler G. Inter-observer agreement of ultrasonographic measurement of alpha and beta angles and the final type classification based on the Graf method. Swiss Med Wkly. 2004;134(45–46):671–677. doi: 10.4414/smw.2004.10764. [DOI] [PubMed] [Google Scholar]
49.Jacobino BCP, Galvão MD, da Silva AF, de Castro CC. Using the Graf method of ultrasound examination to classify hip dysplasia in neonates. Autops Case Rep. 2012;2(2):5–10. doi: 10.4322/acr.2012.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Woolacott NF, Puhan MA, Steurer J, Kleijnen J. Ultrasonography in screening for developmental dysplasia of the hip in newborns: systematic review. BMJ. 2005;330(7505):1413. doi: 10.1136/bmj.38450.646088.E0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib1] 1.Sharpe P, Mulpuri K, Chan A, Cundy PJ. Differences in risk factors between early and late diagnosed developmental dysplasia of the hip. Arch Dis Child Fetal Neonatal Ed. 2006;91(3):F158–62. doi: 10.1136/adc.2004.070870. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Barlow T. Early diagnosis and treatment of congenital dislocation of the hip. J Bone Joint Surg. 1962;44(2):292–301. [Google Scholar]

[bib3] 3.Azzopardi T, Van Essen P, Cundy PJ, Tucker G, Chan A. Late diagnosis of developmental dysplasia of the hip: an analysis of risk factors. J Pediatr Orthop B. 2011;20(1):1–7. doi: 10.1097/BPB.0b013e3283415927. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Malvitz TA, Weinstein SL. Closed reduction for congenital dysplasia of the hip. Functional and radiographic results after an average of thirty years. J Bone Joint Surg Am. 1994;76(12):1777–1792. doi: 10.2106/00004623-199412000-00004. [DOI] [PubMed] [Google Scholar]

[bib5] 5.Vitale MG, Skaggs DL. Developmental dysplasia of the hip from six months to four years of age. J Am Acad Orthop Surg. 2001;9(6):401–411. doi: 10.5435/00124635-200111000-00005. [DOI] [PubMed] [Google Scholar]

[bib6] 6.Mardam-Bey TH, MacEwen GD. Congenital hip dislocation after walking age. J Pediatr Orthop. 1982;2(5):478–486. doi: 10.1097/01241398-198212000-00003. [DOI] [PubMed] [Google Scholar]

[bib7] 7.Thomas SR. A review of long-term outcomes for late presenting developmental hip dysplasia. Bone Joint J. 2015;97(6):729–733. doi: 10.1302/0301-620X.97B6.35395. [DOI] [PubMed] [Google Scholar]

[bib8] 8.Ge Y, Cai H, Wang Z. Quality of reduction and prognosis of developmental dysplasia of the hip: a retrospective study. Hip Int. 2016;26(4):355–359. doi: 10.5301/hipint.5000348. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Palmén K. Preluxation of the hip joint. Diagnosis and treatment in the newborn and the diagnosis of congenital dislocation of the hip joint in Sweden during the years 1948–1960. Acta Paediatr. 1961;50(6):655–657. doi: 10.1111/j.1651-2227.1961.tb07129.x. [DOI] [PubMed] [Google Scholar]

[bib10] 10.Barlow T. Early diagnosis and treatment of congenital dislocation of the hip. J Bone Joint Surg. 1962;44(2):292–301. [Google Scholar]

[bib11] 11.El-Shazly M, Trainor B, Kernohan WG., Turner I, Haugh PE, Johnston AF, et al. Reliability of the Barlow and Ortolani tests for neonatal hip instability. J Med Screen. 1994;1(3):165–168. doi: 10.1177/096914139400100306. [DOI] [PubMed] [Google Scholar]

[bib12] 12.Mahan ST, Katz JN, Kim YJ. To screen or not to screen? A decision analysis of the utility of screening for developmental dysplasia of the hip. J Bone Joint Surg Am. 2009;91(7):1705–1719. doi: 10.2106/JBJS.H.00122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.Xu W, Shu L, Gong P, Huang C, Xu J, Zhao J, et al. A deep-learning aided diagnostic system in assessing developmental dysplasia of the hip on pediatric pelvic radiographs. Front Pediatr. 2022;9:785480. doi: 10.3389/fped.2021.785480. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Den H, Ito J, Kokaze A. Diagnostic accuracy of a deep learning model using YOLOv5 for detecting developmental dysplasia of the hip on radiography images. Sci Rep. 2023;13(1):6693. doi: 10.1038/s41598-023-33860-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Fraiwan M, Al-Kofahi N, Ibnian A, Hanatleh O. Detection of developmental dysplasia of the hip in X-ray images using deep transfer learning. BMC Med Inform Decis Mak. 2022;22(1):216. doi: 10.1186/s12911-022-01957-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Zhang SC, Sun J, Liu CB, Fang JH, Xie HT, Ning B. Clinical application of artificial intelligence-assisted diagnosis using anteroposterior pelvic radiographs in children with developmental dysplasia of the hip. Bone Joint J. 2020;102(11):1574–1581. doi: 10.1302/0301-620X.102B11.BJJ-2020-0712.R2. [DOI] [PubMed] [Google Scholar]

[bib17] 17.Kitay A, Widmann RF, Doyle SM, Do HT, Green DW. Ultrasound is an Alternative to X-ray for Diagnosing Developmental Dysplasia of the Hips in 6-Month-Old Children. HSS J. 2019;15(2):153–158. doi: 10.1007/s11420-018-09657-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Aramesh M, Zanganeh KA, Dehdashtian M, Malekian A, Fatahiasl J. Evaluation of radiation dose received by premature neonates admitted to neonatal intensive care unit. J Clin Med Res. 2017;9(2):124. doi: 10.14740/jocmr2796w. 9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Õmeroğlu H. Use of ultrasonography in developmental dysplasia of the hip. J Child Orthop. 2014;8(2):105–113. doi: 10.1007/s11832-014-0561-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Falliner A, Schwinzer D, Hahne HJ, Hedderich J, Hassenpflug J. Comparing ultrasound measurements of neonatal hips using the methods of Graf and Terjesen. J Bone Joint Surg Br. 2006;88(1):104–106. doi: 10.1302/0301-620X.88B1.16419. [DOI] [PubMed] [Google Scholar]

[bib21] 21.Graf R. The diagnosis of congenital hip-joint dislocation by the ultrasonic Combound treatment. Arch Orthop Trauma Surg. 1980;97(2):117–133. doi: 10.1007/BF00450934. [DOI] [PubMed] [Google Scholar]

[bib22] 22.Rosendahl K, Markestad T, Lie RT, Sudmann E, Geitung JT. Cost-effectiveness of alternative screening strategies for developmental dysplasia of the hip. Arch Pediatr Adolesc Med. 1995;149(6):643–648. [PubMed] [Google Scholar]

[bib23] 23.Rosendahl K, Markestad T, Lie RT. Ultrasound screening for developmental dysplasia of the hip in the neonate: the effect on treatment rate and prevalence of late cases. Pediatrics. 1994;94(1):47–52. [PubMed] [Google Scholar]

[bib24] 24.Dezateux C, Rosendahl K. Developmental dysplasia of the hip. Lancet. 2007;369(9572):1541–1552. doi: 10.1016/S0140-6736(07)60710-7. [DOI] [PubMed] [Google Scholar]

[bib25] 25.Loeber JG. Neonatal screening in Europe; the situation in 2004. J Inherit Metab Dis. 2007;30(4):430–438. doi: 10.1007/s10545-007-0644-5. [DOI] [PubMed] [Google Scholar]

[bib26] 26.Bar-On E, Meyer S, Harati G, Porat S. Ultrasonography of the hip in developmental hip dysplasia. J Bone Joint Surg Br. 1998;80(2):321–324. doi: 10.1302/0301-620x.80b2.8381. [DOI] [PubMed] [Google Scholar]

[bib27] 27.Dias JJ, Thomas IH, Lamont AC, Mody BS, Thompson JR. The reliability of ultrasonographic assessment of neonatal hips. J Bone Joint Surg Br. 1993;75(3):479–482. doi: 10.1302/0301-620X.75B3.8496227. [DOI] [PubMed] [Google Scholar]

[bib28] 28.Roovers EA, Boere-Boonekamp MM, Geertsma TS, Zielhuis GA, Kerkhoff AH. Ultrasonographic screening for developmental dysplasia of the hip in infants. Reproducibility of assessments made by radiographers. J Bone Joint Surg Br. 2003;85(5):726–730. [PubMed] [Google Scholar]

[bib29] 29.Orak MM., Onay T, Çağırmaz T, Elibol C, Elibol FD, Centel T. The reliability of ultrasonography in developmental dysplasia of the hip: How reliable is it in different hands? Indian J Orthop. 2015;49(6):610–614. doi: 10.4103/0019-5413.168753. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] 30.Jaremko JL, Mabee M, Swami VG, Jamieson L, Chow K, Thompson RB. Potential for change in US diagnosis of hip dysplasia solely caused by changes in probe orientation: patterns of alpha-angle variation revealed by using three-dimensional US. Radiology. 2014;273(3):870–878. doi: 10.1148/radiol.14140451. [DOI] [PubMed] [Google Scholar]

[bib31] 31.Liu S, Wang Y, Yang X, Lei B, Liu L, Li SX, et al. Deep learning in medical ultrasound analysis: a review. Engineering. 2019;5(2):261-75. [Google Scholar]

[bib32] 32.Quader N, Hodgson AJ, Mulpuri K, Schaeffer E, Abugharbieh R. Automatic Evaluation of Scan Adequacy and Dysplasia Metrics in 2-D Ultrasound Images of the Neonatal Hip. Ultrasound Med Biol. 2017;43(6):1252–1262. doi: 10.1016/j.ultrasmedbio.2017.01.012. [DOI] [PubMed] [Google Scholar]

[bib33] 33.Hareendranathan AR, Mabee M, Punithakumar K, Noga M, Jaremko JL. Toward automated classification of acetabular shape in ultrasound for diagnosis of DDH: Contour alpha angle and the rounding index. Comput Methods Programs Biomed. 2016;129:89–98. doi: 10.1016/j.cmpb.2016.03.013. [DOI] [PubMed] [Google Scholar]

[bib34] 34.Golan D, Donner Y, Mansi C, Jaremko J, Ramachandran M. Springer; 2016. Fully automating Graf’s method for DDH diagnosis using deep convolutional neural networks. Deep Learning and Data Labeling for Medical Applications; pp. 130–141. [Google Scholar]

[bib35] 35.Lee SW, Ye HU, Lee KJ, Jang WY, Lee JH, Hwang SM, et al. Accuracy of New Deep Learning Model-Based Segmentation and Key-Point Multi-Detection Method for Ultrasonographic Developmental Dysplasia of the Hip (DDH) Screening. Diagnostics(Basel) 2021;11(7):1174. doi: 10.3390/diagnostics11071174. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] 36.Rosendahl K, Aslaksen A, Lie RT, Markestad T. Reliability of ultrasound in the early diagnosis of developmental dysplasia of the hip. Pediatr Radiol. 1995;25(3):219–224. doi: 10.1007/BF02021541. [DOI] [PubMed] [Google Scholar]

[bib37] 37.Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint 2013:1312.6034, 10.48550/arXiv.1312.6034. [DOI]

[bib38] 38.Tompson J, Jain A, LeCun Y, Bregler C. Joint training of a convolutional network and a graphical model for human pose estimation. arXiv preprint :1406.2984. 2014 doi: 10.48550/arXiv.1406.2984. [DOI] [Google Scholar]

[bib39] 39.Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. 2018. Unet++: A Nested U-Net Architecture for Medical Image Segmentation. Deep Learn Med Image Anal Multimodal Learn Clin Decis Support(2018) 11045:3-11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] 40.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint :1409.1556. 2014 doi: 10.48550/arXiv.1409.1556. [DOI] [Google Scholar]

[bib41] 41.He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. Proceedings of the IEEE international conference on computer vision. 2015:1026–1034. [Google Scholar]

[bib42] 42.Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint 2014:14126980, 10.48550/arXiv.1412.69800. [DOI]

[bib43] 43.El-Hariri H, Mulpuri K, Hodgson A, Garbi R. Springer; 2019. Comparative evaluation of hand-engineered and deep-learned features for neonatal hip bone segmentation in ultrasound. Medical image computing and computer assisted intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II 22; pp. 12–20. [Google Scholar]

[bib44] 44.Hareendranathan AR, Zonoobi D, Mabee M, Cobzas D, Punithakumar K, Noga M, et al. Toward automatic diagnosis of hip dysplasia from 2D ultrasound. 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). IEEE 2017:982–5, doi: 10.1109/ISBI.2017.7950680.

[bib45] 45.Bier B, Goldmann F, Zaech JN, Fotouhi J, Hegeman R, Grupp R, et al. Learning to detect anatomical landmarks of the pelvis in X-rays from arbitrary views. Int J Comput Assist Radiol Surg. 2019;14(9):1463–1473. doi: 10.1007/s11548-019-01975-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] 46.Yang W, Ye Q, Ming S, Hu X, Jiang Z, Shen Q, et al. Feasibility of automatic measurements of hip joints based on pelvic radiography and a deep learning algorithm. Eur J Radiol. 2020;132:109303. doi: 10.1016/j.ejrad.2020.109303. [DOI] [PubMed] [Google Scholar]

[bib47] 47.Mostofi E, Chahal B, Zonoobi D, Hareendranathan A, Roshandeh KP, Dulai SK, et al. Reliability of 2D and 3D ultrasound for infant hip dysplasia in the hands of novice users. Eur Radiol. 2019;29(3):1489–1495. doi: 10.1007/s00330-018-5699-1. [DOI] [PubMed] [Google Scholar]

[bib48] 48.Simon EA, Saur F, Buerge M, Glaab R, Roos M, Kohler G. Inter-observer agreement of ultrasonographic measurement of alpha and beta angles and the final type classification based on the Graf method. Swiss Med Wkly. 2004;134(45–46):671–677. doi: 10.4414/smw.2004.10764. [DOI] [PubMed] [Google Scholar]

[bib49] 49.Jacobino BCP, Galvão MD, da Silva AF, de Castro CC. Using the Graf method of ultrasound examination to classify hip dysplasia in neonates. Autops Case Rep. 2012;2(2):5–10. doi: 10.4322/acr.2012.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] 50.Woolacott NF, Puhan MA, Steurer J, Kleijnen J. Ultrasonography in screening for developmental dysplasia of the hip in newborns: systematic review. BMJ. 2005;330(7505):1413. doi: 10.1136/bmj.38450.646088.E0. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Automatic and human level Graf's type identification for detecting developmental dysplasia of the hip

Yueh-Peng Chen

Tzuo-Yau Fan

Cheng-CJ Chu

Jainn-Jim Lin

Chin-Yi Ji

Chang-Fu Kuo

Hsuan-Kai Kao

Abstract

Background

Methods

Results

Conclusions

Introduction

Material and methods

Subjects and ultrasonography datasets

Table 1.

Fig. 2.

Hip landmarks, line drawing, and Graf's angles

Fig. 1.

Image preprocessing and augmentation

Model development

Training procedure

Table 2.

Model metrics

Agreement test

Fig. 5.

Generalization test

Statistics

Results

Training and testing datasets

Accuracy and precision of predicted keypoints

Table 3.

Fig. 3.

Performance of Graf's angle measurement

Fig. 4.

Performance of Graf's DDH classification between model and ground truth

Table 4.

Agreement test to images with Graf's structural labels

Generalization test for the model

Table 5.

Discussion

Funding

Conflicts of interest

Acknowledgment

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases