Abstract
Objective
The aim of this study is to present a newly developed automated method to determine radiographic measurements of hip morphology on dual-energy x-ray absorptiometry (DXA) images. The secondary aim was to compare the performance of the automated and manual measurements.
Design
30 DXA scans from 13-year-olds of the prospective population-based cohort study Generation R were randomly selected. The hip shape was outlined automatically using radiographic landmarks from which the acetabular depth-width ratio (ADR), acetabular index (AI), alpha angle (AA), Wiberg and lateral center edge angle (WCEA) (LCEA), extrusion index (EI), neck-shaft angle (NSA), and the triangular index (TI) were determined. Manual assessments were performed twice by two orthopedic surgeons. The agreement within and between observers and methods was visualized using Bland-Altman plots, and the reliability was studied using the intraclass correlation coefficient (ICC) with 95 % confidence intervals (CI).
Results
The automated method was able to perform all radiographic hip morphology measurements. The intermethod reliability between the automated and manual measurements ranged from 0.57 to 0.96 and was comparable to or better than the manual interobserver reliability, except for the AI.
Conclusion
This open-access, automated method allows fast and reproducible calculation of radiographic measurements of hip morphology on right hip DXA images. It is a promising tool for performing automated radiographic measurements of hip morphology in large population studies and clinical practice.
Keywords: DXA, Radiography, Radiology, Hip, Morphology
Introduction
Osteoarthritis (OA) of the hip leads to pain, disability, and poor quality of life [1]. Risk factors for developing hip OA include age, genetics, trauma, physical workload, and hip morphology [2–8]. Hip morphology can be quantified using radiographic measurements, which are most often performed manually on anterior-posterior (AP) pelvic radiographs [8]. However, manual measurements are time-consuming, especially when multiple measurements are performed, and the accuracy is highly observer dependent. Moreover, the definitions used for most radiographic measurements of hip morphology are inconsistent, and there is often no clear description of how the measurements are performed, making it challenging to compare between studies [9–11]. A recent international consensus statement on hip pain in young and middle-aged adults therefore recommended detailed and consistent definitions, measurements, and statistical reporting of radiographic measurements of hip morphology [10].
Automation of radiographic measurements of hip morphology has the potential to increase their reproducibility. It would also allow rapid calculation of multiple measurements for each patient. Automated calculation of radiographic measurements would not only aid clinical practice but also enable these measurements to be taken in large population studies. However, automated methods are scarce, poorly described and generally not published as open access.
Hip or pelvic radiographs have traditionally been used to define and classify hip OA and hip morphology. Alternatively, modern dual-energy X-ray absorptiometry (DXA) scanners are increasingly used to assess hip morphology [13–17] and are also suitable for hip OA grading [12]. DXA images of the hip expose study participants to a much lower radiation burden (0.36–70 µSv) than hip or pelvic radiographs (600–700 µSv) [18, 19]. The reduced radiation exposure makes DXA images suitable to study hip morphology, growth plates and hip joint development in a pediatric population.
This study provides a detailed description of a newly developed and open-access, automated method of determining radiographic measurements of hip morphology on DXA images of 13-year-olds from the general population. The secondary aims were to compare the performance of the automated method with manual determination of the radiographic measurements and to assess the reliability of the automated measurements, using intraclass correlation coefficients (ICC).
Methods
Participants
Generation R is a population-based prospective cohort study that follows participants from fetal life until young adulthood in the multi-ethnic urban population of Rotterdam, the Netherlands [20]. Generation R is designed to identify early environmental and genetic factors and causal pathways causing normal and abnormal growth, development and health. The Medical Ethical Committee of the Erasmus MC approved the study (MEC-2015–749). All participants provided written informed consent. The 4625 participants who underwent a DXA scan at around age 13 years formed the population of interest. We randomly selected two training sets of 500 participants each and a different test set of 30 participants around age 13 for validation of the automated measurements.
Image acquisition
DXA scans were obtained by the GE-Lunar iDXA densitometer (GE Healthcare, Madison, WI, USA) and enCORE software (enCORE 2010; GE Healthcare). The participants were scanned in supine position with legs slightly apart and big toes touching, with their feet secured in this position. A unilateral anteroposterior (AP) right hip DXA image and an AP full-body DXA image were acquired consecutively for each patient. We extracted the full-body and right hip images from the enCORE software in BMP format.
Definition and calculation of radiographic measurements of hip morphology
We developed methods in-house to automatically calculate radiographic measurements of hip morphology based on radiographic landmarks. The proximal femur and acetabulum were outlined with 80 radiographic landmarks using the BoneFinder® software (www.bone-finder.com; The University of Manchester, UK). The protocol for radiographic landmark definition can be found in Supplementary material 1. To automate the radiographic landmark placement, an automatic search model (ASM), a random forest-based machine learning algorithm, was trained on the first training set of 500 right hip DXA images [21]. Of these 500 images, 20 images were incomplete, with one or more landmarks missing from the image. This resulted in a training set of 480 images suitable for training of the ASM. This ASM was used to annotate the images of the other training set for development of the automated methods and the 30 images of the separate validation set.
To assess the influence of radiographic landmark placement, a second set of radiographic landmarks was created using the ASM. The radiographic landmark placement for all 30 hips from the validation set were manually assessed and minor corrections were performed by the same researcher who developed the protocol for radiographic landmark placement. This manual correction was performed to remove the influence of incorrect placement of the radiographic landmarks on the performance of automated measurements. The landmarks that most often needed adjustments were the most lateral bony point of the acetabulum, the most lateral and most medial point of the sourcil, the triradiate cartilage points, the most caudal point of the teardrop, and the landmarks at the start and end of the best fitting circle. Any points equally spaced between these landmarks were also influenced by the changed landmark position. Adjustments were needed in 20–50 % of images. The radiographic measurements were performed twice: once based on the uncorrected landmarks and again based on the manually adjusted landmarks.
The automated method for radiographic measurements was created in Python v3.9.13 [22]. We implemented the following radiographic measurements: the acetabular depth-width ratio (ADR), the acetabular index (AI), the alpha angle (AA), the center edge angle of Wiberg (WCEA), the lateral center edge angle (LCEA), the extrusion index (EI), the neck-shaft angle (NSA), and the triangular index (TI), see Fig. 1. Some of the measurements are based on similar features, such as the femoral head center, the femoral neck axis, and the horizontal reference line of the pelvis. Additionally, similar mathematical concepts are used in different measurements, such as the angle between two vectors. These general concepts will be reported before describing each radiographic measurement in detail. The output of the algorithm is the performed measurement and a visualization of the measurement. This visualization is created upon request by the user to provide insight into how the measurement was calculated. The workflow to calculate the radiographic measurements can be found in Fig. 2.
Fig. 1. Definition of radiographic measurements of hip morphology implemented in this study.
A: The acetabular depth-width ratio (ADR), ADR = (A/B)*1000, is the ratio between the acetabular width (B) measured from the most lateral bony part of the acetabulum to the most inferior point of the teardrop and the acetabular depth (A) measured from the most medial point of the sourcil perpendicular to line B. B: The acetabular index (AI) describes the angle between the horizontal reference line of the pelvis (line 1) and line 2 through the most lateral bony part of the acetabulum and the most lateral point of the triradiate cartilage. C: The alpha angle (AA) is the angle between line 1 through the alpha point and the femoral head center, and the femoral neck axis (line 2). D: The center edge angle of Wiberg (WCEA) is the angle between line 1, a line through the femoral head center perpendicular to the horizontal reference line of the pelvis, and a line from the most lateral part of the sourcil to the femoral head center (line 2). E: The lateral center edge angle (LCEA) is the angle between line 1, a line through the femoral head center perpendicular to the horizontal reference line of the pelvis, and line 2 from the most lateral bony part of the acetabulum to the femoral head center. F: The extrusion index (EI), EI = A/(A + B) * 100 %, is the percentage of the part of the femoral head not covered by the acetabulum (A) compared to the entire width of the femoral head (A + B). G: The neck-shaft angle (NSA) is the angle between the femoral neck axis (line 1) and the femoral shaft axis (line 2). H: The triangular index (TI) is the length of line 3, the line between point S, a point at the intersection of the cortex of the femoral head and line 2, and the femoral head center (C).
Fig. 2.
Overview of the workflow to obtain the automated radiographic measurements of hip morphology on the AP hip DXA, as well as the horizontal reference line of the pelvis on the AP full-body DXA. AP: anteroposterior. DXA: dual-energy x-ray absorptiometry. ADR: acetabular depth-width ratio. AI: acetabular index. AA: alpha angle. WCEA: center edge angle of Wiberg. LCEA: lateral center edge angle. EI: extrusion index. NSA: neck shaft angle. TI: triangular index.
General concepts
The angle between two vectors
We used Walker’s [23] method for calculating the angle between two vectors to determine the angle between landmarks. Each vector was defined by two landmarks or originated in a landmark and pointed in the direction of one of the image’s axes. The angle between the vectors can be calculated using Eq. (1).
| (1) |
The horizontal reference line of the pelvis
The horizontal reference line of the pelvis is determined automatically on the full-body DXA image to correct for potential pelvic obliquity. The correction of potential pelvic obliquity is applied to the AI, the WCEA, and the LCEA. Two landmarks on both hips were used: the most inferior point of the ischial tuberosity and the most superior point of the obturator foramen.
The slope of the line through each set of landmarks on both hips relative to the horizontal axis of the image was determined as the angle between two vectors, Eq. (1). The horizontal reference line of the pelvis was then determined to be the mean of these measurements. A negative slope indicates that the right hip is positioned more cranial than the left hip.
Femoral head center
The femoral head center is defined as the center of the best-fitting circle around the femoral head. We selected the hyper fit to determine the best-fitting circle since it offers fast calculation owing to its non-iterative nature [24]. Additionally, it has no essential bias and outperforms the geometric circle fit [24], which is viewed as the golden standard.
The circle fit was optimized for each hip by performing the calculations with nine different combinations of points (Fig. 3), to obtain the circle fit with the smallest root mean square error of the distances between the points and the circle with the smallest radius. A trade-off was made between these two features if this was not the same circle. This optimization was performed to prevent the best-fitting circle becoming too large and influenced by possible erroneous points.
Fig. 3. All possible combinations of radiographic landmark points used to define the best-fitting circle.
A: All femoral head points. B: One point less on the lateral side of the femoral head. C: One point less on the medial side of the femoral head. D: One point less on both the lateral and the medial side of the femoral head. E: Two points less on the lateral side of the femoral head. F: Two points less on the medial side of the femoral head. G: One point less on the lateral and two points less on the medial side of the femoral head. H: Two points less on the lateral and one point less on the medial side of the femoral head. I: Two points less on both the lateral and the medial side of the femoral head.
Femoral neck axis
The femoral neck axis is the axis through the femoral head center and the femoral neck center. To determine the femoral neck center, the distance between all femoral neck landmarks was calculated. The femoral neck center was then determined as the center of all distances.
Radiographic measurements of hip MORPHOLOGY
Acetabular depth-width ratio
The acetabular depth-width ratio (ADR) measures the acetabular depth. The ADR is the ratio of the length of the acetabular depth (A) to the entire length of the acetabular opening (B); see Eqs. (2-4). The length of the acetabular opening was measured as the distance between the most lateral point of the bony acetabulum (LA) and the most inferior point of the teardrop (TD). The acetabular depth was measured from the most medial point of the sourcil (MS), the weight-bearing part of the acetabulum, to line B (see Fig. 1A).
| (2) |
| (3) |
| (4) |
Acetabular index
The acetabular index (AI), also known as the Tönnis angle, describes the acetabular roof inclination. The AI is the angle between the horizontal reference line of the pelvis, line 1, and the line bisecting the most lateral point of the triradiate cartilage and the most lateral bony point of the acetabulum, line 2 (Fig. 1B) [25]. The AI is determined by calculating the angle between the vector representing line 2 and a vector originating in the most lateral point of the triradiate cartilage parallel to the horizontal axis of the image, Eq. (1). The found AI is corrected for any potential pelvic obliquity using the horizontal reference line of the pelvis (HRLP) using Eq. (5) and (6) for the left and right hip respectively.
| (5) |
| (6) |
Alpha angle
The alpha angle (AA) is used to detect asphericity (cam morphology) of the femoral head. The AA is the angle between the alpha point and the femoral neck axis (see Fig. 1C). The alpha point is defined as the point where the femoral head or femoral neck leaves the best-fitting circle. To simulate clinical practice, if only a small bony protrusion leaves the best-fitting circle but returns inside the best-fitting circle around the head-neck junction, the alpha point is indicated at the first point on the femoral neck that leaves the best-fitting circle (see Fig. 4). First, the best-fitting circle is calculated as described previously. To find the alpha point, all radiographic landmarks in the femoral head-neck area were investigated to see if any of them were outside of the best-fitting circle. Previous studies have reported a more oval-like shape of the femoral head in children aged 13 years, which might result in higher alpha angles while the femoral head seems spherical [26]. Therefore, an error margin of 4 % of the radius of the best-fitting circle is used to avoid false positives. An error margin in relation to the radius was chosen so that the error margin was not affected by the size of the hip joint. Error margins of 2–7 % were evaluated in the second training set of 500 images (containing 472 complete images), to optimize the detection of cam deformity without creating too many false positives. The AA measurement was assessed visually by an orthopedic surgeon (RA). The error margin of 4 % was chosen since it prevented under-detection of a cam deformity, with a false negative rate of 1 and false positive rate of 7 in the 472 images. To refine the alpha point detection, an interpolating B-spline is fitted through the lateral femoral head and neck points using the interpolate function from the Python SciPy-package [27]. This will allow for identification of the alpha point around the radiographic landmark, which is the first point outside of the best-fitting circle. Lastly, the intersection of the best-fitting circle and the B-spline around this radiographic landmark is defined as the alpha point. Once the alpha point is identified, the AA is calculated using Eq. (1).
Fig. 4.
Example of small bony protrusion outside of the best-fitting circle (indicated by the arrow and highlighted in yellow), which returns inside the best-fitting circle around the head-neck junction. The alpha angle is indicated using the light green lines.
Center edge angle
The CEA is the angle between the vertical line through the femoral head center, perpendicular to the horizontal reference line of the pelvis, and the line tangential to the lateral margin of the acetabulum. On an AP hip DXA image, two types of CEA can be measured: the CEA of Wiberg (WCEA) and the lateral CEA (LCEA). The WCEA is measured from the most lateral point of the sourcil (weight-bearing part of the acetabulum) and represents the anterosuperior coverage of the femoral head (see Fig. 1D). The LCEA is measured from the acetabulum’s most lateral bony point and represents the femoral head’s superolateral coverage (see Fig. 1E).
To calculate the CEA, the center of the femoral head is determined as described previously. Next, we calculate the CEA as the angle between the vector from the center of the femoral head parallel to the vertical axis of the image and the vector from the center of the femoral head to the most lateral point of the bony acetabulum or sourcil (Eq. (1)). The found CEA is corrected for any potential pelvic obliquity using the horizontal reference line of the pelvis (HRLP) using Eqs. (7) and (8) for the left and right hip respectively.
| (7) |
| (8) |
Extrusion index
The extrusion index (EI) describes the femoral head coverage by the acetabulum. The EI is the percentage of the total femoral head width (A + B) not covered by the bony acetabulum (A), (see Fig. 1F). The width of the femoral head is calculated as the difference in x-coordinate between the most lateral and most medial points of the femoral head. The uncovered part of the femoral head is calculated as the difference in x-coordinate between the most lateral point of the femoral head and the most lateral bony point of the acetabulum. Lastly, the EI can be determined using Eq. (9).
| (9) |
Neck shaft angle
The neck shaft angle (NSA) is a measure for coxa vara and valga and is the angle between the neck axis and the shaft axis (see Fig. 1G). To determine the shaft axis, first, the image is cropped to below the minor trochanter to increase the accuracy of the shaft axis determination and to make the calculation faster. Next, the femoral shaft is segmented in the cropped image using multi-otsu thresholding with three levels for the Python Skicit-image package [28]. Next, we cleaned up the segmentation using morphological closing to remove noise and small artifacts, and smooth the edges of the segmentation result [29]. The shaft axis was defined as the center of the lateral and medial cortices. The shaft axis will only be determined if the shaft is visible for a length of at least half of the radius of the best-fitting circle below the minor trochanter. The NSA was calculated as the angle between two lines using the slope of the shaft axis (m1) and the neck shaft axis (m2) with Eq. (10).
| (10) |
Triangular index
The triangular index (TI) measures the sphericity of the femoral head, see Fig. 1H. Several structures need to be identified to calculate the TI:
The best-fitting circle of the femoral head, the femoral head center (point C), and the radius of the circle (r),
The neck axis,
Point H at a distance of half the head circle radius from the femoral head center, along the neck axis,
Point S at the cortex of the femoral head, on the line through point H perpendicular to the neck axis,
The TI is the distance between point S and the femoral head center.
A unit vector in the direction of the neck axis was used to find the coordinates of point H (Eq. (11)). The unit vector was calculated using Eq. (12), where vector is the vector from the femoral head center to the femoral neck center. Next, the line perpendicular to the neck axis through point H was determined. This line could then be used to identify point S, using similar methods as finding the alpha point, namely finding the intersection between this perpendicular line and the B-spline through all femoral head neck points. Lastly, the TI was calculated as the distance between the femoral head center and point S.
| (11) |
| (12) |
Manual determination of the radiographic measurements of hip morphology
On 30 right hip DXA scans, two experienced orthopedic surgeons performed a manual assessment of all the above described radiographic measurements of hip morphology at two different time points, at least one month apart. Both surgeons had access to a protocol providing precise descriptions of the measurements, (see Supplementary material 2). The measurements were performed using a DICOM viewer (Synedra View, Version 21.0.0, Synedra Information Technologies). The mean of all four measurements was used as the reference standard to which the automated method was compared.
Statistical analysis
The agreement within and between observers, between automated measurements on the automated landmarks and the manually adjusted landmarks, and between methods was investigated using Bland-Altman plots with limits of agreement. The Bland-Altman plot shows the differences between the reference standard and the automated measurements over the average between the measurements for each individual hip. The limits of agreement were estimated using the 95 % CI of the differences between the reference standard and the automated measurements. Reliability was tested through intraclass correlation coefficients (ICCs) and reported with 95 % confidence intervals. Intraobserver reliability was tested with a 2-way mixed-effects model, single rater, absolute agreement ICC. Interobserver reliability between both manual observer and between the automated and manually adjusted landmarks was tested with a 2-way random-effects model, single rater, absolute agreement ICC. Lastly, intermethod reliability was tested with a 2-way mixed-effects model, single rater, absolute agree-ment ICC. ICCs were rated as poor (<0.50), moderate (0.50–0.75), good (0.76–0.90), or excellent (>0.90). All statistical analyses were performed using R Statistical Software (v4.1.0; R Core Team 2021). The ICCs were calculated using the irr-package [30] and the Bland-Altman plots were created using the ggplot2-package [31].
Results
The mean age of the participants was 13.6 ± 0.2 years, and 18 (60 %) were female. The automated method was able to perform almost all radiographic hip measurements for all 30 hips. Except, however, for the AI measurement which could only be performed in 8 of the 30 hips, since only those 8 still had open triradiate cartilage.
Agreement
The Bland-Altman plots for agreement between the manual and automated measurements can be found in Fig. 5. Bland-Altman inter-observer and intermethod agreement are presented in Table 1. The manual measurements were consistently higher than the automated measurements for the NSA based on the automated landmarks, and for the AI, the AA, the EI, the NSA, and the TI based on the manually adjusted landmarks. For the ADR, the WCEA, and the LCEA, there was almost no overall difference between the measurements. However, most measurements showed proportion errors, where the difference between the automated and manual measurements was dependent on the measurement size. The limits of agreement were mostly similar or smaller in the comparison of the manual and automated method, as well as in the comparison of the automated measurements on automated and manually adjusted landmarks, than in the comparison of the manual measurements of both observers. Only for the ADR and the AI the intermethod limits of agreement were larger than the interobserver.
Fig. 5. Bland-Altman plots of the mean manual vs. automated hip morphology measurements and the manual measurement as performed by observer 1 vs. observer 2.
A: The acetabular depth-width ratio (ADR) – manual vs automated. B: ADR – observer 1 vs observer 2. C: The acetabular index (AI) – manual vs automated. The AI was only measured in hips with an open triradiate cartilage (n = 8). D: AI – observer 1 vs observer 2. The AI was only measured in hips with an open triradiate cartilage (n = 8). E: The alpha angle (AA) – manual vs automated. F: AA – observer 1 vs observer 2. G: The center edge angle of Wiberg (WCEA) – manual vs automated. H: WCEA – observer 1 vs observer 2. I: The lateral center edge angle (LCEA) – manual vs automated. J: LCEA – observer 1 vs observer 2. K: The extrusion index (EI) – manual vs automated. L: EI – observer 1 vs observer 2. M: The neck-shaft angle (NSA) – manual vs automated. N: NSA – observer 1 vs observer 2. O: The triangular index (TI) – manual vs automated. P: TI – observer 1 vs observer 2.
Table 1. Interobserver and intermethod agreement.
| Manual | Automated | Manual vs Automated | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Observer 1 vs observer 2 | Automated vs manual adjusted landmarks |
Automated landmarks | Manual adjusted landmarks | |||||||||||
| Measurement | Interobserver bias (mean ± SD) | Interobserver limits of agreement | Interobserver bias (mean ± SD) | Interobserver limits of agreement | Intermethod bias (mean ± SD) | Intermethod limits of agreement | Intermethod bias (mean ± SD) | Intermethod limits of agreement | ||||||
| Acetabular depth-width ratio | –27.1 ± 18.5 | –63.4 to 9.2 | 1.2 ± 11.9 | –22.0 to 24.5 | –0.1 ± 21.8 | –42.9 to 42.8 | –1.3 ± 17.4 | –35.5 to 32.9 | ||||||
| Acetabular index* | –0.4° ± 2.4° | –5.2° to 4.4° | –3.3° ± 3.3° | –9.8° to 3.1° | –1.1° ± 4.1° | –9.1° to 6.9° | 2.3°± 2.5° | –2.5° to 7.1° | ||||||
| Alpha angle | 0.4° ± 4.9° | –9.3° to 10.0° | –2.2° ± 6.0° | –14.0°to 9.6° | 0.3° ± 5.5° | –10.4° to 11.0° | 2.5° ± 3.5° | –4.3° to 9.3° | ||||||
| Center edge angle of Wiberg | –2.5° ± 2.6° | –7.6° to 2.6° | –0.9° ± 2.3° | –5.4° to 3.5° | –0.6° ± 2.1° | –4.7° to 3.5° | 0.3° ± 2.4° | –4.4° to 5.1° | ||||||
| Lateral center edge angle | –3.5° ± 2.3° | –8.0° to 1.0° | –0.2° ± 0.8° | –1.7° to 1.4° | –0.7° ± 1.6° | –3.8° to 2.4° | –0.6° ± 1.3° | –3.1° to 1.9° | ||||||
| Extrusion index | 3.0 ± 2.5 | –1.8 to 7.8 | –0.6 ± 1.5 | –3.5 to 2.3 | 0.5 ± 1.8 | –3.1 to 4.0 | 1.1 ± 1.7 | –2.1 to 4.3 | ||||||
| Neck shaft angle | 4.4° ± 3.0° | –1.4° to 10.3° | 0.3° ± 1.8° | –3.3° to 3.9° | 5.2° ± 2.3° | 0.6° to 9.7° | 4.9° ± 2.4° | 0.2° to 9.5° | ||||||
| Triangular index | 0.2 ± 3.3 | –6.4 to 6.8 | –1.0 ± 1.3 | –3.6 to 1.6 | 0.1 ± 2.7 | –5.2 to 5.3 | 1.0 ± 2.5 | –4.0 to 6.0 | ||||||
Bland-Altman interobserver and intermethod bias (mean and standard deviation) and limits of agreement, n = 30.
The acetabular index was only measured in hips with an open triradiate cartilage (n = 8).
Fig. 5: Bland-Altman plots to visualize agreement between the measurements.
Reliability
The intra- and interobserver and intermethod reliability for all measurements is shown in Table 2, as well as the mean absolute difference between the reference standard and the automated measurements. The intermethod reliability is comparable to or better than the interobserver reliability. Only for the acetabular index are manual measurements more reliable than the automated measurements. The interobserver ICCs from the automated measurements using the manually adjusted landmarks and the fully automated landmarks showed that the AI and AA were mostly influenced by the manual corrections. For all the other measurements, the ICCs ranged from good to excellent.
Table 2. Comparing the performance of two manual raters and the automated measurements.
| Manual | Automated | Manual vs Automated | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Observer 1 | Observer 2 | Observer 1 vs observer 2 | Automated vs manual adjusted landmarks | Automated landmarks | Manual adjusted landmarks | |||||||||
| Measurement | Intraobserver ICC | Intraobserver ICC | Interobserver ICC | Interobserver ICC | Intermethod ICC | Mean absolute difference [range] | Intermethod ICC | Mean absolute difference [range] | ||||||
| Acetabular depth-width ratio | 0.72 (0.50 – 0.86) |
0.57 (0.26 – 0.77) |
0.58 (0 – 0.85) |
0.90 (0.80 – 0.95) |
0.69 (0.44 – 0.84) |
17.2 [1.7 – 47.9] |
0.80 (0.62 – 0.90) |
12.6 [0.44 – 50.9] |
||||||
| Acetabular index* | 0.60 (0 – 0.92) |
0.99 (0.66 – 0.998) |
0.83 (0.36 – 0.96) |
0.58 (0 – 0.90) |
0.62 (0 – 0.91) |
2.9° [0.08° – 8.0°] |
0.65 (0 – 0.92) |
2.4° [0.1° – 6.0°] |
||||||
| Alpha angle | 0.89 (0.78 – 0.95) |
0.74 (0.52 – 0.87) |
0.77 (0.58 – 0.89) |
0.68 (0.43 – 0.84) |
0.77 (0.56 – 0.88) |
3.7° [0.05° – 18.2°] |
0.81 (0.46 – 0.92) |
3.4° [0.08° – 11.1°] |
||||||
| Center edge angle of Wiberg | 0.85 (0.70 – 0.92) |
0.83 (0.66 – 0.91) |
0.80 (0.25 – 0.93) |
0.91 (0.81 – 0.96) |
0.92 (0.84 – 0.96) |
1.6° [0.05° – 6.2°] |
0.91 (0.81 – 0.95) |
2.0° [0.1° – 5.8°] |
||||||
| Lateral center edge angle | 0.79 (0.60 – 0.89) |
0.86 (0.73 – 0.93) |
0.65 (0 – 0.89) |
0.98 (0.96 – 0.99) |
0.92 (0.81 – 0.96) |
1.4° [0.03° – 4.5°] |
0.95 (0.87 – 0.98) |
1.1° [0.05° – 3.8°] |
||||||
| Extrusion index | 0.71 (0.47 – 0.85) |
0.86 (0.72 – 0.93) |
0.66 (0 – 0.88) |
0.93 (0.85 – 0.97) |
0.91 (0.81 – 0.95) |
1.6 [0.29 – 3.4] |
0.89 (0.66 – 0.95) |
1.6 [0.1 – 4.7] |
||||||
| Neck shaft angle | 0.69 (0.19 – 0.88) |
0.87 (0.75 – 0.94) |
0.58 (0 – 0.85) |
0.94 (0.87 – 0.97) |
0.60 (0 – 0.88) |
5.3° [0.20° – 9.5°] |
0.57 (0 – 0.86) |
5.0° [1.6° – 11.6°] |
||||||
| Triangular index | 0.81 (0.61 – 0.91) |
0.95 (0.90 – 0.98) |
0.94 (0.89 – 0.97) |
0.98 (0.93 – 0.99) |
0.96 (0.92 – 0.98) |
2.2 [0.15 – 5.2] |
0.96 (0.91 – 0.98) |
2.2 [0.13 – 7.7] |
||||||
Intraclass correlation coefficients (ICC) of intra- and interobserver, and intermethod reliability of the radiographic measurements of hip morphology, n = 30. ICCs are presented with 95 % CI. Intraobserver reliability was tested with a 2-way mixed-effects model, single rater, absolute agreement ICC. Interobserver reliability was tested with a 2-way random-effects model, single rater, absolute agreement ICC. Intermethod reliability was tested with a 2-way mixed-effects model, single rater, absolute agreement ICC. Interpretation: poor (<0.50), moderate (0.50–0.75), good (0.76–0.90), or excellent (>0.90).
The acetabular index was only measured in hips with an open triradiate cartilage (n = 8).
Discussion
We presented an open-access, automated method for determining radiographic measurements of hip morphology on hip DXA images, and evaluated the agreement and reliability compared to manual assessments. We automatically calculated the ADR, AI, AA, CEA, EI, NSA, and TI for all 30 images assessed. The automated measurements were calculated based on a set of radiographic landmarks describing the shape of the acetabulum and proximal femur. The agreement between the automated and manual measurements was similar to or better than the agreement between two manual observers, with respect to the width of the 95 % CI of the intermethod differences. For most measurements, there was a bias towards smaller values for the automated method compared to the manual measurements. The intermethod reliability was comparable to or better than the manual interobserver reliability.
The AA showed both a larger 95 % CI for the agreement as well as erratic behavior in the higher values. This is likely the result of the difficulty in the correct identification of the alpha point. This is especially true if some asphericity seems to be present and thus a higher alpha angle is found. In clinical practice this could lead to an under- or overestimation of the cam deformity. The ADR was the measurement with the largest 95 % CI around the difference for both the intermethod and the interobserver analyses. This is likely because the ADR is a ratio that is tenfold bigger than the other ratio measurements. Further, the AI was only measured in a limited number of hips, since only 8 hips in our study population had an open triradiate cartilage. This makes it difficult to judge the performance of the automated AI measurement. The modified AI, which is measured from the most medial part of the weight-bearing part of the acetabulum instead of the most lateral part of the triradiate cartilage, will allow for measurement of the acetabular roof inclination in hips with a closed triradiate cartilage. Lastly, interpreting the measurement performance of TI can be difficult as it is a length measurement which is also dependent on the size of the hip of the individual. An alternative is the TI ratio, also known as the Gosvig ratio [32], which is the ratio of the TI to the radius of the best-fitting circle around the femoral head.
Some methods are presently available for the automatic determination of radiographic morphology measurements of the hip [33–37]. However, most of these algorithms are not open-source and do not have clear descriptions of how the measurements are calculated. This makes reproducibility and implementation in clinical research and practice impossible. Faber et al. [34] developed an open-access method determining the AA on DXA images based on radiographic landmarks and reported a concordance correlation coefficient between the automated method and the manual measurement of 0.88 (95 % CI 0.84–0.92). Three articles used the AI software HIPPO and validated the software for the WCEA, NSA, EI, and AI against manual measurements [33,35,36]. The ICCs were comparable to those reported in the present study. Lastly, Jensen et al. [37] investigated the reliability and agreement of the RBhip software, which was able to measure the WCEA and the AI, also showing a bias in the WCEA measurement.
The proposed automated methods have some limitations. There is no gold standard for determination of radiographic measurements of hip morphology. Therefore, we needed to create a reference standard to assess the software’s performance. We believe that the created reference standard reflects current clinical practice since it was created by two orthopedic surgeons, who are the specialists who ultimately work with these measurements and use them to decide on diagnoses and treatment. Moreover, the proposed automated calculations need radiographic landmarks describing the shape of the hip as the input. The creation of these landmark sets can be a very time-consuming process. However, the radiographic landmark set for this study was created automatically using the BoneFinder® software. Additionally, the algorithm’s performance depends on the quality of the landmark set provided as input. It should be taken into account that correct landmark placement influences the performance of the ADR, AI and AA the most. Before implementation of the fully automated pipeline in clinical practice, the ASM needs to be improved by additional training using images of different databases. Another limitation for the AA needs to be noted. The selection of the error margin needed for the determination of the alpha point was based on subjective visual inspection and should be confirmed in populations with a higher prevalence of cam morphology. Lastly, the presented method was created and validated on only 30 right hip DXA images in 13-year-old subjects. The generalizability of the automated method in other populations is therefore limited. Nevertheless, we believe that following validation this automated method could also be applied to radiographs.
We think that this algorithm is a promising tool for performing automated radiographic measurements of hip morphology. The fact that it is automated, but still very insightful due to the use of radiographic landmarks, makes it feasible to analyze vast amounts of data. This makes the software highly applicable for large population studies. Additionally, it can be used in clinical practice, where the user can see how the measurement is performed based on the output image if desired, making it more insightful. Another application of these automated measurements could be more standardized recruitment for trials pertaining to certain hip morphologies. The use of automated measurements could reduce selection bias. DXA images allow for x-ray-like imaging with a lower radiation burden. A full-body DXA, however, is also needed when using hip DXA images to be able to correct for the potential obliquity of the pelvis.
In conclusion, the proposed algorithms allow for fast and reproducible calculation of radiographic measurements of hip morphology on right hip DXA images. Furthermore, by providing open access to the algorithms, we aimed at transparency and provided the opportunity for better inter-research comparison. This can help advance insights into the morphology and development of the hip, as well as provide information on the development and risk of diseases such as hip OA.
Software availability
The in-house developed methods can be accessed here: https://github.com/FleurBoel/Automated-Hip-Morphology-Measurements/tree/main. License: Apache 2.0 license. BoneFinder® and the model for automatic point placement are freely available from the website (www.bone-finder.com; The University of Manchester, UK) [21].
Supplementary Material
Acknowledgements
This research is supported by the Dutch Arthritis Association [grant number 19-2-202, 2019]. The Generation R Study is conducted by the Erasmus MC, University Medical Center Rotterdam in close collaboration with the Erasmus University Rotterdam and the city of Rotterdam. We gratefully acknowledge the contribution of children and parents. The general design of Generation R Study is made possible by long-term financial support from Erasmus MC, University Medical Center Rotterdam, the Netherlands, Organization for Health Research and Development (ZonMw) and the Ministry of Health, Welfare and Sport. CL is funded by a Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (223267/Z/21/Z). This research was funded in whole, or in part, by the Wellcome Trust [Grant number 223267/Z/21/Z]. For the purpose of open access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.
References
- [1].Felson DT, Lawrence RC, Dieppe PA, Hirsch R, Helmick CG, Jordan JM, et al. Osteoarthritis: new insights. Part 1: the disease and its risk factors. Ann Intern Med. 2000;133(8):635–646. doi: 10.7326/0003-4819-133-8-200010170-00016. [DOI] [PubMed] [Google Scholar]
- [2].Agricola R, Heijboer MP, Roze RH, Reijman M, Bierma-Zeinstra SM, Verhaar JA, et al. Pincer deformity does not lead to osteoarthritis of the hip whereas acetabular dysplasia does: acetabular coverage and development of osteoarthritis in a nationwide prospective cohort study (CHECK) Osteoarthrit Cartil. 2013;21(10):1514–1521. doi: 10.1016/j.joca.2013.07.004. [DOI] [PubMed] [Google Scholar]
- [3].Guilak F. Biomechanical factors in osteoarthritis. Best Pract Res Clin Rheumatol. 2011;25(6):815–823. doi: 10.1016/j.berh.2011.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Reijman M, Hazes JM, Pols HA, Koes BW, Bierma-Zeinstra SM. Acetabular dysplasia predicts incident osteoarthritis of the hip: the Rotterdam study. Arthritis Rheum. 2005;52(3):787–793. doi: 10.1002/art.20886. [DOI] [PubMed] [Google Scholar]
- [5].Saberi Hosnijeh F, Kavousi M, Boer CG, Uitterlinden AG, Hofman A, Reijman M, et al. Development of a prediction model for future risk of radiographic hip osteoarthritis. Osteoarthrit Cartil. 2018;26(4):540–546. doi: 10.1016/j.joca.2018.01.015. [DOI] [PubMed] [Google Scholar]
- [6].Saberi Hosnijeh F, Zuiderwijk ME, Versteeg M, Smeele HT, Hofman A, Uitterlinden AG, et al. Cam Deformity and Acetabular Dysplasia as Risk Factors for Hip Osteoarthritis. Arthrit Rheumatol. 2017;69(1):86–93. doi: 10.1002/art.39929. [DOI] [PubMed] [Google Scholar]
- [7].Thomas GE, Palmer AJ, Batra RN, Kiran A, Hart D, Spector T, et al. Subclinical deformities of the hip are significant predictors of radiographic osteoarthritis and joint replacement in women. A 20 year longitudinal cohort study. Osteoarthrit Cartil. 2014;22(10):1504–1510. doi: 10.1016/j.joca.2014.06.038. [DOI] [PubMed] [Google Scholar]
- [8].Casartelli NC, Maffiuletti NA, Valenzuela PL, Grassi A, Ferrari E, van Buuren MMA, et al. Is hip morphology a risk factor for developing hip osteoarthritis? A systematic review with meta-analysis. Osteoarthrit Cartil. 2021;29(9):1252–1264. doi: 10.1016/j.joca.2021.06.007. [DOI] [PubMed] [Google Scholar]
- [9].Mascarenhas VV, Castro MO, Afonso PD, Rego P, Dienst M, Sutter R, et al. The Lisbon Agreement on femoroacetabular impingement imaging-part 2: general issues, parameters, and reporting. Eur Radiol. 2021;31(7):4634–4651. doi: 10.1007/s00330-020-07432-1. [DOI] [PubMed] [Google Scholar]
- [10].Reiman MP, Agricola R, Kemp JL, Heerey JJ, Weir A, van Klij P, et al. Consensus recommendations on the classification, definition and diagnostic criteria of hip-related pain in young and middle-aged active adults from the International Hip-related Pain Research Network, Zurich 2018. Br J Sport Med. 2020;54(11):631–641. doi: 10.1136/bjsports-2019-101453. [DOI] [PubMed] [Google Scholar]
- [11].Hanson JA, Kapron AL, Swenson KM, Maak TG, Peters CL, Aoki SK. Discrepancies in measuring acetabular coverage: revisiting the anterior and lateral center edge angles. J Hip Preserv Surg. 2015;2(3):280–286. doi: 10.1093/jhps/hnv041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Yoshida K, Barr RJ, Galea-Soler S, Aspden RM, Reid DM, Gregory JS. Reproducibility and diagnostic accuracy of Kellgren-Lawrence grading for osteoarthritis using radiographs and dual-energy X-ray absorptiometry images. J Clin Densitom. 2015;18(2):239–244. doi: 10.1016/j.jocd.2014.08.003. [DOI] [PubMed] [Google Scholar]
- [13].Aldieri A, Terzini M, Audenino AL, Bignardi C, Morbiducci U. Combining shape and intensity dxa-based statistical approaches for osteoporotic HIP fracture risk assessment. Comput Biol Med. 2020;127:104093. doi: 10.1016/j.compbiomed.2020.104093. [DOI] [PubMed] [Google Scholar]
- [14].Faber BG, Baird D, Gregson CL, Gregory JS, Barr RJ, Aspden RM, et al. DXA-derived hip shape is related to osteoarthritis: findings from in the MrOS cohort. Osteoarthrit Cartil. 2017;25(12):2031–2038. doi: 10.1016/j.joca.2017.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Faber BG, Ebsim R, Saunders FR, Frysz M, Gregory JS, Aspden RM, et al. Cam morphology but neither acetabular dysplasia nor pincer morphology is associated with osteophytosis throughout the hip: findings from a cross-sectional study in UK Biobank. Osteoarthrit Cartil. 2021;29(11):1521–1529. doi: 10.1016/j.joca.2021.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Pavlova AV, Saunders FR, Muthuri SG, Gregory JS, Barr RJ, Martin KR, et al. Statistical shape modelling of hip and lumbar spine morphology and their relationship in the MRC National Survey of Health and Development. J Anat. 2017;231(2):248–259. doi: 10.1111/joa.12631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Waarsing JH, Rozendaal RM, Verhaar JA, Bierma-Zeinstra SM, Weinans H. A statistical model of shape and density of the proximal femur in relation to radiological and clinical OA of the hip. Osteoarthrit Cartil. 2010;18(6):787–794. doi: 10.1016/j.joca.2010.02.003. [DOI] [PubMed] [Google Scholar]
- [18].Mettler FA, Jr, Huda W, Yoshizumi TT, Mahesh M. Effective doses in radiology and diagnostic nuclear medicine: a catalog. Radiology. 2008;248(1):254–263. doi: 10.1148/radiol.2481071451. [DOI] [PubMed] [Google Scholar]
- [19].Wall BF, Hart D. Revised radiation doses for typical X-ray examinations. Report on a recent review of doses to patients from medical X-ray examinations in the UK by NRPB. National Radiological Protection Board. Br J Radiol. 1997;70(833):437–439. doi: 10.1259/bjr.70.833.9227222. [DOI] [PubMed] [Google Scholar]
- [20].Kooijman MN, Kruithof CJ, van Duijn CM, Duijts L, Franco OH, van IMH, et al. The Generation R Study: design and cohort update 2017. Eur J Epidemiol. 2016;31(12):1243–1264. doi: 10.1007/s10654-016-0224-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Lindner C, Thiagarajah S, Wilkinson JM, arc OC, Wallis GA, Cootes TF. Fully automatic segmentation of the proximal femur using random forest regression voting. IEEE Trans Med Imaging. 2013;32(8):1462–1472. doi: 10.1109/TMI.2013.2258030. [DOI] [PubMed] [Google Scholar]
- [22].Van Rossum G, Drake FL. Python 3 Reference Manual. CreateSpace; Scotts Valley, CA: 2009. [Google Scholar]
- [23].Walker JW. Computing Angle Between Vectors. 2014. updated 5 June 2016. Available from, https://www.jwwalker.com/pages/angle-between-vectors.html.
- [24].Al-Sharadqah A, Chernov N. Error analysis for circle fitting algorithms. Electron J Stat. 2009;3:886–911. [Google Scholar]
- [25].Tönnis D. Congenital Dysplasia and Dislocation of the Hip in Children and Adults. Springer-Verlag Berlin and Heidelberg GmbH & Co. KG; 1987. [Google Scholar]
- [26].Agricola R, Heijboer MP, Ginai AZ, Roels P, Zadpoor AA, Verhaar JA, et al. A cam deformity is gradually acquired during skeletal maturation in adolescent and young male soccer players: a prospective study with minimum 2-year follow-up. Am J Sport Med. 2014;42(4):798–806. doi: 10.1177/0363546514524364. [DOI] [PubMed] [Google Scholar]
- [27].Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in python. Nat Method. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Van der Walt S, Schönberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, et al. scikit-image: image processing in Python. PeerJ. 2014;2:e453. doi: 10.7717/peerj.453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Gonzalez RC, Woods RE. Digital Image Processing. 4th. Pearson Education Limited; 2018. Opening and Closing; pp. 644–648. [Google Scholar]
- [30].Gamer M, Lemon J. irr: Various Coefficients of Interrater Reliability and Agreement. 2019.
- [31].Wickham H. ggplot2: Elegant Graphics For Data Analysis. Springer-Verlag; New York: 2016. [Google Scholar]
- [32].Nelson AE, Stiller JL, Shi XA, Leyland KM, Renner JB, Schwartz TA, et al. Measures of hip morphology are related to development of worsening radiographic hip osteoarthritis over 6 to 13 year follow-up: the Johnston County Osteoarthritis Project. Osteoarthrit Cartil. 2016;24(3):443–450. doi: 10.1016/j.joca.2015.10.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Archer H, Reine S, Alshaikhsalama A, Wells J, Kohli A, Vazquez L, et al. Artificial intelligence-generated hip radiological measurements are fast and adequate for reliable assessment of hip dysplasia: an external validation study. Bone Jt Open. 2022;3(11):877–884. doi: 10.1302/2633-1462.311.BJO-2022-0125.R1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Faber BG, Ebsim R, Saunders FR, Frysz M, Smith GD, Cootes T, et al. Deriving alpha angle from anterior-posterior dual-energy x-ray absorptiometry scans: an automated and validated approach. Wellcome Open Res. 2021;6:60. doi: 10.12688/wellcomeopenres.16656.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Schwarz GM, Simon S, Mitterer JA, Huber S, Frank BJ, Aichmair A, et al. Can an artificial intelligence powered software reliably assess pelvic radiographs? Int Orthop. 2023;47(4):945–953. doi: 10.1007/s00264-023-05722-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Stotter C, Klestil T, Roder C, Reuter P, Chen K, Emprechtinger R, et al. Deep Learning for Fully Automated Radiographic Measurements of the Pelvis and Hip. Diagnost (Basel) 2023;13(3) doi: 10.3390/diagnostics13030497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Jensen J, Graumann O, Overgaard S, Gerke O, Lundemann M, Haubro MH, et al. A deep learning algorithm for radiographic measurements of the hip in adults-a reliability and agreement study. Diagnost (Basel) 2022;12(11) doi: 10.3390/diagnostics12112597. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





