Skip to main content
Journal of Medical Imaging logoLink to Journal of Medical Imaging
. 2020 May 13;7(3):035001. doi: 10.1117/1.JMI.7.3.035001

Automatic analysis of global spinal alignment from simple annotation of vertebral bodies

Sophia A Doerr a, Tharindu De Silva a, Rohan Vijayan a, Runze Han a, Ali Uneri a, Michael D Ketcha a, Xiaoxuan Zhang a, Nishanth Khanna b, Erick Westbroek c, Bowen Jiang c, Corinna Zygourakis c, Nafi Aygun b, Nicholas Theodore c, Jeffrey H Siewerdsen a,b,c,*
PMCID: PMC7218103  PMID: 32411814

Abstract.

Purpose: Measurement of global spinal alignment (GSA) is an important aspect of diagnosis and treatment evaluation for spinal deformity but is subject to a high level of inter-reader variability.

Approach: Two methods for automatic GSA measurement are proposed to mitigate such variability and reduce the burden of manual measurements. Both approaches use vertebral labels in spine computed tomography (CT) as input: the first (EndSeg) segments vertebral endplates using input labels as seed points; and the second (SpNorm) computes a two-dimensional curvilinear fit to the input labels. Studies were performed to characterize the performance of EndSeg and SpNorm in comparison to manual GSA measurement by five clinicians, including measurements of proximal thoracic kyphosis, main thoracic kyphosis, and lumbar lordosis.

Results: For the automatic methods, 93.8% of endplate angle estimates were within the inter-reader 95% confidence interval (CI95). All GSA measurements for the automatic methods were within the inter-reader CI95, and there was no statistically significant difference between automatic and manual methods. The SpNorm method appears particularly robust as it operates without segmentation.

Conclusions: Such methods could improve the reproducibility and reliability of GSA measurements and are potentially suitable to applications in large datasets—e.g., for outcome assessment in surgical data science.

Keywords: spine surgery, global spinal alignment, image analysis, image-guided surgery, surgical outcomes, surgical data science

1. Introduction

Measurements of global spinal alignment (GSA) are frequently used to evaluate spinal deformity and characterize the progression of spinal disability.13 GSA metrics are conventionally measured by manual annotation of anatomical landmarks (e.g., vertebral endplates) in radiographic images and have been shown to correlate with spine surgery outcomes in treatment of both deformity and degenerative disease.15 For example, courses of treatment for scoliosis and kyphosis are frequently chosen based on progression of GSA measures beyond predetermined thresholds (e.g., normal thoracic kyphosis falls within a range 20 deg to 40 deg).6 Retrospective analysis7 of patients undergoing degenerative lumbar spinal fusion suggests that GSA assessment is pertinent to lumbar fusion as well, with abnormal ranges of spinopelvic GSA parameters (pelvic incidence–lumbar lordosis mismatch, PI–LL >11deg) indicating a 75% positive predictive value for adjacent segment failure.

The standard approach to measurement of GSA is manual identification of pertinent anatomical landmarks in radiography. However, variability in reliability and reproducibility of manual annotation is high. As summarized in Table 1, intrareader intraclass correlation coefficient (ICC) has been reported to range from 0.40 to 0.96 (poor to excellent).813 Inter-reader ICC shows similar findings (ICC ranging from 0.20 to 0.86) and confirms expectations that variability between readers is worse than variability for one reader. Such wide variability implies important dependence on study controls and conditions and suggests inherent variability in GSA measurement based on visual assessment of endplate angles. As reported by Carman et al.,14 change in GSA must exceed 11 deg to be of clinical utility and not be solely attributable to measurement error. Challenges to manual measurement include radiographic image noise,10 angulated projection of the endplate plateau,10,11,13 differences in plateau visualization due to kyphosis and lordosis, patient obesity, and degeneration of bone quality.11 The lack of reproducibility in manual measurement has led some to question the usefulness of measurements of sagittal alignment from radiographs.10

Table 1.

Summary of reports on inter- and intrareader variability (ICC) in measurement of spinal alignment.

Inter-reader Intrareader Reference
Thoracic Lumbar Pelvic/global Average Thoracic Lumbar Pelvic/global Average
0.50 to 0.57 0.76 to 0.84 0.04 to 0.44 0.29 to 0.95 0.36 to 0.46 0.62 to 0.74 8
0.83 0.89 0.79 to 0.99 0.84 to 0.99 9
>0.80 <0.80 >0.80 >0.80 >0.80 <0.20 to >0.80 0.40 to 0.76 10
0.82 to 0.85 0.82 to 0.93 0.78 to 0.86 0.87 to 0.96 0.78 to 0.86 0.82 to 0.96 11
0.88 0.90 0.83 0.81 0.86 0.87 12
0.50 0.71 0.83 0.70 0.77 0.71 13

Computer-assisted technologies, for example, Surgimap (Globus Medical, Audubon, PA) and iGA (NuVasive, San Diego, CA), have shown similar or better performance than a manual approach in analyzing GSA.15 However, such approaches still require significant manual input that is subject to variability. Recently, a number of algorithms for automated vertebral labeling and segmentation have emerged that could improve the identification of image features pertinent to GSA to increase efficiency and reduce variability. Such automatic labeling methods include model-based approaches (prior shape modeling,16 mathematical morphology,17 and regression forests18) and deep learning approaches,1921 some of which have been translated to clinical use, for example, FAST Spine (Siemens Healthineers, Malvern, PA).

Automatic determination of GSA metrics would be of benefit not only in diagnostic radiology (e.g., assessment of scoliosis), but also in intraoperative measurement of GSA as a tool for quantitatively evaluating the change in curvature imparted during spinal reduction. As discussed by Ailon et al.,22 the development of an intraoperative tool for GSA computation requires efficient computation time and a low impact on workflow, further motivating an automatic method. The evolution of minimally invasive, image-guided spine surgery in recent decades raises the opportunity for more quantitative assessment of GSA in the perioperative context. A straightforward method involves use of a surgical tracking system to localize pertinent anatomical features on the patient and assess GSA accordingly. Image-guided spine surgery has also prompted more frequent acquisition of preoperative and postoperative computed tomography (CT)—from which changes in GSA may be determined—and steady adoption of intraoperative CT from which assessment of deformity correction could be performed during the case.2325

Automatic assessment of GSA from preoperative, intraoperative, and postoperative spine CT or radiographs could provide a valuable tool for data-intensive methods that aim to analyze large-scale image datasets for correlation with patient outcomes. For example, recent work by De Silva et al.26 shows that GSA (among other features derived from perioperative images) could provide insight on surgical outcomes beyond that of conventional patient demographic data. It also prompts further analysis of the relationship between GSA assessed in supine, prone, and standing orientations.

The work reported below details automatic measurement of radiographic GSA metrics from vertebral labels identified in spine CT using two distinct approaches. The first approach (referred to as EndSeg) operates by automatic segmentation of vertebral endplates, analogous to the conventional manual approach based on measurement of endplate angles. The second approach (referred to as SpNorm) uses the vertebral labels themselves as a surrogate for spinal curvature determined by spline fit. Both operate by projection of CT-derived inputs to two-dimensional (2-D) radiographic planes for GSA analysis in terms that are consistent with conventionally defined 2-D GSA measures. Each method is tested in retrospective analysis of spine CT images and compared to manual definition by expert radiologists.

2. Methods

Metrics of GSA are defined by various Cobb angles as illustrated in Fig. 1. On a radiograph, a Cobb angle is the angle between lines parallel to the superior endplate of a particular (superior) vertebra and the inferior endplate of a second (inferior) vertebra as presented in a 2-D radiographic view. Three methods for GSA measurement are described below and evaluated in terms of reproducibility and accuracy—conventional manual annotation and two innovative automatic methods.

Fig. 1.

Fig. 1

Definitions and illustration of various GSA metrics. Sagittal GSA is determined from LAT radiographs (top) and coronal GSA from PA or AP radiographs (bottom).

2.1. Manual Annotation

A reader study was performed to assess inter- and intrareader variability in manual annotation of vertebral endplate angles by five expert clinicians (two radiologists and three spinal neurosurgeons). Seven patient CT images were used from SpineWeb Dataset 1427 from which lateral (LAT) and anterior–posterior (AP) digitally reconstructed radiographs (DRRs) were computed spanning spinal vertebrae from C7 to S1. Readers annotated a single endplate digitally on each of 128 DRRs (49 LAT + 15 LAT repeats + 49 AP + 15 AP repeats).

The DRRs were generated with 0.6-mm pixel spacing and system geometry characterized by source-to-detector distance = 120 cm and source-to-axis distance = 60 cm using a prism projector (divergent fan beams stacked in rows to form a full length DRR—thus simulating a line scanner or CT “scout” view). This projection model reduces distortion of endplates compared to a divergent cone-beam projection and represents an optimistic lower bound on variability in manual endplate delineation.

Reproducibility of manual annotation was quantified in terms of ICC, ranging from 1 to 1. Rubrics for ICC performance cited by Koo and Li.28 are 0.00 to 0.49 (poor), 0.50 to 0.74 (fair), 0.75 to 0.90 (good), and 0.90 to 1.00 (excellent). Inter-reader ICC was calculated as

ICCinter=σWR2σBR2+σWR2, (1)

where σBR2 is the variance between five expert readers and σWR2 is the mean variance within readers. Inter-reader ICC was calculated from 98 single endplate angle measurements (49 LAT + 49 AP). Intrareader ICC was calculated for each reader as

ICCintra=σWT2σBT2+(k1)σWT2, (2)

where σBT2 is the variance between repeated trials of measurements, σWT2 is the variance between trials within a particular reader, and k is the number of repeated measurements. Intrareader ICC was calculated from 60 single endplate angle measurements (15 LAT + 15 LAT repeats + 15 AP + 15 AP repeats). Inter-reader root mean square was also computed as a measure of the dispersion in endplate angle distribution within and across readers and was used to compute the inter-reader 95% confidence interval (CI95).

2.2. Automatic Method 1: Endplate Segmentation (EndSeg)

An automatic endplate segmentation method (denoted EndSeg) was developed to delineate endplate angles analogous to the manual method described in Sec. 2.1. With a CT image and vertebral labels (defined at the approximate centroid of the vertebral body) as input, the EndSeg algorithm identifies endplate angles as illustrated in Fig. 2. The input vertebral labels could be defined by any of the various automatic labeling methods described in Sec. 1.1621 In the work reported here, the vertebral point labels were defined manually by an expert radiologist at the approximate centroid of each vertebra. Sources of variation in the location of labels include intrauser variability (for manual labeling, as in this work) and image quality limitations (for manual or automatic labeling methods).

Fig. 2.

Fig. 2

The EndSeg algorithm for calculation of GSA. (a) Input CT image with vertebral labels. (b) Computation of principal axis and intersection with endplate. (c) Region growing along the endplate. (d) Projection of the segmented endplate from three-dimensional (3-D) to the 2-D LAT or AP radiographic plane. (e) Linear fit to the projected point cloud. (f) GSA metrics computed according to definitions in Table 1.

The EndSeg method starts with a segmentation of the vertebral body using continuous max-flow optimization,29 which has demonstrated robust performance in a variety of applications, including spine imaging.30 The max-flow objective function uses a maximum a posteriori estimate of voxel labels, a weighting function based-on the vertebral centroid seed point, and a smoothness preserving regularization term. Image erosion was applied to separate the vertebral body from the spinous process. Following erosion, principal component analysis of the larger component (vertebral body) was used to determine the direction of the principal axis of the vertebral body (roughly superior–inferior direction).

The EndSeg method then delineates the vertebral endplate starting with a seed point defined by the intersection of the principal axis with the vertebral body segmentation. A region-growing method expands about the seed point on the vertebral endplate, stopping near the edge of the endplate, where the gradient elevation direction (i.e., the angle of the surface out of the xy plane) changes sharply. The region growing is constrained by the distance from the seed point and a gradient elevation threshold parameter τ. A voxel at location x is added to the endplate segmentation if the gradient elevation at that location G(x) differs from the mean gradient elevation in a small patch around the seed point Ge¯ by less than the threshold τ:

|Ge¯G(x)|<τ, (3)

thus growing along the “flat” surface of the endplate. Region growing was constrained to the anterior portion of the vertebra (i.e., did not include the spinous or transverse processes) by a distance cutoff beyond the posterior surface of the vertebral body. The posterior constraint was defined using the principal component of the segmentation as a proxy for anterior–posterior axis. A sensitivity analysis was performed to investigate the dependence of EndSeg endplate angle estimation on the selection of the gradient threshold parameter τ.

As illustrated in Fig. 2, the EndSeg method then projects the locus of region-grown endplate voxels onto a DRR as a point-cloud distribution with forward projection as in Sec. 2.1. A linear fit to the point-cloud is then performed, and the slope is taken as a surrogate for the endplate angle. The resulting endplate angles are used as the basis for computing the various GSA metrics according to the definitions in Fig. 1. Pseudocode for EndSeg can be found in Table 2.

Table 2.

Pseudocode for EndSeg. Functional blocks correspond to Fig. 2 flowchart. Calculations and parameter are denoted in the second column as manual, automatic, or adjustable.

Pseudocode User interaction
Get CT image, vertebral labels, and vertebra segmentation
Get vertebral labels ci Manual
  ci(xi,yi,zi): 3-D “centroid” position for vertebra i or Auto1618
for each vertebra i:
  vertebra.seg = maxflow(CT, ci) Auto30
  vertebra.seg = Adjustable
gaussian.filter{erosion[closing(vertebra.seg)]}
Find endplate edge
for each vertebra i:
  if not vertebra S1:
    principal.axes = PCA (vertebra.seg) Auto31
    principal.axis = max.z(principal.axes) Auto
  else:
    principal.axis = direction[ci (S1) to ci+1 (L5)] Auto
  endplate.edge = peak.gradient(principal axis) Auto
Segment the endplate (region growing)
specify τ: Angular threshold of region growing Adjustable 32
for each vertebra. Auto
  compute G (local 3-D image gradient)
  while G<τ
    endplate.seg = endplate.seg + current voxel Auto
  endplate.seg = posterior.cutoff(endplate.seg) Auto
Project endplate to DRR and compute endplate angle
for each endplate.seg
  endplate.pointcloud = forward.projection(endplate.seg) Auto
  linear.fit = fit(endplate.pointcloud, ‘linear’) Auto
  endplate.angle = slope(linear.fit) Auto

2.3. Automatic Method 2: Spline-Fit Normals (SpNorm)

The Spline-Fit Normals (SpNorm) method computes spinal curvature without segmentation. Given vertebral labels as input (as in Sec. 2.2), SpNorm forward projects vertebral labels to a DRR and computes a 2-D curvilinear fit as illustrated in Fig. 3. Various fit models were investigated, including a smoothing spline-fit and polynomial models (third, fifth, sixth, and seventh order). The smoothing-spline fit minimized the following objective for vertebral label coordinates (xi,yi) in the DRR:

s=argminsjpi[yisj(xi)]2+(1p)(d2sjdx2)2dx, (4)

where s is the smoothing spline function, xi is the distance along the cranial–caudal axis determined by orientation of the CT, yi is distance perpendicular to the cranial–caudal axis determined by projection type (i.e., LAT and AP), and p is approximately 1/(1+h36), where h is the average data point spacing.

Fig. 3.

Fig. 3

SpNorm algorithm for calculation of GSA. (a) Input CT image with vertebral labels. (b) Projection of vertebral labels from 3-D to the 2-D LAT or AP radiographic plane. (c) Spline fit to the projected labels. (d) Normal rays computed between vertebral labels, with slope taken as a proxy for endplate angle. (e) GSA metrics computed according to definitions in Table 1.

The SpNorm method then identifies rays normal to the spline as proxies for endplate angles. The normal ray intersection was nominally computed midway between vertebral levels, but locations varied somewhat depending on spinal region: half-way between the superior and inferior vertebral for levels C7 to L5; and at a distance 0.65 of the way between labels from L5 to S1 (slightly closer to S1 to more accurately capture the sharp curvature at the lumbosacral junction). The S2 level provided an additional inferior control point for curve fitting. Metrics of GSA were then computed as defined in Fig. 1. Pseudocode for SpNorm can be found in Table 3.

Table 3.

Pseudocode for SpNorm. Functional blocks correspond to Fig. 2 flowchart. Calculations and parameter are denoted in the second column as manual, automatic, or adjustable.

Pseudocode User interaction
Get CT image and vertebral labels
Get vertebral labels ci Manual
  ci(xi,yi,zi): 3-D “centroid” position for vertebra i or Auto28
Project labels to DRR
for each vertebra i
  cR,i(ui,vi) project label ci(xi,yi,zi) to DRR Auto
Fit smoothing spline
spline.fit = fit(cR,i, ‘smoothing_spline’) Auto
Compute spline normal
specify fv: fraction of distance between vertebra to approximate … endplate location Adjustable
specify fS1: fraction of distance between L5 and S1 to … approximate S1 endplate location Adjustable
for each vertebra i
  if not vertebra L5
    spline.point = spline.fit(cR,i + fvcR,i+1) Auto
  else
    spline.point = spline.fit(cR,L5 + fS1cR,S1) Auto
  spline.slope = derivative[spline.fit(spline.point)] Auto
Compute endplate angle
  endplate.angle = 1/spline.slope Auto

The sensitivity of SpNorm to variations in the location of the vertebral label (e.g., due to intrauser or image noise) was investigated by varying the label coordinate and analyzing the effect on GSA metric. Perturbations from the “true” (manually defined) locations were realized according to a Gaussian distribution with σ=5.7  mm and a cutoff at Δxj=9  mm to confine within the vertebral body. A total of 100 perturbed locations were simulated for each level and the resulting variation in GSA metric was analyzed.

2.4. Performance of Manual and Automatic Methods

The EndSeg and SpNorm methods were compared to manual annotation, which represents the conventional clinical means of GSA analysis and is, therefore, a reasonable basis of comparison (recognizing that it may or may not represent an actual “truth” definition). Endplate angles computed by EndSeg and SpNorm were compared to manual annotation using Passing–Bablok (PB) regression tests, a nonparametric test of similarity between measurements.33 Deviations from the regression line were computed and compared to the CI95 of manual measurements. Such nonparametric measures provided insight for small sample sizes and spread about the median. Student t-tests were used to assess differences between automatic and manual methods.

The performance of GSA measurement (from endplate angle estimates as described above) was similarly analyzed using PB regression tests to evaluate the similarity of manual and automatic methods. Metrics of GSA included lumbar lordosis (LL), main thoracic kyphosis (MThK), proximal thoracic kyphosis (PThK), lumbar Cobb (LC) angle, main thoracic Cobb (MThC) angle, and proximal thoracic Cobb (PThC) angle—each defined in Fig. 1. Adherence to the regression line within CI95 was tested, and parametric Student t-tests were performed to compare the distributions in manually and automatically determined GSA metrics.

3. Results

3.1. Manual Annotation

Inter-reader and intrareader ICC (ICCinter and ICCintra, respectively) are summarized in Table 4. The ICCinter was similar to ICCintra for both LAT and AP cases. Noting that ICCinter is within two standard deviations of ICCintra, it appears that visual interpretation of the endplate is no more consistent for a single reader than between different readers.

Table 4.

Inter- and intrareader agreement (ICCintra and ICCinter) in manual definition of endplate angle. Parenthetical values denote the CI95 (or ± standard deviation for average ICCintra).

  ICCintra ICCinter
  R1 R2 R3 R4 R5 Average
AP 0.75 (0.39 to 0.91) 0.86 (0.73 to 0.93) 0.76 (0.60 to 0.86) 0.76 (0.62 to 0.85) 0.78 (0.67 to 0.86) 0.78 (±0.05) 0.80 (0.69 to 0.87)
LAT 0.99 (0.96 to 1.00) 0.99 (0.98 to 1.00) 0.98 (0.95 to 0.99) 0.97 (0.94 to 0.98) 0.97 (0.95 to 0.98) 0.98 (±0.01) 1.0 (0.99 to 1.00)

According to Table 4, both ICCinter and ICCintra were greater in LAT views, suggesting that measures of sagittal curvature have higher reproducibility than coronal measures. This observation may be associated with better visualization (e.g., reduced overlap) of endplates for cases included in this study, which exhibited varying levels of normal or abnormal lordosis and kyphosis, but did not include pathologically significant scoliosis. However, the distributions in Fig. 4 suggest that LAT views have slightly higher median and standard deviation in endplate angle definition than AP views. The apparent discrepancy between standard deviation and ICC is because the former solely describes inter/intrareader variability. ICC additionally describes the inherent range of variability within a single type of measurement (i.e., LAT views have a typical endplate angle range of 40 to 40 deg) as it is a ratio between the inherent variability in a particular type of measurement (σWR and σWT) and the sum of the inherent variability and inter- or intrareader variability (σBR or σBT, respectively). In the extreme case when inherent variability is much greater than the inter/intrareader variability as with LAT views, then ICC approaches 1. From the standard deviations shown in Fig. 4, inter-reader CI95 in endplate angle definition was 5.8 deg.

Fig. 4.

Fig. 4

Standard deviation in (a) inter-reader and (b) intrareader variability of endplate angle definition in AP and LAT views. The violin plots show individual sample points, an envelope fit to the sample distribution, the median (open circle with horizontal bar), and interquartile range (upper and lower horizontal range bars).

The inter-reader CI95 for sagittal GSA metrics was 8.2 deg for PThK, 6.0 deg for MThK, and 7.4 deg for LL, with a similar range for coronal GSA metrics: 11.0 deg for PThC, 8.2 deg for MThC, and 4.8 deg for LC. The mean inter-reader error for GSA metrics across all manual annotations was 7.6 deg. Because GSA metrics aggregate the errors from two endplate angle measurements, they are subject to a higher rate of variability than the error for individual endplate measurements. The variability in GSA measurements is consistent with such propagation of error.

3.2. Automatic Method 1: Endplate Segmentation (EndSeg)

The sensitivity of EndSeg to selection of the gradient angle threshold parameter (τ) was analyzed to determine the operating range and a suitable, nominal parameter setting for the region growing operation. As shown in Fig. 5, the error in endplate angle (calculated as the difference from manual annotation) was greater for low values of the threshold, τ<5deg. Above this level, endplate region growing appeared stable, with error 0.5deg, which is within the range of errors associated with manual (inter- or intrareader) delineation.

Fig. 5.

Fig. 5

Sensitivity of endplate angle measurement in the EndSeg method to the gradient elevation threshold parameter τ.

3.3. Automatic Method 2: Spline-Fit Normals (SpNorm)

Performance of various curvilinear fits is shown in Fig. 6. As shown in Fig. 6(a), the smoothing-spline model demonstrated the best adherence to manually defined endplate angles. The goodness of fit was quantified via the coefficient of determination (r2) as shown in Fig. 6(b), where the smoothing-spline model is seen to outperform other model types.

Fig. 6.

Fig. 6

Selection of model fit for the SpNorm method. (a) Endplate angle evaluated for various model fits (solid curves) and by manual definition (open circle) shown as a function of position along the spine. (b) Coefficient of determination (r2) between model-based and manual endplate angle definition for various models in SpNorm.

The sensitivity of SpNorm to variations in vertebral label location is shown in Fig. 7(a), illustrating the impact of variation in the label coordinate on the spline fit. Relatively small deviations in the fit are observed with reasonable range of vertebral coordinate variations (e.g., within ±9  mm). Overall, 95% of the angle deviations are within the inter-reader CI95 as shown in Fig. 7(b), with the exception of T1 and S1, for which the 95% bounds exceed the inter-reader CI95 [although the interquartile range (IQR) was within the CI95]. The effect of label coordinate perturbations on GSA measurement is shown in Fig. 7(c), with similar findings as for endplate angles (i.e., 95% of deviations within the inter-reader CI95), with the exception of PThK and ThK.

Fig. 7.

Fig. 7

SpNorm sensitivity analysis. (a) LAT DRR with representative unperturbed (yellow) and perturbed (cyan) SpNorm vertebral labels and fits. (b) Distribution in endplate angle estimation for SpNorm over the range of perturbation in vertebral label location. (c) Distribution in GSA metrics computed over the range of perturbed vertebral label location. The violin plots in (b) and (c) show the sample points, median (open circle), and interquartile range (vertical bar). The gray range in the background of (b) and (c) shows inter-reader CI95 for endplate angle and GSA metric, respectively.

3.4. Comparative Analysis of Manual and Automatic Methods

Figure 8 illustrates EndSeg and SpNorm GSA metric calculations. Most GSA metrics agreed within 3.0 deg (LL, PI, and pelvic tilt) or 10  mm (C7S1 sagittal vertical axis (SVA)]. The most notable difference was in ThK, with angle differences up to 6deg. The SpNorm estimate of ThK was unaffected by variations in endplate angle estimation and vertebral body shape variations as may affect EndSeg.

Fig. 8.

Fig. 8

Example LAT DRR with GSA metrics as measured by (a) EndSeg and (b) SpNorm.

3.4.1. Endplate angles

Endplate angle estimation for manual, EndSeg, and SpNorm methods suggests similar underlying distributions as seen in Fig. 9. The alternative hypothesis (HA) that the methods sample from distinct distributions (p<0.05) was rejected for PB regression tests (manual to EndSeg and manual to SpNorm). Deviations in endplate angles from the regression line show that 93.8% of the endplate angle estimates for both EndSeg and SpNorm are within the inter-reader CI95. With the exception of three outliers (>2 standard deviations) for each method, endplate angle estimates suggested strong correspondence with manual definition. Comparison of example endplate angle measurements in Figs. 9(c) and 9(d) showed comparable method performance, with no visible outliers.

Fig. 9.

Fig. 9

Comparison of automatic and manual endplate angle estimation. (a) Endplate angle PB regression between manual and EndSeg and (b) between manual and SpNorm. The narrow transparent range marked about the fit shows the CI95 in PB regression slope, and the dashed lines mark the CI95 for inter-reader variability. Example endplate angle measurements for Manual, EndSeg, and SpNorm are shown in (c) for the T2 inferior endplate and in (d) for the L3 superior endplate. (Other endplates showed similar trends.) The violin plots show sample points, envelope of the distribution, median (open circle), and IQR.

Table 5 summarizes the average endplate angle difference between EndSeg and SpNorm compared to manual definition. EndSeg exhibited stricter adherence to manual definition than SpNorm in lower thoracic/lumbar regions, potentially due to the interplay between endplate plateaus and the curvature line along the spine, whereas endplate plateaus are more normal to the direction of curvature in the upper spine. Although EndSeg was observed to have an average absolute difference in endplate angle measurement less than SpNorm, both methods were within inter-reader CI95. Student t-tests failed to detect significant differences between manual and automatic methods.

Table 5.

Difference in average endplate angle for EndSeg and SpNorm compared to manual definition for various vertebral levels. Paired T-tests reject the hypothesis that automatic and manual measurements are different for both EndSeg and SpNorm.

  T2 (deg) T3 (deg) T4 (deg) T8 (deg) L1 (deg) L3 (deg) S1 (deg) Avg. diff. (abs. val.) (deg) T-test (p-value)
EndSeg −3.6 −4.0 −2.2 0.1 0.4 1.9 0.8 1.9 p=0.33
SpNorm −2.1 −1.6 −0.4 1.4 2.6 5.6 −5.3 2.7 p=0.83

3.4.2. GSA metric computation

Pairwise Student t-tests and PB regression tests (Fig. 10) between manual and automatic GSA measurement methods (EndSeg and SpNorm) rejected the alternative hypothesis (HA), thus failing to detect statistically significant differences between methods. All deviations of GSA metrics from the regression line are within inter-reader CI95. Most notable differences in GSA metrics were observed in the sagittal upper thoracic measures [PThK in Fig. 10(a) and MThK in Fig. 10(b)], but strong correspondence was seen between automatic and manual GSA metric estimates. Violin plots of GSA metric examples in Fig. 10 suggest similar GSA metric estimates for the three methods, although significant outliers (>2 standard deviation) were observed in manual definition for MThC and PThC, which are not present in either automatic method.

Fig. 10.

Fig. 10

GSA metric comparison for automatic and manual methods for (a)–(c) sagittal and (d)–(f) coronal metrics of spinal alignment. In each case, the PB regression shows correspondence between automatic and manual methods as in Fig. 9. Violin plots show sample points, distribution, median, and IQR in GSA metrics for each method.

Figure 11 shows PB regression between EndSeg and SpNorm for GSA metrics that require hip axis annotation. All tests rejected the alternative hypothesis (HA) and failed to indicate significant differences between GSA metric estimates. Correspondence between the two methods suggests extension of EndSeg and SpNorm to pelvic measures of alignment, although manual definitions of hip axis were not collected in the current work.

Fig. 11.

Fig. 11

Comparison of GSA metric estimated by EndSeg and SpNorm for (a) PI, (b) PI-LL, and (c) SVA. The PB regression plots show correspondence in GSA metrics between the two methods, and violin plots show the sample points, distribution, median, and IQR in GSA estimate for each case.

4. Discussion

Two methods for automatic analysis of GSA from spinal CT were presented in this work—the first (EndSeg) based on endplate angle definition analogous to conventional endplate visualization and the second (SpNorm) based on angles normal to a robust spline fit of vertebral body labels. Both take vertebral body annotations defined in CT as basic input, and each projects to a 2-D radiographic context to analyze GSA in terms common to conventional analysis (as defined in Fig. 1). EndSeg and SpNorm measurements demonstrated that 93.8% of endplate angles and all GSA metrics were within the inter-reader CI95 of manual measurements. There was no significant difference between GSA metrics determined by the manual and automatic methods.

Such automatic methods could help to avoid the high level of variability observed in manual definition (Table 1 and Refs. 810, and 13) and could streamline efficiency compared to fairly time-consuming manual analysis. Such capability could benefit more widespread analysis of GSA (e.g., retrospectively and intraoperatively) in large datasets and improve understanding of correlation between GSA and surgical outcomes. Such methods could be applied to intraoperative CT at the end of a case for validation of the surgical construct, with automatic analysis of the change in GSA imparted by surgery. In contrast, manual annotation disrupts surgical workflow and is not commonly practiced in intraoperative evaluation. With the advent of image analytic tools for correlative analyses of patient outcomes and risk factors, metrics computed by such automated method could help to glean important correlates and drive improvement in spinal surgery outcomes.

An important consideration for future work is the extension to more patient cases. This work describes application of both automatic algorithms to seven SpineWeb patient cases with similar scan protocols and patient pathologies, subsequently limiting the scope of the results. For future work, broader variation in CT scan protocol or patient pathology would permit validation of the algorithms for potentially broader application. The increased statistical power that accompanies an increase in number of cases could elucidate important differences/similarities between methods that are not evident from this work. Although SpNorm and EndSeg demonstrate similar performances within the current dataset, SpNorm is potentially more robust to vertebral shape abnormalities, owing to its shape-agnostic description of the curvature of the spine. Investigation of GSA analysis in pathologic spinal cases (e.g., ankylosing spondylitis and butterfly vertebra) could demonstrate further utility of SpNorm compared to endplate-based methods, in which pathologic vertebral shapes can alter representations of global spinal curvature.

Although EndSeg operates within a vertebral segmentation and both EndSeg and SpNorm require vertebral labels as input, existing methods for such inputs are numerous.1621 In the future, EndSeg will be investigated in unsegmented CT (i.e., operating on image gradients rather than segmentation boundaries), thus eliminating the need for segmentation prior to GSA measurements. Variations in CT scan protocols are particularly relevant, from which poor image quality (e.g., large slice thickness) can affect the accuracy of vertebral segmentation.

Inter- and intrareader variability reported in Sec. 3.1 quantify the reliability and reproducibility in DRRs alone. The measurements represent conservative lower bounds, since DRRs lack pertinent forms of image noise and artifacts that can challenge anatomical landmark identification in true radiographs. Thus the respective measures are likely lower bounds on manual variability and do not quantify all sources of variability present in radiographic measures. Comparison of automatically derived measures to manual ranges of variability serves as a preliminary baseline for validation in the current work.

The conventional approach to GSA analysis involves manual annotation of radiographs directly, whereas the current work applied EndSeg and SpNorm to radiographic GSA analysis from a CT image. Preoperative assessment of the patient commonly includes a CT image, whereas intraoperative and postoperative assessment is frequently limited to radiographs. Additional consideration to the influence of patient positioning during imaging on spinal curvature is important, where radiographs are typically taken under load-bearing conditions, and CT is typically taken with supine or prone positioning. Although findings have demonstrated a correlation between metrics of GSA between standing, supine, and prone positioning, direct comparison may be difficult. Direct analysis of GSA from radiographs in future work could more readily allow computation of perioperative changes in GSA.

Acknowledgments

This work was supported by the U.S. National Institutes of Health (NIH) (Grant No. R01-EB-017226). Preliminary results of this work were previously reported in SPIE Medical Imaging Proceedings 2019 by Doerr et. al.34

Biography

Biographies of the authors are not available.

Disclosures

No conflicts of interest, financial or otherwise, are declared by the authors.

Contributor Information

Sophia A. Doerr, Email: sdoerr1@jhu.edu.

Tharindu De Silva, Email: tsameera1@gmail.com.

Rohan Vijayan, Email: rvijaya3@jhmi.edu.

Runze Han, Email: rhan5@jhmi.edu.

Ali Uneri, Email: ali.uneri@jhu.edu.

Michael D. Ketcha, Email: mketcha3@jhmi.edu.

Xiaoxuan Zhang, Email: xzhan110@jhu.edu.

Nishanth Khanna, Email: nkhanna2@jhmi.edu.

Erick Westbroek, Email: erickw@jhmi.edu.

Bowen Jiang, Email: bjiang5@jhmi.edu.

Corinna Zygourakis, Email: czygour1@jhmi.edu.

Nafi Aygun, Email: naygun1@jhmi.edu.

Nicholas Theodore, Email: Theodore@jhmi.edu.

Jeffrey H. Siewerdsen, Email: jeff.siewerdsen@jhu.edu.

References

  • 1.Diebo B., et al. , “Global sagittal axis: a step toward full-body assessment of sagittal plane deformity in the human body,” J. Neurosurg. Spine. 25(4), 494–499 (2016). 10.3171/2016.2.SPINE151311 [DOI] [PubMed] [Google Scholar]
  • 2.Schwab F., Patel A., Ungar B., “Adult spinal deformity postoperative standing imbalance: how much can you tolerate? An overview of key parameters in assessing alignment and planning corrective surgery,” Spine (Phila Pa 1976) 35, 2224–2231 (2010). 10.1097/BRS.0b013e3181ee6bd4 [DOI] [PubMed] [Google Scholar]
  • 3.Glassman S., Bridwell K., Dimar J., “The impact of positive sagittal balance in adult spinal deformity,” Spine (Phila Pa 1976) 30, 2024–2029 (2005). 10.1097/01.brs.0000179086.30449.96 [DOI] [PubMed] [Google Scholar]
  • 4.Leveque J.-C. A., et al. , “A multicenter radiographic evaluation of the rates of preoperative and postoperative malalignment in degenerative spinal fusions,” Spine (Phila Pa 1976) 43(13), E782–E789 (2018). 10.1097/BRS.0000000000002500 [DOI] [PubMed] [Google Scholar]
  • 5.Nakai S., Yoshizawa H., Kobayashi S., “Long-term follow-up study of posterior lumbar interbody fusion,” J. Spinal Disord. 12(4), 293–299 (1999). 10.1097/00002517-199908000-00004 [DOI] [PubMed] [Google Scholar]
  • 6.Fon G. T., Pitt M. J., Thies A. C., “Thoracic kyphosis: range in normal subjects,” Am. J. Roentgenol. 134(5), 979–983 (1980). 10.2214/ajr.134.5.979 [DOI] [PubMed] [Google Scholar]
  • 7.Tempel Z., et al. , “The influence of pelvic incidence and lumbar lordosis mismatch on development of symptomatic adjacent level disease following single-level transforaminal lumbar interbody fusion,” Neurosurgery 80(6), 880–886 (2017). 10.1093/neuros/nyw073 [DOI] [PubMed] [Google Scholar]
  • 8.Dimar J., et al. , “Intra- and inter-observer reliability of determining radiographic sagittal parameters of the spine and pelvis using a manual and a computer-assisted method,” Eur. Spine J. 17(10), 1373–1379 (2008). 10.1007/s00586-008-0755-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Yamada K., Aota Y., Higashi T., “Accuracies in measuring spinopelvic parameters in full-spine lateral standing radiograph,” Spine 40, E640–E646 (2015). 10.1097/BRS.0000000000000904 [DOI] [PubMed] [Google Scholar]
  • 10.Dang N. R., et al. , “Intra-observer reproducibility and interobserver reliability of the radiographic parameters in the spinal deformity study group’s AIS radiographic measurement manual,” Spine 30(9), 1064–1069 (2005). 10.1097/01.brs.0000160840.51621.6b [DOI] [PubMed] [Google Scholar]
  • 11.Kyrola K., et al. , “Intra- and interrater reliability of sagittal spinopelvic parameters on full-spine radiographs in adults with symptomatic spinal disorders,” J. Neurospine 15(2), 175–181 (2018). 10.14245/ns.1836054.02 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Orht-Nissen S., et al. , “Reproducibility of thoracic kyphosis measurements in patients with adolescent idopathic scoliosis,” Scoliosis Spinal Disord. 12, 4 (2017). 10.1186/s13013-017-0112-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vaynrub M., et al. , “Validation of prone intraoperative measurements of global spinal alignment,” J. Neurosurg. Spine 29, 187–192 (2018). 10.3171/2018.1.SPINE17808 [DOI] [PubMed] [Google Scholar]
  • 14.Carman D., Browne R., Birch J. G., “Measurement of scoliosis and kyphosis radiographs. Intraobserver and interobserver variation,” J. Bone. Joint Surg. Am. 72, 328–333 (1990). 10.2106/00004623-199072030-00003 [DOI] [PubMed] [Google Scholar]
  • 15.Wu W., et al. , “Reliability and reproducibility analysis of the Cobb angle and assessing sagittal plane by computer-assisted and manual measurement tools,” BMC Musculoskelet. Disord. 15, 33 (2014). 10.1186/1471-2474-15-33 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Klinder T., et al. , “Automated model-based vertebra detection, identification, and segmentation in CT images,” Med. Image Anal. 13, 471–482 (2009). 10.1016/j.media.2009.02.004 [DOI] [PubMed] [Google Scholar]
  • 17.Naegel B., “Using mathematical morphology for the anatomical labeling of vertebrae from 3D CT-scan images,” Comput Med Imaging Graph. 31, 141–156 (2007). 10.1016/j.compmedimag.2006.12.001 [DOI] [PubMed] [Google Scholar]
  • 18.Glocker B., et al. , “Automatic localization and identification of vertebrae in arbitrary field-of-view CT scans,” Lect. Notes Comput. Sci. 7512, 590–598 (2012). 10.1007/978-3-642-33454-2_73 [DOI] [PubMed] [Google Scholar]
  • 19.Qadri S. F., et al. , “Automatic deep feature learning via patch-based deep belief network for vertebrae segmentation in CT images,” Appl. Sci. 9(1), 69 (2018). 10.3390/app9010069 [DOI] [Google Scholar]
  • 20.Yang D., et al. , “Automatic vertebra labeling in large-scale 3D CT using deep image-to-image network to message passing and sparsity regularization,” Lect. Notes Comput Sci. 10265, 633–644 (2017). [Google Scholar]
  • 21.Levine M., et al. , “Automatic vertebrae localization in spine CT: a deep-learning approach for image guidance and surgical data science,” Proc. SPIE 10951, 109510S (2019). 10.1117/12.2513915 [DOI] [Google Scholar]
  • 22.Ailon T., et al. , “Adult spinal deformity surgeons are unable to accurately predict postoperative spinal alignment using clinical judgment alone,” Spine Deform. 4(4), 323–329 (2016). 10.1016/j.jspd.2016.02.003 [DOI] [PubMed] [Google Scholar]
  • 23.Bourgeois A. C., et al. , “The evolution of image-guided lumbosacral spine surgery,” Ann. Transl. Med. 3(5), 69 (2015). 10.3978/j.issn.2305-5839.2015.02.01 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Holly L., Foley K., “Image guidance in spine surgery,” Orthop. Clin. North Am. 38(3), 451–461 (2007). 10.1016/j.ocl.2007.04.001 [DOI] [PubMed] [Google Scholar]
  • 25.Tjardes T., et al. , “Image-guided spine surgery: state of the art and future directions,” Eur. Spine J. 19(1), 25–45 (2010). 10.1007/s00586-009-1091-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.De Silva T., et al. , “SpineCloud: image analytics for predictive modeling of spine surgery outcomes,” J. Med. Imaging 7(3), 031502 (2020). 10.1117/1.JMI.7.3.031502 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Muñoz H. E., et al. , “Detection of vertebral degenerative disc disease based on cortical shell unwrapping,” Proc. SPIE 8670, 86700C (2013). 10.1117/12.2008063 [DOI] [Google Scholar]
  • 28.Koo T., Li M., “A guideline of selecting and reporting intraclass correlation coefficients for reliability research,” J. Chiropr. Med. 15(2), 155–163 (2016). 10.1016/j.jcm.2016.02.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Iquebal A. S., Bukkapatnam S., “Unsupervised image segmentation via maximum a posteriori estimation of continuous max-flow,” in csCV, arXiv:1811.00220 (2018). [Google Scholar]
  • 30.Pezold S., et al. , “Automatic segmentation of the spinal cord using continuous max flow with cross-sectional similarity prior and tubularity features,” in Recent Advances in Computational Methods and Clinical Applications for Spine Imaging, Yao J., et al., Eds., pp. 107–118, Springer International Publishing, Cham: (2015). [Google Scholar]
  • 31.Hotelling H., Analysis of a Complex of Statistical Variables into Principal Components, Warwick & York. Inc, Baltimore, Maryland: (1933). [Google Scholar]
  • 32.Adams R., Bischof L., “Seeded region growing,” IEEE Trans Pattern Anal Mach Intell. 16(6), 641–647 (1994). 10.1109/34.295913 [DOI] [Google Scholar]
  • 33.Passing H., Bablok W., “A new biometrical procedure for testing the equality of measurements from two different analytical methods,” J. Clin. Chem. Clin. Biochem. 21, 709–720 (1983). [DOI] [PubMed] [Google Scholar]
  • 34.Doerr S., et al. , “Automatic analysis of global spinal alignment from spine CT images,” Proc. SPIE 10951, 1095104 (2019). 10.1117/12.2513975 [DOI] [Google Scholar]

Articles from Journal of Medical Imaging are provided here courtesy of Society of Photo-Optical Instrumentation Engineers

RESOURCES