Automatic Localization and Brand Detection of Cervical Spine Hardware on Radiographs Using Weakly Supervised Machine Learning

Raman Dutt; Dylan Mendonca; Huai Ming Phen; Samuel Broida; Marzyeh Ghassemi; Judy Gichoya; Imon Banerjee; Tim Yoon; Hari Trivedi

doi:10.1148/ryai.210099

. 2022 Jan 19;4(2):e210099. doi: 10.1148/ryai.210099

Automatic Localization and Brand Detection of Cervical Spine Hardware on Radiographs Using Weakly Supervised Machine Learning

Raman Dutt ¹, Dylan Mendonca ¹, Huai Ming Phen ¹, Samuel Broida ¹, Marzyeh Ghassemi ¹, Judy Gichoya ¹, Imon Banerjee ¹, Tim Yoon ¹, Hari Trivedi ^1,^✉

PMCID: PMC8980883 PMID: 35391772

Abstract

Purpose

To develop an end-to-end pipeline to localize and identify cervical spine hardware brands on routine cervical spine radiographs.

Materials and Methods

In this single-center retrospective study, patients who received cervical spine implants between 2014 and 2018 were identified. Information on the implant model was retrieved from the surgical notes. The dataset was filtered for implants present in at least three patients, which yielded five anterior and five posterior hardware models for classification. Images for training were manually annotated with bounding boxes for anterior and posterior hardware. An object detection model was trained and implemented to localize hardware on the remaining images. An image classification model was then trained to differentiate between five anterior and five posterior hardware models. Model performance was evaluated on a holdout test set with 1000 iterations of bootstrapping.

Results

A total of 984 patients (mean age, 62 years ± 12 [standard deviation]; 525 women) were included for model training, validation, and testing. The hardware localization model achieved an intersection over union of 86.8% and an F1 score of 94.9%. For brand classification, an F1 score, sensitivity, and specificity of 98.7% ± 0.5, 98.7% ± 0.5, and 99.2% ± 0.3, respectively, were attained for anterior hardware, with values of 93.5% ± 2.0, 92.6% ± 2.0, and 96.1% ± 2.0, respectively, attained for posterior hardware.

Conclusion

The developed pipeline was able to accurately localize and classify brands of hardware implants using a weakly supervised learning framework.

Keywords: Spine, Convolutional Neural Network, Deep Learning Algorithms, Machine Learning Algorithms, Prostheses, Semisupervised Learning

Supplemental material is available for this article.

See also commentary by Huisman and Lessmann in this issue.

Keywords: Spine, Convolutional Neural Network, Deep Learning Algorithms, Machine Learning Algorithms, Prostheses, Semisupervised Learning

graphic file with name ryai.210099.VA.jpg

Summary

Localization and identification of cervical spine hardware brands with high accuracy was feasible using deep learning and image processing.

Key Points

■ Anterior and posterior cervical spine hardware can be localized on routine radiographs; the current study yielded an F1 score of 95.9% for detection and an intersection over union score of 86.8% for localization.
■ Convolutional neural networks can help identify brands of anterior and posterior hardware, as demonstrated herein by F1 scores of 98.7% and 93.5%, respectively.
■ Model performance for brand prediction was robust across anteroposterior and lateral images within an examination, with less than 3% disagreement between images for anterior hardware brands and 10% disagreement between images for posterior hardware brands.

Introduction

Cervical spine disorders such as spondylosis, disk herniation, and fracture can lead to severe disability and impairment in activities of daily living or can cause debilitating pain (1,2). Surgical fixation is often performed to alleviate these symptoms (3–6) and may involve implanting hardware in either the anterior or the posterior cervical spine, or both. An estimated 350 000 cervical procedures are performed each year in the United States, and a variety of implants from more than 150 cervical hardware manufacturers worldwide are used (7).

Unfortunately, repeat surgery in the cervical spine occurs frequently, most commonly related to adjacent segment disease, which may manifest as radiculomyelopathy involving the unfused levels above or below the previously operated segments (8,9). The incidence of symptomatic adjacent segment disease has historically been approximately 3% per year within the first 10 years of the index operation, with up to 25% of patients meeting criteria for revision surgery by 10 years after surgery (10,11). Typically, the repeat surgery is performed years after the index surgery, often at a different institution. Most hardware brands require their own set of surgical instruments to manipulate or remove; thus, accurate identification of the implanted hardware during preoperative evaluation is imperative. Although these hardware constructs may appear similar on radiographs, seemingly minor differences in screw head shape or depth can make removal impossible without the correct tool. Unfortunately, patient records are often unavailable when repeat surgery is being planned, necessitating a manual review of radiographs by surgeons or implant vendors to try to identify hardware brand and model. Even so, hardware often cannot be identified, or in some cases is misidentified. Failure to properly identify the hardware can result in delays in care or difficulties during the surgical procedure owing to unavailable or inappropriate equipment. For instance, the absence of a proper screwdriver could necessitate the use of metal cutting burs, resulting in metal debris in the soft tissues that would interfere with future imaging. In more extreme cases, it may be impossible to remove a screw at the desired level, requiring the surgeon to skip that level altogether, resulting in a nonideal spine construct. An accurate and automated method of cervical implant identification would help alleviate these problems.

The rapid development of deep learning applications for medical imaging over the past decade has resulted in exciting advancements in automated diagnosis and detection of abnormality. For example, in orthopedic applications, deep learning models have been developed to predict the development of cervical spondylotic myelopathy and Japanese Orthopedic Association scores based on MRI scans (12). In the area of hardware detection, deep learning has been applied for the identification of hardware brands for total joint arthroplasties (13–16). Model performance in these studies was good; however, there are key differences in hardware structure in these studies. Joint arthroplasties usually have large and more apparent differences between brands (eg, varying dimensions of stems, flanges, trunnions, and/or collars). In contrast, cervical spine hardware is much more similar across brands, with all hardware essentially consisting of screws and a plate or rod. Often the only identifiable difference between brands is an indentation on the plate or the thread pitch of the screws. Therefore, cervical spine hardware identification poses a substantial challenge for device identification, whether by radiologists or algorithms.

The goal of this study was to produce an end-to-end pipeline to localize and classify anterior and posterior cervical spine hardware brands on plain radiographs. Development of such a model would enable rapid, automated detection of cervical spine hardware brands and expedite revision surgery.

Materials and Methods

Dataset Preparation

Patient selection.—An overview of the entire pipeline is shown in Figure 1. This retrospective study was approved by the institutional review board at our institution, with Health Insurance Portability and Accountability Act approval; individual patient consent was waived. The records of 1031 patients who underwent cervical fixation surgery between 2014 and 2018 at our institution (a tertiary referral center) were identified. There was no patient overlap with prior studies. The surgical notes for these patients were extracted from the electronic health record and manually reviewed for the presence and brand of anterior and posterior spinal hardware. The date of surgery was recorded for each patient. All cervical spine radiographs acquired after the date of surgery for these patients were extracted and de-identified. A total of 16 patients were excluded because there were no radiographic examinations available after the date of surgery; examinations prior to the date of surgery were presumed to be without spinal hardware, and examinations on the date of surgery contained incomplete hardware with fluoroscopic views and therefore were not included. This resulted in 1015 patients (4223 examinations, 11 001 images) with 13 unique anterior brands and 21 posterior brands or brand combinations. An additional 12 patients (58 examinations, 179 images) with combinations of two posterior brands were excluded because there was no automated way to identify the brand of each implant based on the images. There were no patients with combinations of anterior brands. Finally, any hardware brand associated with fewer than three unique patients was discarded to ensure inclusion of at least one patient per hardware brand in the training, validation, and holdout test sets. Patients with both anterior and posterior hardware were retained. An additional 19 patients (110 examinations, 233 images) were excluded owing to insufficient numbers of cases for specific hardware brands. In total, 984 patients (4055 examinations, 10 589 images) with implants across five anterior and five posterior brands (Fig E1 [supplement]) were included. In total, 31 patients (3.2%) with available imaging and 412 images (3.7%) were excluded. Demographic information, including self-reported race and ethnicity, were collected from the electronic health record to better understand the patient distribution and potential implications for model generalizability.

Overview of patient and image selection for development of the object localization model. c-spine = cervical spine, pts = patients.

Implant detection and localization.—For model training, a total of 1044 de-identified images were uploaded to a commercial online annotation platform (MD.ai), and bounding boxes were drawn around anterior and posterior hardware by three radiology residents, resulting in 1559 bounding boxes (837 [53.7%] anterior, 722 [46.3%] posterior), as shown in Figure 2. All annotations were reviewed by H.T. (musculoskeletal radiologist with 3 years of experience) to ensure accuracy. Annotations were downloaded in .json (JavaScript Object Notation) format to train an object detection model, as described below. The trained object detection model was then used to localize and classify hardware as anterior or posterior on all 9545 remaining images.

Object detection model samples. (A) Sample ground truth annotation of anterior and posterior hardware annotated on the MD.ai platform (green boxes). (B) Predicted anterior (yellow) and posterior (blue) bounding boxes for the same image. (C) Sample prediction of anterior and posterior hardware on an anteroposterior radiograph. (D) Difficult lateral radiograph for object detection in which the more inferior anterior hardware was missed owing to overlying soft tissues. (E) Difficult anteroposterior radiograph for object detection in which the more inferior posterior hardware was missed owing to overlap by the anterior hardware. — Object detection model samples. **(A)** Sample ground truth annotation of anterior and posterior hardware annotated on the MD.ai platform (green boxes). **(B)** Predicted anterior (yellow) and posterior (blue) bounding boxes for the same image. **(C)** Sample prediction of anterior and posterior hardware on an anteroposterior radiograph. **(D)** Difficult lateral radiograph for object detection in which the more inferior anterior hardware was missed owing to overlying soft tissues. **(E)** Difficult anteroposterior radiograph for object detection in which the more inferior posterior hardware was missed owing to overlap by the anterior hardware.

Brand classification.—To split training, validation, and test data for brand classifications, all images were stratified initially by patient and then by examination. This stratification ensured that neither patients nor multiple images from the same examination would overlap between training, validation, and test sets and that overfitting of the model would be prevented. Hardware brands were divided among the sets according to their original prevalence, and to ensure the presence of at least one patient per brand for the training, validation, and test sets. A test set of 1900 randomly selected images was manually reviewed by H.T., and 53 images containing erroneous bounding boxes were discarded, yielding 1847 remaining images. The total number of patients, images, and hardware devices per brand in the training, validation, and test sets is presented in Table 1.

Table 1:

Summary of Patient, Examination, and Object Numbers for the Five Anterior and Five Posterior Hardware Brands

Open in a new tab

Implant Localization Model

For implant detection and localization, we divided the 1044 annotated images into 842 training, 99 validation, and 103 test images. We chose to use the EfficientDet architecture proposed by Tan et al (17); it is an object detection model derived from the EfficientNet family (18). EfficientNets have been widely adopted for several medical imaging tasks because of their ability to generalize well to small image datasets (19–21). We chose the EfficientDet D0 variant initialized from a checkpoint pretrained on the Common Objects in Context (COCO; Microsoft) dataset (22). This architecture was implemented as is without any substantial changes using the TensorFlow Object Detection API based on the TensorFlow (23) deep learning framework. We reduced the final number of classes to be detected to only two according to our dataset and problem statement. The model took input images with a matrix size of 512 × 512 and output a dictionary containing detected classes, coordinates, and confidence scores of each detection. Training was performed using the Adam optimizer (24), with a cosine decay learning rate initialized at a value of 0.08 and a batch size of 16, for a total of 25 000 steps. All images were histogram normalized. We noted that some pixel intensities on images were inverted; to correct this, all images with a value of 2 in the Digital Imaging and Communications in Medicine (DICOM) tag PhotometricInterpretation were inverted. This inversion corrected all images to the standard radiographic appearance. Finally, standard augmentations, including horizontal flipping, as well as hue, brightness, and saturation adjustments, were applied to images during training to increase generalizability.

Brand Label Classification

Previous work identified the importance of incorporating radiographic views when making predictions based on radiographs (25). Based on this work, we created the simpler Basic Append View (BAV) model (Fig 3), implemented using the PyTorch deep learning framework (version 1.6.0) (26). The BAV model uses a pretrained network backbone and incorporates the radiographic view as a feature to the fully connected head. The radiographic view (anteroposterior [AP] vs lateral) is important for model prediction because these views represent two orthogonal views of each implant. These views were identified using DICOM header ViewPosition. We chose a DenseNet121 (27) backbone based on the similar earlier work of Hashir et al (25).

Figure 3:

Open in a new tab

Architecture of the Basic Append View (BAV) model. Bounding boxes are input into the DenseNet121 network, which yields a 1000 logit output. The radiograph is appended as a one-hot encoded vector to the 1000 logit output, which is then propagated through two more fully connected layers with leaky rectified linear unit (ReLU) and dropout in between. A softmax function is implemented after the second linear layer (FC3) to predict probability that a bounding box corresponds to a specific brand. AP = anteroposterior, L = lateral, RGB = red green blue.

Predictions using hardware localization.—The bounding boxes produced by the object localization model were input into the DenseNet backbone, which yielded a 1000 logit output. A one-hot encoded vector representing the radiographic view (AP or lateral) was concatenated to this output and then fed through two linear layers (FC2 and FC3), with leaky rectified linear unit activation functions in between. A softmax function was applied to the final output to predict probabilities of a bounding box corresponding to each brand class. Dropout (28) layers were also implemented between the fully connected layers to prevent overfitting.

A separate BAV model was trained for the anterior and posterior hardware using cross-entropy loss. Brand imbalance in each of the datasets was accounted for when computing the loss function by altering the weight of each training label based on its prevalence in the training set. The model weights were optimized using the Adam optimizer, with separate learning rates for the pretrained backbone and fully connected head. Both learning rates were decayed by a factor of 0.5 every three epochs to reduce fluctuations toward the end of model training.

Standard augmentations were applied to the training dataset to increase model generalizability, including horizontal flipping, contrast and hue adjustments, and limited rotations and shearing. The augmented images were then normalized according to the mean and standard deviation (SD) of intensity of the training set and resized to 256 × 256 before being fed into the BAV model. For the validation and test sets, the images were only resized and normalized according to the training set mean and SD. Ten-fold cross-validation was used for hyperparameter tuning to optimize model performance.

Without using hardware localization.—To demonstrate the importance of object localization for brand classification, we also implemented the same brand classification pipeline described above using whole input images instead of bounding boxes per device. A separate BAV model was trained for images with anterior and posterior hardware using 10-fold cross-validation and was evaluated in the same fashion.

Statistical Analysis

All statistical metrics were calculated in Python using scikit-learn (version 0.23.2) (29). Weighted specificity was computed based on its theoretical definition, using NumPy (version 1.19.2) (30) and scikit-learn (version 0.23.2).

Hardware localization.—Performance of the object detection model is reported in two ways. Method 1 evaluates raw model performance by considering each bounding box independently, regardless of the presence of multiple anterior or posterior bounding boxes in a single image. If an image has two anterior hardware devices, detection of one device would yield only 50% accuracy.

Method 2 considers detection of at least one anterior or posterior bounding box to be a successful detection. In the same example with two anterior hardware devices, detection of at least one anterior device would still enable brand prediction in the next stage of the model and would thus register as a true positive. However, anterior and posterior hardware elements are still considered separately; thus, in an image with two anterior elements and one posterior element, detection of only a single anterior device counts as a true positive for anterior hardware and as a false negative for posterior hardware (Fig 4).

Examples of evaluation methods 1 and 2 for object detection. (A) Two instances of anterior hardware and one instance of posterior hardware (blue box), with detection of only half the anterior implants (yellow box). Under evaluation method 1, this would be counted as one true positive (TP) and one false negative (FN) for anterior, and one TP for posterior. Under evaluation method 2, this finding would be counted as one TP for both anterior and posterior classes. (B) Two instances of anterior hardware and one instance of posterior hardware (blue box), with both anterior hardware devices going undetected. This would count as two FN for anterior and one TP for posterior in method 1, and one FN for anterior and one TP for posterior in method 2. — Examples of evaluation methods 1 and 2 for object detection. **(A)** Two instances of anterior hardware and one instance of posterior hardware (blue box), with detection of only half the anterior implants (yellow box). Under evaluation method 1, this would be counted as one true positive (TP) and one false negative (FN) for anterior, and one TP for posterior. Under evaluation method 2, this finding would be counted as one TP for both anterior and posterior classes. **(B)** Two instances of anterior hardware and one instance of posterior hardware (blue box), with both anterior hardware devices going undetected. This would count as two FN for anterior and one TP for posterior in method 1, and one FN for anterior and one TP for posterior in method 2.

Brand prediction.—The brand prediction model was evaluated on the holdout set using 1000 iterations of bootstrapping. For each iteration, a randomly sized subset, ranging from 10% to the full size of the test set, was sampled from the test set with replacement. Weighted F1, precision, recall (sensitivity), specificity, and area under the receiver operating characteristic curve scores were calculated for each iteration. We report results as the average and error as the SD calculated across 1000 iterations of bootstrapping.

Assessment of images only by evaluating per bounding box could be overly optimistic because disagreement between brand predictions within a single image or examination would lead to confusion during clinical deployment. In an image with two anterior hardware devices, one correct brand prediction and one incorrect brand prediction would yield no benefit because the clinician would not know which to trust. Therefore, we also show performance in which a true positive was registered only if all bounding boxes for anterior or posterior hardware were correct. Any case in which the predictions within an examination disagreed was registered as unknown and reported separately.

A t-distributed stochastic neighbor embedding (t-SNE) analysis was conducted to provide insight into the ability of the model to separate brands. The final fully connected layer was removed from the best anterior and posterior models. The anterior and posterior test datasets were run through the respective modified models, yielding 256-dimensional outputs. t-SNE was implemented using scikit-learn (version 0.23.2) to compress these output matrices into two-dimensional space (Figs E2 and E3 [supplement]).

Model Availability

The final hardware localization and brand classification models are released for public use at www.github.com/emory-hiti. We also plan to develop a website to enable physicians and researchers to upload their images to identify hardware brands in their patients.

Results

Patient and hardware overview.—A total of 984 patients were included (mean age, 62 years ± 12 [SD]; 525 women [53.4%]). Distribution by self-reported race was as follows: 240 African American (24.4%), two American Indian or Alaska Native (0.2%), 20 Asian (2.0%), 709 White (72.1%), one Native Hawaiian or Other Pacific Islander (0.1%), one multiple (0.1%), and 11 unavailable (1.1%). Distribution by self-reported ethnicity was as follows: 12 (1.2%) Hispanic or Latino, 894 non-Hispanic or -Latino (90.9%), and 78 unavailable (7.9%) (Table 2). Mean body mass index was 27.9 kg/m² ± 8.3. A total of 932 of 984 (94.7%) International Statistical Classification of Diseases, Tenth Revision, clinical indications for surgery were spondylosis (M47), spinal stenosis (M48), or cervical disk disorder (M50) based on direct extraction from the electronic health record; this method is sometimes unreliable, however. Time between implantation and imaging ranged from 1 day to 18.1 years (mean, 7 months ± 12). Hardware vendors were GE Healthcare, Carestream Health, Fujifilm, and Kodak. Radiographic views (DICOM tag ViewPosition) included were as follows: AP, AP open mouth, posteroanterior, lateral, lateral flexion, lateral extension, left lateral, swimmers, right posterior oblique, left posterior oblique, left anterior oblique, right anterior oblique, and right lateral.

Table 2:

Patient Demographics

graphic file with name ryai.210099tbl2.jpg

Open in a new tab

Hardware localization.—Hardware localization attained an intersection over union of 86.8% for localization of anterior and posterior hardware and an overall F1 score (threshold = 50%) of 94.9% for detection of individual anterior and posterior hardware devices (method 1). Performance improved slightly when evaluating for detection of at least one anterior or posterior device (method 2), with an F1 score of 97.5%. Performance was slightly higher for anterior devices (F1 score, 96.2%) than posterior devices (F1 score, 93.6%) (Table 3). The performance increase between evaluation methods 1 and 2 was greater for posterior hardware than for anterior hardware; frequently, there were three to four individual discontinuous devices posteriorly (eg, laminoplasty plates), which can be small and less radiopaque. Missed detection of one of these devices would decrease performance in method 1 but would not affect the performance of method 2 because at least one object would still be detected.

Table 3:

Results of Object Localization for Anterior and Posterior Hardware

graphic file with name ryai.210099tbl3.jpg

Open in a new tab

Examples of good anterior and posterior predictions, as well as difficult AP and lateral images for object detection, are shown in Figure 2. Cases of missed objects were common for anterior hardware in the lower cervical spine owing to obscuration by overlying soft tissue. Missed posterior rod and screw implants were rare, whereas missed laminoplasty plates were more common. There was a similar performance for detection on both AP and lateral images, with more frequent misses noted for posterior hardware on AP views as a result of obscuration by anterior hardware.

Hardware classification with implant localization.—Results for brand classification with implant localization are summarized in Table 4. Overall performance was excellent, with an F1 score, sensitivity, and specificity of 98.7% ± 0.5, 98.7% ± 0.5, and 99.2% ± 0.3, respectively, for anterior hardware classification, and of 93.5% ± 2.0, 92.6% ± 2.0, and 96.1% ± 2.0, respectively, for posterior hardware classification on the manually verified test set. Brand prediction performance on AP and lateral views was similar for anterior hardware, but it was slightly higher on lateral views for posterior hardware. Prediction performance was also consistent between multiple AP and lateral images in the same examination, with less than 3% disagreement (15 of 564) for anterior brands and 10% disagreement (22 of 219) for posterior brands.

Table 4:

Brand Prediction of Anterior and Posterior Hardware for Anteroposterior Radiographs, Lateral Radiographs, and Overall

graphic file with name ryai.210099tbl4.jpg

Open in a new tab

Hardware classification without implant localization.—The BAV model, which used entire images as input without implant localization, demonstrated lower overall performance, with an F1 score, sensitivity, and specificity of 92.8% ± 2.0, 93.1% ± 2.0, and 96.3% ± 1.0, respectively, for anterior hardware and of 83.7% ± 5.0, 83.9% ± 4.0, and 85.9% ± 6.0, respectively, for posterior hardware (Table 5).

Table 5:

Results for Brand Prediction of Anterior and Posterior Hardware without Object Localization

graphic file with name ryai.210099tbl5.jpg

Open in a new tab

Classification and mismatches.—Confusion matrices for each brand of anterior and posterior hardware are presented in Figure 5. For anterior hardware, there were only 16 test examples of NuVasive Archon, and only three misclassifications occurred. Posterior hardware classification was low between NuVasive Vuepoint and NuVasive Vuepoint II, presumably owing to similarities in appearance. When considering consistency of model predictions between multiple images in the same examination, there were few mismatches between predicted brands. These results indicate that the model accurately identifies brand-identifying features on both AP and lateral images. Instances of mismatch are denoted in the unknown column of Figure 5, labeled as such because conflicting brand predictions in a single examination effectively yields an unusable result.

Figure 5:

Open in a new tab

Model performance for anterior hardware by brand on a (top left) per object and (top right) per accession basis. There is strong performance for all brands, with lower performance for NuVasive Archon, for which there was the lowest number of samples. Evaluation by accession shows consistent results, which indicates that brands were predicted consistently for all images within a given examination. Performance was lower in this case for Medtronic Zevo, which indicates that two different brands were predicted across images from the same examination. Model performance for posterior hardware by brand on a per object (bottom left) and per accession (bottom right) basis. Performance overall was less consistent than with anterior hardware, with relatively frequent errors between NuVasive Viewpoint and NuVasive Viewpoint II. Even so, evaluation by accession showed good results for most brands; interestingly, it showed fewer discrepancies between multiple images in the same examination. BM = Biomet MaxAn, DSC = DePuy Synthes CSLP-Cervical Spine Locking Plate, DPM = DePuy Synthes Mountaineer, MAVE = Medtronic Atlantis Vision Elite, MC = Medtronic Centerpiece, MI = Medtronic Infinity, MZ = Medtronic Zevo, NA = NuVasive Archon, NV = NuVasive Vuepoint, NVii = NuVasive Vuepoint II, UNK = unknown.

Discussion

The number of repeat surgeries required to manage cervical spine disorders is expected to increase over time in the aging population, with revision rates as high as 20% (31,32). With the large number of implants available worldwide, it is challenging for any individual surgeon or radiologist, or even implant vendor, to easily recognize the type of implant on radiographs, and few patients arrive with medical records from prior treatment. The inability to accurately identify an implant causes substantial morbidity before and after surgery.

We demonstrate a model that is able to accurately localize and classify brands of cervical spine hardware on routine spine radiographs at our institution. Hardware localization performance was excellent and tolerant to variations in patient positioning, laterality, and view type. Brand prediction metrics were high for both anterior and posterior hardware, even in cases in which the number of samples available was low. We also showed strong internal consistency between brand predictions on multiple AP and lateral images from the same examination. The largest number of errors occurred between posterior hardware brands NuVasive Vuepoint and NuVasive Vuepoint II, presumably owing to similar features between two devices in the same product line. Performance was higher for anterior implants than for posterior implants (overall F1 score, 98.7% and 93.5%, respectively), possibly because some posterior hardware is multisegmented, whereas anterior hardware is usually contiguous. With anterior implants, differences between plates are more easily recognizable on AP radiographs. We also show the importance of implant localization for model performance; the F1 score from the whole-image BAV model was lower than brand classification with implant localization for anterior hardware (92.8% vs 98.7%, respectively) and posterior hardware (83.7% vs 93.5%, respectively).

A strength of this work is that data preprocessing and filtration was largely automated and could be reproduced at other institutions. Intraoperative radiographic views were excluded using only knowledge of the date of surgery, and hardware brands were excluded only if they were present in fewer than three patients (one patient each for training, validation, and test sets). Otherwise, no manual data processing or curation was required to run the model. Training annotations consisted only of obtained brand names from surgical reports and annotation of hardware position using bounding boxes on a subset of cases. However, we believe that manual hardware localization may not be necessary at other institutions unless localization performance decreases, in which case the model could be fine-tuned using a smaller number of cases. In this case, it may be possible to fine-tune the model to recognize additional hardware brands with less effort since only brands will need to be identified from surgical records.

This work had limitations. Despite coverage of all common brands at our institution, only 10 total hardware models were included, which is relatively few brands compared with the total number in use in the United States. It is possible that our model could be overfit to radiographs from our institution and that performance could be reduced in different patient populations. It is also possible that model performance could be reduced as more brands are added; however, external collaboration with additional samples would be required to test both of these issues. Performance of brand prediction is predicated on accuracy of hardware localization, and localization performance was imperfect, particularly in cases in which hardware devices are partially obscured. This could lead to systematic hardware nondetection or brand misclassification in certain patients with multiple overlapping devices, immobility, or obesity in which imaging may be difficult. Finally, the nature of our dataset collection resulted in radiographs acquired at a mean of 7 months ± 12 after hardware implantation. It is unclear if performance would decrease on radiographs acquired much later than the time of surgery; however, we hypothesize that performance may be similar because hardware appearance is not expected to change over time in situ.

We developed a model for automatic cervical spine hardware localization and brand identification using weakly supervised machine learning. The model is released, and we plan to develop a website to allow physicians to test the model on their own images and allow us to more easily collect additional hardware brands to expand the system, something that hardware manufacturers could directly participate in by contributing radiographic images of their hardware. The pipeline is flexible and could be adapted for use for additional implants elsewhere in the body, and even for other devices, such as pacemakers.

R.D. and D.M. contributed equally to this work.

Authors declared no funding for this work.

Disclosures of Conflicts of Interest: R.D. No relevant relationships. D.M. No relevant relationships. H.M.P. No relevant relationships. S.B. No relevant relationships. M.G. No relevant relationships. J.G. Institution receives grant from NSF (NSF Future of Work grant) and NIH MIDRC grant; associate editor of Radiology: Artificial Intelligence. I.B. No relevant relationships. T.Y. Travel expense from AOSpine and ISSLS. H.T. No relevant relationships.

Abbreviations:

AP: anteroposterior
BAV: Basic Append View
DICOM: Digital Imaging and Communications in Medicine
SD: standard deviation
t-SNE: t-distributed stochastic neighbor embedding

References

1. Burneikiene S , Nelson EL , Mason A , Rajpal S , Villavicencio AT . The duration of symptoms and clinical outcomes in patients undergoing anterior cervical discectomy and fusion for degenerative disc disease and radiculopathy . Spine J 2015. ; 15 ( 3 ): 427 – 432 . [DOI] [PubMed] [Google Scholar]
2. Sampath P , Bendebba M , Davis JD , Ducker TB . Outcome of patients treated for cervical myelopathy. A prospective, multicenter study with independent clinical review . Spine (Phila Pa 1976) 2000. ; 25 ( 6 ): 670 – 676 . [DOI] [PubMed] [Google Scholar]
3. Veeravagu A , Connolly ID , Lamsam L , et al . Surgical outcomes of cervical spondylotic myelopathy: an analysis of a national, administrative, longitudinal database . Neurosurg Focus 2016. ; 40 ( 6 ): E11 . [DOI] [PubMed] [Google Scholar]
4. Wang LN , Wang L , Song YM , Yang X , Liu LM , Li T . Clinical and radiographic outcome of unilateral open-door laminoplasty with alternative levels centerpiece mini-plate fixation for cervical compressive myelopathy: a five-year follow-up study . Int Orthop 2016. ; 40 ( 6 ): 1267 – 1274 . [DOI] [PubMed] [Google Scholar]
5. Galivanche AR , Gala R , Bagi PS , et al . Perioperative Outcomes in 17,947 Patients Undergoing 2-Level Anterior Cervical Discectomy and Fusion Versus 1-Level Anterior Cervical Corpectomy for Treatment of Cervical Degenerative Conditions: A Propensity Score Matched National Surgical Quality Improvement Program Analysis . Neurospine 2020. ; 17 ( 4 ): 871 – 878 . [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Sampath P , Bendebba M , Davis JD , Ducker T . Outcome in patients with cervical radiculopathy. Prospective, multicenter study with independent clinical review . Spine (Phila Pa 1976) 1999. ; 24 ( 6 ): 591 – 597 . [DOI] [PubMed] [Google Scholar]
7. Spinal Implants Market Analysis, Size, Share & Trends | Global | 2019-2025 | MedSuite . iData Research . https://idataresearch.com/product/spinal-implants-market/. Published June 3, 2019. Accessed March 31, 2021 . [Google Scholar]
8. Hilibrand AS , Carlson GD , Palumbo MA , Jones PK , Bohlman HH . Radiculopathy and myelopathy at segments adjacent to the site of a previous anterior cervical arthrodesis . J Bone Joint Surg Am 1999. ; 81 ( 4 ): 519 – 528 . [DOI] [PubMed] [Google Scholar]
9. Ishihara H , Kanamori M , Kawaguchi Y , Nakamura H , Kimura T . Adjacent segment disease after anterior cervical interbody fusion . Spine J 2004. ; 4 ( 6 ): 624 – 628 . [DOI] [PubMed] [Google Scholar]
10. Chung JY , Kim SK , Jung ST , Lee KB . Clinical adjacent-segment pathology after anterior cervical discectomy and fusion: results after a minimum of 10-year follow-up . Spine J 2014. ; 14 ( 10 ): 2290 – 2298 . [DOI] [PubMed] [Google Scholar]
11. Bydon M , Xu R , De la Garza-Ramos R , et al . Adjacent segment disease after anterior cervical discectomy and fusion: Incidence and clinical outcomes of patients requiring anterior versus posterior repeat cervical fusion . Surg Neurol Int 2014. ; 5 ( Suppl 3 ): S74 – S78 . [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Hopkins BS , Weber KA , Kesavabhotla K , Paliwal M , Cantrell DR , Smith ZA . Machine learning for the prediction of cervical spondylotic myelopathy: a post hoc pilot study of 28 participants . World Neurosurg 2019. ; 127 : e436 – e442 . [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Yi PH , Wei J , Kim TK , et al . Automated detection & classification of knee arthroplasty using deep learning . Knee 2020. ; 27 ( 2 ): 535 – 542 . [DOI] [PubMed] [Google Scholar]
14. Kang YJ , Yoo JI , Cha YH , Park CH , Kim JT . Machine learning-based identification of hip arthroplasty designs . J Orthop Translat 2019. ; 21 ( 13 ): 17 . [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Yi PH , Kim TK , Wei J , et al . Automated detection and classification of shoulder arthroplasty models using deep learning . Skeletal Radiol 2020. ; 49 ( 10 ): 1623 – 1632 . [DOI] [PubMed] [Google Scholar]
16. Borjali A , Chen AF , Bedair HS , et al . Comparing the performance of a deep convolutional neural network with orthopedic surgeons on the identification of total hip prosthesis design from plain radiographs . Med Phys 2021. ; 48 ( 5 ): 2327 – 2336 . [DOI] [PubMed] [Google Scholar]
17.Tan M, Pang R, Le QV. EfficientDet: Scalable and Efficient Object Detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, Wash, June 13–19, 2020.Piscataway, NJ:IEEE;2020;10778–10787. [Google Scholar]
18. Tan M , Le QV . EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks . arXiv 1905.11946 [preprint] https://arxiv.org/abs/1905.11946. Posted May 28, 2019. Accessed March 31, 2021 . [Google Scholar]
19. Marques G , Agarwal D , de la Torre Díez I . Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network . Appl Soft Comput 2020. ; 96 : 106691 . [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Miglani V, Bhatia MPS. Skin Lesion Classification: A Transfer Learning Approach Using EfficientNets. In: Hassanien AE, Bhatnagar R, Darwish A, eds.Advanced Machine Learning Technologies and Applications. AMLTA 2020. Advances in Intelligent Systems and Computing,vol 1141.Singapore:Springer,2021;315–324. [Google Scholar]
21. Gessert N , Nielsen M , Shaikh M , Werner R , Schlaefer A . Skin lesion classification using ensembles of multi-resolution EfficientNets with meta data . MethodsX 2020. ; 7 100864 . [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Lin TY, Maire M, Belongie S, et al. Microsoft COCO: Common Objects in Context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, eds.Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693.Cham, Switzerland:Springer,2014;740–755. [Google Scholar]
23.Abadi M, Barham P, Chen J, et al. TensorFlow: a system for large-scale machine learning. In: OSDI’16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, 2016;265–283. [Google Scholar]
24. Kingma DP , Ba J . Adam: A Method for Stochastic Optimization . arXiv 1412.6980 [preprint] https://arxiv.org/abs/1412.6980. Posted December 22, 2014. Accessed March 31, 2021 . [Google Scholar]
25. Hashir M , Bertrand H , Cohen JP . Quantifying the Value of Lateral Views in Deep Learning for Chest X-rays . arXiv 2002.02582 [preprint] https://arxiv.org/abs/2002.02582. Posted February 7, 2020. Accessed March 31, 2021 . [Google Scholar]
26. Paszke A , Gross S , Massa F , et al . Pytorch: An imperative style, high-performance deep learning library . arXiv 1912.01703 [preprint] https://arxiv.org/abs/1912.01703. Posted December 3, 2019. Accessed March 31, 2021 . [Google Scholar]
27.Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),Honolulu, HI,July 21–26, 2017.Piscataway, NJ:IEEE;2017;2261–2269. [Google Scholar]
28. Srivastava N , Hinton G , Krizhevsky A , Sutskever I , Salakhutdinov R . Dropout: A Simple Way to Prevent Neural Networks from Overfitting . J Mach Learn Res 2014. ; 15 ( 56 ): 1929 – 1958 . [Google Scholar]
29. Pedregosa F , Varoquaux G , Gramfort A , et al . Scikit-learn: Machine Learning in Python . J Mach Learn Res 2011. ; 12 ( 85 ): 2825 – 2830 . [Google Scholar]
30. Harris CR , Millman KJ , van der Walt SJ , et al . Array programming with NumPy . Nature 2020. ; 585 ( 7825 ): 357 – 362 . [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Rajaee SS , Kanim LEA , Bae HW . National trends in revision spinal fusion in the USA: patient characteristics and complications . Bone Joint J 2014. ; 96 - B(6) : 807 – 816 . [DOI] [PubMed] [Google Scholar]
32. Saifi C , Fein AW , Cazzulino A , et al . Trends in resource utilization and rate of cervical disc arthroplasty and anterior cervical discectomy and fusion throughout the United States from 2006 to 2013 . Spine J 2018. ; 18 ( 6 ): 1022 – 1029 . [DOI] [PubMed] [Google Scholar]

[r1] 1. Burneikiene S , Nelson EL , Mason A , Rajpal S , Villavicencio AT . The duration of symptoms and clinical outcomes in patients undergoing anterior cervical discectomy and fusion for degenerative disc disease and radiculopathy . Spine J 2015. ; 15 ( 3 ): 427 – 432 . [DOI] [PubMed] [Google Scholar]

[r2] 2. Sampath P , Bendebba M , Davis JD , Ducker TB . Outcome of patients treated for cervical myelopathy. A prospective, multicenter study with independent clinical review . Spine (Phila Pa 1976) 2000. ; 25 ( 6 ): 670 – 676 . [DOI] [PubMed] [Google Scholar]

[r3] 3. Veeravagu A , Connolly ID , Lamsam L , et al . Surgical outcomes of cervical spondylotic myelopathy: an analysis of a national, administrative, longitudinal database . Neurosurg Focus 2016. ; 40 ( 6 ): E11 . [DOI] [PubMed] [Google Scholar]

[r4] 4. Wang LN , Wang L , Song YM , Yang X , Liu LM , Li T . Clinical and radiographic outcome of unilateral open-door laminoplasty with alternative levels centerpiece mini-plate fixation for cervical compressive myelopathy: a five-year follow-up study . Int Orthop 2016. ; 40 ( 6 ): 1267 – 1274 . [DOI] [PubMed] [Google Scholar]

[r5] 5. Galivanche AR , Gala R , Bagi PS , et al . Perioperative Outcomes in 17,947 Patients Undergoing 2-Level Anterior Cervical Discectomy and Fusion Versus 1-Level Anterior Cervical Corpectomy for Treatment of Cervical Degenerative Conditions: A Propensity Score Matched National Surgical Quality Improvement Program Analysis . Neurospine 2020. ; 17 ( 4 ): 871 – 878 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6] 6. Sampath P , Bendebba M , Davis JD , Ducker T . Outcome in patients with cervical radiculopathy. Prospective, multicenter study with independent clinical review . Spine (Phila Pa 1976) 1999. ; 24 ( 6 ): 591 – 597 . [DOI] [PubMed] [Google Scholar]

[r7] 7. Spinal Implants Market Analysis, Size, Share & Trends | Global | 2019-2025 | MedSuite . iData Research . https://idataresearch.com/product/spinal-implants-market/. Published June 3, 2019. Accessed March 31, 2021 . [Google Scholar]

[r8] 8. Hilibrand AS , Carlson GD , Palumbo MA , Jones PK , Bohlman HH . Radiculopathy and myelopathy at segments adjacent to the site of a previous anterior cervical arthrodesis . J Bone Joint Surg Am 1999. ; 81 ( 4 ): 519 – 528 . [DOI] [PubMed] [Google Scholar]

[r9] 9. Ishihara H , Kanamori M , Kawaguchi Y , Nakamura H , Kimura T . Adjacent segment disease after anterior cervical interbody fusion . Spine J 2004. ; 4 ( 6 ): 624 – 628 . [DOI] [PubMed] [Google Scholar]

[r10] 10. Chung JY , Kim SK , Jung ST , Lee KB . Clinical adjacent-segment pathology after anterior cervical discectomy and fusion: results after a minimum of 10-year follow-up . Spine J 2014. ; 14 ( 10 ): 2290 – 2298 . [DOI] [PubMed] [Google Scholar]

[r11] 11. Bydon M , Xu R , De la Garza-Ramos R , et al . Adjacent segment disease after anterior cervical discectomy and fusion: Incidence and clinical outcomes of patients requiring anterior versus posterior repeat cervical fusion . Surg Neurol Int 2014. ; 5 ( Suppl 3 ): S74 – S78 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r12] 12. Hopkins BS , Weber KA , Kesavabhotla K , Paliwal M , Cantrell DR , Smith ZA . Machine learning for the prediction of cervical spondylotic myelopathy: a post hoc pilot study of 28 participants . World Neurosurg 2019. ; 127 : e436 – e442 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13] 13. Yi PH , Wei J , Kim TK , et al . Automated detection & classification of knee arthroplasty using deep learning . Knee 2020. ; 27 ( 2 ): 535 – 542 . [DOI] [PubMed] [Google Scholar]

[r14] 14. Kang YJ , Yoo JI , Cha YH , Park CH , Kim JT . Machine learning-based identification of hip arthroplasty designs . J Orthop Translat 2019. ; 21 ( 13 ): 17 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r15] 15. Yi PH , Kim TK , Wei J , et al . Automated detection and classification of shoulder arthroplasty models using deep learning . Skeletal Radiol 2020. ; 49 ( 10 ): 1623 – 1632 . [DOI] [PubMed] [Google Scholar]

[r16] 16. Borjali A , Chen AF , Bedair HS , et al . Comparing the performance of a deep convolutional neural network with orthopedic surgeons on the identification of total hip prosthesis design from plain radiographs . Med Phys 2021. ; 48 ( 5 ): 2327 – 2336 . [DOI] [PubMed] [Google Scholar]

[r17] 17.Tan M, Pang R, Le QV. EfficientDet: Scalable and Efficient Object Detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, Wash, June 13–19, 2020.Piscataway, NJ:IEEE;2020;10778–10787. [Google Scholar]

[r18] 18. Tan M , Le QV . EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks . arXiv 1905.11946 [preprint] https://arxiv.org/abs/1905.11946. Posted May 28, 2019. Accessed March 31, 2021 . [Google Scholar]

[r19] 19. Marques G , Agarwal D , de la Torre Díez I . Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network . Appl Soft Comput 2020. ; 96 : 106691 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20] 20.Miglani V, Bhatia MPS. Skin Lesion Classification: A Transfer Learning Approach Using EfficientNets. In: Hassanien AE, Bhatnagar R, Darwish A, eds.Advanced Machine Learning Technologies and Applications. AMLTA 2020. Advances in Intelligent Systems and Computing,vol 1141.Singapore:Springer,2021;315–324. [Google Scholar]

[r21] 21. Gessert N , Nielsen M , Shaikh M , Werner R , Schlaefer A . Skin lesion classification using ensembles of multi-resolution EfficientNets with meta data . MethodsX 2020. ; 7 100864 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r22] 22.Lin TY, Maire M, Belongie S, et al. Microsoft COCO: Common Objects in Context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, eds.Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693.Cham, Switzerland:Springer,2014;740–755. [Google Scholar]

[r23] 23.Abadi M, Barham P, Chen J, et al. TensorFlow: a system for large-scale machine learning. In: OSDI’16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, 2016;265–283. [Google Scholar]

[r24] 24. Kingma DP , Ba J . Adam: A Method for Stochastic Optimization . arXiv 1412.6980 [preprint] https://arxiv.org/abs/1412.6980. Posted December 22, 2014. Accessed March 31, 2021 . [Google Scholar]

[r25] 25. Hashir M , Bertrand H , Cohen JP . Quantifying the Value of Lateral Views in Deep Learning for Chest X-rays . arXiv 2002.02582 [preprint] https://arxiv.org/abs/2002.02582. Posted February 7, 2020. Accessed March 31, 2021 . [Google Scholar]

[r26] 26. Paszke A , Gross S , Massa F , et al . Pytorch: An imperative style, high-performance deep learning library . arXiv 1912.01703 [preprint] https://arxiv.org/abs/1912.01703. Posted December 3, 2019. Accessed March 31, 2021 . [Google Scholar]

[r27] 27.Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),Honolulu, HI,July 21–26, 2017.Piscataway, NJ:IEEE;2017;2261–2269. [Google Scholar]

[r28] 28. Srivastava N , Hinton G , Krizhevsky A , Sutskever I , Salakhutdinov R . Dropout: A Simple Way to Prevent Neural Networks from Overfitting . J Mach Learn Res 2014. ; 15 ( 56 ): 1929 – 1958 . [Google Scholar]

[r29] 29. Pedregosa F , Varoquaux G , Gramfort A , et al . Scikit-learn: Machine Learning in Python . J Mach Learn Res 2011. ; 12 ( 85 ): 2825 – 2830 . [Google Scholar]

[r30] 30. Harris CR , Millman KJ , van der Walt SJ , et al . Array programming with NumPy . Nature 2020. ; 585 ( 7825 ): 357 – 362 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r31] 31. Rajaee SS , Kanim LEA , Bae HW . National trends in revision spinal fusion in the USA: patient characteristics and complications . Bone Joint J 2014. ; 96 - B(6) : 807 – 816 . [DOI] [PubMed] [Google Scholar]

[r32] 32. Saifi C , Fein AW , Cazzulino A , et al . Trends in resource utilization and rate of cervical disc arthroplasty and anterior cervical discectomy and fusion throughout the United States from 2006 to 2013 . Spine J 2018. ; 18 ( 6 ): 1022 – 1029 . [DOI] [PubMed] [Google Scholar]

PERMALINK

Automatic Localization and Brand Detection of Cervical Spine Hardware on Radiographs Using Weakly Supervised Machine Learning

Raman Dutt, BS

Dylan Mendonca, MEng

Huai Ming Phen, MBBS

Samuel Broida, MD

Marzyeh Ghassemi, PhD

Judy Gichoya, MD

Imon Banerjee, PhD

Tim Yoon, MD

Hari Trivedi, MD

Abstract

Purpose

Materials and Methods

Results

Conclusion

Summary

Key Points

Introduction

Materials and Methods

Dataset Preparation

Figure 1:

Figure 2:

Table 1:

Implant Localization Model

Brand Label Classification

Figure 3:

Statistical Analysis

Figure 4:

Model Availability

Results

Table 2:

Table 3:

Table 4:

Table 5:

Figure 5:

Discussion

Abbreviations:

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases