Skip to main content
Operative Neurosurgery logoLink to Operative Neurosurgery
. 2022 Aug 8;23(4):279–286. doi: 10.1227/ons.0000000000000322

Image Segmentation of Operative Neuroanatomy Into Tissue Categories Using a Machine Learning Construct and Its Role in Neurosurgical Training

Andrew J Witten *, Neal Patel *,, Aaron Cohen-Gadol §,‖,
PMCID: PMC10637405  PMID: 36103318

Abstract

BACKGROUND:

The complexity of the relationships among the structures within the brain makes efficient mastery of neuroanatomy difficult for medical students and neurosurgical residents. Therefore, there is a need to provide real-time segmentation of neuroanatomic images taken from various perspectives to assist with training.

OBJECTIVE:

To develop the initial foundation of a neuroanatomic image segmentation algorithm using artificial intelligence for education.

METHODS:

A pyramidal scene-parsing network with a convolutional residual neural network backbone was assessed for its ability to accurately segment neuroanatomy images. A data set of 879 images derived from The Neurosurgical Atlas was used to train, validate, and test the network. Quantitative assessment of the segmentation was performed using pixel accuracy, intersection-over-union, the Dice similarity coefficient, precision, recall, and the boundary F1 score.

RESULTS:

The network was trained, and performance was assessed class wise. Compared with the ground truth annotations, the ensembled results for our artificial intelligence framework for the pyramidal scene-parsing network during testing generated a total pixel accuracy of 91.8%.

CONCLUSION:

Using the presented methods, we show that a convolutional neural network can accurately segment gross neuroanatomy images, which represents an initial foundation in artificial intelligence gross neuroanatomy that will aid future neurosurgical training. These results also suggest that our network is sufficiently robust, to an unprecedented level, for performing anatomic category recognition in a clinical setting.

KEY WORDS: Artificial intelligence, Convolutional neural network, Image segmentation, Neurosurgical education, Operative neuroanatomy


ABBREVIATIONS:

AI

artificial intelligence

BF

boundary F1

CNN

convolutional neural network

DSC

dice similarity coefficient

IoU

intersection-over-union

PSPNet

pyramidal scene-parsing network

ResNet

residual neural network

SVG

Scalable Vector Graphics.

Cadaver dissection has been a centerpiece of modern medical education. It is also considered a rite of passage in which students develop a grasp of bodily parts.1 More recently, in medical schools, there has been a steady decline in the time allocated for teaching the anatomy curriculum. However, the time dedicated to teaching neuroanatomy has been protected, which suggests that anatomy educators recognize cerebral and spinal anatomy, with its structural and functional complexities, as a difficult topic.2 Delivering a rigorous, connected, and integrated neuroanatomy curriculum at any level is a challenge. Considering this situation, it is important to know where the weaknesses in student knowledge exist so that time and resources can be allocated accordingly.

Many neurosurgeons enter the field of neurosurgery with the desire to master and understand the complexity behind neurosurgical anatomy. A complete and thorough understanding of neuroanatomy is quintessential in becoming a skillful neurosurgeon.3 As we adjust to factors such as duty-hour restrictions imposed on neurosurgical trainees, it is necessary to create new innovative models to enhance learning. We should expect these novel training methods to draw from the history and evolution of cadaveric dissection and to enhance rather than replace it.

One of the novel training methods currently being pursued is the use of artificial intelligence (AI), which is a key factor in the future of medicine and is expected to improve the diagnosis, prognosis, and management of disease. Just like other medical fields, AI in the field of neurosurgery is newly blossoming. Although AI in medicine is focused primarily on improving patient care, it can be an essential tool in medical education.4 Although medical knowledge is growing exponentially, estimated to have doubled every 73 days in 2020,5 medical education is still largely based on a traditional curriculum. A recent example of the use of AI in medical education is a program called SmartPath. SmartPath is a machine learning program developed to support teaching the identification of glomerulopathies in nephropathology. The program is dynamic and allows the inclusion of new pathology slides not seen previously by the program.6 Vincent et al7 recently created an AI virtual environment that significantly distinguished surgical training levels in a virtual reality spinal task, testing the skill sets between neurosurgical residents and attending neurosurgeons. Teaching with the use of AI has the potential to advance the field of neurosurgery and empower teachers and learners of all abilities.

We expect to see new training methods that include AI for neuroanatomy for trainees at all levels. However, for an effective AI machine learning algorithm, an extensively labeled neuroanatomy database is required. Luckily, The Neurosurgical Atlas neuroanatomy section provides a perfect resource for developing algorithms for educational endeavors such as these. As neurosurgical anatomy resources such as The Neurosurgical Atlas continue to grow, there is a need for a more effective way to create them. Two-dimensional gross anatomy images of real tissue taken from different perspectives are invaluable in the creation of 3-dimensional maps and organization of neuroanatomic structures, especially for surgical anatomy. In this study, we aimed to develop a novel AI machine learning framework to identify unique tissue types in gross neuroanatomy images. The goals of this investigation were to establish an initial foundation in AI gross neuroanatomy that can be built on in the future to aid neurosurgical training.

METHODS

Manual Segmentations

The neuroanatomy collection of The Neurosurgical Atlas provides one of the most comprehensive collections of neuroanatomic knowledge in the world with more than 1800 images. Through the efforts of many surgeons, anatomists, fellows, residents, and medical students, a dynamic library of highlighted neuroanatomy images was created through manual segmentation. For each image, the tissue type was manually segmented using an online Scalable Vector Graphics (SVG) platform (Vector [Inmagine]). Each SVG path was colored uniquely according to tissue type. For this study, SVG paths were converted to trainable Portable Network Graphics (.png) files with tissue classifications including background, arteries, veins, nerves, gross brain tissue, and bone. Data set cleaning was performed to remove anomalous labels, classification mismatch errors, image rotation issues, and unlabeled images. After cleaning the data, approximately 879 uniquely labeled neuroanatomy images were identified to be used as the data set. The 879 images were separated into the training, validation, and testing data sets.

Automated Segmentation Using AI

Choosing the network structure is a key decision when designing convolutional neural network (CNN)–based segmentation models (Figure 1). A residual neural network (ResNet) backbone was used in the initial portion of the network. ResNet is a CNN composed of many individual blocks that enable the computation of a deeper feature set.8 This feature set is then passed to a pyramidal scene-parsing network (PSPNet). A PSPNet is a segmentation network that was designed to allow a network to learn its given environment and objects in a larger general context.9 Zhao et al9 showed that traditional fully convoluted networks can incorrectly segment a pillow on a bed because their individual features can be similar. However, a PSPNet enables the network to learn the association between beds and pillows and correctly label the pillow. Learning such global associations is valuable in neuroanatomy images. An example is where the superior cerebellar artery touches the superior surface of the trigeminal nerve at its entry into the brainstem during microvascular dissection for trigeminal neuralgia.

FIGURE 1.

FIGURE 1.

(Top) The 2 primary building blocks of our neural network are convolution operations (blue) and max pooling (green). After performing convolutions, rectified linear activation function (activation) layers were used to allow for learning nonlinear relationships. (Center) The network architecture is based on a PSPNet with a ResNet-152 backbone. It accepts a color image with 3 channels and predicts the probability that each pixel lies within a given class. The PSPNet is composed of an auxiliary branch and a main branch. Each branch generates a class prediction and a corresponding loss during training. The use of an auxiliary branch helps to prevent vanishing gradients in the ResNet backbone. During inference, only the main branch is used to generate the classifications. (Bottom) The ResNet network in our final model is composed of 4 layers composed of 152 residual blocks. PSPNet architecture illustrates the ability to use a pyramidal pooling layer to learn more global associations. The PSPNet auxiliary branch below the main branch works to prevent vanishing gradients. Auxiliary branches are used only in training. Used with permission from The Neurosurgical Atlas by Aaron Cohen-Gadol, MD.

Network Training

This specific network was created in PyTorch. PyTorch is an open-source machine learning library used for applications such as computer vision and natural language processing. The networks were implemented on the basis of the open-source PyTorch segmentation tools (available at https://github.com/yassouali/pytorch_segmentation). Modifications to the code provided were made for our specific application.

For training, Adams optimizer was used with a differential learning rate of 10−5 for the ResNet backbone and 10−4 for PSPNet, both with a weight decay of 10−4. A differential learning rate was used to focus ResNet on more conserved features, while PSPNet could develop more diverse features used for ultimate classification. A batch size of 4 was used. We trained the network as a multiclass segmenter. The network was tested using a validation data set every 5 epochs for every 5 epochs of training. The network was trained for over 300 epochs, which took 48 hours on an NVIDIA V100 graphics processing unit.

Data augmentation was performed using rotation, random cropping, blurring, and flipping, which were performed to increase the generalizability of the network and prevent overfitting during training. Data augmentation was performed iteratively. Our total data set was broken into 679 training images, 100 validation images, and 100 test images, which were allocated randomly.

Learning curves for both the training and validation phases of the proposed model are presented in Supplementary Figures S1 and S2, http://links.lww.com/ONS/A759 and http://links.lww.com/ONS/A760. Convolutional layer visualization helps show unique feature maps extracted from the model during inference image prediction and identify regions of class-specific activation. We chose specific convolutional layers to show how the images change during the training period (Supplementary Figure S3, http://links.lww.com/ONS/A761).

Testing and Statistical Analysis

For quantitative assessment of segmentation performance, several metrics were used, including pixel accuracy, intersection-over-union (IoU), the Dice similarity coefficient (DSC), precision, recall, and the boundary F1 (BF) score. All assessments occurred using scripts written in Python. Pixel accuracy is the percentage of pixels that are classified correctly. Pixel accuracy can be calculated using the following formula:

pixelaccuracy=TP+TNTP+TN+FP+FN

TP and TN indicate true-positive and true-negative instances, respectively, and FP and FN indicate false-positive and false-negative instances, respectively.

The second metric analyzed is IoU, which represents the area of overlap between the predicted segmentation and the labeled images over the area of union between the predicted segmentation and the labeled images. IoU is calculated using the following formula:

IoU=intersectionunion=TPTP+FP+FN

The next metric is the DSC (also known as the Sørensen–Dice coefficient), which can be used to determine how similar the predicted labels are to the actual labels. The Dice loss can be calculated as follows:

DSC=2TP2TP+FP+FN 

Precision shows the fraction of retrieved true-positive instances among all retrieved positive instances. The precision value can be calculated as follows:

precision=TPTP+FP

Recall represents the retrieved true-positive instances among all relevant instances.

The recall value can be calculated as follows:

recall=TPTP+FN

A perfect classifying CNN would have precision and recall values equal to 1. Finally, the last metric we used to evaluate our model was the mean BF score (boundary contour-matching score). The BF score characterizes the harmonic mean (F1 measure) of the precision and recall values. The BF score equation is as follows:

BFscore=precision×recallprecision+recall=2TP2TP+FN+FP

Institutional review board/ethics committee approval and patient consent were neither required nor sought for this study.

RESULTS

The data set was first assessed to gain appreciation for the distribution of labeling. In Figure 2, the total percentage of labeling is shown for each image. The remaining portions of the images are the background. In this figure, the distribution of labeling is also apparent by the area of different colors, which represent their respective tissues.

FIGURE 2.

FIGURE 2.

Data set image class percentage distributions, which represent the ground truth classes, plotted in descending order on the basis of background. The background (unlabeled) section of the image is noted by the absence of color lower than 1.0. Each class is labeled by color, respectively.

Once the data set was analyzed, the network was trained, and performance was assessed. Table shows the class-wise and total performance as assessed from the metrics discussed earlier. Compared with the ground truth annotations, the ensembled results for our AI framework for the PSPNet generated a total pixel accuracy of 97.1% during training and 91.8% during testing. For the test data set, the highest accuracy by class was for labeled arteries (94.5%), and the lowest accuracy was for nerve tissue (90.1%). The vein label was the highest for the DSC. Finally, arterial tissue led the other metrics with the highest IoU, precision, recall, and BF score.

TABLE.

Results for Global Classification and Each Class Separate From the Clinical Validation Data Set

Class Accuracy IoU DSC Precision Recall BF score
Artery 0.945 0.887 0.937 0.909 0.842 0.909
Tissue 0.912 0.844 0.847 0.864 0.783 0.864
Nerve 0.901 0.674 0.862 0.745 0.630 0.745
Vein 0.902 0.861 0.970 0.871 0.813 0.871
Bone 0.931 0.863 0.906 0.874 0.811 0.874
Total 0.918 0.826 0.904 0.853 0.776 0.853

BF, boundary F1; DSC, dice similarity coefficient; IoU, intersection-over-union.

Accuracy, IoU, DSC, precision, recall, and mean BF score (boundary contour-matching score) are reported for each class. Data in bold are the highest in each category.

In addition, the confusion matrix, which further shows class-wise prediction performance, is shown in Figure 3. The network inputs, targets, and predictions for assessing the test data set are shown in Figure 4 for gross anatomic images. Although surgical anatomy images were not represented in the current data set, inference was performed on 2 images taken from videos from The Neurosurgical Atlas. The network inputs and outputs of this case are shown in Figure 5.

FIGURE 3.

FIGURE 3.

Confusion matrix of the segmentation result of the best-fit PSPNet using the test data set. The columns represent the predicted class, and the rows represent the true class. Data are presented as percentages of classified pixels.

FIGURE 4.

FIGURE 4.

Segmentation examples of image verification data (from left to right, gross anatomy image, true class label, and PSPNet image prediction). These images show a caudal view of the arterial blood supply of the brain (first row), a left lateral exposure of brain, infratemporal fossa, and neck (second row); a superolateral view of the right orbit and middle fossa floor featuring the trigeminal nerve and internal carotid artery (third row); a superior view of the posterior fossa after tentorium cerebelli resection (fourth row); and venous anatomy of the anterior brainstem and cerebellum (fifth row). Images in the first, second, and fourth rows, used with permission from The Neurosurgical Atlas by Aaron Cohen-Gadol, MD; images in the third and fifth rows, courtesy of A. L. Rhoton, Jr.

FIGURE 5.

FIGURE 5.

Segmentation examples using actual surgical neuroanatomy images. Shown are surgical anatomy images (left column) followed by the PSPNet image predictions (right column). The surgical anatomies depicted in both pictures are the vein of Trolard (top row) and the surface of the Sylvian fissure (bottom row). Used with permission from The Neurosurgical Atlas by Aaron Cohen-Gadol, MD.

DISCUSSION

In this study, we describe the first fully automated image segmentation of gross neuroanatomy images. Our CNN was designed with the PSPNet framework on a ResNet-152 backbone. Our image segmentation model demonstrated excellent accuracy in validation. The network returned a total testing accuracy of 91.8% on 100 images previously unseen by the network. The training accuracy was 97.1%, and the validation accuracy was 94.5%. These results suggest that our network is sufficiently robust to perform anatomic category recognition in a clinical setting. This accuracy is especially evident in the correct labeling of surgical neuroanatomy during validation and testing, as exemplified in Figure 5, which shows that our model was able to pick up some correct classification of tissue despite the lack of surgical images in our training data set. Note that for the image in Figure 5A, our model correctly identified the vein of Trolard, which runs superficially on the surface of the brain. However, limitations can also be seen; the cranioplasty clips were labeled falsely as veins because their blue color and the blood product in the working field led to a mislabel of arterial tissue. The results of testing our model using surgical images show that a network trained specifically with surgical images would perform correct anatomic category recognition.

In the confusion matrix in Figure 3, there is clear identification of most tissue types. It is interesting to note that our least accurate results showed that tissues that were labeled as a true nerve were classified as either nerve or brain tissue. There were concerns around the appropriate manual labeling between nerve and brain tissue, which led to there not being a clear transition point between brain tissue ending and nerve tissue starting. Classification of structures that transition, such as nerves to nervous tissue, remain difficult for models to predict.

Limitations

Although The Neurosurgical Atlas provides a comprehensive data set of neuroanatomic structures and their corresponding labels, there are several limitations in using this data set. The first limitation is the imbalanced heterogeneity of the tissue classifications present in the data set. For example, it is apparent in Figure 2 that the brain tissue label was applied to a much larger portion of the data set than were the nerve or vein labels. Ligament classification was not included in this work because of its sparse relative representation in the data set.

In Figure 2, it is also evident from the right-skewed overall distribution that the data set is not completely uniform, and many images are only partially labeled. A few images had a significant amount of unlabeled space. Unlabeled spaces were ignored during training. We taught the model to ignore unlabeled spaces because it significantly skewed the results. In a more perfect data set, we would expect Figure 2 to have an equivalent distribution resulting in each image being fully labeled. Fully labeled images would enable more comprehensive training with increased validation accuracy.

In addition, a bigger, more diverse data set would likely yield better network performance. Our current network was trained on 679 images. Increasing the size of a training data set reduces spurious correlations that cause overfitting and improve the performance of the learning model, creating a more generalizable network and preventing overfitting. Although there is no standard for the size of the training data set for model success, we recognize that our results could have been improved with a larger training data set.

Future Research

The data set used for training of this network was composed largely of gross neuroanatomy images. The next step is to adapt our current data set to have increased representation of neurosurgical procedural images. After identifying neurosurgical tissue classifications, future work should involve automating subclassification of this output to generate more specific labels and translating this work to segmentation of surgical case videos.

This study can have potential applications in neurosurgical training. This work has the potential to enhance the operating experience during complex neurosurgical cases. As shown in Figure 5, the developed automated image segmentation network was able to segment surgical images, demonstrating its potential in operative scenes. Our ultimate goal is to create a real-time identification of neuroanatomic structures while operating for neurosurgical residents and medical students. In addition, we hope to incorporate a recurrent neural network to segment a given procedure and time-dependent processes for neurosurgical procedures. We foresee implementation of our technology into an augmented reality display to enrich the neurosurgeon's experience in the operating room.

CONCLUSION

Using the presented methods, we have shown that a CNN can accurately segment gross neuroanatomy images. This model represents an initial foundation in AI gross neuroanatomy that can aid in future neurosurgical training. These results suggest that our network is sufficiently robust to perform anatomic category recognition in a clinical setting. Our model with the test data set successfully generated a total pixel accuracy of 91.8%. With the success that we see here, our ultimate goal is to create real-time visualization of neuroanatomy structures during surgery to enhance learning for neurosurgical residents and medical students.

Acknowledgments

We thank Indiana University for allowing our team to use their GPU deep learning cluster (Carbonate). In addition, we thank Yassine Ouali and his team for creating the open-source PyTorch segmentation framework that was the backbone of this project. Their framework is available at https://github.com/yassouali/pytorch_segmentation. The authors sincerely appreciate the support of the Stead Family Endowed Chair in creation of this work. We also appreciate the help of Rohin Singh with reviewing the SVG images before analysis.

Footnotes

Supplemental digital content is available for this article at operativeneurosurgery-online.com.

Contributor Information

Andrew J. Witten, Email: andywitt@iu.edu.

Neal Patel, Email: nemipate@iu.edu.

Funding

This study did not receive any funding or financial support.

Disclosures

Dr Cohen-Gadol is the president and founder of The Neurosurgical Atlas. The authors have no personal, financial, or institutional interest in any of the drugs, materials, or devices described in this article.

Supplemental Digital Content

Figure S1. Pixel precision during model training and validation. Each epoch represents the number of passes through the entire training data set.

Figure S2. Cross-entropy loss during model training and validation. Each epoch represents the number of passes through the entire training data set.

Figure S3. Convolutional layer visualization during inference image prediction. Images become less visual as the layers increase but become more understandable by the computer. These layers line up to the layers shown in Figure 1C. Used with permission from The Neurosurgical Atlas by Aaron Cohen-Gadol, MD.

REFERENCES

  • 1.Neuwirth LS, Dacius TF, Jr., Mukherji BR. Teaching neuroanatomy through a historical context. J Undergrad Neurosci Educ. 2018;16(2):E26-E31. [PMC free article] [PubMed] [Google Scholar]
  • 2.Hall S, Stephens J, Parton W, et al. Identifying medical student perceptions on the difficulty of learning different topics of the undergraduate anatomy curriculum. Med Sci Educ. 2018;28(3):469-472. [Google Scholar]
  • 3.Teton ZE, Freedman RS, Tomlinson SB, et al. The Neurosurgical Atlas: advancing neurosurgical education in the digital age. Neurosurg Focus. 2020;48(3):E17. [DOI] [PubMed] [Google Scholar]
  • 4.Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94-98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Densen P. Challenges and opportunities facing medical education. Trans Am Clin Climatol Assoc. 2011;122:48-58. [PMC free article] [PubMed] [Google Scholar]
  • 6.Aldeman NLS, de Sá Urtiga Aita KM, Machado VP, et al. A platform for teaching glomerulopathies using machine learning. BMC Med Educ. 2021;21(1):248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bissonnette V, Mirchi N, Ledwos N, Alsidieri G, Winkler-Schwartz A, Del Maestro RF, Neurosurgical Simulation & Artificial Intelligence Learning Centre. Artificial intelligence distinguishes surgical training levels in a virtual reality spinal task. J Bone Joint Surg Am. 2019;101(23):e127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Paper Presented at: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. [Google Scholar]
  • 9.Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. Paper Presented at: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1. Pixel precision during model training and validation. Each epoch represents the number of passes through the entire training data set.

Figure S2. Cross-entropy loss during model training and validation. Each epoch represents the number of passes through the entire training data set.

Figure S3. Convolutional layer visualization during inference image prediction. Images become less visual as the layers increase but become more understandable by the computer. These layers line up to the layers shown in Figure 1C. Used with permission from The Neurosurgical Atlas by Aaron Cohen-Gadol, MD.


Articles from Operative Neurosurgery are provided here courtesy of Wolters Kluwer Health

RESOURCES