Abstract
Categorization of radiological images according to characteristics such as modality, scanner parameters, body part etc, is important for quality control, clinical efficiency and research. The metadata associated with images stored in the DICOM format reliably captures scanner settings such as tube current in CT or echo time (TE) in MRI. Other parameters such as image orientation, body part examined and presence of intravenous contrast, however, are not inherent to the scanner settings, and therefore require user input which is prone to human error. There is a general need for automated approaches that will appropriately categorize images, even with parameters that are not inherent to the scanner settings. These approaches should be able to process both planar 2D images and full 3D scans.
In this work, we present a deep learning based approach for automatically detecting one such parameter: the presence or absence of intravenous contrast in 3D MRI scans. Contrast is manually injected by radiology staff during the imaging examination, and its presence cannot be automatically recorded in the DICOM header by the scanner. Our classifier is a convolutional neural network (CNN) based on the ResNet architecture. Our data consisted of 1000 breast MRI scans (500 scans with and 500 scans without intravenous contrast), used for training and testing a CNN on 80%/20% split, respectively. The labels for the scans were obtained from the series descriptions created by certified radiological technologists. Preliminary results of our classifier are very promising with an area under the ROC curve (AUC) of 0.98, sensitivity and specificity of 1.0 and 0.9 respectively (at the optimal ROC cut-off point), demonstrating potential usefulness in both clinical as well as research settings.
Keywords: Categorization of radiological images, intravenous contrast detection, deep learning, convolutional neural network
1. INTRODUCTION
Radiological images can be categorized in many ways, broadly based on acquisition characteristics and image content. Some examples include by imaging modality such as CT/MRI, scanner parameters such as echo time (TE) in MRI, imaging orientation such as axial, sagittal, or coronal acquisition, body part examined such as brain, chest and finally patient intervention such as presence or absence of contrast. Categorization of images according to such criterion is important for multiple reasons including: 1) better standardization, organization and documentation of data and therefore better quality control; 2) fewer errors during clinical reads and hence better clinical efficiency; and lastly 3) to enable accurate identification of data for clinical trials and build ground truth labels for machine learning algorithms.
Medical images are stored using a standardized format known as DICOM1. DICOM files are rich in metadata, which include patient identifiers and acquisition parameters. Some examples are shown in table 1. Parameters that are inherent to the settings on a scanner such as imaging modality, tube current in CT, and echo time (TE) in MRI are reliably recorded into the DICOM metadata. Other parameters such as image orientation, body part examined and intravenous contrast are dependent on human input, and all prone to error2. Hence, image categorization based on DICOM tags alone is insufficient. There is a need for automated approaches that are based on image content.
Table 1.
Some image attributes, their DICOM tags and values.
| Image attribute | DICOM tag | Value in this example |
|---|---|---|
| Modality | (0008,0060) | MR |
| Tube current | (0018,1151) | 120 |
| Echo time | (0018,0081) | 30 |
| Image orientation | (0020,0037) | Sagittal* |
| Body part | (0018,0015) | Brain** |
| Contrast | (0018,0010) | No*** |
Dependent on correct patient positioning, possible human error
Not reliable/missing in majority
Dependent on human input
In this work, we focus on the problem of detecting the presence or absence of intravenous contrast in breast MR. Contrast material is injected into a patient sometime during an examination, helping to differentiate tissue and identify areas of pathology. Figure 1 shows example MRI slices of the breast before and after contrast injection. Arrows point to areas that are brighter due to the presence of contrast, such as heart and blood vessels.
Figure 1.
A breast MRI slice before (left) and after (right) contrast injection. Areas in the heart and blood vessels appear brighter in the post contrast scan.
Recording presence of contrast into the DICOM metadata requires manual input, which is prone to human error (reliable only 80% of the time from our experience). Technologists document contrast presence in the series description of an image using words such as “T1 post”. This human entry is another source of variability. This lack of consistency leads to variability in the way clinical systems present the images for interpretation, and difficulty in performing large scale research involving this type of imaging data. There is a need for automated approaches to identify contrast in medical images. Further, these approaches must be able to process both 2D planar images such as chest radiographs and full 3D image series such as CT/MRI.
There have been some previous studies that try to tackle problems of this nature. In Prince et al.3, the authors present an automated approach to detect arrival of contrast material in the aorta. However, their methodology requires selection of regions of interest manually. Authors of Sheiman et al.4 take a similar approach, where regions of interest are identified and the time-enhancement curves post intravenous contrast injection are analysed. Again, their methodology requires manual input. The study that is most relevant to our work is Criminisi et al.5, where the authors present a fully automated approach to detect intravenous contrast in CT scans. Here, the positions and extents of a pre chosen set of 12 organs are first identified automatically using a discriminative machine learning model. Evidence for the presence of contrast from each of these organs are then combined via a generative machine learning model with intensity histograms as the image features. While the approach is promising on CT scans, its applicability doesn’t carry over to MR scans in a straightforward manner because of non-standard MR voxel intensities. Further, the success of the methodology is dependent on accurate localization of the organs during the first step.
Motivated by the recent successes of deep learning in medical image analysis, we present a deep neural network based approach to identify presence or absence of contrast in 3D breast MRI scans. Specifically, we design a classifier to differentiate pre contrast from post contrast MRI volumes. Our classification model is a convolutional neural network (CNN)6 based on the ResNet7 architecture. The ground truth for training the model consisted of 1000 T1 breast MRI scans (500 scans with and 500 scans without intravenous contrast). The novelty in our approach can be summarized as follows: 1) ours is a fully data driven and automated approach; 2) the task of identification of contrast is carried out in one single step, unlike the previous approaches that split it into the tasks of identification of relevant regions and inferring contrast presence from those regions; and lastly 3) our model can analyse a full 3D MRI series. We also compare our approach with two other classical computer vision approaches: 1) normalized intensity histogram classification using support vector machines (SVM); and 2) classification of scale invariant feature transform (SIFT) and bag of words (BoW) representation of maximum intensity projected (MIP) images using SVM.
2. METHODS
We take a classification approach to automatically distinguish pre contrast MRI volumes from post contrast MRI volumes. Training a full 3D convolutional neural network that can process a whole 3D scan at its original resolution requires significantly high GPU memory and processing power. Hence we take a different approach in this work. Our idea is to first train a convolutional neural network that detects presence of contrast in groups of consecutive slices (with overlap) of the 3D scan. Evidence for the presence of contrast from all of these slice groups are then combined together using an approach similar to bagging8 to infer the presence of contrast in the whole 3D scan.
2.1. Basic data elements: “slabs”
We refer to the basic data elements on which we train the convolutional neural network as “slabs”. The slabs are essentially all possible groups (with overlap) of consecutive slices from the 3D scan. For simplicity, we chose all slabs to have the same number of slices in this work. If n is the total number of slices in the 3D scan and m is the number of consecutive slices in a slab (n ≥ m), the total number of slabs in the 3D scan would be n − m + 1. The slabs incorporate 3D/depth information (depending on how many slices are in a slab) and neighborhood constraints. We chose m = 3 here, i.e. each slab consisted of 3 consecutive slices. If we treat the 3 consecutive slices as the red, green and blue channels, the slabs can now be considered as pseudo color images as shown in figure 2. This enables us to fine tune and build on popular convolutional neural network models trained on ImageNet9 such as ResNet.
Figure 2.
“Slab”: a group of 3 consecutive slices from the 3D scan. The slices are stacked as red, green and blue channels to obtain a pseudo color image. An example convolutional filter operating on the pseudo color image is shown as well.
2.2. Slab classifier
Our neural network model to classify the slabs as having contrast or no contrast is based on the widely popular ResNet507 architecture. The ResNet architecture has been very successful on a wide variety of image classification tasks in computer vision and medical imaging. The skip connections are a key feature of the ResNet that help alleviate vanishing gradient problems and aid parameter optimization. In this work, we fine-tune the ImageNet pre-trained ResNet classifier on the slabs obtained from the 3D MR scans. This involved modifications to the final sigmoid layer of the network. Our motivation for taking this approach is based on the intuition that the lower layers of the neural network encode generic low level image features and hence the ImageNet pretrained weights would provide good initialization when training the CNN on our data. The ground truth labels for all the slabs of a 3D scan were assumed to be the same as the label for the whole scan, which in turn was obtained from the series description using relevant words such as “pre” and “post”. The slabs were assumed to be independent even if they came from the same scan. The output of the slab classifier is the probability of the slab belonging to either class.
2.3. Whole scan classification at test time
At test time, all the slabs from the 3D scan are first extracted and classified individually using the slab classifier. The slab classification probabilities are then combined by computing their mean to obtain the probabilities for the whole scan. This approach is similar to bagging8 (bootstrap aggregating) when sampling is done without replacement. Finally, to compute scalar metrics of classification performance, an optimal ROC threshold is computed based on the Youden’s index10.
2.4. Controls: two other approaches for comparison
We compared our approach to two other simpler approaches that acted as our controls and provided performance baselines. The first approach involved using the standardized intensity histograms of the volumes as features to train a support vector machine11 (SVM) for the contrast/no contrast classification task. This approach does not take into account the spatial location of the image intensities nor their neighbourhood relationships. In the second approach, we first computed the 2D maximum intensity projection12 (MIP) images from the 3D volumes. Scale invariant feature transform13 (SIFT) was then applied on these images to obtain 128 dimensional SIFT descriptors from the so called “interest points” of the image. The SIFT vectors were further vector quantized using k-means clustering14 to obtain codewords that are representative of the whole image. Finally, an SVM was trained with codewords as the features to classify contrast from non contrast volumes. We note here that the MIP image encodes some spatial information (2D projections of 3D volume) unlike the intensity histogram approach.
3. RESULTS
Our data consisted of 1000 T1 fat saturated breast MRI volumes (500 volumes with and 500 volumes without intravenous contrast). The volumes were acquired on 1.5T and 3.0T MR scanners and approved by our institutional IRB. All DICOM data were de-identified and archived in a departmental imaging research repository. The slice thickness of the individual slices ranged from 1 to 5 mm and included both axial and sagittal acquisitions. All the volumes were resampled and intensity standardized, while preserving the entropy of the intensity distribution of each scan. The labels for training were obtained from the series descriptions of the volumes that contained pertinent words such as “pre” and “post”. We reserved 80% of the data for training the CNN and 20% for testing. The train-test split was randomized and stratified. The weights of all the layers of the CNN were initialized to ImageNet pretrained weights, while those of the final fully connected layer were re-initialized to nominal values sampled from a Gaussian distribution. No weights were clamped during training to allow for the network to fully adapt to our data. We used binary cross entropy loss with Adam15 as the optimizer. For the intensity histogram and MIP based approaches, the number of histogram bins were empirically set to yield the best performance. All computation was performed using Nvidia Tesla P100 GPUs on Microsoft Azure cloud.
The results of classifying slabs as having contrast/no contrast are shown in figure 3. The area under the ROC curve (AUC) was 0.928. At the optimal ROC threshold computed using the Youden’s index2,10, the classifier achieved a sensitivity of 0.87 and specificity of 0.88.
Figure 3.
Slab contrast classification. The algorithm performed well with an area under the ROC curve (AUC) of 0.928.
The whole volume classification results are shown in figure 4. The ROC curve of our approach (deep CNN) is superimposed on those obtained using the MIP and intensity histogram based approaches. The deep CNN approach resulted in a near ideal AUC and significantly outperformed the other two approaches. This trend is reflected by the scalar performance metrics as well, as shown in table 2. In particular, the deep CNN achieved the highest possible sensitivity and very high AUC and specificity. The optimal ROC thresholds for the respective classifiers were again computed using the Youden’s index method. We also see that the MIP based approach generally performs better than the intensity histogram approach. This is in conformance with our intuition that in addition to the pixel intensity values, the pixel location/spatial arrangement also plays a role in contrast classification. MIPs retain pixel location information partially whereas intensity histograms only carry pixel intensity information and no location.
Figure 4.
Whole volume contrast classification using our deep CNN approach (blue), maximum intensity projection based approach (green) and intensity histogram based approach (red). Note that the ROC curve of the deep CNN approach is very close to the ideal ROC point (top left).
Table 2.
Scalar performance measures for whole volume contrast classification.
| Method | AUC | Sensitivity | Specificity |
|---|---|---|---|
| Deep CNN | 0.988 | 1.0 | 0.96 |
| MIP | 0.84 | 0.71 | 0.8 |
| Intensity histogram | 0.76 | 0.76 | 0.71 |
4. CONCLUSION
Automated radiological image categorization is an important problem with applications in both routine clinical care as well as research. Though there have been some previous studies that try to tackle problems of this nature, there is a need for generic and fully automated approaches that can characterize images. In this work, we presented a fully automated deep CNN based approach for the specific problem of detecting the presence or absence of contrast in breast MRI scans. Preliminary results of our classifier are very encouraging with an area under the ROC curve (AUC) of 0.98, sensitivity and specificity of 1.0 and 0.9 (at the optimal ROC cut-off point) respectively, demonstrating a high degree of accuracy. In fact, one of the scans that was mis-classified by the deep CNN was later found to have been labeled erroneously. We believe our algorithm has the potential to be a useful tool in clinical as well as research settings.
The algorithm developed here can be one among a suite of algorithms to be developed that perform radiographic image characterization in all aspects. Downstream benefits of accurate and reliable image characterization include streamlining image interpretation workflow and optimization of data sources for other machine learning algorithms. Our next steps also include efficient clinical implementation and rigorous validation on data from other institutions to test our method’s generalizability.
Footnotes
The Youden’s index is computed by maximizing the following expression: sensitivity + specificity - 1
REFERENCES
- [1].NEMA PS3 / ISO 12052, Digital Imaging and Communications in Medicine (DICOM) Standard, National Electrical Manufacturers Association, Rosslyn, VA, USA: (http://medical.nema.org/). [Google Scholar]
- [2].Guld M, Kohnen M, Keysers D, Schubert H, Wein B, Bredno J, Lehmann T, “Quality of DICOM header information for image categorization,” Proc. SPIE 4685, (2002). [Google Scholar]
- [3].Prince M, Chenevert T, Foo T, Londy F, Ward J, Maki J, “Contrast enhanced abdominal mr angiography: optimization of imaging delay time by automating the detection of contrast material arrival in the aorta,” Radiology 203, (1997). [DOI] [PubMed] [Google Scholar]
- [4].Sheiman R, Prassopoulos P, Raptopoulos V, “Ct detection of layering of i.v. contrast material in the abdominal aorta,” American Journal of Roentgenology (American Roentgen Ray Society) 171(5), (1998). [DOI] [PubMed] [Google Scholar]
- [5].Criminisi A, Juluru K, Pathak S, “A Discriminative-Generative Model for Detecting Intravenous Contrast in CT Images,” Proc. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2011, Lecture Notes in Computer Science, vol 6893, Springer; (2011). [DOI] [PubMed] [Google Scholar]
- [6].LeCun Y, Bengio Y, “Convolutional networks for images, speech, and time-series,” The Handbook of Brain Theory and Neural Networks, MIT Press, (1995). [Google Scholar]
- [7].He K, Zhang X, Ren S, Sun J, “Deep Residual Learning for Image Recognition” Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778 (2016). [Google Scholar]
- [8].Breiman L, “Bagging predictors,” Machine Learning 24(2), 123–140 (1996). [Google Scholar]
- [9].Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L, “ImageNet: A Large-Scale Hierarchical Image Database,” Proc. IEEE Computer Vision and Pattern Recognition (CVPR), (2009). [Google Scholar]
- [10].Youden WJ, “Index for rating diagnostic tests,” Cancer 3, 32–35 (1950). [DOI] [PubMed] [Google Scholar]
- [11].Cortes C, Vapnik VN, “Support-vector networks,” Machine Learning 20 (3), 273–297 (1995). [Google Scholar]
- [12].“Maximum intensity projection,” Magnetic Resonance - Technology Information Portal, 17 July, 2019, <https://www.mr-tip.com/serv1.php?type=db1&dbs=Maximum%20intensity%20projection>.
- [13].Lowe DG, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision 60 (2), 91–110 (2004). [Google Scholar]
- [14].Lloyd SP, “Least squares quantization in PCM,” IEEE Transactions on Information Theory 282, 129–137 (1982). [Google Scholar]
- [15].Kingma DP, Ba J, “Adam: A Method for Stochastic Optimization,” International Conference on Learning Representations-ICLR, (2015). [Google Scholar]




