Impact of Upstream Medical Image Processing on Downstream Performance of a Head CT Triage Neural Network

Sarah M Hooper; Jared A Dunnmon; Matthew P Lungren; Domenico Mastrodicasa; Daniel L Rubin; Christopher Ré; Adam Wang; Bhavik N Patel

doi:10.1148/ryai.2021200229

. 2021 Apr 28;3(4):e200229. doi: 10.1148/ryai.2021200229

Impact of Upstream Medical Image Processing on Downstream Performance of a Head CT Triage Neural Network

Sarah M Hooper ¹, Jared A Dunnmon ¹, Matthew P Lungren ¹, Domenico Mastrodicasa ¹, Daniel L Rubin ¹, Christopher Ré ¹, Adam Wang ¹, Bhavik N Patel ^1,^✉

PMCID: PMC8328108 PMID: 34350412

Abstract

Purpose

To develop a convolutional neural network (CNN) to triage head CT (HCT) studies and investigate the effect of upstream medical image processing on the CNN's performance.

Materials and Methods

A total of 9776 HCT studies were retrospectively collected from 2001 through 2014, and a CNN was trained to triage them as normal or abnormal. CNN performance was evaluated on a held-out test set, assessing triage performance and sensitivity to 20 disorders to assess differential model performance, with 7856 CT studies in the training set, 936 in the validation set, and 984 in the test set. This CNN was used to understand how the upstream imaging chain affects CNN performance by evaluating performance after altering three variables: image acquisition by reducing the number of x-ray projections, image reconstruction by inputting sinogram data into the CNN, and image preprocessing. To evaluate performance, the DeLong test was used to assess differences in the area under the receiver operating characteristic curve (AUROC), and the McNemar test was used to compare sensitivities.

Results

The CNN achieved a mean AUROC of 0.84 (95% CI: 0.83, 0.84) in discriminating normal and abnormal HCT studies. The number of x-ray projections could be reduced by 16 times and the raw sensor data could be input into the CNN with no statistically significant difference in classification performance. Additionally, CT windowing consistently improved CNN performance, increasing the mean triage AUROC by 0.07 points.

Conclusion

A CNN was developed to triage HCT studies, which may help streamline image evaluation, and the means by which upstream image acquisition, reconstruction, and preprocessing affect downstream CNN performance was investigated, bringing focus to this important part of the imaging chain.

Keywords Head CT, Automated Triage, Deep Learning, Sinogram, Dataset

Supplemental material is available for this article.

Keywords: Head CT, Automated Triage, Deep Learning, Sinogram, Dataset

Summary

A convolutional neural network (CNN) was developed to triage head CT studies using a dataset of 9776 head CT examinations, after which the CNN was used to generate understanding of how upstream image acquisition, reconstruction, and preprocessing affected downstream CNN performance.

Key Points

■ This model achieved a mean area under the receiver operating characteristic curve of 0.84 at triaging head CT studies as normal or abnormal, with fine-grained analyses performed over 20 abnormality subsets to assess differential model performance.
■ The number of x-ray projections was reduced by up to 16 times without a statistically significant effect on the performance of the triage convolutional neural network (CNN), which indicates that CNNs could be used to facilitate analyses of lower-dose or faster image acquisition.
■ Training a triage CNN directly on sinograms resulted in comparable performance to inputting reconstructed head CT studies, which suggests that models can be positioned further upstream.

Introduction

Convolutional neural networks (CNNs) have become a powerful tool for classifying medical images (1–3). The medical imaging chain begins well before the CNN, however, and includes many upstream operations such as image acquisition and reconstruction. Although several studies have focused on improving CNN performance via CNN architecture and training modifications (4–10), fewer have focused on how upstream image processing affects CNN performance (Fig 1).

Figure 1: — Pipeline that precedes image analysis by a convolutional neural network (CNN). This study focuses on the effect of upstream image processing operations (area encircled by the blue dotted line) on downstream CNN performance.

This study assesses how upstream image processing affects the downstream performance of a CNN trained to triage head CT (HCT) scans. HCT is a widely performed imaging procedure with broad diagnostic scope that is often used to detect cranial abnormalities (11,12). Automated HCT triage could assist in clinical workflows by quickly flagging abnormal cases for review (13). For this study, a CNN was trained to triage HCT studies, and the effect of upstream image acquisition, reconstruction, and preprocessing on model performance was studied. Further, model performance over 20 abnormalities was evaluated to assess the effect of upstream processing on clinically important data subsets.

Medical image acquisition was the earliest stage considered in the imaging pipeline. Acquisition protocols have been developed to balance the needs of the radiologist, patient, and health care system (eg, high image quality and low radiation dose [14–16]). This study sought to evaluate whether scanning protocols could be adjusted to meet these needs without sacrificing CNN performance. The next item of consideration was image reconstruction, which transforms the acquired CT sensor data (the sinogram) into an image amenable to human analysis. Because it is not yet clear whether the reconstructions required by humans are required by CNNs—or which representation is optimal for CNN performance—CNNs were trained to directly classify sinograms. Because sinogram data are typically both proprietary and discarded after reconstruction, we collected a large volume of HCTs, simulated high-fidelity sinograms, and reconstructed images with altered acquisition parameters. Last, the effect of preprocessing images with clinically meaningful CT windows on model performance was assessed.

It was posited that by evaluating upstream imaging operations, it would be possible to design better-performing CNNs or identify ways to meet health care system needs. To explore these ideas, a CNN was trained and the authors analyzed how changes in the upstream medical imaging chain affected CNN performance. Our contributions include a high-performing HCT triage CNN, detailed evaluation of CNN performance over abnormality subsets, comprehensive assessment of how upstream processing affected CNN performance, and release of our flexible codebase and unique dataset of nearly 10 000 image-sinogram pairs.

Materials and Methods

This study was supported by a General Electric (GE) grant. No authors are GE employees or consultants, and the authors had control of data collection and analysis.

Dataset

We retrospectively collected 9996 volumetric HCT studies from our institution's (Stanford University) picture archiving and communication system (mean patient age, 57 years [range, <1–99 years]; images acquired 2001–2014). This Health Insurance Portability and Accountability Act–compliant study was approved by our institutional review board, and a waiver of informed consent was obtained. Data were randomly split into 80% training, 10% validation, and 10% test. After filtering, so no distinct scans from the same patient spanned multiple splits, 7856, 936, and 984 CT studies comprised the training, validation, and testing sets, respectively, totaling 9776 studies. Additional details about the dataset are provided in the Appendix E1 (supplement). This dataset was used for a different study (17), but the dataset itself was not previously released.

Labeling

As part of standard procedure at our institution, each examination was prospectively labeled by the reading radiologist as normal or abnormal; we did not modify these labels.

Additionally, radiologists retrospectively labeled the test set for specific disorders. Each test image was labeled by either a board-eligible radiologist (D.M., 2 years of postgraduate experience) or a board-certified radiologist (B.N.P., 8 years of postgraduate experience) for 20 disorders using the clinical notes. Additional information on the labeling process and counts of all disorders are provided in the Appendix E1 (supplement). Of 9776 total scans, 5398 (55.21%) were abnormal. This dataset will be publicly released for future research.

Synthetic Dataset Generation

To study the influence of image reconstruction, the validated simulation software CatSim (GE Global Research) was used to simulate high-fidelity sinograms of all images in the dataset (18). To further validate the fidelity of the simulation, CatSim was used to reconstruct the simulated sinograms. Information lost in the simulation was identified by comparing all the images reconstructed from the simulated sinograms with the original images. Favorable agreement between the original and re-reconstructed images was observed, achieving a mean 0.04 normalized root mean square error value and 16.44 mean absolute error in Hounsfield units per image volume. Additional details and visualizations are provided in the Appendix E1 (supplement).

To study the effect of image acquisition on model performance, 4×, 8×, and 16× reduced-projection scans were simulated by taking the first of every 4, 8, and 16 projections of the sinogram, respectively, after which the reduced-projection sinograms were reconstructed using CatSim. The simulation workflow is shown in Figure 2, and example reduced-projection scans are shown in Figure E3 (supplement).

Figure 2: — Diagram shows the sinogram and reduced-projection simulation process. First, the original CT image was reprojected into sinogram-space via CatSim (GE Global Research). Second, the first of every 4×, 8×, or 16× projections was chosen from the sinogram. Finally, the reduced-projection sinograms were reconstructed using CatSim.

Simulated sinograms were used for multiple reasons. First, the raw sinograms were not stored for the retrospective dataset, nor would they be readily accessible in a prospective study with current technology. Second, using synthetic datasets enabled systematic exploration of imaging parameters, with comparison of performance over the same test set with a single varying parameter. High-fidelity simulations are an advantage of the present study, as true sinograms and simulation software are often inaccessible (19). The simulated sinograms and reduced x-ray projection datasets will be released to facilitate future work.

Model Training

A three-dimensional 121-layer DenseNet (5) was trained to take the CT volume as input and output a label of “normal” or “abnormal.” This workflow is visualized in Figure E4 (supplement). The He initialization (10) was used, and batch size was set to the maximum that could fit onto the graphics processing unit. We accumulated gradients to achieve a virtual batch size of 16 before backpropagating. The Adam optimizer (7) was used to minimize the cross-entropy loss, with the learning rate set to decay after 10 epochs of validation loss plateau. Coarse hyperparameter search was used to select additional hyperparameters, which are listed in the Table. All other hyperparameters were left at defaults. Each model was trained for 50 epochs and checkpointed using the validation area under the receiver operating characteristic curve (AUROC).

Network Hyperparameters

Open in a new tab

Infrastructure

CT simulations were performed using CatSim on MATLAB 2017b (MathWorks). CNN experiments were performed in Python (version 3.6.7; https://www.python.org/) using PyTorch (version 1.2; https://pytorch.org) on either an NVIDIA Tesla P100 or an NVIDIA TITAN RTX graphics processing unit. As part of this work, we developed a flexible codebase to rapidly build CNNs for medical image classification using Py-Torch (20), automating data extraction from Digital Imaging and Communications in Medicine files and CNN training. This codebase will also be released as a contribution.

Performance and Statistical Analyses

Models were compared using triage AUROC and disease sensitivities. When comparing model subset sensitivities, the operating point was set such that model triage specificity was 0.70. The AUROC was calculated using the Scikit-learn package (version 0.21.3; Python) (21). AUROCs were compared using a DeLong test implemented in Python. Sensitivities were compared using a McNemar test implemented in the statsmodels library (version 0.11.1; Python) (22). A significance level of .05 with a Bonferroni correction was used for 21 total comparisons (performance on 20 subsets and binary triage). Each model was trained with five random seeds, and mean performance is reported with 95% CIs constructed over the five random seeds to describe performance variation resultant from randomness in the training process.

Results

Image Acquisition

The effect of the number of projections per gantry rotation on CNN performance was evaluated by training four networks: one to classify the original images, and one each for the 4×, 8×, and 16× reduced-projection datasets. While the number of projections is one of many acquisition parameters, the method used here could be replicated to analyze other image acquisition parameters.

Model performances are shown in Figure 3. The AUROCs were 0.75–0.78, 0.78–0.79, 0.77–0.80, and 0.77–0.78 for the original, 4×, 8×, and 16× reduced-projection datasets, respectively. The DeLong test was used to compare the AUROCs of the seeds that achieved the median AUROCs. Statistically insignificant differences were found between the 4×, 8×, and 16× reduced-projection networks and the network trained with the original data (P = .29, P = .28, P = .68, respectively).

To assess if each network was behaving similarly despite the differences in image quality, the Cohen κ coefficient and the Pearson correlation coefficient between predictions made by the original network and the reduced-projection networks were computed; the cutoff used to binarize predictions prior to computing κ was selected to achieve a specificity of 0.70. The predictions for the 4× images achieved κ of 0.65 (95% CI: 0.59, 0.70) and a correlation coefficient of 0.84 (95% CI: 0.81, 0.88); those for the 8× images achieved κ of 0.67 (95% CI: 0.63, 0.71) and a correlation coefficient of 0.85 (95% CI: 0.82, 0.88); and those for the 16× images achieved κ of 0.64 (95% CI: 0.60, 0.68) and a correlation coefficient of 0.83 (95% CI: 0.79, 0.86). These values show that the networks maintained good agreement despite reduced image quality in the downsampled datasets. In addition, the correlations between the class activation maps (CAMs) were computed to assess if the networks attended to similar regions of the images (CAMs visualized in Fig E5 [supplement]). The CAMs for the 4×, 8×, and 16× reduced projection networks achieved a mean Pearson correlation coefficient of 0.23 (95% CI: 0.00, 0.54), 0.30 (95% CI: 0.17, 0.43), and 0.23 (95% CI: 0.02, 0.43), respectively, compared with the CAMs of the original image network, showing a positive correlation.

The sensitivities of the reduced projection models on abnormality subsets were evaluated; example subsets are shown in Figure 3. All subsets showed insignificant changes despite the reduced number of projections. The sensitivities and 95% CIs for each of the 20 labeled subsets are noted in Figure E6 and Tables E4 and E7 (supplement).

Image Reconstruction

Next, triage models were trained using the HCT sinograms. The aggregate performance of the sinogram-space models was similar to that of the image-space models. The mean AUROCs of the sinogram- and image-space networks were both 0.77 (95% CI: 0.76, 0.78 for images; 95% CI: 0.76, 0.77 for sinograms). DeLong testing using the image- and sinogram-space networks that achieved the median AUROCs from the five runs showed the difference in AUROCs to be statistically insignificant. The aggregate specificity was set to 0.70 and the sensitivity of each model was compared across different pathologic conditions; a few differences emerged, and these are summarized in Figure 3. Some subsets were detected with higher sensitivity in sinogram space (eg, tumors) and some in image space (eg, intraparenchymal hematoma), while still others exhibited similar performance in sinogram and image space. In the case of tumor sensitivity, although the mean sensitivities differed by 0.10 points, a McNemar test over the tumor class failed to reject the null hypothesis that the two models have a similar proportion of errors; test datasets with more tumor-positive examples are needed. Interestingly, the sinogram subset sensitivity also exhibited lower standard deviation (for both the tumor class and the majority of subclasses). The sensitivities for each of the 20 labeled subsets in image space and sinogram space can be found in Figure E7 (supplement).

Image Preprocessing

Finally, the effect of image preprocessing on downstream performance was considered. Guided by common CT viewing practices in radiology, we investigated image normalization through CT windowing and histogram equalization. CT windowing is used by human interpreters to increase the contrast of relevant anatomy in CT scans (Fig 4). CT scans are typically represented in Hounsfield units, the value of which can be used to identify different substances in the scan. CT windowing involves selecting a range of Hounsfield units in which to increase contrast while clipping information from Hounsfield units outside that range. For example, CT windows can increase the contrast of blood (Fig 4B) or stroke (Fig 4C) in HCT.

Figure 4: — The same axial section of a noncontrast head CT image with different preprocessing operations: **(A)** original image, **(B)** image with a blood CT window applied, **(C)** image with a stroke window applied, and **(D)** histogram-equalized image.

To mimic this viewing procedure with our network, each CT scan was windowed using four common HCT windows: blood (window level [WL], 40 HU; window width [WW], 80 HU), subdural (WL, 25 HU; WW, 300 HU), stroke (WL, 32 HU; WW, 8 HU), and bone (WL, 600 HU; WW, 3000 HU). Each windowed CT was stacked as a different channel before being input into the CNN. The results of the CT windowed network are shown in Figure 3. The mean AUROC score of the windowed network was 0.84 (95% CI: 0.83, 0.84), an increase of 0.07 points over the nonwindowed network, with a DeLong test indicating that this was a statistically significant difference (P < .001). Moreover, the CNN trained using windowed CT scans either matched or exceeded the performance of the original network that classified nonwindowed CT scans in every subset, showing that this simple preprocessing operation provided consistent performance enhancement across pathologic conditions. Using the networks that achieved the median performance over five random seeds, we conducted a McNemar test to compare subset performance and found statistically significant differences for chronic disease, postoperative changes, and encephalomalacia (P < .001, P = .002, and P < .001, respectively).

Histogram equalization was also evaluated. Histogram equalization increases contrast in CT, similar to CT windowing, but does not focus on a narrow range of Hounsfield units. Instead, histogram equalization spreads out frequent intensity values. While histogram equalization did improve the results using the original images, it did not perform as well as CT windowing, achieving a mean AUROC of 0.80 (95% CI: 0.79, 0.81). The sensitivity of each of the 20 labeled subsets with different preprocessing operations is reported in Figure E7 (supplement).

A summary of all results can be found in Figure 5.

Discussion

This study analyzed how upstream image acquisition, reconstruction, and preprocessing affected the performance of an HCT triage CNN, bringing focus to this important part of the imaging pipeline. Image preprocessing was found to have the greatest effect on AUROC, with CT windowing improving the mean AUROC from 0.77 to 0.84. Conversely, reducing the number of projections and skipping reconstruction had a statistically insignificant effect on triage AUROC.

Image Acquisition

The study found a surprising robustness to the number of projections per rotation, indicating that the scanning protocols needed by human interpreters may not be optimal for or required by CNNs. Thus, CNNs may be suited for use cases that require low radiation dose or robustness to artifacts.

Image Reconstruction

Training a model to classify sinograms resulted in performance similar to that of classifying images. On one hand, this result fits intuition: both representations contain nearly the same information, so performance should be similar. However, sinograms are not interpretable by humans, making the CNN's ability to classify sinograms perhaps surprising. Some subtle differences in subset performances were noted, suggesting that different representations may be more fitting for different pathologic conditions. The sinogram model could have clinical import by allowing the positioning of models upstream, which could enable rapid delivery of results, optimize the radiologist's workflow, or highlight the need for additional needed imaging.

Image Preprocessing

We observed a clear performance boost from CT windowing, which came without any additional information input in the CNN. Instead, CT windowing restricted the input information, serving to inject domain expertise into training by forcing the network to evaluate only pixel values likely to display abnormalities. For imaging modalities with no equivalent to CT windowing, histogram equalization may be a more natural normalization step.

Evaluation Using Fine-grained Labels

Themes were observed throughout these experiments that motivate model development using fine-grained labels, including that pathologic condition sensitivity varied greatly across seeds and subsets and that the lowest-performing subsets were often the least frequent. It must be emphasized that in evaluating CNNs for clinical applications, measuring aggregate performance is insufficient. Detailed subset analysis is required. Similarly, when choosing optimal preprocessing operations for a network, decisions should be informed by performance both in aggregate and in critical subsets.

Related Work

Previous studies have explored automated HCT study classification via CNNs to identify a single pathologic condition (23–28) and to detect multiple pathologic conditions (29,30). Pathologic condition–level classification provides useful detail but is often targeted for different use cases than general anomaly detection. In a study involving a general anomaly detection network similar to ours, a CNN trained using triage labels generated from a natural language processing model achieved a 0.73 HCT triage AUROC (13).

Other studies have used the upstream operations explored herein. For example, CNNs have been trained to reconstruct low-noise medical images from high-noise images (31,32). A different study showed that sinograms could detect hemorrhage and identify body regions using a parallel-beam radon transform to simulate sinograms (33). Further, a range of studies have used CT windows to preprocess images (26,33–35).

The present study builds on previous work by comprehensively analyzing many of these upstream operations using the same dataset and high-fidelity sinograms, and assessing how each affected triage performance and performance over many pathologic condition labels.

Limitations

Triage labels were prospectively assigned by the original reading radiologist. Although these labels are clinically valuable, labels assigned by a panel and nuanced characterization of normal versus abnormal labels is important for future work focused on clinical implementation. The labels were also created in the context of the patient history, to which the CNNs did not have access. Additionally, the images used in this study were collected from a single site by a GE scanner to enable modeling using CatSim. Future studies involving data collected from multiple centers and scanner types would be useful in validating and extending our results. A final limitation is that this study leverages a standard CNN architecture, which makes assumptions about the relationships between neighboring pixels that are not valid for a sinogram. While the highly nonlinear nature of these CNNs supports sinogram-based models that perform competitively with image-based ones, future work could explore the results of this study in the context of network architectures specifically designed for use with raw data.

Conclusion

In summary, this study presents an HCT triage CNN that achieved high performance compared with that in the existing literature. The study explored the effect of upstream image processing on downstream CNN performance, with certain factors (eg, number of projections) having a surprisingly small effect and others (eg, CT windowing) having a relatively large effect. Model performance was analyzed using both coarse triage labels and disease subset labels, the latter of which are seldom studied or reported but have important clinical implications. It is hoped that the insights provided by these results will inform both practitioners and developers as they work to build clinically meaningful machine learning models for volumetric imaging modalities.

S.M.H. and J.A.D. contributed equally to this work.

Supported by an industry-sponsored grant from General Electric. Also supported by Defense Advanced Research Projects Agency grants FA86501827865 (SDH) and FA86501827882 (ASED); National Institutes of Health grant U54EB020405 (Mobilize); National Science Foundation grants CCF1763315 (Beyond Sparsity), CCF1563078 (Volume to Velocity), and 1937301 (RTML); Office of Naval Research grant N000141712266 (Unifying Weak Supervision); and the Moore Foundation, NXP, Xilinx, LETI-CEA, Intel, IBM, Microsoft, NEC, Toshiba, TSMC, ARM, Hitachi, BASF, Accenture, Ericsson, Qualcomm, Analog Devices, the Okawa Foundation, American Family Insurance, Google Cloud, Swiss Re, the HAI-AWS Cloud Credits for Research program, and members of the Stanford DAWN project: Teradata, Facebook, Google, Ant Financial, NEC, VMWare, and Infosys. S.M.H. supported by the Fannie and John Hertz Foundation, a National Science Foundation Graduate Research Fellowship grant (DGE-1656518), and as a Texas Instruments Fellow under the Stanford Graduate Fellowship in Science and Engineering. J.A.D. supported by the Intelligence Community Postdoctoral Fellowship. D.M. supported in part by a National Institute of Biomedical Imaging and Bioengineering grant (5T32EB009035).

The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views, policies, or endorsements, either expressed or implied, of DARPA, NIH, ONR, or the U.S. Government. Research reported in this publication was also supported by the National Library of Medicine of the National Institutes of Health under Award Number R01LM012966. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Disclosures of Conflicts of Interest: S.M.H. Activities related to the present article: study supported by a grant from GE. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. J.A.D. Activities related to the present article: institution received grant from GE Healthcare (grant focused on investigating effects of upstream steps on downstream image analysis models). Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. M.P.L. Activities related to the present article: institution received grants from GE Healthcare and Philips; author paid consulting fee/honorarium by Bayer and Microsoft. Activities not related to the present article: author is paid board member for Carestream, Nines Radiology, and Segmed; author has stock/stock options in Segmed, Nines Radiology, Bunker Hill, and Centaur. Other relationships: disclosed no relevant relationships. D.M. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: author is consultant for Segmed; stock/stock option in Segmed. Other relationships: disclosed no relevant relationships. D.L.R. Activities related to the present article: institution has grant from the NIH; author is associate editor of Radiology: AI. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. C.R. disclosed no relevant relationships. A.W. Activities related to the present article: institution received grant from GE Healthcare (partially supported); author's institution receives research funding and support from GE Healthcare, Siemens Healthineers, Varex Imaging, and the NIH. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. B.N.P. Activities related to the present article: institution received grant from GE Healthcare. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships.

Abbreviations:

AUROC: area under the receiver operating characteristic curve
CAM: class activation map
CNN: convolutional neural network
HCT: head CT
WL: window level
WW: window width

References

1.Shen D, Wu G, Suk HI. Deep learning in medical image analysis. Annu Rev Biomed Eng 2017;19(1):221–248. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal 2017;42:60–88. [DOI] [PubMed] [Google Scholar]
3.Zaharchuk G, Gong E, Wintermark M, Rubin D, Langlotz CP. Deep learning in neuroradiology. AJNR Am J Neuroradiol 2018;39(10):1776–1784. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nev, June 27–30, 2016. Piscataway, NJ: IEEE, 2016; 770–778. [Google Scholar]
5.Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, July 21–26, 2017. Piscataway, NJ: IEEE, 2017; 2261–2269 [Google Scholar]
6.Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 [preprint]. https://arxiv.org/abs/1502.03167. Posted February 11, 2015. Accessed August 13, 2020. [Google Scholar]
7.Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv:1412.6980 [preprint]. https://arxiv.org/abs/1412.6980. Posted December 22, 2014. Accessed August 13, 2020. [Google Scholar]
8.Zeiler MD. ADADELTA: an adaptive learning rate method. arXiv:1212.5701 [preprint]. https://arxiv.org/abs/1212.5701. Posted December 22, 2012. Accessed August 13, 2020. [Google Scholar]
9.Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, May 13–15, 2010. Brookline, MA: Microtome Press, 2010; 249–256. [Google Scholar]
10.He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December 7–13, 2015. Piscataway, NJ: IEEE, 2015; 1026–1034. [Google Scholar]
11.Coles JP. Imaging after brain injury. Br J Anaesth 2007;99(1):49–60. [DOI] [PubMed] [Google Scholar]
12.Eisenberg HM, Gary HE Jr, Aldrich EF, et al. Initial CT findings in 753 patients with severe head injury: a report from the NIH Traumatic Coma Data Bank. J Neurosurg 1990;73(5):688–698. [DOI] [PubMed] [Google Scholar]
13.Titano JJ, Badgeley M, Schefflein J, et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat Med 2018;24(9):1337–1341. [DOI] [PubMed] [Google Scholar]
14.Gies M, Kalender WA, Wolf H, Suess C. Dose reduction in CT by anatomically adapted tube current modulation Simulation. I. studies. Med Phys 1999;26(11):2235–2247. [DOI] [PubMed] [Google Scholar]
15.Primak AN, McCollough CH, Bruesewitz MR, Zhang J, Fletcher JG. Relationship between noise, dose, and pitch in cardiac multi–detector row CT. RadioGraphics 2006;26(6):1785–1794. [DOI] [PubMed] [Google Scholar]
16.Goldman LW. Principles of CT: radiation dose and image quality. J Nucl Med Technol 2007;35(4):213, ––225.; quiz 226–228. [DOI] [PubMed] [Google Scholar]
17.Hooper SM, Dunnmon JA, Lungren MP, et al. Assessing robustness to noise: low-cost head CT triage. arXiv:2003.07977 [preprint]. https://arxiv.org/abs/2003.07977. Posted March 17, 2020. Accessed August 13, 2020. [Google Scholar]
18.De Man B, Basu S, Chandra N, et al. CatSim: a new computer assisted tomography simulation environment. In: Hsieh J, Flynn MJ, eds. Proceedings of SPIE: medical imaging 2007—physics of medical imaging. Vol 6510. Bellingham, Wash: International Society for Optics and Photonics, 2007. [Google Scholar]
19.Yi X, Walia E, Babyn P. Generative adversarial network in medical imaging: a review. Med Image Anal 2019;58:101552. [DOI] [PubMed] [Google Scholar]
20.Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. In. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in neural information processing systems 32, 2019. https://papers.nips.cc/paper/2019. [Google Scholar]
21.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011;12:2825–2830. [Google Scholar]
22.Seabold S, Perktold J. Statsmodels: econometric and statistical modeling with Python. In: Proceedings of the 9th Python in Science Conference, Austin, TX; June 28–30, 2010.. [Google Scholar]
23.Chang PD, Kuoy E, Grinband J, et al. Hybrid 3D/2D convolutional neural network for hemorrhage evaluation on head CT. AJNR Am J Neuroradiol 2018;39(9):1609–1616. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Lee H, Yune S, Mansouri M, et al. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat Biomed Eng 2019;3(3):173–182. [DOI] [PubMed] [Google Scholar]
25.Kuo W, Hӓne C, Mukherjee P, Malik J, Yuh EL. Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. Proc Natl Acad Sci U S A 2019;116(45):22737–22745. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Arbabshirani MR, Fornwalt BK, Mongelluzzo GJ, et al. Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. NPJ Digit Med 2018;1(1):9. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Lee EJ, Kim YH, Kim N, Kang DW. Deep into the brain: artificial intelligence in stroke imaging. J Stroke 2017;19(3):277–285. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Lisowska A, Beveridge E, Muir K, Poole I. Thrombus detection in CT brain scans using a convolutional neural network. In: Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies: Volume 2—Bioimaging, Porto, Portugal, February 21–23, 2017. Setúbal, Portugal: SciTePress, 2017; 24–33. [Google Scholar]
29.Chilamkurthy S, Ghosh R, Tanamala S, et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet 2018;392(10162):2388–2396. [DOI] [PubMed] [Google Scholar]
30.Gao XW, Hui R, Tian Z. Classification of CT brain images based on deep learning networks. Comput Methods Programs Biomed 2017;138:49–56. [DOI] [PubMed] [Google Scholar]
31.Chen H, Zhang Y, Kalra MK, et al. Low-dose CT with a residual encoder-decoder convolutional neural network. IEEE Trans Med Imaging 2017;36(12):2524–2535. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Würfl T, Hoffmann M, Christlein V, et al. Deep learning computed tomography: learning projection-domain weights from image domain in limited angle problems. IEEE Trans Med Imaging 2018;37(6):1454–1463. [DOI] [PubMed] [Google Scholar]
33.Lee H, Huang C, Yune S, Tajmir SH, Kim M, Do S. Machine friendly machine learning: interpretation of computed tomography without image reconstruction. Sci Rep 2019;9(1):15540. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Ker J, Singh SP, Bai Y, Rao J, Lim T, Wang L. Image thresholding improves 3-dimensional convolutional neural network diagnosis of different acute brain hemorrhages on computed tomography scans. Sensors (Basel) 2019;19(9):2167. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Lee H, Kim M, Do S. Practical window setting optimization for medical image deep learning. arXiv:1812.00572 [preprint]. https://arxiv.org/abs/1812.00572. Posted December 3. 2018. Accessed August 13, 2020.

[r1] 1.Shen D, Wu G, Suk HI. Deep learning in medical image analysis. Annu Rev Biomed Eng 2017;19(1):221–248. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r2] 2.Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal 2017;42:60–88. [DOI] [PubMed] [Google Scholar]

[r3] 3.Zaharchuk G, Gong E, Wintermark M, Rubin D, Langlotz CP. Deep learning in neuroradiology. AJNR Am J Neuroradiol 2018;39(10):1776–1784. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4] 4.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nev, June 27–30, 2016. Piscataway, NJ: IEEE, 2016; 770–778. [Google Scholar]

[r5] 5.Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, July 21–26, 2017. Piscataway, NJ: IEEE, 2017; 2261–2269 [Google Scholar]

[r6] 6.Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 [preprint]. https://arxiv.org/abs/1502.03167. Posted February 11, 2015. Accessed August 13, 2020. [Google Scholar]

[r7] 7.Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv:1412.6980 [preprint]. https://arxiv.org/abs/1412.6980. Posted December 22, 2014. Accessed August 13, 2020. [Google Scholar]

[r8] 8.Zeiler MD. ADADELTA: an adaptive learning rate method. arXiv:1212.5701 [preprint]. https://arxiv.org/abs/1212.5701. Posted December 22, 2012. Accessed August 13, 2020. [Google Scholar]

[r9] 9.Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, May 13–15, 2010. Brookline, MA: Microtome Press, 2010; 249–256. [Google Scholar]

[r10] 10.He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December 7–13, 2015. Piscataway, NJ: IEEE, 2015; 1026–1034. [Google Scholar]

[r11] 11.Coles JP. Imaging after brain injury. Br J Anaesth 2007;99(1):49–60. [DOI] [PubMed] [Google Scholar]

[r12] 12.Eisenberg HM, Gary HE Jr, Aldrich EF, et al. Initial CT findings in 753 patients with severe head injury: a report from the NIH Traumatic Coma Data Bank. J Neurosurg 1990;73(5):688–698. [DOI] [PubMed] [Google Scholar]

[r13] 13.Titano JJ, Badgeley M, Schefflein J, et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat Med 2018;24(9):1337–1341. [DOI] [PubMed] [Google Scholar]

[r14] 14.Gies M, Kalender WA, Wolf H, Suess C. Dose reduction in CT by anatomically adapted tube current modulation Simulation. I. studies. Med Phys 1999;26(11):2235–2247. [DOI] [PubMed] [Google Scholar]

[r15] 15.Primak AN, McCollough CH, Bruesewitz MR, Zhang J, Fletcher JG. Relationship between noise, dose, and pitch in cardiac multi–detector row CT. RadioGraphics 2006;26(6):1785–1794. [DOI] [PubMed] [Google Scholar]

[r16] 16.Goldman LW. Principles of CT: radiation dose and image quality. J Nucl Med Technol 2007;35(4):213, ––225.; quiz 226–228. [DOI] [PubMed] [Google Scholar]

[r17] 17.Hooper SM, Dunnmon JA, Lungren MP, et al. Assessing robustness to noise: low-cost head CT triage. arXiv:2003.07977 [preprint]. https://arxiv.org/abs/2003.07977. Posted March 17, 2020. Accessed August 13, 2020. [Google Scholar]

[r18] 18.De Man B, Basu S, Chandra N, et al. CatSim: a new computer assisted tomography simulation environment. In: Hsieh J, Flynn MJ, eds. Proceedings of SPIE: medical imaging 2007—physics of medical imaging. Vol 6510. Bellingham, Wash: International Society for Optics and Photonics, 2007. [Google Scholar]

[r19] 19.Yi X, Walia E, Babyn P. Generative adversarial network in medical imaging: a review. Med Image Anal 2019;58:101552. [DOI] [PubMed] [Google Scholar]

[r20] 20.Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. In. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in neural information processing systems 32, 2019. https://papers.nips.cc/paper/2019. [Google Scholar]

[r21] 21.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011;12:2825–2830. [Google Scholar]

[r22] 22.Seabold S, Perktold J. Statsmodels: econometric and statistical modeling with Python. In: Proceedings of the 9th Python in Science Conference, Austin, TX; June 28–30, 2010.. [Google Scholar]

[r23] 23.Chang PD, Kuoy E, Grinband J, et al. Hybrid 3D/2D convolutional neural network for hemorrhage evaluation on head CT. AJNR Am J Neuroradiol 2018;39(9):1609–1616. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r24] 24.Lee H, Yune S, Mansouri M, et al. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat Biomed Eng 2019;3(3):173–182. [DOI] [PubMed] [Google Scholar]

[r25] 25.Kuo W, Hӓne C, Mukherjee P, Malik J, Yuh EL. Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. Proc Natl Acad Sci U S A 2019;116(45):22737–22745. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r26] 26.Arbabshirani MR, Fornwalt BK, Mongelluzzo GJ, et al. Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. NPJ Digit Med 2018;1(1):9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r27] 27.Lee EJ, Kim YH, Kim N, Kang DW. Deep into the brain: artificial intelligence in stroke imaging. J Stroke 2017;19(3):277–285. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r28] 28.Lisowska A, Beveridge E, Muir K, Poole I. Thrombus detection in CT brain scans using a convolutional neural network. In: Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies: Volume 2—Bioimaging, Porto, Portugal, February 21–23, 2017. Setúbal, Portugal: SciTePress, 2017; 24–33. [Google Scholar]

[r29] 29.Chilamkurthy S, Ghosh R, Tanamala S, et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet 2018;392(10162):2388–2396. [DOI] [PubMed] [Google Scholar]

[r30] 30.Gao XW, Hui R, Tian Z. Classification of CT brain images based on deep learning networks. Comput Methods Programs Biomed 2017;138:49–56. [DOI] [PubMed] [Google Scholar]

[r31] 31.Chen H, Zhang Y, Kalra MK, et al. Low-dose CT with a residual encoder-decoder convolutional neural network. IEEE Trans Med Imaging 2017;36(12):2524–2535. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r32] 32.Würfl T, Hoffmann M, Christlein V, et al. Deep learning computed tomography: learning projection-domain weights from image domain in limited angle problems. IEEE Trans Med Imaging 2018;37(6):1454–1463. [DOI] [PubMed] [Google Scholar]

[r33] 33.Lee H, Huang C, Yune S, Tajmir SH, Kim M, Do S. Machine friendly machine learning: interpretation of computed tomography without image reconstruction. Sci Rep 2019;9(1):15540. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r34] 34.Ker J, Singh SP, Bai Y, Rao J, Lim T, Wang L. Image thresholding improves 3-dimensional convolutional neural network diagnosis of different acute brain hemorrhages on computed tomography scans. Sensors (Basel) 2019;19(9):2167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r35] 35.Lee H, Kim M, Do S. Practical window setting optimization for medical image deep learning. arXiv:1812.00572 [preprint]. https://arxiv.org/abs/1812.00572. Posted December 3. 2018. Accessed August 13, 2020.

PERMALINK

Impact of Upstream Medical Image Processing on Downstream Performance of a Head CT Triage Neural Network

Sarah M Hooper, MS

Jared A Dunnmon, PhD

Matthew P Lungren, MD

Domenico Mastrodicasa, MD

Daniel L Rubin, MD

Christopher Ré, PhD

Adam Wang, PhD

Bhavik N Patel, MD, MBA

Abstract

Purpose

Materials and Methods

Results

Conclusion

Summary

Key Points

Introduction

Figure 1:

Materials and Methods

Dataset

Labeling

Synthetic Dataset Generation

Figure 2:

Model Training

Infrastructure

Performance and Statistical Analyses

Results

Image Acquisition

Figure 3:

Image Reconstruction

Image Preprocessing

Figure 4:

Figure 5:

Discussion

Image Acquisition

Image Reconstruction

Image Preprocessing

Evaluation Using Fine-grained Labels

Related Work

Limitations

Conclusion

Abbreviations:

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases