Deep learning of spatiotemporal filtering for fast super-resolution ultrasound imaging

Katherine G Brown; Debabrata Ghosh; Kenneth Hoyt

doi:10.1109/TUFFC.2020.2988164

. Author manuscript; available in PMC: 2021 Sep 1.

Published in final edited form as: IEEE Trans Ultrason Ferroelectr Freq Control. 2020 Apr 15;67(9):1820–1829. doi: 10.1109/TUFFC.2020.2988164

Deep learning of spatiotemporal filtering for fast super-resolution ultrasound imaging

Katherine G Brown ¹, Debabrata Ghosh ², Kenneth Hoyt ³

PMCID: PMC7523282 NIHMSID: NIHMS1624145 PMID: 32305911

Abstract

Super-resolution ultrasound imaging (SR-US) is a new technique that breaks the diffraction limit and allows visualization of microvascular structures down to tens of microns. Image processing methods for the spatiotemporal filtering needed in SR-US, such as singular value decomposition (SVD), are computationally burdensome and performed off-line. Deep learning has been applied to many biomedical imaging problems and trai16ned neural networks have been shown to process an image in milliseconds. The goal of this study was to evaluate the effectiveness of deep learning to realize a spatiotemporal filter in the context of SR-US processing. A 3D convolutional neural network (3DCNN) was trained on in vitro and in vivo datasets using SVD as ground truth in tissue clutter reduction. in vitro data was obtained from a tissue-mimicking flow phantom and in vivo data was collected from murine tumors of breast cancer. Three training techniques were studied: training with in vitro datasets, training with in vivo datasets, and transfer learning with initial training on in vitro datasets followed by fine-tuning with in vivo datasets. The neural network trained with in vitro datasets followed by fine-tuning with in vivo datasets had the highest accuracy at 88.0%. The SR-US images produced with deep learning allowed visualization of vessels as small as 25 μm in diameter, which is below the diffraction limit (wavelength of 110 μm at 14 MHz). The performance of the 3DCNN was encouraging for real-time SR-US imaging with an average processing frame rate for in vivo data of 51 Hz with GPU acceleration.

Keywords: contrast agents, contrast-enhanced ultrasound, deep learning, microbubbles, super-resolution ultrasound

I. INTRODUCTION

THE assessment of angiogenesis in cancerous tissue improves prediction of patient outcome and allows monitoring of response to treatment [1]–[3]. The small size (< 100 μm) and low flow (1 – 10 mm/sec) characteristics of the microvasculature require resolution unavailable with the present clinical tools, e.g. computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound (US). Super-resolution US (SR-US) imaging holds promise in that it can achieve spatial resolutions below the diffraction limit, at a tenth of the US wavelength (λ/10) and down to tens of microns [4], [5]. This approach relies on precisely localizing the centers of the microbubble (MB) contrast agents as they flow intravascularly. Progressive detection of these MBs through a time series of contrast-enhanced US (CEUS) images allows for the visualization of the microvasculature morphology and tissue perfusion [6]–[8]. SR-US has also demonstrated potential in imaging renal and brain perfusion and tumor angiogenesis in small animals [9]–[11].

The more computationally intensive processing steps of SR-US include the localization of the centroids of the detected MBs and tissue clutter reduction, commonly with spatiotemporal filtering. Spatiotemporal filtering of images segments the MBs from the tissue signal [12], [13]. Localization of the precise center of each agent is determined typically by a center of mass algorithm or convolution with the 2-dimensional (2D) point spread function (PSF) of the US system to obtain the centroid [8], [14]. The final step is the accumulation of detected events across multiple frames to produce an SR-US image that details microvascular features in a region-of-interest (ROI).

The real-time nature of US imaging is critical in cancer biopsy, needle aspiration of lymph nodes, and drug delivery [15]–[17]. For these procedures, a needle is precisely guided to a site within the tissue for biopsy or injection as a clinician observes images. In the biopsy of the prostate, up to 39% of patients have tumors that are isoechoic with normal tissue and “systematic needle biopsies” are performed by sampling at several sites across the organ [18]. The cancer may be missed by sampling and multiple biopsy sessions may be required to conclude a diagnosis. Recently MRI has been used to direct US-guided biopsies to suspicious sites [19]. Alternatively, a real-time implementation of SR-US could reveal angiogenic networks, which are biomarkers of cancer and have the potential to guide biopsies to suspicious tissue using a single imaging modality.

A spatiotemporal filter is effective at segmenting the tissue signal from that of MBs as tissue and MBs have quite distinct signatures. More specifically, tissue has a strong spatial correlation and relatively low motion, while MBs and blood exhibit a very low spatial correlation and considerably more motion, ranging from slow to fast. The different methods for spatiotemporal filtering include a finite impulse response (FIR) clutter filter [1], singular value filtering (SVF) [13], and the widely used singular value decomposition (SVD) [12].

Of importance, principle component analysis methods such as SVD and SVF transform the input data from two dimensions of space and one of time (2+1D) into a 2D Casorati matrix, with one dimension of space and the other of time. The effectiveness of SVD applied to an image sequence relies on the full ensemble of data, for SR-US thousands of image frames, to be captured prior to begin SVD matrix processing, eliminating the possibility of executing the SR-US processing flow in real time while scanning. Furthermore, processing is slow, requiring minutes to hours to form the final SR-US image, depending on the number of frames to be processed [13], [20], [21]. Additionally, the selection of a threshold level for SVD spatiotemporal filtering requires empirical analysis and tuning, which limits the sensitivity of MB detection [9].

Deep learning has brought advances to speech and image recognition and medical imaging by discovering complex structures in data using trained neural network models. Deep learning is a type of representation learning based on neural networks, while representation learning is a type of machine learning [22]. Neural networks are considered deep networks when they contain more than one hidden layer. Multilayer feedforward neural networks, given sufficient hidden nodes and training data, are arbitrarily precise universal function approximators [23]. This capability makes deep neural networks suitable for modeling spatiotemporal filtering. Deep networks can take hours or days to train, but trained networks functioning as models have been shown to process an image in milliseconds. Neural networks have an additional advantage in that they do not require manual tuning [24], [25].

The challenge for a neural network to detect MBs is in distinguishing these objects that are only a few pixels in size in the image from noise. Distinguishing features of MBs are their appearance in successive frames within a localized area, and their high spatiotemporal correlation, whereas noise has a more random distribution within frames and low spatiotemporal correlation. On the other hand, tissue remains mainly static, and it can be recognized by this characteristic. This problem of detecting moving MBs is similar to detecting and recognizing human actions. A 3-dimensional convolutional neural network (3DCNN) was shown to be optimal for human action recognition in airport surveillance video sequences [26]. That study used short video clips of 7 frames to train a 3DCNN which performed better than standard algorithms as well as a 2DCNN in the recognition of action sequences. The authors found that the time sequence of 7 frames was the critical element in improving the performance of segmentation of moving targets. The effectiveness of the 3DCNN architecture on human action recognition suggests that a 3DCNN, in effect working as 2+1D, could be trained to identify MBs.

Deep learning was first applied to the localization step of SR-US [27], [28], and more recently applied to the tissue decluttering step [29], [30], [31]. The purpose of this study is to evaluate the performance of a 3DCNN as a spatiotemporal filter for rapid MB segmentation and formation of in vivo SR-US images acquired using an animal model of breast cancer. Preliminary results of this study, limited to the network trained on in vitro data, were reported in a recent paper [31], whereas this study includes training with in vivo data, alone and as a fine-tuning step after in vitro training, as well as measurements of vessels at SR-US resolution.

II. MATERIALS AND METHODS

A. Ultrasound Imaging

Images were collected with a clinical US scanner (Acuson Sequoia 512, Siemens Healthcare, Mountain View, CA) equipped with a 15L8 linear array transducer (Siemens). A contrast imaging mode with a low mechanical index (MI) of 0.2 was used to minimize disruption of MBs. The images were acquired at 15 frames per second (fps) at a center frequency of 14 MHz. All image data were saved for offline processing.

B. In vitro Experiments

For in vitro studies and network training, a 10% gelatin and 1% scatterer mixture (w/v) was heated to 50 °C and then poured into a rigid mold threaded with a Teflon wire. After 8 h of refrigeration and phantom solidification, the wire was removed leaving a hollow void representative of a small vessel. The channel had a diameter of 2.25 mm at a depth of 17 mm. MBs were made in our lab according to conventional methods [10]. Filtered water mixed with MBs at a concentration of 6 × 10⁶ MB/mL was pumped through the phantom at slow flow rates (Model 77200–60, Cole-Palmer) to mimic microvascular flow. Five datasets were captured at flow rates between 4 cm/sec and 5 cm/sec. Each dataset collected consisted of 1000 frames. To validate SR-US vessel measurements, an in vitro phantom consisting of a Silastic tube (Dow Corning, Midland, MI) with an internal diameter of 300 μm in a water bath was imaged with a MB solution of 2 uL contrast agent (Definity, Lantheus Medical Imaging, N Billerica, MA) in 400 uL of saline flowing from a syringe pump.

C. In vivo Experiments

Female athymic nude mice (N = 3; Charles River Laboratories, Wilmington, MA) were implanted with 2 million human breast cancer cells in the mammary fat pad (MDA-MB-231, ATCC, Manassas, VA). After about 4 weeks of tumor growth, the animals were anesthetized with 2% isoflurane and injected with a bolus of Definity. The bolus consisted of 2.5 × 10⁷ MBs in 60 μL saline and was injected via a tail vein catheter. A 10-min sequence of dynamic CEUS images were acquired for each animal (15 frames/sec) for offline processing [11]. Images were cropped to contain only the ROI, which was 180 × 260 pixels on average. All animal experiments were approved by the Institutional Animal Care and Use Committee (IACUC) at the University of Texas at Dallas.

D. SR-US image processing

A flowchart of the SR-US image processing strategy for both in vitro and in vivo images is illustrated in Fig. 1. Briefly, a sequence of images, after an optional motion compensation step, is processed by a spatiotemporal filter to remove tissue clutter. Filtering is followed by upsampling of the image and localization of MBs detected in the filtering step. The final step is the accumulation of MB detections in a high-resolution SR-US image. For the in vivo datasets, respiration motion of up to 270 μm was eliminated with a curve fit based image filtering technique [11] prior to spatiotemporal filtering. The in vitro datasets displayed minimal motion (less than 50 μm) and did not require this step.

The reference method for ground truth SR-US processing differed from the neural network approach only in the MB segmentation step. For the reference images, the MB segmentation was performed using spatiotemporal filtering with SVD. For the neural network approach, MB segmentation was performed based on predictions by a trained 3DCNN of whether a MB present or absent for each image patch in all image frames of the image stack.

The details for the SVD processing was as described earlier by our group [11]. Briefly, the largest 5% of singular values, representing the tissue signal, were computed using an optimized algorithm for SVD. The threshold in determining the largest signal values was based on the highest value of the contrast-to-noise ratio (CNR) metric when applied to the SR-US images. The contributions to the signal of these highest singular values was subtracted from the total signal [12].

Spatiotemporal filtering based on deep network prediction started with finding all potential MB locations by thresholding each frame. The threshold was selected as a compromise between noise rejection and sensitivity to weak MB echoes. For each potential MB location, a 9 × 9 × 9 pixel patch centered on that location was created, becoming the input to the 3DCNN. For each patch with a positive result from segmentation, the pixel values of the original frame at the location of the 9 × 9 pixel image patch were copied into a working frame. After segmentation, the working frame contained a filtered image frame with intensities matching the original input frame for each pixel patch region around segmented MBs. This method captured the PSF around the detected MB within the frame.

For both the reference method and neural network approach, the precise locations of the MBs in the filtered image stack were determined after 2D linear interpolation and centroid location by a center of mass method [32]. Image frames were upsampled by a factor of 8 resulting in a resolution of 7 μm per pixel. Finally, MB locations were accumulated to form a single SR-US image frame.

SR-US images formed by the neural network approach were compared to those formed with the ground truth method. Vessel diameters were measured at full width at half maximum (FWHM) from the SR-US images after spatial interpolation to a grid with 7 μm resolution. Maximum intensity projection (MIP) images were formed from the image stacks after MB segmentation and prior to centroid localization. Comparing the MIP images from the ground truth method and from the neural network approach allows assessment of the effectiveness of each spatiotemporal filtering method prior to MB localization.

E. Proposed 3DCNN Architecture

The proposed 3DCNN architecture has a 9 × 9 × 9-pixel patch input size (9 × 9 in 2D space, and 9 frames deep) and an output with 2 possible states, indicating the likelihood of the presence or absence of a MB in the center of the patch. The 3DCNN had three convolutional layers and two fully-connected layers. The convolutional kernels used were 3 × 3 × 3 in the convolutional layers, and the features per layer were 8, 16, and 16 for the 3 convolutional layers respectively, and 128 and 2 for the fully-connected layers. A dropout rate of 0.8, and a rectified linear unit (ReLU) activation function was used in each layer. These layers were followed by a soft-max layer and a cross-entropy loss calculation. Table I summarizes the neural network architecture parameters for each layer. The network implementation was based on CNN code openly distributed from this source: https://github.com/hagaygarty/mdCNN as well as custom MATLAB software (MathWorks Inc, Natick, MA). The batch size used was 500 images, and batch validation was performed on 1000 images. For training, the initial learning rate was 0.05, and decreased by half after 50 batch iterations with no improvement in performance, to a minimum value of 0.00001. For the network that was fine-tuned with training using in vivo datasets after the training using in vitro datasets, the initial learning rate when beginning the training with in vivo data was 0.005 decreasing to a minimum value of 0.00001. Training was stopped upon reaching the minimum value of learning rate.

TABLE I.

3-dimensional convolutional neural network (3DCNN) architecture with a 8 × 9 × 9 input patch size, 3 convolutional layers each using a 3 × 3 × 3 kernel followed by a two fully connected layers and having two output states.

Layer	Type	Input	Kernel	Features	Output
data	input	9 × 9 × 9			9 × 9 × 9
conv1	convolution	9 × 9 × 9	3 × 3 × 3	8	7 × 7 × 7
conv2	convolution	7 × 7 × 7	3 × 3 × 3	16	5 × 5 × 5
conv3	convolution	5 × 5 × 5	3 × 3 × 3	16	3 × 3 × 3
fc1	fully connected			128
fc2	fully connected			2

Open in a new tab

F. Training, Validation and Testing

Three training techniques were studied on the proposed 3DCNN architecture: training with in vitro datasets, training with in vivo datasets, and transfer learning with initial training with in vitro datasets followed by fine-tuning with in vivo datasets. The input patches for all training data for the neural network approach were labelled as to the presence or absence of a MB at the center of the patch based on the results of SVD processing. Experimental data was split into training/validation data and test data in a ratio of 4:1 by excluding one test trial, or test subject for in vivo data, for use in testing. The training / validation data was further divided to reserve one tenth as validation data and the remaining 90% for training data. With this approach, test data was not used in training or validation.

For in vitro training, each fold of the five-fold cross-validation was trained on an average of 573,000 image patches in a 50/50 ratio to attain balanced classes. For in vivo training, each fold was trained on an average of 1,900,000 image patches in a 50/50 split ratio to attain balanced classes. Image patches were randomly selected to be flipped on horizontal or vertical axes prior to presentation to the network.

Five-fold cross-validation was used to validate each of the trained 3DCNNs. Briefly, for the first fold of five-fold cross-validation, the 3DCNN network was trained on images from four experiments. After training, the performance was assessed for the model from a fifth dataset which was not used in training. This was repeated for the four other folds, each time selecting a different group of four experiments for training data and the fifth for test data. For the network fine-tuned with in vivo data, four folds of in vivo data were used for fine tuning a network previously trained with in vitro data, and the fold not used in in vivo fine tuning was used in validation. In addition, the network trained on in vitro data with the highest accuracy from cross-validation on in vitro data, was tested for its performance on in vivo data consisting of 1,500,000 patches.

The performance metrics for the neural network approach were accuracy, sensitivity, and specificity, averaged over the results from five-fold cross-validation. Accuracy was measured as the number of accurate predictions of the presence or absence of a MB at the center of each input patch divided by the total number of predictions. Sensitivity was measured as the number of positive identifications of MBs divided by the total number of patches labeled as MBs. Specificity was measured as the correct identifications of absence of a MB divided by the total number of patches labeled as no MB present.

To test the trained neural networks in the SR-US context, SR-US images were formed with the MB segmentation output of each 3DCNN and compared with the reference image from ground truth signal processing for three tumors from three subjects. The performance of each network was assessed on in vivo data that the network had not seen in training. Additionally, the SR-US images from the three tumors processed with the ground truth method and the neural network approach were compared qualitatively as to the level of detail visible on resolved vessels in the image as a whole and in areas where super-resolved vessels were located. This comparison was made at both 55 μm (λ/2) and 7 μm (λ/16) resolution. Vessel diameter measurements were made and compared using the higher resolution 7 μm images.

III. Results

The cross-validation of the 3DCNN trained with in vitro CEUS data was found to have greater than 95% accuracy. This reflects the network’s ability to detect MBs in a simple in vitro environment. For comparison, a cross-validation of a 3DCNN trained with in vitro B-mode US data was found to have similarly high accuracy, as shown in Table II, and confirms the methodology of using a neural network as a spatiotemporal filter to detect MB. As a further validation step, both B-mode US and CEUS image stacks of an in vitro model with a 300 μm channel were processed to form SR-US images using an in vitro trained 3DCNN, shown in Fig. 2. The measured channel width in the SR-US image from CEUS data was 298.3 μm at a pixel resolution of 7 μm.

TABLE II.

MB segmentation performance results on five-fold cross validation

Training Method	Accuracy	Sensitivity	Specificity
in vitro CEUS training	95.9 ± 1.7%	98.2 ± 1.8%	93.8 ± 3.0%
in vitro B-mode US training	97.8 ± 0.1%	96.7 ± 0.2%	99.0 ± 0.2%

Open in a new tab

Fig. 2. — Contrast-enhanced US (CEUS) maximum intensity projection (MIP) image (left) and SR-US images based on detection of MB with 3DCNN trained on *in vitro* data from a phantom with a 300 μm channel using B-mode US imaging (middle) and nonlinear CEUS imaging (right).

Testing results assessed on in vivo data for the three methods of 3DCNN training with CEUS data are summarized in Table III as an average and standard deviation of accuracy, sensitivity, and specificity. Nearly the same high accuracies, sensitivities and specificities were attained by the networks trained on in vivo data and those fine-tuned with in vivo data. The results differ by less than the standard deviation of the validation accuracies. The in vitro trained network has the lowest accuracy in segmenting MBs in in vivo data, as might be expected.

TABLE III.

MB segmentation performance results for three CEUS trained 3DCNNS on in vivo CEUS data

Training Method	Accuracy	Sensitivity	Specificity
in vitro training	66.7 ± 5.5%	76.5 ± 4.4%	59.6 ± 9.3%
in vivo training	84.3 ± 6.0%	84.7 ± 5.4%	83.8 ± 14.3%
in vitro training / in vivo tuning	88.0 ± 0.5%	82.9 ± 1.0%	93.0 ± 1.2%

Open in a new tab

From the CEUS-derived MIP images depicted in Fig. 3, it is clear that the larger and smaller vessels visible in the ground truth method are present in the MB segmentation results from each of the trained 3DCNNs. The resolution of MIP images is limited by the diffraction limit of the US system. Thus, smaller vessels are not expected to be clearly delineated in the MIP images. While the ground truth image is of higher intensity than the 3DCNN based images, it does not seem to have much additional detail. However, the ground truth image does have more regions of higher signal strength, representing a greater number of MBs detected.

The results from the full SR-US processing flow outlined in Fig. 1 (spatial resolution of 55 μm) for the ground truth method as compared to the three trained neural networks is depicted in the images in Fig. 4. All three SR-US images based on 3DCNN processing for MB segmentation have visible details of vessels while they lack a large number of MB detections. The additional MB detections in the reference image may be responsible for obscuring vessels at the top of the tumor.

A detailed view of the region in the box in Fig. 4 is shown in Fig. 5 at a resolution of 7 μm for each of the SR-US images. A high level of detail is visible in all four images. A SR-US resolvable vessel with a profile line marked in the reference image is quite clearly delineated in all the 3DCNN images. The in vivo trained 3DCNN image shows a slightly greater level of low intensity pixels than the other two trained 3DCNN images. Further, as can be seen in Fig. 5, there are more MBs detected in the reference image than the 3DCNN images.

Measurements were made for all four SR-US in vivo test images of a representative small vessel, shown with a dashed line in Fig. 5, at a resolution of 7 μm. The results of vessel measurements for all four techniques is shown in Fig. 6. The representative vessel in the reference image at FWHM is 25 μm in diameter. The FWHM vessel measurements for the 3DCNN images varied from 23 μm for the image from the 3DCNN fine-tuned with in vivo datasets, 31 μm for the 3DCNN trained with in vitro datasets, and to 36 μm for the 3DCNN trained with in vivo datasets. The 3DCNN results are comparable to the reference method given a pixel resolution of 7 μm.

Fig. 6. — Microvessel profiles for SR-US images measured at full-width half-maximum (FWHM). With the ground truth method, the vessel measured 25 μm (black -). The vessel measured 31 μm based on a 3DCNN trained with in vitro datasets (blue …), 36 μm when trained with in vivo datasets (blue o), and 23 μm when trained with in vitro and fine-tuned with in vivo datasets (blue - -).

Overall, the 3DCNN fine-tuned on in vivo data performed the best in terms of highest accuracy, sensitivity and specificity, and greater microvascular detail in the SR-US images than the ground truth, when assessed quantitatively. The 3DCNN network fine-tuned on in vivo data had the closest measurement of vessel diameter when compared to the reference image. To test the consistency and extensibility of these results, two additional tumors were processed with the ground truth method as well as the neural network approach for spatiotemporal filtering in SR-US. For each tumor, a network was trained with in vitro data and then fine-tuned on in vivo data that did not include the specific tumor. The SR-US images are shown in Fig. 7. The two additional tumors display different levels of vascularity. There are many more vessels visible in Fig. 7a compared to Fig. 7d, which reveals a highly necrotic core. The development of necrosis, starting at the center of the tumor and progressing outwards, is typical of this type of tumor at an advanced stage. The FWHM measurements of the ground truth method for the indicated microvessel in the first tumor (Fig. 7a) was 25 μm (Fig. 7b) and for the 3DCNN method 17 μm (Fig. 7c). The measurements for the second tumor (Fig. 7d) were 32 μm (Fig. 7e) and 39 μm (Fig. 7f) respectively. The profiles for the vessel measurements shown in Fig. 8 reveal the differences in each case are approximately one pixel, which is the resolution of the measurement. In one case, the 3DCNN method had the smaller measurement, but in the other, the ground truth measurement was smaller.

Fig. 7. — SR-US images from two additional murine tumors a) and d) at a resolution of 55 μm and details at a resolution of 7 μm for b) and e) the ground truth method and c) and f) based on prediction of a 3DCNN trained with in vitro datasets and fine-tuned with in vivo datasets. The microvessel FWHM profiles at were measured as b) 25 μm and c) 17 μm and as e) 33 μm and f) 33 μm. The scale bars represent 1 mm.

Fig. 8. — Vessel profile measurements from SR-US images from two additional murine tumors comparing the ground truth method with one based on prediction of a 3DCNN trained with in vitro datasets and fine-tuned with in vivo datasets. The microvessel FWHM profiles at were measured on left as 25 μm and 17 μm and on right as 32 μm and 39 μm for the two methods respectively.

The processing time for prediction of MB segmentation by the proposed 3DCNN was assessed on both in vitro and in vivo images on a single CPU, and with GPU (Nvidia GeForce RTX 2080 Ti) acceleration. Each frame of image data is broken down into a group of image patches, each patch surrounding a potential MB. The group of patches from an image frame is then processed by the 3DCNN. The proposed architecture requires 9 image frames to process the center frame of the 9, given the input patch is 9 × 9 × 9 pixels. The calculation can be pipelined, and processing can begin after a delay of 9 frames: the current frame is in the center of 9 frames, with 4 frames of history and 4 frames into the future. Thus, there is a 4-frame latency in addition to the time for computation. The average processing frame rate for in vivo tumor data for MB segmentation with a frame size of 180 × 260 pixels was 51 Hz with GPU acceleration which was approximately 3 fold faster than with a single CPU.

IV. Discussion

The use of deep learning in segmentation of the MB signal from the tissue background was investigated in the context of SR-US processing. The results showed that a 3DCNN can discern the 2+1D features of MBs in in vivo CEUS datasets. Deep networks trained on spatiotemporal filtering with in vivo CEUS data displayed high accuracy, sensitivity and specificity and effectively allowed measurement of microvasculature in SR-US imaging that is comparable to current methods. The neural network approach has an additional advantage of reduced computation time.

Van Sloun et al. trained a deep network to realize the centroid localization processing step in SR-US [27], [28]. They trained a deep network for MB localization with data from a simulation, rather than in vitro or in vivo data, and reported that local error averaged 0.0375 mm, close to the pixel size. Recently, the same group took a deep learning approach to tissue clutter filtering, implementing principal components analysis with an unrolled CNN architecture, and that was applied in addition to a deep learning method for localization [33], [29]. The authors showed improvements in performance as well as better generalization and noise suppression when the spatiotemporal filter implemented with deep learning was performed prior to localization. Our study extends our earlier work in applying deep learning to SVD spatiotemporal filtering for SR-US with a 3DCNN to training on both in vitro and in vivo data, and compares vessel measurements made on SR-US images formed with deep learning techniques to those made with conventional processing methods.

In this study, a 3DCNN was trained to realize a spatiotemporal filter in a model that represents the training data observed. The accuracy of its output at 88.0%, compared to the ground truth method, indicates that much of the operation of the SVD filter has been captured by the network by recognizing the pattern of MB as distinct from tissue and noise. As the receptive field of the 3DCNN of 9 × 9 pixels is much smaller than that for SVD, which uses the entire image, the filter may be understood to have learned a simpler nonlinear filter with compact support that yields similar results. Deeper architectures that can capture more complexity of the SVD filter could improve accuracy, sensitivity, and specificity. A residual network architecture allows deeper networks to be trained than standard CNNs and is being considered for future work [34]. How each layer of the network achieves this result may be explored by examining the features at each layer with visualization tools [35].

A study of machine learning approaches was performed prior to selecting the proposed deep network. In unreported data, support vector machines and 2D convolutional neural networks were found to have accuracies below 60% when trained on in vitro datasets, leading to the proposal of a 3DCNN solution. The architectural details of the 3DCNN were then explored.

Of note, as the 3D input patch size was slowly increased the accuracy improved while processing time lengthened. Architecture explorations of patch sizes of 3 × 3 × 5, 5 × 5 × 9, 7 × 7 × 9 and 9 × 9 × 9 pixels revealed that 90% accuracy was not achieved until a patch size of 9 × 9 × 9 pixels was reached. This is understood to be the result of including more of the PSF of the MB and its surrounding environment in the patch as patch size increased, which was believed to aid in feature extraction by the network. However, increasing patch size comes at a computational cost as the number of multiply-accumulate calculations increases as each dimension of the patch is increased. From this trade-off, a 9 × 9 × 9 patch was selected.

The number of layers and the number of features per layer were also investigated to arrive at an optimized architecture. Pruning the 9 × 9 × 9 network parameters of the convolutional layers, layer1, layer2, and layer 3, revealed decreasing accuracy with a decrease in the number of features per layer. Decreasing hidden nodes in this way was understood to limit the complexity of the filter which might be realized. Increasing the number of features per layer decreased accuracy, which was believed to be the result of insufficient training data.

In vitro data from a vascular flow phantom was chosen to begin training the network as large amounts of data could be easily acquired. In vitro images with bright MB echoes transiting through a dark background of water in a channel were expected to be simple to detect. Furthermore, such training data is more representative of MBs in vivo than simulated US data. Inherently, the variation of PSF with polydisperse MBs, US system noise, and variations in environmental parameters during collection of in vitro data are shared with in vivo experiments. These factors may enhance the accuracy of MB segmentation.

Transfer learning is a technique that deploys a pre-trained neural network on a new problem. With this technique, the weights of a network trained with non-representative images are used as a starting point in training a network with representative images. The first layers of a neural network trained on images from any source and type learn the same features, namely edge and texture features similar to those produced by Gabor filters [35]. This was demonstrated by Shin et al. who used popular networks such as GoogLeNet trained on non-medical images to operate on medical images [36]. It is useful when large amounts of annotated training data are not available, as may be the case in medical imaging [36], [37].

Transfer learning from the same US imaging modality may have advantages over transfer learning from general images or medical images from other modalities. The object to be detected, an US contrast agent, would not be represented in images other than those from US. The similarity of the CEUS data from the in vitro environment compared to the in vivo is expected to help in the transfer of learning. An additional advantage to training on in vitro data is that it is generally easier to obtain, and more training data improves network accuracy.

The network trained with only in vitro data, while not as accurate in segmentation of MB as the networks trained with in vivo data, was able to perform effectively in creating SR-US images with slight degradations compared to the reference images. The implication is that some amount of missing MB localizations may be tolerated in SR-US images. The network trained with both in vitro and in vivo data showed the greatest accuracy, indicating that transfer learning from in vitro trained networks is helpful. The constrained environment of the in vitro model, with bright MBs flowing through a dark background of water, results in a high contrast image that is believed to aid in developing a network architecture and in transfer learning.

Spatiotemporal filtering using SVD requires the use of the entire image stack of up to several minutes in order to perform the calculation to separate the MBs from the tissue. A full SVD is needed to optimally select a threshold for each dataset. For this study, it took an average of 9.7 minutes to compute a full SVD on the in vivo image stacks with a conventional method. Optimized SVD computation, which calculates only the needed singular values higher than the threshold selected, took an average of 2.7 minutes.

Recently, improvements have been proposed to SVD processing that achieve real time, from 10 to 24 Hz, on short ensemble lengths of 50 and 16 respectively [38], [39]. However, given that SVD complexity is of O (MN²) Floating Point Operations (FLOPS) for a matrix of M x N, with M > N [40]. Extending these methods to ensemble lengths of thousands that are typical for SR-US would preclude their use in an application that runs in real time. As such, conventional SVD and other PCA methods are not amenable to a real-time frame-by-frame implementation for SR-US.

Additionally, there is a tradeoff between imaging time and resolution of the SR-US image that depends on vessel size [41]. In cases in which real- time imaging is essential, such as in needle-guided procedures, a compromise can be made in favor of shorter imaging time. The resolution will suffer, however, larger vessels will be well perfused and depicted accurately for needle guidance. The perfusion of smaller vessels will be incomplete, yet the MB detected will be localized correctly and over time allow the buildup of a more and more detailed image.

The computation time for the segmentation of an image frame with a patch-based 3DCNN depends on the number of MBs in a frame as well as the level of threshold selected. In this study, a frame rate of 51 Hz was achieved in processing in vivo CEUS images (180 × 260) of tumors with a tissue clutter removal implemented in deep learning, promising for this step in SR-US processing. The frame rate is expected to improve in a fully-convolutional network, which would be more efficient than a patch-based 3DCNN as it would process each frame once, independent of the number of potential MB, and thus number of patches, in each frame. Further, spatiotemporal filtering is one of the computationally intensive steps in SR-US, and future studies will address the localization step, which took more than fifteen-fold longer in this study, in order to achieve the real-time performance goal for overall SR-US.

SVD processing collapses the two dimensions of space into a single dimension prior to the matrix decompositions step, potentially eliminating the row to row correlation of an image. Both horizontal and vertical spatial dimensions are preserved in the 3DCNN allowing features to be extracted based on both row to row correlation and on column to column correlation which may be an advantage for this approach. Deeper network architectures to capture more of the complexity of spatiotemporal filtering, other tissue clutter or background clutter removal techniques, and alternative approaches such as highly expressive variational autoencoders that could improve accuracy are areas for future study.

The problem of generalizability should be considered for both spatiotemporal filtering methods. Spatiotemporal filtering with SVD relies on thresholds and other parameters that require manual tuning to obtain optimal results for each CEUS dataset captured. A neural network does not require such parameters, which may be seen as an advantage. However, the question of whether tuning of the network would be required for a change of transducer or a different tissue type is relevant. In recent work, a deep network trained on in vitro data collected by a Verasonics Vantage 256 equipped with a L11–4v linear transducer at a frame rate of 300 Hz was used to successfully segment MBs from in vivo CEUS data collected with a Siemens Sequoia 512 clinical US machine equipped with a 15L8 linear transducer [31]. For the problem of MB segmentation, the deep network may be robust to changes in transducers and tissue types and quite generalizable.

Likewise, preliminary studies have shown this deep learning technique applied to CEUS images can be applied to B-mode images. B-mode ultrasound imaging has inherently higher signal-to-noise ratio (SNR), whereas nonlinear CEUS may have higher contrast-to-noise ratio (CNR). Both CNR and SNR improvements that may be advantageous in SR-US [5]. The amount of tissue clutter signal is reduced in CEUS as compared to B-mode imaging, but it remains an obstacle to SR-US based on CEUS as tissue has a substantial nonlinear signal that is helpful in imaging difficult patients [42]. The effective SVD thresholds for CEUS and B-mode will be different and lower for CEUS. In future work, the results from applying the deep learning SR-US spatiotemporal filtering technique to CEUS and B-mode imaging will be compared more extensively.

It is understood that trained neural networks will learn, and thus share, the deficiencies of the ground truth method used to train them. Improved spatiotemporal filtering methods or background suppression, such as robust matrix completion might be used as ground truth for 3DCNN training to improve results [43]. Additionally, simulated data with perfect ground truth could be used in training to improve accuracy.

The type of tumors in this study are well vascularized at the periphery and become necrotic over time starting from the core outwards. The additional MB detections in the reference image in such highly vascularized regions obscure the structure of the underlying vessels. The images from the neural network approach have fewer overall MB detections which is thought to be from a combination of factors. One factor may be that the network is not sufficiently complex and the other that the neural network approach is better at noise reduction, as seen by Van Sloun [29]. This is something that may be exploitable in future work and lead to superior results in spatiotemporal filtering from deep learning.

V. Conclusion

This study demonstrated the ability of deep learning to act as a spatiotemporal filter in SR-US and to visualize vessels as small as 25 μm, which is comparable to the use of SVD spatiotemporal filtering in SR-US. The high performance of a trained deep network shows promise in supporting a real-time SR-US application visualizing microvasculature below the diffraction limit. Additionally, the study shows that transfer learning based on a deep network trained on in vitro data is effective in the segmentation of MBs from in vivo data, accelerating the development of CEUS applications.

Acknowledgments

This research was supported in part by NIH grant R01EB025841 and Texas CPRIT award RP180670.

Biography

graphic file with name nihms-1624145-b0009.gif Debabrata Ghosh is an Assistant Professor in the Department of Electronics & Communication Engineering at Thapar University since 2018. He received his B.E. degree in Electronics & Instrumentation Engineering in 2004. Following that, he received his MSc degree in Satellite Communication & Space Systems from University of Sussex in 2006 and Ph.D. degrees in Electrical Engineering from University of North Dakota in 2015. Since 2016 to 2018, he worked as a Postdoctoral researcher in the Radiology Department at UT Southwestern Medical Center and in the Bioengineering Department at UT Dallas. His research interests include developing new imaging algorithms for ultrasound-based medical applications that impact patient care.

graphic file with name nihms-1624145-b0010.gif Katherine G. Brown received her BSEE from Stanford University, an MSEE from the University of Washington, Seattle, and an MBA from Southern Methodist University. She spent nearly 30 years in corporate America in engineering and management roles, including Boeing, Siemens Ultrasound, and nineteen years at Texas Instruments. She is a PhD candidate in Bioengineering at the University of Texas at Dallas working under Dr. Kenneth Hoyt. Her research focuses on super-resolution ultrasound imaging in cancer.

graphic file with name nihms-1624145-b0011.gif Kenneth Hoyt is an Associate Professor in the Department of Bioengineering at the University of Texas and Dallas and Department of Radiology at the University of Texas Southwestern Medical Center. He has been an IEEE Member since 1999. He received a B.S. degree in Electrical Engineering from Drexel University (Philadelphia, PA) in 2001, followed by M.S. and Ph.D. degrees in Biomedical Engineering in 2004 and 2005, respectively, from the same institution. He did a postdoctoral fellowship in the Department of Electrical and Computer Engineering at the University of Rochester. Dr. Hoyt was faculty in the Department of Radiology at the University of Alabama at Birmingham (UAB) from 2008 to 2015. During this tenure he also received an M.B.A. degree from the School of Business (2011). Dr. Hoyt was elected fellow of the American Institute of Ultrasound in Medicine (AIUM) in 2014. In short, Dr. Hoyt’s research focuses on the development of novel ultrasound imaging strategies for improved human disease management (e.g., cancer and diabetes).

Contributor Information

Katherine G. Brown, Department of Bioengineering, Univ. of Texas at Dallas, 800 W. Campbell Rd., Richardson, TX 75080.

Debabrata Ghosh, Department of Electronics and Communication Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab.

Kenneth Hoyt, Department of Bioengineering, Univ. of Texas at Dallas, 800 W. Campbell Rd. BSB 13.929, Richardson, TX 75080 75235.

References

[1].Ferrara KW, Merritt CR, Burns PN, Foster FS, Mattrey RF, and Wickline SA, “Evaluation of tumor angiogenesis with US: imaging, Doppler, and contrast agents,” Academic radiology, vol. 7, no. 10, pp. 824–839, 2000. [DOI] [PubMed] [Google Scholar]
[2].Hoyt K, Umphrey H, Lockhart M, Robbin M, and Forero-Torres A, “Ultrasound imaging of breast tumor perfusion and neovascular morphology,” Ultrasound in Medicine and Biology, vol. 41, no. 9, pp. 2292–2302, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Saini R and Hoyt K, “Recent developments in dynamic contrast-enhanced ultrasound imaging of tumor angiogenesis,” Imaging in medicine, vol. 6, no. 1, p. 41, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Errico C et al. , “Ultrafast ultrasound localization microscopy for deep super-resolution vascular imaging,” Nature, vol. 527, no. 7579, pp. 499–502, Nov. 2015, doi: 10.1038/nature16066. [DOI] [PubMed] [Google Scholar]
[5].Couture O, Hingot V, Heiles B, Muleki-Seya P, and Tanter M, “Ultrasound localization microscopy and super-resolution: A state of the art,” IEEE transactions on ultrasonics, ferroelectrics, and frequency control, vol. 65, no. 8, pp. 1304–1320, 2018. [DOI] [PubMed] [Google Scholar]
[6].Tanigaki K et al. , “Hyposialylated IgG activates endothelial IgG receptor Fc RIIB to promote obesity-induced insulin resistance,” J Clin Invest, vol. 128, no. 1, pp. 309–322, Jan. 2018, doi: 10.1172/JCI89333. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Ghosh D, Xiong F, Sirsi SR, Shaul PW, Mattrey RF, and Hoyt K, “Toward optimization of in vivo super-resolution ultrasound imaging using size-selected microbubble contrast agents,” Medical Physics, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Ghosh D et al. , “Super-resolution ultrasound imaging of skeletal muscle microvascular dysfunction in an animal model of type 2 diabetes,” Journal of Ultrasound in Medicine, vol. 38, no. 10, pp. 2589–2599, 2019, doi: 10.1002/jum.14956. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Lin F, Shelton SE, Espíndola D, Rojas JD, Pinton G, and Dayton PA, “3-D ultrasound localization microscopy for identifying microvascular morphology features of tumor angiogenesis at a resolution beyond the diffraction limit of conventional ultrasound,” Theranostics, vol. 7, no. 1, p. 196, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Foiret J, Zhang H, Ilovitsh T, Mahakian L, Tam S, and Ferrara KW, “Ultrasound localization microscopy to image and assess microvasculature in a rat kidney,” Scientific Reports, vol. 7, no. 1, p. 13662, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Ghosh D et al. , “Monitoring early tumor response to vascular targeted therapy using super-resolution ultrasound imaging,” presented at the Ultrasonics Symposium (IUS), 2017 IEEE International, 2017, pp. 1–4. [Google Scholar]
[12].Demené C et al. , “Spatiotemporal clutter filtering of ultrafast ultrasound data highly increases Doppler and fUltrasound sensitivity,” IEEE transactions on medical imaging, vol. 34, no. 11, pp. 2271–2285, 2015. [DOI] [PubMed] [Google Scholar]
[13].Mauldin FW, Lin D, and Hossack JA, “The singular value filter: a general filter design strategy for PCA-based signal separation in medical ultrasound imaging,” IEEE transactions on medical imaging, vol. 30, no. 11, pp. 1951–1964, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Christensen-Jeffries K, Browning RJ, Tang M-X, Dunsby C, and Eckersley RJ, “In vivo acoustic super-resolution and super-resolved velocity mapping using microbubbles,” IEEE transactions on medical imaging, vol. 34, no. 2, pp. 433–440, 2015. [DOI] [PubMed] [Google Scholar]
[15].Herth FJF, Eberhardt R, Vilmann P, Krasnik M, and Ernst A, “Real-time endobronchial ultrasound guided transbronchial needle aspiration for sampling mediastinal lymph nodes,” Thorax, vol. 61, no. 9, pp. 795–798, Sep. 2006, doi: 10.1136/thx.2005.047829. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Chin KJ, Perlas A, Chan VWS, and Brull R, “Needle visualization in ultrasound-guided regional anesthesia: challenges and solutions,” Reg Anesth Pain Med, vol. 33, no. 6, pp. 532–544, Oct. 2008, doi: 10.1136/rapm-00115550-200811000-00005. [DOI] [PubMed] [Google Scholar]
[17].Katharina König, Ulrich Scheipers, Andreas Pesavento, Andreas Lorenz, Helmut Ermert, and Theodor Senge, “Initial experiences with real-time elastography guided biopsies of the prostate,” Journal of Urology, vol. 174, no. 1, pp. 115–117, Jul. 2005, doi: 10.1097/01.ju.0000162043.72294.4a. [DOI] [PubMed] [Google Scholar]
[18].Katsuto Shinohara, Wheeler Thomas M., and Scardino Peter T., “The appearance of prostate cancer on transrectal ultrasonography: correlation of imaging and pathological examinations,” Journal of Urology, vol. 142, no. 1, pp. 76–82, Jul. 1989, doi: 10.1016/S0022-5347(17)38666-4. [DOI] [PubMed] [Google Scholar]
[19].Kaplan I, Oldenburg NE, Meskell P, Blake M, Church P, and Holupka EJ, “Real time MRI-ultrasound image guided stereotactic prostate biopsy,” Magnetic Resonance Imaging, vol. 20, no. 3, pp. 295–299, Apr. 2002, doi: 10.1016/S0730-725X(02)00490-3. [DOI] [PubMed] [Google Scholar]
[20].Song P et al. , “Improved super-resolution ultrasound microvessel imaging with spatiotemporal nonlocal means filtering and bipartite graph-based microbubble tracking,” IEEE transactions on ultrasonics, ferroelectrics, and frequency control, vol. 65, no. 2, pp. 149–167, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Mauldin FW Jr, Dhanaliwala AH, Patil AV, and Hossack JA, “Real-time targeted molecular imaging using singular value spectra properties to isolate the adherent microbubble signal,” Physics in Medicine & Biology, vol. 57, no. 16, p. 5275, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Goodfellow I, Bengio Y, and Courville A, Deep Learning. The MIT Press, 2016. [Google Scholar]
[23].Hornik K, Stinchcombe M, and White H, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, Jan. 1989, doi: 10.1016/0893-6080(89)90020-8. [DOI] [Google Scholar]
[24].LeCun Y, Bengio Y, and Hinton G, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015. [DOI] [PubMed] [Google Scholar]
[25].Ravì D et al. , “Deep learning for health informatics,” IEEE journal of biomedical and health informatics, vol. 21, no. 1, pp. 4–21, 2017. [DOI] [PubMed] [Google Scholar]
[26].Ji S, Xu W, Yang M, and Yu K, “3D convolutional neural networks for human action recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221–231, 2013. [DOI] [PubMed] [Google Scholar]
[27].van Sloun RJ et al. , “Super-resolution ultrasound localization microscopy through deep learning,” arXiv preprint arXiv:180407661, 2018. [DOI] [PubMed] [Google Scholar]
[28].van Sloun RJ, Solomon O, Bruce M, Khaing ZZ, Eldar YC, and Mischi M, “Deep learning for super-resolution vascular ultrasound imaging,” presented at the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 1055–1059. [Google Scholar]
[29].van Sloun RJG, Cohen R, and Eldar YC, “Deep learning in ultrasound imaging,” Proceedings of the IEEE, pp. 1–19, 2019, doi: 10.1109/JPROC.2019.2932116. [DOI] [Google Scholar]
[30].Brown K and Hoyt K, “Deep learning in spatiotemporal filtering for super-resolution ultrasound imaging,” in 2019 IEEE International Ultrasonics Symposium (IUS), 2019, pp. 1114–1117, doi: 10.1109/ULT-SYM.2019.8926282. [DOI] [Google Scholar]
[31].Brown K, Dormer J, Fei B, and Hoyt K, “Deep 3D convolutional neural networks for fast super-resolution ultrasound imaging,” presented at the Medical Imaging 2019: Ultrasonic Imaging and Tomography, 2019, vol. 10955, p. 1095502. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Viessmann O, Eckersley R, Christensen-Jeffries K, Tang M, and Dunsby C, “Acoustic super-resolution with ultrasound and microbubbles,” Physics in Medicine & Biology, vol. 58, no. 18, p. 6447, 2013. [DOI] [PubMed] [Google Scholar]
[33].Cohen R et al. , “Deep convolutional robust PCA with application to ultrasound imaging,” presented at the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 3212–3216. [Google Scholar]
[34].He K, Zhang X, Ren S, and Sun J, “Deep residual learning for image recognition,” presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. [Google Scholar]
[35].Yosinski J, Clune J, Bengio Y, and Lipson H, “How transferable are features in deep neural networks?,” in Advances in Neural Information Processing Systems 27, Ghahramani Z, Welling M, Cortes C, Lawrence ND, and Weinberger KQ, Eds. Curran Associates, Inc., 2014, pp. 3320–3328. [Google Scholar]
[36].Shin H et al. , “Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1285–1298, May 2016, doi: 10.1109/TMI.2016.2528162. [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].Hoo-Chang S et al. , “Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning,” IEEE transactions on medical imaging, vol. 35, no. 5, p. 1285, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Song P et al. , “Accelerated singular value-based ultrasound blood flow clutter filtering with randomized singular value decomposition and randomized spatial downsampling,” IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 64, no. 4, pp. 706–716, Apr. 2017, doi: 10.1109/TUFFC.2017.2665342. [DOI] [PubMed] [Google Scholar]
[39].Chee AJY, Yiu BYS, and Yu ACH, “A GPU-parallelized eigen-based clutter filter framework for ultrasound color flow imaging,” IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 64, no. 1, pp. 150–163, Jan. 2017, doi: 10.1109/TUFFC.2016.2606598. [DOI] [PubMed] [Google Scholar]
[40].Song P, Manduca A, Trzasko JD, and Chen S, “Ultrasound small vessel imaging with block-wise adaptive local clutter filtering,” IEEE Transactions on Medical Imaging, vol. 36, no. 1, pp. 251–262, Jan. 2017, doi: 10.1109/TMI.2016.2605819. [DOI] [PubMed] [Google Scholar]
[41].Hingot V, Errico C, Heiles B, Rahal L, Tanter M, and Couture O, “Microvascular flow dictates the compromise between spatial resolution and acquisition time in Ultrasound Localization Microscopy,” Sci Rep, vol. 9, no. 1, pp. 1–10, Feb. 2019, doi: 10.1038/s41598-018-38349-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
[42].Tranquart F, Grenier N, Eder V, and Pourcelot L, “Clinical use of ultrasound tissue harmonic imaging,” Ultrasound in Medicine & Biology, vol. 25, no. 6, pp. 889–894, Jul. 1999, doi: 10.1016/S0301-5629(99)00060-5. [DOI] [PubMed] [Google Scholar]
[43].Ashikuzzaman M, Belasso C, Gauthier CJ, and Rivaz H, “Suppressing clutter components in ultrasound color flow imaging using robust matrix completion algorithm: simulation and phantom study,” in 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), 2019, pp. 745–749, doi: 10.1109/ISBI.2019.8759543. [DOI] [Google Scholar]

[R1] [1].Ferrara KW, Merritt CR, Burns PN, Foster FS, Mattrey RF, and Wickline SA, “Evaluation of tumor angiogenesis with US: imaging, Doppler, and contrast agents,” Academic radiology, vol. 7, no. 10, pp. 824–839, 2000. [DOI] [PubMed] [Google Scholar]

[R2] [2].Hoyt K, Umphrey H, Lockhart M, Robbin M, and Forero-Torres A, “Ultrasound imaging of breast tumor perfusion and neovascular morphology,” Ultrasound in Medicine and Biology, vol. 41, no. 9, pp. 2292–2302, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Saini R and Hoyt K, “Recent developments in dynamic contrast-enhanced ultrasound imaging of tumor angiogenesis,” Imaging in medicine, vol. 6, no. 1, p. 41, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Errico C et al. , “Ultrafast ultrasound localization microscopy for deep super-resolution vascular imaging,” Nature, vol. 527, no. 7579, pp. 499–502, Nov. 2015, doi: 10.1038/nature16066. [DOI] [PubMed] [Google Scholar]

[R5] [5].Couture O, Hingot V, Heiles B, Muleki-Seya P, and Tanter M, “Ultrasound localization microscopy and super-resolution: A state of the art,” IEEE transactions on ultrasonics, ferroelectrics, and frequency control, vol. 65, no. 8, pp. 1304–1320, 2018. [DOI] [PubMed] [Google Scholar]

[R6] [6].Tanigaki K et al. , “Hyposialylated IgG activates endothelial IgG receptor Fc RIIB to promote obesity-induced insulin resistance,” J Clin Invest, vol. 128, no. 1, pp. 309–322, Jan. 2018, doi: 10.1172/JCI89333. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Ghosh D, Xiong F, Sirsi SR, Shaul PW, Mattrey RF, and Hoyt K, “Toward optimization of in vivo super-resolution ultrasound imaging using size-selected microbubble contrast agents,” Medical Physics, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Ghosh D et al. , “Super-resolution ultrasound imaging of skeletal muscle microvascular dysfunction in an animal model of type 2 diabetes,” Journal of Ultrasound in Medicine, vol. 38, no. 10, pp. 2589–2599, 2019, doi: 10.1002/jum.14956. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Lin F, Shelton SE, Espíndola D, Rojas JD, Pinton G, and Dayton PA, “3-D ultrasound localization microscopy for identifying microvascular morphology features of tumor angiogenesis at a resolution beyond the diffraction limit of conventional ultrasound,” Theranostics, vol. 7, no. 1, p. 196, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Foiret J, Zhang H, Ilovitsh T, Mahakian L, Tam S, and Ferrara KW, “Ultrasound localization microscopy to image and assess microvasculature in a rat kidney,” Scientific Reports, vol. 7, no. 1, p. 13662, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Ghosh D et al. , “Monitoring early tumor response to vascular targeted therapy using super-resolution ultrasound imaging,” presented at the Ultrasonics Symposium (IUS), 2017 IEEE International, 2017, pp. 1–4. [Google Scholar]

[R12] [12].Demené C et al. , “Spatiotemporal clutter filtering of ultrafast ultrasound data highly increases Doppler and fUltrasound sensitivity,” IEEE transactions on medical imaging, vol. 34, no. 11, pp. 2271–2285, 2015. [DOI] [PubMed] [Google Scholar]

[R13] [13].Mauldin FW, Lin D, and Hossack JA, “The singular value filter: a general filter design strategy for PCA-based signal separation in medical ultrasound imaging,” IEEE transactions on medical imaging, vol. 30, no. 11, pp. 1951–1964, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Christensen-Jeffries K, Browning RJ, Tang M-X, Dunsby C, and Eckersley RJ, “In vivo acoustic super-resolution and super-resolved velocity mapping using microbubbles,” IEEE transactions on medical imaging, vol. 34, no. 2, pp. 433–440, 2015. [DOI] [PubMed] [Google Scholar]

[R15] [15].Herth FJF, Eberhardt R, Vilmann P, Krasnik M, and Ernst A, “Real-time endobronchial ultrasound guided transbronchial needle aspiration for sampling mediastinal lymph nodes,” Thorax, vol. 61, no. 9, pp. 795–798, Sep. 2006, doi: 10.1136/thx.2005.047829. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Chin KJ, Perlas A, Chan VWS, and Brull R, “Needle visualization in ultrasound-guided regional anesthesia: challenges and solutions,” Reg Anesth Pain Med, vol. 33, no. 6, pp. 532–544, Oct. 2008, doi: 10.1136/rapm-00115550-200811000-00005. [DOI] [PubMed] [Google Scholar]

[R17] [17].Katharina König, Ulrich Scheipers, Andreas Pesavento, Andreas Lorenz, Helmut Ermert, and Theodor Senge, “Initial experiences with real-time elastography guided biopsies of the prostate,” Journal of Urology, vol. 174, no. 1, pp. 115–117, Jul. 2005, doi: 10.1097/01.ju.0000162043.72294.4a. [DOI] [PubMed] [Google Scholar]

[R18] [18].Katsuto Shinohara, Wheeler Thomas M., and Scardino Peter T., “The appearance of prostate cancer on transrectal ultrasonography: correlation of imaging and pathological examinations,” Journal of Urology, vol. 142, no. 1, pp. 76–82, Jul. 1989, doi: 10.1016/S0022-5347(17)38666-4. [DOI] [PubMed] [Google Scholar]

[R19] [19].Kaplan I, Oldenburg NE, Meskell P, Blake M, Church P, and Holupka EJ, “Real time MRI-ultrasound image guided stereotactic prostate biopsy,” Magnetic Resonance Imaging, vol. 20, no. 3, pp. 295–299, Apr. 2002, doi: 10.1016/S0730-725X(02)00490-3. [DOI] [PubMed] [Google Scholar]

[R20] [20].Song P et al. , “Improved super-resolution ultrasound microvessel imaging with spatiotemporal nonlocal means filtering and bipartite graph-based microbubble tracking,” IEEE transactions on ultrasonics, ferroelectrics, and frequency control, vol. 65, no. 2, pp. 149–167, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Mauldin FW Jr, Dhanaliwala AH, Patil AV, and Hossack JA, “Real-time targeted molecular imaging using singular value spectra properties to isolate the adherent microbubble signal,” Physics in Medicine & Biology, vol. 57, no. 16, p. 5275, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Goodfellow I, Bengio Y, and Courville A, Deep Learning. The MIT Press, 2016. [Google Scholar]

[R23] [23].Hornik K, Stinchcombe M, and White H, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, Jan. 1989, doi: 10.1016/0893-6080(89)90020-8. [DOI] [Google Scholar]

[R24] [24].LeCun Y, Bengio Y, and Hinton G, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015. [DOI] [PubMed] [Google Scholar]

[R25] [25].Ravì D et al. , “Deep learning for health informatics,” IEEE journal of biomedical and health informatics, vol. 21, no. 1, pp. 4–21, 2017. [DOI] [PubMed] [Google Scholar]

[R26] [26].Ji S, Xu W, Yang M, and Yu K, “3D convolutional neural networks for human action recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221–231, 2013. [DOI] [PubMed] [Google Scholar]

[R27] [27].van Sloun RJ et al. , “Super-resolution ultrasound localization microscopy through deep learning,” arXiv preprint arXiv:180407661, 2018. [DOI] [PubMed] [Google Scholar]

[R28] [28].van Sloun RJ, Solomon O, Bruce M, Khaing ZZ, Eldar YC, and Mischi M, “Deep learning for super-resolution vascular ultrasound imaging,” presented at the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 1055–1059. [Google Scholar]

[R29] [29].van Sloun RJG, Cohen R, and Eldar YC, “Deep learning in ultrasound imaging,” Proceedings of the IEEE, pp. 1–19, 2019, doi: 10.1109/JPROC.2019.2932116. [DOI] [Google Scholar]

[R30] [30].Brown K and Hoyt K, “Deep learning in spatiotemporal filtering for super-resolution ultrasound imaging,” in 2019 IEEE International Ultrasonics Symposium (IUS), 2019, pp. 1114–1117, doi: 10.1109/ULT-SYM.2019.8926282. [DOI] [Google Scholar]

[R31] [31].Brown K, Dormer J, Fei B, and Hoyt K, “Deep 3D convolutional neural networks for fast super-resolution ultrasound imaging,” presented at the Medical Imaging 2019: Ultrasonic Imaging and Tomography, 2019, vol. 10955, p. 1095502. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Viessmann O, Eckersley R, Christensen-Jeffries K, Tang M, and Dunsby C, “Acoustic super-resolution with ultrasound and microbubbles,” Physics in Medicine & Biology, vol. 58, no. 18, p. 6447, 2013. [DOI] [PubMed] [Google Scholar]

[R33] [33].Cohen R et al. , “Deep convolutional robust PCA with application to ultrasound imaging,” presented at the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 3212–3216. [Google Scholar]

[R34] [34].He K, Zhang X, Ren S, and Sun J, “Deep residual learning for image recognition,” presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. [Google Scholar]

[R35] [35].Yosinski J, Clune J, Bengio Y, and Lipson H, “How transferable are features in deep neural networks?,” in Advances in Neural Information Processing Systems 27, Ghahramani Z, Welling M, Cortes C, Lawrence ND, and Weinberger KQ, Eds. Curran Associates, Inc., 2014, pp. 3320–3328. [Google Scholar]

[R36] [36].Shin H et al. , “Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1285–1298, May 2016, doi: 10.1109/TMI.2016.2528162. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] [37].Hoo-Chang S et al. , “Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning,” IEEE transactions on medical imaging, vol. 35, no. 5, p. 1285, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] [38].Song P et al. , “Accelerated singular value-based ultrasound blood flow clutter filtering with randomized singular value decomposition and randomized spatial downsampling,” IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 64, no. 4, pp. 706–716, Apr. 2017, doi: 10.1109/TUFFC.2017.2665342. [DOI] [PubMed] [Google Scholar]

[R39] [39].Chee AJY, Yiu BYS, and Yu ACH, “A GPU-parallelized eigen-based clutter filter framework for ultrasound color flow imaging,” IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 64, no. 1, pp. 150–163, Jan. 2017, doi: 10.1109/TUFFC.2016.2606598. [DOI] [PubMed] [Google Scholar]

[R40] [40].Song P, Manduca A, Trzasko JD, and Chen S, “Ultrasound small vessel imaging with block-wise adaptive local clutter filtering,” IEEE Transactions on Medical Imaging, vol. 36, no. 1, pp. 251–262, Jan. 2017, doi: 10.1109/TMI.2016.2605819. [DOI] [PubMed] [Google Scholar]

[R41] [41].Hingot V, Errico C, Heiles B, Rahal L, Tanter M, and Couture O, “Microvascular flow dictates the compromise between spatial resolution and acquisition time in Ultrasound Localization Microscopy,” Sci Rep, vol. 9, no. 1, pp. 1–10, Feb. 2019, doi: 10.1038/s41598-018-38349-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] [42].Tranquart F, Grenier N, Eder V, and Pourcelot L, “Clinical use of ultrasound tissue harmonic imaging,” Ultrasound in Medicine & Biology, vol. 25, no. 6, pp. 889–894, Jul. 1999, doi: 10.1016/S0301-5629(99)00060-5. [DOI] [PubMed] [Google Scholar]

[R43] [43].Ashikuzzaman M, Belasso C, Gauthier CJ, and Rivaz H, “Suppressing clutter components in ultrasound color flow imaging using robust matrix completion algorithm: simulation and phantom study,” in 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), 2019, pp. 745–749, doi: 10.1109/ISBI.2019.8759543. [DOI] [Google Scholar]

PERMALINK

Deep learning of spatiotemporal filtering for fast super-resolution ultrasound imaging

Katherine G Brown

Debabrata Ghosh

Kenneth Hoyt

Roles

Abstract

I. INTRODUCTION