Abstract
Purpose
Transrectal ultrasound (TRUS) is a versatile and real‐time imaging modality that is commonly used in image‐guided prostate cancer interventions (e.g., biopsy and brachytherapy). Accurate segmentation of the prostate is key to biopsy needle placement, brachytherapy treatment planning, and motion management. Manual segmentation during these interventions is time‐consuming and subject to inter‐ and intraobserver variation. To address these drawbacks, we aimed to develop a deep learning‐based method which integrates deep supervision into a three‐dimensional (3D) patch‐based V‐Net for prostate segmentation.
Methods and materials
We developed a multidirectional deep‐learning‐based method to automatically segment the prostate for ultrasound‐guided radiation therapy. A 3D supervision mechanism is integrated into the V‐Net stages to deal with the optimization difficulties when training a deep network with limited training data. We combine a binary cross‐entropy (BCE) loss and a batch‐based Dice loss into the stage‐wise hybrid loss function for a deep supervision training. During the segmentation stage, the patches are extracted from the newly acquired ultrasound image as the input of the well‐trained network and the well‐trained network adaptively labels the prostate tissue. The final segmented prostate volume is reconstructed using patch fusion and further refined through a contour refinement processing.
Results
Forty‐four patients' TRUS images were used to test our segmentation method. Our segmentation results were compared with the manually segmented contours (ground truth). The mean prostate volume Dice similarity coefficient (DSC), Hausdorff distance (HD), mean surface distance (MSD), and residual mean surface distance (RMSD) were 0.92 ± 0.03, 3.94 ± 1.55, 0.60 ± 0.23, and 0.90 ± 0.38 mm, respectively.
Conclusion
We developed a novel deeply supervised deep learning‐based approach with reliable contour refinement to automatically segment the TRUS prostate, demonstrated its clinical feasibility, and validated its accuracy compared to manual segmentation. The proposed technique could be a useful tool for diagnostic and therapeutic applications in prostate cancer.
Keywords: deeply supervised network, deep learning, prostate segmentation, transrectal ultrasound (TRUS)
1. Introduction
Prostate cancer is the second leading cause of cancer‐related death in men in the United States.1 Transrectal ultrasound (TRUS) is a standard imaging modality for image‐guided prostate‐cancer procedures (e.g., biopsy and brachytherapy).2 Accurate segmentation of the prostate plays a key role in biopsy needle placement, radiotherapy treatment planning, and motion monitoring.3 Manual segmentation during biopsy or radiation therapy planning can be time‐consuming and subject to inter‐ and intraobserver variation.4 As ultrasound images have a relatively low signal‐to‐noise ratio (SNR), automated segmentation of the prostate is challenging.
Recently, a number of techniques have been developed to segment the prostate from TRUS images. The current TRUS segmentation techniques can be briefly summarized as the following:
Non‐machine‐learning‐based methods include contour and shape‐based methods and region‐based methods.5 Contour and shape‐based methods segment the prostate based on boundary information,6 which can be affected by ambiguous boundaries of the prostate apex and base in TRUS images. Prior shape information has been applied to address this issue.7, 8 Region‐based methods use predominant intensity distributions of the prostate region to segment the TRUS contour, which are affected by the speckle noise in TRUS images. In addition, there are many non‐machine‐learning‐based methods including atlas‐based,9, 10 graph‐based,11, 12 or level sets‐based methods.13, 14 Ghose et al. reviewed these methods in detail.5
Machine learning‐based methods cluster the TRUS voxels into prostate and non‐prostate tissues based on different leaning‐based models. These methods can be classified into two types, either using unsupervised or supervised models. The unsupervised methods perform tissue classification based on TRUS contour information,15, 16 and shape priority.17 The supervised methods train a classifier using a set of training data with their associated labels (prostate or non‐prostate) and then the well‐trained classifier performs the segmentation for a newly acquired ultrasound image.18, 19 The supervised methods can be grouped into support vector machines (SVM)‐based, random forest‐based, and the deep learning‐based methods. The SVM‐based and random forest‐based methods use TRUS contour boundary information, such as texture features or shape statistic information, to train a SVM or random forest classifier for future segmentation.20, 21, 22, 23, 24, 25 To address the problem that traditional machine‐learning‐based methods are challenging to handcrafted, high‐dimensional, and ill‐posed mapping from TRUS image to binary segmentation, a deep learning method has been introduced into medical image segmentation.26, 27, 28, 29, 30 Yang et al. incorporated an auto‐context model into recurrent neural networks to deal with the severe boundary incompleteness and enhanced the performance in prostate boundary delineation.31 However, this method was based on two‐dimensional (2D) patch inputs, which lacks spatial information and thus results in ambiguous boundary segmentation in low‐contrast regions such as the prostate apex and base. Ghavami et al. proposed a method based on an improved convolutional neural networks (CNN) for 2D and three‐dimensional (3D) TRUS images' prostate segmentation.32 However, due to the lack of a stage‐wise deep supervision, training such a network with limited patient data is difficult when all the convolutional kernels of each stage are optimized only based on a loss function at the final stage. Zhu et al. involved a deep supervision strategy into CNN for prostate segmentation.33 Zeng et al. utilized magnetic resonance imaging priors for TRUS prostate segmentation.34
In this work, we proposed a multidirectional and multiderivative deep learning‐based method to automatically segment the prostate. The contributions of the paper are as follows:
To cope with the optimization difficulties of training the deep learning‐based model with limited training data, a deep supervision strategy with a hybrid loss function (logistic and Dice loss) was introduced to the different stages. These mechanisms could make the residual information semantically meaningful for the early stages and final stage in the network, and thus reduce convergence time and improve the segmentation performance of the network when training with limited patient data.
To reduce possible segmentation errors at the prostate apex and base in TRUS images, we introduced a multidirectional‐based contour refinement model to fuse transverse, sagittal, and coronal plane‐based segmentation.
The multiderivative images including the 3D original TRUS and multiple filtered images were used as multichannel samples to train the proposed network, which could perform a cross‐modality feature learning to enhance the networks' capacity and performance.35
The paper is organized as follows: We first provide an overview of the proposed TRUS prostate segmentation framework in Materials and Methods, followed by detailed description of the multiderivative preprocessing, deeply supervised V‐Net, and then multidirectional‐based contour refinement post‐processing. We evaluated the proposed method through a comparison with state‐of‐art segmentation methods U‐Net36 and V‐Net26 and verified their performance using clinical data. Finally, along with an extended discussion, we conclude the presentation of our novel TRUS prostate segmentation framework.
2. Materials and methods
2.A. Overview
The proposed prostate segmentation method consists of a training stage and a segmentation stage. For a given pair of TRUS image, the corresponding manual contour was used as the learning‐based target. Both TRUS images from the training and testing sets were preprocessed to remove noise by bias correction and a despeckling method.37 Multidirectional‐based deep learning networks were trained using images in the transverse, sagittal, and coronal planes. For each plane, TRUS images were filtered by 3D Gaussian, mean, and median filters. The filtered images, together with original images were used to constitute a 4‐channel image data, or multiderivative‐based data. A 3D patch‐based V‐Net26 architecture was introduced to enable end‐to‐end learning. A deep supervision strategy38 combined with hybrid loss was used to deeply supervise the network. During the segmentation stage, 3D patches were extracted from multiderivative images of newly acquired images as the input of the well‐trained networks, which performed a patch‐based segmentation. The segmented prostate volume was obtained by patch fusion and was refined by multidirectional‐based contour refinement. Figure 1 outlines the workflow schematic of our segmentation method.
Figure 1.

The schematic flow diagram of the proposed method. [Color figure can be viewed at wileyonlinelibrary.com]
2.B. Multiderivative image
In the training of a segmentation network, extracted features from the entire image can enhance the segmentation accuracy by providing structural, global, and spatial information. However, it is typically not feasible to use whole 3D TRUS images due to the computational cost. In order to address the challenge of large‐scale TRUS data, serial 3D image patches were extracted for the training. Moreover, for a successful segmentation task, it is important for a network to extract more and deep features of prostate in TRUS images. Thus, handcrafted filters, for example, 3D mean, median, and Gaussian filters were introduced to enhance the feature extraction of the prostate boundary. This was done through smoothing and denoising the original TRUS images to reduce the uninformative artifact texture in the original TRUS images, as shown in Fig. 2. Then, 3D patches of the original TRUS image and its multiderivative counterparts were used as the multichannel input samples to train our network.
Figure 2.

The multiderivative images and the corresponding prostate. (a1–a3) TRUS images represented in transverse plane, sagittal plane, and coronal plane. (b1–b3), (c1–c3), (d1–d3) Corresponding images generated by 3D Gaussian filter, 3D mean filter, and 3D median filter. (e1–e3) Corresponding prostate masks based on physicians' manual contours. The display window for (a–d) is [0, 200].
2.C. Deeply supervised V‐Net
Our proposed network architecture (yellow part of Fig. 1) was inspired by a well‐known end‐to‐end V‐Net.26 The principle advantage of a V‐Net is its ability to perform voxel‐wise error back‐propagation during the training stage, and generate a segmented patch with the same size as the input patch during testing.26
As shown in Fig. 1, the network consists of compression and decompression paths, and a bridge path that connects these two. The compression path is constructed by 1, 2, or 3 convolutional layers, which are called convolutional components, followed by a “down” convolutional layer to reduce the resolution. The decompression path is constructed by 1, 2, or 3 convolutional layers and followed by an “up” convolutional layer to enhance the resolution. The bridge path concatenates the feature maps from equal‐sized compression and decompression paths.
From up to down, the network is grouped as five stages with different resolutions. Each stage consists of a compression path, a bridge path, a decompression path, a soft‐max operator, and a threshold to binarize the output (with 0 and 1 denoting prostate and non‐prostate regions, respectively). Assume input patch size of our network is , where and denote the length and width of slice in our work, denotes the number of slices. From up to down, the first stage generates feature maps with a size of , and is the final stage. The second stage generates feature maps with a size of , and is the high‐resolution stage. The third stage generates feature maps with a size of , and is the modest‐resolution stage. The fourth stage generates feature maps with a size of , and is the low‐resolution stage. The last stage's output size is , which we denote the last stage as the bridge stage. The kernel size and stride size of the “down” convolutional layer and the corresponding “up” convolutional layer in the final, and the high‐resolution stages are both set as 2 × 2 × 2. The kernel size and stride size of the convolutional component in the final and the high‐resolution stages are set as 3 × 3 × 3 and 1 × 1 × 1, respectively. Since the depth of patch is reduced to one in the last three stages, the kernel size and stride size of the “down” convolutional layer and the corresponding “up” convolutional layer in the modest‐resolution, low‐resolution, and bridge stages are set as 2 × 2 × 1 and 2 × 2 × 1, respectively. The kernel size and stride size of the convolutional component in the modest‐resolution, low‐resolution, and bridge stages are set as 3 × 3 × 1 and 1 × 1 × 1, respectively.
2.C.1. Deep supervision
To cope with the optimization difficulties when training a deep network with limited data, we incorporated deep supervision33, 38 into a V‐Net, as shown in green part of Fig. 1. Since the output sizes of these stages are equal to the original input size, for the final and high‐resolution stages, an “up” convolution operator was not needed to retrieve the image size. Since the patch is downsampled by factors of two and four, in order to obtain an equal‐size output in the modest‐resolution and low‐resolution stages, these two stages are followed by one or two more “up” convolution operators, and then followed by a soft‐max and threshold operator to obtain the equal‐size segmentation.
2.C.2. Hybrid loss function
Recent work has used either logistic or Dice loss as loss functions in their networks.26, 36 We proposed to combine logistic loss which is used to measure dissimilarity, with Dice loss which is used to measure similarity, into a hybrid loss function to supervise our network at four stages. Voxel‐wise binary cross‐entropy (BCE) loss is commonly used as a logistic loss. Since the segmentation task can be regarded as a binary regression, we used the voxel‐wise BCE loss as the logistic loss function. The BCE loss is defined as follows:
| (1) |
where denotes the voxel value in manual segmentation, denotes the voxel value of the automatic segmentation generated in final, high, modest, and low‐resolution stages. denotes the th voxel. or equal to 1 indicates the voxel belongs to the segmented prostate, vice versa.
A Dice similarity coefficient (DSC) loss26 is also introduced, which is defined as:
| (2) |
where denotes the prostate mask from manual contour, and denotes the prostate mask from automatic segmentation in final, high‐, modest‐ and low‐resolution stages.
Combining the above two loss functions, the hybrid loss function for deep supervision at the different stages is defined as follows:
| (3) |
where denotes the stage of our network. denotes the regularization weights of each stage's loss, and is set by , empirically. Since the resolution difference of each stage, we set . is a balancing parameter, which balances the BCE and DSC losses.
2.D. Multidirectional‐based contour refinement
As stated in Section Conclusion.B, if we used only input slices from the transverse plane, the prostate apex and base cannot be accurately identified on TRUS. To address this issue, we propose a post‐processing method called multidirectional‐based model to automatically refine the segmentation results.
Suppose , , and are the three segmentations from the transverse, sagittal, and coronal planes. Since the refinement's range‐of‐view occurs in the apex and base of the prostate, that is, around the poles of the contour surface in the z‐axis. Thus, we cast rays from prostate contour's central position . These rays are generated by restricting the variation range of the polar angle along the z‐axis, as shown in Fig. 3. The orange lines of Fig. 3(c) shows the distribution range of rays, where denotes the polar ray's polar angle, the range of is [−30°, 30°], denotes the polar ray's azimuthal angle, the range of is [0°, 360°). The orange lines of Figs. 3(a) and 3(b) show the distribution of rays in the sagittal and coronal planes, respectively.
Figure 3.

The visual result of polar rays' distribution. (a) and (b) Polar rays' distribution by orange line in sagittal plane and coronal plane, where the red lines show the manual contour of prostate. (c) Distribution of polar rays in both polar coordinate and Cartesian coordinate. The display window for (a) and (b) is [0, 200]. [Color figure can be viewed at wileyonlinelibrary.com]
As shown in Figs. 3(a) and 3(b), these polar rays have intersection points with the surface of prostate contour. We denote the intersection coordinates of segment contours , and with ray as , and , . The multidirectional‐based contour refinement is performed by calculating the distance between these three contours. The distance is defined as follows:
| (4) |
where denotes the Euclidean distance between the coordinates and . Then, inspired by Ding's work,39 the multidirectional‐based contour refinement is implemented differently based on this distance. The refinement is given as follows:
| (5) |
where is a fixed threshold, as previously recommended by Ding et al.39 If , we assume that the three contours at this point are well‐matched, the final segmentation is refined by the average of the three contours. Otherwise, we assume that the three contours at this point are not well‐matched, and the final segmentation is refined by the average of contours generated from sagittal and coronal planes. Finally, the refined segmentation around apex and base is obtained by mesh smoothing on these points.
2.E. Dataset and quantitative measurements
We tested the proposed method using 44 patients' TRUS data. All were acquired using a Hitachi ultrasound scanner with a 7.5‐MHz biplane probe. Each 3D TRUS image is composed of 1024 × 768 × 216 voxels. The voxel size is 0.12 × 0.12 × 1.0 mm3. We used leave‐one‐out cross‐validation methods to evaluate the proposed segmentation algorithm. Specifically, we excluded one patient from dataset for training our deep learning‐based segmentation model. During training stage, one fivefold cross‐validation was used to train the model, that is, the random selected 80% patch samples were used to train the model, and the rest 20% patch samples were used for validation. After training, this excluded patient's TRUS image was used for segmentation test. Our segmentation results were compared with the manually created contours. All manual prostate glands contours were created on TRUS images by an experienced physician. We calculated the DSC, precision score, recall score, Hausdorff distance (HD), mean surface distance (MSD), and the residual mean square distance (RMSD) between the two contours to evaluate the accuracy of our segmentation method. The DSC, precision, and recall scores are used to quantify volume similarity between two contours. The HD, MSD, and RMSD metrics are used to quantify boundary similarity between two surfaces. Generally, more accurate segmentation results are associated with lower HD, MSD, and RMSD scores and higher DSC, precision, and recall scores. Due to low contrast, the most challenging regions in TRUS prostate segmentation are the base and apex. We also conducted a regional analysis to measure errors in the base and apex sections of the prostate.
In order to illustrate the significant improvement of our proposed step‐by‐step enhancement, a paired two‐tailed t‐test were used for comparison of the outcomes between two numerical results groups calculated from all patients' data. To furfure evaluate the performance of matrices (DSC, precision, recall, HD, MSD, and RMSD), we computed the corrected p‐values through the Holm‐Bonferroni method, for which we set α < 0.05.40
2.F. Parameter performance
In general, segmentation performance can be improved by one of several parameters: larger patch size, larger batch size, and more epochs. There is a tradeoff between segmentation performance and the computation complexity and memory requirements raised by these parameters. In order balance competing demands, we empirically set the initial patch size as 256 × 256 × 4, the number of epochs to 180, and the batch size to 40.
To test the influence of weighting parameter and balancing parameter in our proposed hybrid loss function, we fixed the parameters setting, detailed in Appendix Table S1, and tested these two parameters. Fourfold cross‐validation was used for this evaluation. Appendix Fig. S1 plots the averaged DSC as a function of these two parameters, which illustrates that = 0.8 and = 2 are adequate for our TRUS prostate segmentation. We set the number of rays as 360 to make the refined contours smooth and more detailed at boundaries.
3. Results
3.A. Comparison between multiderivative‐based and single image input
To evaluate the influence of multiderivative‐based image input, we compared the convergence of our proposed algorithm using the original image and the multiderivative‐based images as an input. Appendix Fig. S2 shows the average DSC and loss convergence curve of these two methods. The mean DSC based on a batch converges faster with the multiderivative‐based input, especially if the epoch value is <40. Figure S2 also shows the best DSC and loss, and their epoch number of these two algorithms. The epoch number with the best DSC's and loss of a multiderivative‐based input is much smaller than with a single input.
Compares the segmentation produced with the proposed method using a single (original image) or multiderivative‐based image input. Because more informative and structural features can be more easily captured from a multiderivative‐based input than an original image input, the final segmented prostate [Appendix Fig. S3(b2)] and the prostate probability maps at high‐, modest‐, and low‐resolution stages of multiderivative‐based image input are closer to the manual contour (ground truth) compared with those of an original image input alone. Table 1 quantitatively compares 44 patients' data based on leave‐one‐out cross‐validation, showing the results of multiderivative input are better than the single input in precision, recall, HD, and RMSD.
Table 1.
Quantitative comparison of the proposed deep supervised V‐Net with multiderivative‐based and single input
| Metric | DSC | Precision | Recall | HD (mm) | MSD (mm) | RMSD (mm) |
|---|---|---|---|---|---|---|
| Original | 0.908 ± 0.030 | 0.893 ± 0.058 | 0.927 ± 0.046 | 3.911 ± 1.558 | 0.605 ± 0.228 | 0.904 ± 0.377 |
| Multiderivative‐based | 0.912 ± 0.026 | 0.897 ± 0.056 | 0.930 ± 0.043 | 3.996 ± 1.560 | 0.607 ± 0.228 | 0.907 ± 0.377 |
| P‐value | 0.078 | 0.012 | 0.041 | <0.001 | 0.076 | 0.038 |
DSC, Dice similarity coefficient; HD, Hausdorff distance; MSD, mean surface distance; RMSD, residual mean surface distance.
3.B. Contribution of deep supervision
To demonstrate the efficiency of deep supervision, we compared the results of a V‐Net and our proposed algorithm without contour refinement, that is, deeply supervised V‐Net (DS‐V‐Net). The aim was to show the contribution in three different aspects: (a) the segmented prostate probability maps of four stages, (b) the 3D scatter plot of the first three principle components of randomly selected patches from the probability maps at each stage, and (c) the batch‐based mean DSC convergence.
As shown in Appendix Fig. S4, the probability maps of each stage for the apex, middle, and base of the prostate generated by DS‐V‐Net can identify the prostate boundary with deep supervision. In contrast, the probability maps generated by the traditional V‐Net cannot do the same, especially for the high‐resolution stage.
Figure 4 shows 3D scatter plots of the first three principle components of patch samples in the probability map of each stage. We randomly selected 4000 samples from the prostate region as well as 4000 samples from a non‐prostate region around the prostate boundary, as shown in Fig. 4(a2). From the scatter plots of a V‐Net at final, high‐, modest‐, and low‐resolution stages, as shown in Figs. 4(b1)–4(e1), there can be seen a large overlap between the samples from prostate and non‐prostate regions. It is difficult to directly separate the prostate and non‐prostate samples using a traditional V‐Net. Whereas in Figs. 4(b2)–4(e2), the prostate and non‐prostate samples can be approximately separated by a plane, demonstrating the benefit of using a deep supervision strategy.
Figure 4.

An illustrative example of the benefit of our deeply supervised V‐Net compared with V‐Net without deeply supervision, (a1) Transrectal ultrasound (TRUS) image in transverse plane. (a2) Sample patches' central positions drawn from test TRUS images, where the samples belonging to the prostate are highlighted by green circles and the samples belonging to the non‐prostate are highlighted by red asterisks. (b1–e1) Scatter plots of the first three principle components of corresponding patch samples in probability map at the final, high‐resolution, modest‐resolution, and low‐resolution stages by using a V‐Net, respectively. (b2–e2) Scatter plots of first three principle components of corresponding patch samples in the probability maps at the final, high‐resolution, modest‐resolution, and low‐resolution stages by using our deeply supervised V‐Net (DS‐V‐Net), respectively. The position of the viewer in (b1–e1) and (b2–e2) is azimuth = 20°and elevation = 20°. [Color figure can be viewed at wileyonlinelibrary.com]
The batch‐based mean DSC was used as a metric to compare the convergence of a V‐Net and our DS‐V‐Net (shown in Appendix Fig. S5). The epoch values that our DS‐V‐Net reaches its best DSC are much smaller than the corresponding values that the V‐Net reaches its best DSC for both the training and validation folds. This demonstrates that the deep supervision strategy can accelerate the training convergence of our deep‐learning‐based segmentation. We did not use loss as a metric, because the loss function of a V‐Net is computed only in the final stage, whereas the loss function of our DS‐V‐Net is calculated by the summation of all four stages' loss. Table 2 quantitatively compares our DS‐V‐Net vs the V‐Net based on the leave‐one‐out cross‐validation of 44 patients' data. As shown in Table 2, our DS‐V‐Net significantly improves the DSC, precision, HD, and RMSD over that of the V‐Net.
Table 2.
Quantitative comparison of the proposed deep supervised V‐Net vs a V‐Net without deep supervision
| Metric | DSC | Precision | Recall | HD (mm) | MSD (mm) | RMSD (mm) |
|---|---|---|---|---|---|---|
| V‐Net | 0.905 ± 0.030 | 0.881 ± 0.060 | 0.935 ± 0.043 | 4.643 ± 1.926 | 0.657 ± 0.270 | 0.977 ± 0.377 |
| DS‐V‐Net | 0.912 ± 0.026 | 0.897 ± 0.056 | 0.930 ± 0.043 | 3.996 ± 1.560 | 0.607 ± 0.228 | 0.907 ± 0.377 |
| P‐value | <0.001 | <0.001 | 0.561 | 0.001 | 0.007 | 0.021 |
DSC, Dice similarity coefficient; DS‐V‐Net, deeply supervised V‐Net; HD, Hausdorff distance; MSD, mean surface distance; RMSD, residual mean surface distance.
3.C. Comparison of the loss function
In order to compare the influence of different loss functions, we compared the proposed DS‐V‐Net with the Dice, the BCE, and the hybrid loss functions. Appendix Fig. S6 shows a segmentation comparison from our DS‐V‐Net based on three different loss functions in transverse, sagittal and coronal planes. The segmented prostate using the hybrid loss function most closely resembles the manual segmentation. Moreover, since the manual segmentation is delineated on transverse slices, it can have some gradient and sharp shape regions, as seen in the manual contours in sagittal and coronal planes [Appendix Figs. S6(a4) and S6(a6)]. The segmentation results of our DS‐V‐Net based on the hybrid loss function [Appendix Figs. S6(b6) and S6(b9)] can also smooth the edge of prostate contour, whereas DS‐V‐Net based on a BCE or Dice loss function cannot, especially for a BCE loss function alone. Appendix Table S2 quantitatively compares the performance of the proposed DS‐V‐Net with that of three different loss functions, showing that our DS‐V‐Net with the hybrid loss function slightly better than DS‐V‐Net with BCE or DSC loss functions. Thus, we adopted the hybrid loss as the loss function for our method.
3.D. Contribution of multidirectional‐based contour refinement
The main challenge in TRUS prostate segmentation is to accurately delineate the prostate at the apex and base. We compared segmentation results from our proposed DS‐V‐Net with and without contour refinement to evaluate the influence of multidirectional‐based refinement, especially at the prostate apex and base. Figure 5 shows our segmentation results with and without a multidirectional‐based refinement. The segmentation generated from only the transverse plane has discontinuous boundaries around the apex and base regions [Fig. 5(c1)–5(c2)], whereas the segmentation generated by sagittal and coronal planes can keep the continuity of the contour boundary [Figs. 5(d1)–5(d2), 5(e1)–5(e2)]. In addition, as shown in Figs. 5(a3)–5(f3) and 5(a4)–5(f4), the boundaries of manual contour are not well‐matched to the original TRUS images, thus the segmentation results based on any one of three planes introduce ambiguous boundaries. Even in these situations, the proposed DS‐V‐Net with a contour refinement (DS‐CR‐V‐Net) can still maintain a reasonable and smooth boundary. Table 3 quantitatively compares segmentation results with and without contour refinement based on leave‐one‐out cross‐validation of 44 patients' data.
Figure 5.

Comparison of segmented prostates with and without multidirectional‐based refinement. (a1–a2) Transrectal ultrasound (TRUS) images shown in sagittal plane, (b1–f1) and (b2–f2) corresponding manually contoured prostates, the segmented prostate from transverse, coronal, and sagittal planes, and the segmented prostate with the multidirectional‐based refinement, respectively. (a3–a4) TRUS images shown in coronal plane, (b3–f3) and (b4–f4) Corresponding manual contour, the segmented prostate from only transverse, coronal, and sagittal planes, and the segmented prostate with the refinement, respectively. The display window size of (a1–a4) is [0, 200].
Table 3.
Quantitative metrics comparison with and without contour refinement
| Metric | DSC | Precision | Recall | HD (mm) | MSD (mm) | RMSD (mm) |
|---|---|---|---|---|---|---|
| Transverse | 0.912 ± 0.026 | 0.897 ± 0.056 | 0.930 ± 0.043 | 3.996 ± 1.560 | 0.607 ± 0.228 | 0.907 ± 0.377 |
| Sagittal | 0.914 ± 0.025 | 0.897 ± 0.057 | 0.931 ± 0.043 | 3.981 ± 1.573 | 0.607 ± 0.228 | 0.907 ± 0.378 |
| Coronal | 0.916 ± 0.028 | 0.899 ± 0.059 | 0.930 ± 0.044 | 3.999 ± 1.546 | 0.606 ± 0.228 | 0.907 ± 0.376 |
| Our DS‐CR‐V‐Net | 0.919 ± 0.028 | 0.906 ± 0.055 | 0.938 ± 0.043 | 3.938 ± 1.550 | 0.599 ± 0.225 | 0.900 ± 0.377 |
| P‐value (Our vs Trans) | <0.001 | <0.001 | <0.001 | <0.001 | 0.002 | <0.001 |
| P‐value (Our vs Sag) | 0.004 | <0.001 | <0.001 | 0.009 | 0.002 | <0.001 |
| P‐value (Our vs Cor) | 0.064 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 |
DSC, Dice similarity coefficient; DS‐CR‐V‐Net, deeply supervised contour refinement V‐Net; HD, Hausdorff distance; MSD, mean surface distance; RMSD, residual mean surface distance.
3.E. Comparison with state‐of‐art methods
In order to evaluate and verify the performance of our proposed method, we compared its performance against state‐of‐the‐art prostate segmentation algorithms based on U‐Net,36 V‐Net,26 deeply supervised U‐Net (DS‐U‐Net),33 and CNNs.32 Figure 6 compares the segmentation results between these methods and our proposed DS‐CR‐V‐Net. In Figs. 6(a1)–6(g1), five algorithms have similarly segmented prostate in a high‐contrast TRUS image. However, in a low‐contrast TRUS image, the segmented prostate from our DS‐CR‐V‐Net much more closely represents the real prostate than comparison methods as shown in Figs. 6(a2)–6(g2). Furthermore, as shown in Figs. 6(a3)–6(g3), 6(a4)–6(g4), all the comparison methods could not accurately identify the prostate boundary at the apex and base with a low contrast in prostate TRUS images, while our proposed DS‐CR‐V‐Net with a contour refinement model, demonstrating that our method outperforms these four state‐of‐art deep learning segmentation algorithms.
Figure 6.

Segmentation comparison between the proposed deeply supervised contour refinement V‐Net (DS‐CR‐V‐Net) and comparing methods. (a1–g1) High‐contrast transrectal ultrasound (TRUS) image of the mid‐prostate in the transverse plane, the corresponding manual contour, segmented prostates using U‐Net, V‐Net, convolutional neural networks (CNNs), deeply supervised U‐Net (DS‐U‐Net), and our DS‐CR‐V‐Net algorithms, respectively. (a2–g2) Low‐contrast TRUS image in mid‐prostate on the transverse plane, the corresponding manual contour, segmented prostates using U‐Net, V‐Net, CNNs, DS‐U‐Net, and our DS‐CR‐V‐Net algorithms, respectively. (a3–g3) TRUS image at the prostate apex, the corresponding manual contour, segmented prostates using U‐Net, V‐Net, CNNs, DS‐U‐Net, and our DS‐CR‐V‐Net algorithms, respectively. (a4–g4) TRUS image at the prostate base, the corresponding manual contour, segmented prostate generated by U‐Net, V‐Net, CNNs, DS‐U‐Net, and our DS‐CR‐V‐Net algorithms, respectively. The display windows for (a1–a4) are [0, 200].
Table 4 shows a quantitative metrics comparison of prior methods as well as our DS‐CR‐V‐Net based on the leave‐one‐out cross‐validation. All the comparing algorithms were performed using their best parameter setting. As shown in Table 5, there is a significant improvement on all metrics between our proposed DS‐CR‐V‐Net method over the V‐Net and DS‐U‐Net methods. Table 5 also shows the corrected P‐value calculated through the Holm‐Bonferroni method40 with p‐values obtained by comparison between our method and four other methods. Any corrected P‐value less than alpha (0.05) is significant. A binary vector h has the same dimensionality as corrected P‐values. If the ith element of h is 1, then the ith P‐value is significant. As can be shown in Table 5, all P‐values are significant in DSC.
Table 4.
Quantitative metrics comparison of our proposed algorithm vs state‐of‐the‐art methods
| Metric | DSC | Precision | Recall | HD (mm) | MSD (mm) | RMSD (mm) |
|---|---|---|---|---|---|---|
| U‐Net | 0.906 ± 0.028 | 0.905 ± 0.062 | 0.912 ± 0.049 | 4.437 ± 2.010 | 0.619 ± 0.220 | 0.915 ± 0.346 |
| V‐Net | 0.905 ± 0.030 | 0.881 ± 0.060 | 0.935 ± 0.035 | 4.643 ± 1.926 | 0.657 ± 0.270 | 0.977 ± 0.410 |
| CNNs | 0.901 ± 0.032 | 0.891 ± 0.069 | 0.910 ± 0.072 | 4.391 ± 1.788 | 0.711 ± 0.315 | 1.043 ± 0.479 |
| DS‐U‐Net | 0.911 ± 0.03 | 0.901 ± 0.05 | 0.926 ± 0.05 | 3.963 ± 1.51 | 0.599 ± 0.21 | 0.892 ± 0.33 |
| DS‐CR‐V‐Net | 0.919 ± 0.028 | 0.906 ± 0.055 | 0.938 ± 0.043 | 3.938 ± 1.550 | 0.599 ± 0.225 | 0.900 ± 0.377 |
CNNs, convolutional neural networks; DSC, Dice similarity coefficient; DS‐CR‐V‐Net, deeply supervised contour refinement V‐Net; HD, Hausdorff distance; MSD, mean surface distance; RMSD, residual mean surface distance.
Table 5.
P‐values, corrected P‐values, and h obtained by comparing our proposed algorithm with state‐of‐the‐art methods in whole prostate region
| DSC | Precision | Recall | HD | MSD | RMSD | |
|---|---|---|---|---|---|---|
| P‐value | ||||||
| U‐Net | <0.001 | 0.854 | <0.001 | 0.063 | 0.423 | 0.689 |
| V‐Net | <0.001 | <0.001 | 0.561 | 0.002 | 0.007 | 0.021 |
| CNNs | <0.001 | 0.171 | 0.004 | 0.854 | 0.991 | 0.771 |
| DS‐U‐Net | <0.001 | 0.009 | 0.005 | 0.013 | <0.001 | 0.001 |
| Corrected P‐value(h) | ||||||
| U‐Net | <0.001 ( 1 ) | 0.855 ( 0 ) | <0.001 ( 1 ) | 0.126 ( 0 ) | 0.846 ( 0 ) | 1.378 ( 0 ) |
| V‐Net | <0.001 ( 1 ) | <0.001 ( 1 ) | 0.561 ( 0 ) | 0.008 ( 1 ) | 0.022 ( 1 ) | 0.063 ( 0 ) |
| CNNs | <0.001 ( 1 ) | 0.341 ( 0 ) | 0.011 ( 1 ) | 0.854 ( 0 ) | 0.991 ( 0 ) | 1.378 ( 0 ) |
| DS‐U‐Net | <0.001 ( 1 ) | 0.028 ( 1 ) | 0.011 ( 1 ) | 0.039 ( 1 ) | 0.002 ( 1 ) | 0.005 ( 1 ) |
CNNs, convolutional neural networks; DSC, Dice similarity coefficient; DS‐CR‐V‐Net, deeply supervised contour refinement V‐Net; HD, Hausdorff distance; MSD, mean surface distance; RMSD, residual mean surface distance. The italics format is used to show the P‐value through paired two‐tailed t‐test and corrected P‐value through Holm‐Boneferroni method, the italics and bold format is used to show the associated hypotheses for it corresponding corrected P‐value, where 1 denoting significant, 0 denoting not significant.
We also conducted a regional analysis to further demonstrate the improvement comparing with other state‐of‐the‐art methods, which includes U‐Net,36 V‐Net,26 CNNs,32 and DS‐U‐Net.33 We varied the number of cropped base (superior) and apex (inferior) slices from 1 to 10 and measured the segmented contour accuracy within these regions. Dice similarity coefficient and HD were used to compare our proposed method to the comparator methods, as shown in Appendix Tables S3 and S4. It is shown that with respect to mean DSC of the base and apex regions, our method outperforms other methods.
Table 6 show the p‐values of the comparison between our method and other methods in base and apex regions, where the number of slices was four. Table 6 also shows the corrected P‐values calculated through the Holm‐Bonferroni method40 with P‐values obtained by comparison between our method and four other methods. As can be shown in Table 6, all p‐values are significant in DSC of both base and apex, and in HD of base region.
Table 6.
P‐values, corrected P‐values, and h obtained by comparing our proposed algorithm with state‐of‐the‐art methods in low‐contrast prostate region
| DSC base | DSC apex | HD base | HD apex | |
|---|---|---|---|---|
| P‐value | ||||
| U‐Net | 0.014 | 0.001 | <0.001 | 0.025 |
| V‐Net | 0.009 | 0.042 | 0.010 | 0.031 |
| DS‐U‐Net | 0.018 | 0.002 | 0.003 | 0.031 |
| CNNs | 0.013 | 0.002 | <0.001 | 0.149 |
| Corrected P‐value (h) | ||||
| U‐Net | 0.040 ( 1 ) | 0.004 ( 1 ) | <0.001 ( 1 ) | 0.099 ( 0 ) |
| V‐Net | 0.037 ( 1 ) | 0.042 ( 1 ) | 0.010 ( 1 ) | 0.092 ( 0 ) |
| DS‐U‐Net | 0.029 ( 1 ) | 0.005 ( 1 ) | 0.006 ( 1 ) | 0.099 ( 0 ) |
| CNNs | 0.040 ( 1 ) | 0.005 ( 1 ) | <0.001 ( 1 ) | 0.149 ( 0 ) |
CNNs, convolutional neural networks; DSC, Dice similarity coefficient; DS‐U‐Net, deeply supervised U‐Net; HD, Hausdorff distance; MSD, mean surface distance; RMSD, residual mean surface distance. The italics format is used to show the P‐value through paired two‐tailed t‐test and corrected P‐value through Holm‐Boneferroni method, the italics and bold format is used to show the associated hypotheses for it corresponding corrected P‐value, where 1 denoting significant, 0 denoting not significant.
3.F. Inter‐ and intraobserver reliability
In order to test the reliability of manual contours, we conducted an inter‐ and intraobserver reliability study with nine patients. The interobserver study was performed by three physicians contouring on same patients separately. The intraobserver study was performed by one physician contouring on the same patients separated by an interval of 3 weeks. The volume percentage difference among observer 1 (O1‐1), 2 (O2), and 3 (O3) and our segmented contour were measured for interobserver reliability. The volume percentage difference between two‐time contours of the observer 1 (O1‐1, O1‐2), and our segmented contour were measured for intraobserver reliability. Whole region, base, and apex were used to evaluate. The metrics of these comparisons are shown in Appendix Fig. S7.
Comparing the manual segmentations by three interobservers and our segmentation, the volume percentage difference variance of our vs. three observers' manual contour (whole region 3.12% ± 3.82%, base 16.95% ± 19.67%, apex 19.51% ± 19.15%) is smaller than that among three interobservers (whole region 4.46% ± 6.40%, base 27.18% ± 27.72%, apex 31.81% ± 25.31%), which demonstrated our method partially reduces interobserver variation. Comparing the two manual segmentations by same observer and our segmentation, the volume percentage difference variance of our vs intraobservers' manual contour (whole region 3.46% ± 4.01%, base 11.75% ± 15.76%, apex 15.82% ± 15.35%) is smaller than that among two intraobservers (whole region 5.14% ± 4.59%, base 20.83% ± 23.03%, apex 27.84% ± 23.26%), which demonstrated our method partially reduces intraobserver variation. In prostate whole region, the inter‐ and intraobserver reliability study showed the consistency in the manual segmentations.
4. Discussion
We proposed a new prostate segmentation method which incorporates a deep supervision strategy and multidirectional‐based contour refinement into a V‐Net architecture to automatically segment the prostate on TRUS images. Our proposed method was evaluated against state‐of‐the‐art deep learning networks. As shown in Fig. 6 and Table 4, our proposed method outperformed these two both qualitatively and quantitatively. Ghavami et al. have reported their CNNs‐based TRUS prostate segmentation results with 10‐fold patient‐level cross‐validation.32 The DSC and MSD of 4055 2D TRUS images were 0.91 ± 0.12 and 1.23 ± 1.46 mm, respectively. The DSC of 110 3D TRUS images was 0.91 ± 0.04. Our final results show the DSC and MSD of 44 3D TRUS images were 0.92 ± 0.03 and 0.60 ± 0.23 mm, respectively. Considering the different TRUS database used in our and Ghavami's paper, we applied the Ghavami's CNNs into our TRUS database with the best performing parameter settings, as shown in Fig. 6 and Table 4. Our proposed method outperformed this method, using the same TRUS database. In addition, the superior performance of our proposed method was further demonstrated at the apex and base regions from our region analysis study as shown in Table 6.
There are several limitations to our current methods. First, our ground truth of prostate volume is from the manual contours by physicians. These manual contours may have systematic errors and random errors. Our proposed method may mitigates random errors. However, the systematic errors (e.g., physician's contouring style) will affect our final segmented results, but this would be expected to be a limitation to all learning‐based methods. Second, the computation complexity is higher than the state‐of‐the‐art algorithms due to three more stages of deep supervision and corresponding upsampling convolutional kernels. In our leave‐one‐out experiments, the training time for a U‐Net and V‐Net are 1.35 and 1.70 h, whereas our proposed algorithm requires 1.85 h. However, after training, our proposed algorithm has similar segmentation times compared to these two algorithms. A single prostate segmentation can be completed in 1–2 s. All the algorithms were implemented in Tensorflow with Adam optimizer and were trained and tested on a NVIDIA TITAN XP GPU with 12 GB of memory. Third, it may require the introduction of an adaptive and nonlinear contour refinement model when the prostate surface of the three methods from three different directions are not well matched. Improving contour refinement, incorporating the conditional random field, is a direction of for our future segmentation work. We also plan to test the robustness and reliability of our TRUS prostate segmentation using more patients' data before we apply our method to our ultrasound‐guided prostate‐cancer radiotherapy procedure. In addition, the size of the patient population is relative small. Evaluating the clinical utility of the proposed method on more patients' data will be our future work. With limited patient datasets available as training samples, data augmentation is essential to train the network with the desired invariance and robustness properties. By artificially increasing the number of training examples, it helps reducing overfitting and improves generalization. Especially, for the computed tomography prostate segmentation, the prostates' size, shape, and position vary significantly among individuals. Data augmentation can help introducing more such training data diversity. In our implementation, data augmentation was applied during training by flipping the images left and right in transverse plane, rotating images by 90°, 180°, 270° in transvers plane, as well as random elastic deforming training TRUS images and corresponding contour binary masks. For deforming images, as recommended in previous deep learning‐based study,36 we generated smooth deformations using random displacement vectors on a coarse 3 by 3 grid. The displacements were sampled from a Gaussian distribution with 10 pixels standard deviation. Per‐pixel displacements are then computed using bicubic interpolation. In our experience, the random elastic deformations of the training data are the key concept to train a segmentation network with limited number of annotated images.
Our proposed method does not need any manual intervention in segmentation stage. We set input patch size as 512 × 512 × 4, which is a group of sequent slices within a TRUS image volume. The patches were automatically extracted by sliding the volume with overlap 0 × 0 × 2. We used mean patch fusion to reconstruct the probability map, and then use a threshold to get the binary mask.
5. Conclusions
We developed a novel 3D deeply supervised deep learning‐based approach to automatically segment the TRUS prostate. A 3D deep supervision strategy has been utilized to address limitations from the size of the training dataset and low contrast challenges in TRUS prostate segmentation. A multidirectional‐based contour refinement was used as a post‐processing to refine the segmentation results. Experimental validation was performed to demonstrate its clinical feasibility and segmentation accuracy. When considering the whole prostate, the improvement of the proposed method is mostly in terms of the Dice coefficient. In low‐contrast regions, the comparison between our method and other method is significant in DSC of both base and apex regions, and in HD of base region. This segmentation technique could be a useful tool for image‐guided interventions in prostate cancer diagnosis and treatment. The role of multidirectional‐based refinement in the post‐processing of TRUS contour is expected to continue to grow as increasingly complex technological challenges and associated robustness concerns need to be addressed.
Conflict of Interest
The authors declare no conflicts of interest.
Supporting information
Fig S1: DSC as a function of weighting parameter ρ and balancing parameter μ in our proposed hybrid loss function.
Fig S2: Convergence of proposed deep supervised V‐Net with multi‐derivative image and single image input.
Fig S3: Comparison of segmented prostate contours from the proposed deep supervised V‐Net with multi‐derivative image to single image input.
Fig S4: Comparison of the prostate probability map at each stage.
Fig S5: Batch‐based mean DSC convergence of the V‐Net and our DS‐V‐Net.
Fig S6: Segmentation comparison from our DS‐V‐Net based on three different loss functions.
Fig S7: Inter‐ and intra‐observer reliability of the prostate contours.
Table S1: Default parameter setting.
Table S2: Quantitative metrics comparison of the proposed DSN‐V‐Net with three different loss functions.
Table S3: DSC of base and apex regions of our proposed algorithm versus state‐of‐the‐art methods.
Table S4: HD of base and apex regions of our proposed algorithm versus state‐of‐the‐art methods.
Acknowledgments
This research is supported in part by the National Cancer Institute of the National Institutes of Health under Award Number R01CA215718 (XY), the Department of Defense (DoD) Prostate Cancer Research Program (PCRP) Award W81XWH‐13‐1‐0269 (XY), DoD W81XWH‐17‐1‐0438 (TL), and W81XWH‐17‐1‐0439 (AJ), and Dunwoody Golf Club Prostate Cancer Research Award, a philanthropic award provided by the Winship Cancer Institute of Emory University. We are also grateful for the GPU support from NVIDIA Corporation.
References
- 1. Martin S, Daanen V, Troccaz J. Atlas‐based prostate segmentation using an hybrid registration. Int J Comput Assist Radiol Surg. 2008;3:485–492. [Google Scholar]
- 2. Sarkar S, Das S. A review of imaging methods for prostate cancer detection. Biomed Eng Comput Biol. 2016;7:1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Yu YY, Chen YM, Chiu B. Fully automatic prostate segmentation from transrectal ultrasound images based on radial bas‐relief initialization and slice‐based propagation. Comput Biol Med. 2016;74:74–90. [DOI] [PubMed] [Google Scholar]
- 4. Mahdavi SS, Chng N, Spadinger I, Morris WJ, Salcudean SE. Semi‐automatic segmentation for prostate interventions. Med Image Anal. 2011;15:226–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Ghose S, Oliver A, Martí R, et al. A survey of prostate segmentation methodologies in ultrasound, magnetic resonance and computed tomography images. Comput Meth Prog Bio. 2012;108:262–287. [DOI] [PubMed] [Google Scholar]
- 6. Yan P, Xu S, Turkbey B, Kruecker J. Discrete deformable model guided by partial active shape model for TRUS image segmentation. IEEE Trans Biomed Eng. 2010;57:1158–1166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Betrouni N, Vermandel M, Pasquier D, Maouche S, Rousseau J. Segmentation of abdominal ultrasound images of the prostate using a priori information and an adapted noise filter. Comput Med Imag Grap. 2005;29:43–51. [DOI] [PubMed] [Google Scholar]
- 8. Kachouie NN, Fieguth P, Rahnamayan S. An elliptical level set method for automatic TRUS prostate image segmentation. Proc of IEEE International Symposium on Signal Processing and Information Technology; 2006:191–196. 10.1109/ISSPIT.2006.270795 [DOI]
- 9. Nouranian S, Mahdavi SS, Spadinger I, Morris WJ, Salcudean SE, Abolmaesumi P. A multi‐atlas‐based segmentation framework for prostate brachytherapy. IEEE Trans Med Imaging. 2015;34:950–961. [DOI] [PubMed] [Google Scholar]
- 10. Yang X, Wu N, Cheng G, et al. Automated segmentation of the parotid gland based on atlas registration and machine learning: a longitudinal MRI study in head‐and‐neck radiation therapy. Int J Radiat Oncol Biol Phys. 2014;90:1225–1233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Zouqi M, Samarabandu J. Prostate segmentation from 2‐D ultrasound images using graph cuts and domain knowledge. Proc of Canadian Conference on Computer and Robot Vision; 2008:359–362.
- 12. Egger J. PCG‐Cut: graph driven segmentation of the prostate central gland. PLoS ONE. 2013;8:e76645 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Gong LX, Ng L, Pathak SD, et al. Prostate ultrasound image segmentation using level set‐based region flow with shape guidance. Proc SPIE. 2005;5747:1648–1657. [Google Scholar]
- 14. Yu YY, Cheng JY, Li JZ, Chen WF, Chiu B. Automatic prostate segmentation from transrectal ultrasound images. Proc of IEEE Biomedical Circuits and Systems Conference; 2014:117–120. 10.1109/BioCAS.2014.6981659 [DOI]
- 15. Richard WD, Keen CG. Automated texture‐based segmentation of ultrasound images of the prostate. Comput Med Imag Grap. 1996;20:131–140. [DOI] [PubMed] [Google Scholar]
- 16. Tutar IB, Pathak SD, Gong L, Cho PS, Wallner K, Kim Y. Semiautomatic 3‐D prostate segmentation from TRUS images using spherical harmonics. IEEE Trans Med Imaging. 2007;25:1645–1654. [DOI] [PubMed] [Google Scholar]
- 17. Qiu W, Yuan J, Ukwatta E, Fenster A. Rotationally resliced 3D prostate TRUS segmentation using convex optimization with shape priors. Med Phys. 2015;42:877–891. [DOI] [PubMed] [Google Scholar]
- 18. Nouranian S, Ramezani M, Spadinger I, Morris WJ, Salcudean SE, Abolmaesumi P. Learning‐based multi‐label segmentation of transrectal ultrasound images for prostate brachytherapy. IEEE Trans Med Imaging. 2016;35:921–932. [DOI] [PubMed] [Google Scholar]
- 19. Anas E, Mousavi P, Abolmaesumi P. A deep learning approach for real time prostate segmentation in freehand ultrasound guided biopsy. Med Image Anal. 2018;48:107–116. [DOI] [PubMed] [Google Scholar]
- 20. Yang XF, Fei BW. 3D prostate segmentation of ultrasound images combining longitudinal image registration and machine learning. Proc SPIE. 2012;8316:83162O. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Yang XF, Schuster D, Master V, Nieh P, Fenster A, Fei BW. Automatic 3D segmentation of ultrasound images using atlas registration and statistical texture prior. Proc SPIE. 2011;7964:796432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Yang X, Rossi P, Jani A, Mao H, Curran W, Liu T. 3D transrectal ultrasound (TRUS) prostate segmentation based on optimal feature learning framework. Proc SPIE. 2016;9784:97842F. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Akbari H, Yang X, Halig LV, Fei B. 3D segmentation of prostate ultrasound images using wavelet transform. Proc SPIE. 2011;7962:79622K. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Ghose S, Mitra J, Oliver A, et al. A supervised learning framework for automatic prostate segmentation in trans rectal ultrasound images. Advanced Concepts for Intelligent Vision Systems (Acivs. 2012); 2012;7517:190–200.
- 25. Khellaf F, Leclerc S, Voorneveld JD, Bandaru RS, Bosch JG, Bernard O. Left ventricle segmentation in 3D ultrasound by combining structured random forests with active shape models. Proc SPIE. 2018;10574:105740J. [Google Scholar]
- 26. Milletari F, Navab N, Ahmadi SA. V‐Net: fully convolutional neural networks for volumetric medical image segmentation. Proceedings of International Conference on 3D Vision; 2016:565–571. 10.1109/3dv.2016.79 [DOI]
- 27. Dong X, Lei Y, Wang T, et al. Automatic multiorgan segmentation in thorax CT images using U‐net‐GAN. Med Phys. 2019. 46:2157–2168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Wang B, Lei Y, Tian S, et al. Deeply supervised 3D fully convolutional networks with group dilated convolution for automatic MRI prostate segmentation. Med Phys. 2019;46:1707–1718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Wang T, Lei Y, Tang H, et al. A learning‐based automatic segmentation and quantification method on left ventricle in gated myocardial perfusion SPECT imaging: a feasibility. J Nucl Cardiol. 2019. 10.1007/s12350-019-01594-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Wang T, Lei Y, Tian S, et al. Learning‐based automatic segmentation of arteriovenous malformations on contrast CT images in brain stereotactic radiosurgery. Med Phys. 2019. 10.1002/mp.13560 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Yang X, Yu L, Wu L, et al. Recurrent neural networks for automatic prostate segmentation in ultrasound images. Proc of AAAI;2017.
- 32. Ghavami N, Hu YP, Bonmati E, et al. Automatic slice segmentation of intraoperative transrectal ultrasound images using convolutional neural networks. Proc SPIE. 2018;10576:1057603. [Google Scholar]
- 33. Zhu QK, Du B, Turkbey B, Choyke PL, Yan PK. Deeply‐supervised CNN for prostate segmentation. Proc of International Joint Conference on Neural Networks;2017:178‐184. 10.1109/IJCNN.2017.7965852 [DOI]
- 34. Zeng Q, Samei G, Karimi D, et al. Prostate segmentation in transrectal ultrasound using magnetic resonance imaging priors. Int J Comput Assist Radiol Surg. 2018;13:749–757. [DOI] [PubMed] [Google Scholar]
- 35. Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY.Multimodal deep learning. Proc of International Conference on International Conference on Machine Learning; 2011:689–696.
- 36. Ronneberger O, Fischer P, Brox T. U‐Net: convolutional networks for biomedical image segmentation. Proc MICCAI. 2015;9351:234–241. [Google Scholar]
- 37. Postema A, Mischi M, de la Rosette J, Wijkstra H. Multiparametric ultrasound in the detection of prostate cancer: a systematic review. World J Urol. 2015;33:1651–1659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Dou Q, Yu LQ, Chen H, et al. 3D deeply supervised network for automated segmentation of volumetric medical images. Med Image Anal. 2017;41:40–54. [DOI] [PubMed] [Google Scholar]
- 39. Ding MY, Chiu B, Gyacskov I, et al. Fast prostate segmentation in 3D TRUS images based on continuity constraint using an autoregressive model. Med Phys. 2007;34:4109–4125. [DOI] [PubMed] [Google Scholar]
- 40. Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1979;6:65–70. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Fig S1: DSC as a function of weighting parameter ρ and balancing parameter μ in our proposed hybrid loss function.
Fig S2: Convergence of proposed deep supervised V‐Net with multi‐derivative image and single image input.
Fig S3: Comparison of segmented prostate contours from the proposed deep supervised V‐Net with multi‐derivative image to single image input.
Fig S4: Comparison of the prostate probability map at each stage.
Fig S5: Batch‐based mean DSC convergence of the V‐Net and our DS‐V‐Net.
Fig S6: Segmentation comparison from our DS‐V‐Net based on three different loss functions.
Fig S7: Inter‐ and intra‐observer reliability of the prostate contours.
Table S1: Default parameter setting.
Table S2: Quantitative metrics comparison of the proposed DSN‐V‐Net with three different loss functions.
Table S3: DSC of base and apex regions of our proposed algorithm versus state‐of‐the‐art methods.
Table S4: HD of base and apex regions of our proposed algorithm versus state‐of‐the‐art methods.
