Skip to main content
Journal of Digital Imaging logoLink to Journal of Digital Imaging
. 2020 Apr 22;33(4):946–957. doi: 10.1007/s10278-020-00339-9

SUD-GAN: Deep Convolution Generative Adversarial Network Combined with Short Connection and Dense Block for Retinal Vessel Segmentation

Tiejun Yang 1,2, Tingting Wu 3,, Lei Li 3, Chunhua Zhu 3
PMCID: PMC7522149  PMID: 32323089

Abstract

Since morphology of retinal blood vessels plays a key role in ophthalmological disease diagnosis, retinal vessel segmentation is an indispensable step for the screening and diagnosis of retinal diseases with fundus images. In this paper, deep convolution adversarial network combined with short connection and dense block is proposed to separate blood vessels from fundus image, named SUD-GAN. The generator adopts U-shape encode-decode structure and adds short connection block between convolution layers to prevent gradient dispersion caused by deep convolution network. The discriminator is all composed of convolution block, and dense connection structure is added to the middle part of the convolution network to strengthen the spread of features and enhance the network discrimination ability. The proposed method is evaluated on two publicly available databases, the DRIVE and STARE. The results show that the proposed method outperforms the state-of-the-art performance in sensitivity and specificity, which were 0.8340 and 0.9820, and 0.8334 and 0.9897 respectively on DRIVE and STARE, and can detect more tiny vessels and locate the edge of blood vessels more accurately.

Electronic supplementary material

The online version of this article (10.1007/s10278-020-00339-9) contains supplementary material, which is available to authorized users.

Keywords: Retinal vessel segmentation, Generative adversarial network, Short connection block, Dense block

Introduction

The eyes are one of the most important sensory organs of the human body, but many people in the world are suffering from blindness. Among many eye diseases that cause blindness, fundus diseases such as senile macular degeneration, diabetic retinopathy, and hypertensive retinopathy are the main causes of blindness. The study and the analysis of retinal vessel geometric characteristics such as vessel diameter, branch angles, and branch lengths have become the basis of medical applications related to early diagnosis and effective monitoring of retinal pathology [1].

In practical clinical diagnosis, ophthalmologists mainly perform manual segmentation of retinal vascular images based on their professional knowledge and personal experience. However, due to the imbalance of the number of doctors and patients, the number of medical images has been increasing in recent years, which makes manual segmentation time-consuming and laborious. Moreover, for novice doctors with insufficient clinical experience, the accuracy of segmentation is difficult to guarantee, and the manual segmentation method is not conducive to large-scale disease screening and diagnosis. Therefore, in recent years, the method of retinal blood vessel automatic segmentation has become a research hotspot of medical image processing.

At present, a large number of retinal vascular automatic segmentation methods have emerged. However, due to the complex structure of retinal blood vessels and the low performance of capillary sites, it is difficult to maintain the connectivity of blood vessels in general segmentation methods, so retinal vascular segmentation is still a big challenge. Segmentation method can usually be divided into supervised and unsupervised methods according to the gold standard of whether manual annotation is needed or not.

Unsupervised Methods for Vessel Segmentation

Unsupervised segmentation methods design feature vectors manually by observing a given sample. For retinal vessel segmentation, unsupervised methods can exploit the natural mode of vessels to determine whether pixels are blood vessels. Common unsupervised methods are based on mathematical morphology, matching filter, and vascular tracking. Azzopardi et al. [2] proposed a B-COSFIRE method, which achieves orientation selectivity by computing the weighted geometric mean of the output of a pool of difference-of-Gaussians filters. Zhao et al. [3] proposed a new infinite active contour model which uses hybrid region information of the image to automated detection of blood vessel structures. Jiang et al. [4] proposed a morphology-based global model which was used to draw the retinal venule structure and centerline. Liang en hui [5] proposed an improved matching filtering algorithm based on line template filtering and B-COSFIRE, which fused the segmentation results of matching filter, line template filter, and B-COSFIRE. Unsupervised method is suitable for training a large amount of unmarked data, but it is very complicated. In normal retinal images, most of these methods perform well, while in pathological retinal images, the segmentation accuracy of these methods is low.

Supervised Methods for Vessel Segmentation

Supervised methods use extracted feature vectors, or labeled training data (ground truth), to train a classifier that automatically classifies retinal vascular and non-vascular pixels from the retinal images. Labeled dataset is very important in supervised methods. Because the prior knowledge of vessel segmentation is obtained directly from the ophthalmologist’s manually segmented images, the performance of the supervised method is usually better than the unsupervised method. At present, supervised methods are mainly based on Bayesian classifier, random forest (RF), support vector machine (SVM), multilayer neural network, decision tree, Gaussian mixture model (GMM), and so on. Orlando et al. [6] proposed a model based on a discriminatively trained fully connected conditional random field, which uses SVM to supervise the learning of model parameters and is helpful for the processing of slender vascular structure. Fraz et al. [7] proposed an ensemble system, which relates to a combination of bagged and booster decision trees and utilizes a feature vector based on the orientation analysis of gradient vector field, morphological transformation, line strength measures, and Gabor filter responses. The feature vector encodes information to handle the healthy as well as the pathological retinal image. In general, these methods design segmentation models according to existing prior knowledge and complex probability statistics methods.

Related Work

In recent years, deep learning algorithms have been rapidly developed and the segmentation method based on deep learning has surpassed the traditional segmentation method. Among them, convolutional neural network (CNN) can automatically learn high ergodic features without prior knowledge and additional preprocessing, which is widely used in image classification and image detection. Long et al. [8] proposed the fully convolutional networks (FCN), in which the full connection layer of CNN is replaced with the convolution layer and the skip connection structure is added. FCN adopts full convolution structure for pixel by pixel prediction, which avoids the problem of high storage cost and low computing efficiency of traditional CNN. FCN achieves end-to-end semantic segmentation of images. Ronneberger et al. [9] proposed U-net, which is based on FCN and adopts symmetric down-sampling and up-sampling structure. Meanwhile, U-net increases reuse of shallow features through short links and achieves better results in the field of biomedical image segmentation with small datasets. Badrinarayanan et al. [10] proposed SegNet, which has similar encoding and decoding structures with U-net, but up-sampling method is different. Most semantic segmentation methods of medical images are based on the segmentation network proposed above, and the depth or width of the network is extended. Mo and Zhang [11] proposed a multi-level deep supervised convolutional network, which utilized the multi-level and hierarchical characteristics of the deep supervised convolutional network, and used the knowledge transfer learned from other fields to alleviate the problem of insufficient medical training data. Hu et al. [12] proposed a method based on CNN and full connectivity condition random (CRF). Oliveira et al. [13] proposed a combining multiscale analysis method, which uses stationary wavelet transform and multiscale convolutional neural network (FCN) to deal with changes in the width and direction of retinal vessel structure. Guo et al. [14] proposed a multiscale deep monitoring network with short connections based on FCN, which used short connections to transmit semantic information between side output layers to improve the performance of the network by learning multiscale features. However, traditional deep learning methods usually assume that the training data and test data follow the same distribution or that the predicted results on the training data follow the same distribution. As a result, the training result is superior to the test result, and the over-fitting problem occurs. Moreover, due to the focus on the classification of pixel level and the neglect of the correlation between pixels, the segmentation result is fuzzy, and binarization post-processing is required for the segmentation result.

In order to generate a new sample that conforms to the true sample probability distribution, Goodfellow et al. [15] proposed generative adversarial network (GAN). Adversarial network can be composed of a generation model (generator) and a discrimination model (discriminator). In the training process, the two networks are optimized in turn until the two sides reach a dynamic balance. By counter learning, the algorithm can learn the distribution of data directly. However, for complex data, such as high-resolution images, it is extremely difficult to learn its pixel distribution unsupervised. To solve this problem, Mirza and Osindro [16] proposed conditional generative adversarial network (CGAN), which can make it controllable generated new samples and make the results more in line with expectations. Facebook AI team [17] proposed deep convolution generative adversarial network (DCGAN), which introduced deep neural networks into GAN, which not only accelerates the GAN training process but also makes the training process more stable. Generative adversarial network, which combines neural network with adversarial idea, has been applied to medical image processing and achieved good results in the field of medical image segmentation. Moeskops et al. [18] used GAN and expanded convolution to achieve automatic segmentation of brain MR images and added expanded convolution to replace the pooling layer to reduce the loss of features in the lower sampling, so that the segmentation results are better than the full convolution network. Xue et al. [19] proposed a new end-to-end against network architecture SegAN to segment medical image, which introduce a new multiscale L1 loss function to evaluate real segmentation result and forecast the difference between the features of segmentation results. Shankaranarayana et al. [20] combined FCN and GAN to perform automatic segmentation of the optic nerve disc of the fundus to assist in the diagnosis of glaucoma. The methods are superior to the existing methods in various evaluation indexes. Lahiri et al. [21] used the semi-supervised semantic segmentation method of generative adversarial network to segment blood vessels from fundus images, and the training showed higher efficiency. Although the above methods of GAN have achieved good segmentation results, the segmentation of low-pixel capillaries still has the problem of low segmentation accuracy. Because the characteristics of adversarial training are the competition between two models, the improvement of one side’s performance is also the suppression of the performance of the other side. The discriminator will be confused by the new samples generated by the generator if it lacks the discriminant ability, and it cannot correctly distinguish the real samples from the generated samples.

In order to more effectively and accurately segment retinal blood vessel images, this paper proposes a retinal blood vessel segmentation model based on deep convolution generative adversarial network (SUD-GAN). We did multiple comparative studies on two classic datasets, DRIVE and STARE; the objective function is optimized by using binary cross entropy loss function. Experimental results show that SUD-GAN achieved competitive performance over most state-of-the-art methods. The main work of this paper includes the following:

  1. U-shape generator structure with short connection block. We used U-net as backbone and combined with the idea of Residual Net [22]. The short connection is added between each convolution block to make the network sensitive to the change of output and the change of network weight, so it helps better adjust the weight of the network and avoid the gradient dispersion problem caused by deep convolution neural network, which can improve the segmentation ability and robustness of the generator model.

  2. Deep convolution discriminator structure with dense connection block. We construct a deep convolutional neural network with multiple hidden layers to extract the abstract features of the fundus image. In addition, Densely Net [23] is introduced in the proposed model. Dense connection structure is added to the convolution block in the middle convolution network layer. It can enhance the transmission of shallow features to deep convolutional neural network and improve the discriminant ability of the discriminator, as well as make the adversarial training better guide the selection of features.

Through the adversarial training between the generator and the discriminator, the network parameters of the two models were optimized, thus realizing a method of automatic segmentation of retinal images with better performance.

Method

Network Architecture

In this section, we describe the proposed SUD-GAN, the framework which is shown in Fig. 1. SUD-GAN architecture is built on a GAN and consists of two sub-model, the generator and discriminator. The generator is responsible for generating a new probability map that is as close as possible to the ground truth. The discriminator is responsible for distinguishing the input real samples from generated samples. In the training phase, the real samples and generated samples were input into the discriminant model together; features were extracted through deep convolutional network, then the real samples were assigned with a higher label and the generated samples with a lower label, and the distinction between the real samples and generated samples is realized. The two models are trained alternately. Firstly, the generator network parameters of the model are fixed, the discriminator is trained K times, the parameters of the discriminant network are updated until the discriminant model can reach a certain discrimination accuracy, and then the generating model is trained. By alternative training, the network parameter is constantly optimized. The discriminator can more accurately distinguish real samples from generated samples, and the generator will generate new samples that are close to real samples. Finally, the framework reaches a dynamic equilibrium, namely Nash equilibrium [15]. When the dynamic balance is reached, the generator can generate the closest new samples with the real samples; in other words, the generator can restore the real sample distribution. In this case, the discriminant result of the discriminator model is that the real sample and the generated sample account for 50% respectively, so it is impossible to distinguish the real sample and the generated sample, and the network training is completed.

Fig. 1.

Fig. 1

An overview of retina vessel segmentation process with generative adversarial networks

Generator

The goal of the generator generates a retinal vessel segmentation image that is as similar as possible to the fundus retinal image of the ground truth. In this paper, generator model is designed based on U-net structure. As shown in Fig. 2, the network structure adopts the symmetric encoder-decoder structure to segment the fundus images end-to-end.

Fig. 2.

Fig. 2

Generator architecture

In the encoding part, four convolution blocks are used to extract the abstract features of the input images. Each convolution block is composed by two convolution layers, and each convolution layer using convolution kernels with size 3 × 3. The number of convolution kernels of each convolution module is 32, 64, 128, and 256. Because with the deepening of the network, the size of the feature map is gradually reduced to extract features of higher dimensions. Therefore, the convolution layer of the latter layer needs to increase the number of feature graphs, that is, the number of convolution kernels to extract the image features of the former convolution module more fully. After each convolution block, a maxpooling layer of size 2 × 2 is added. The convolution layer of each convolution block structure is followed by the batch normalization (BN) layer and the RELU (rectified linear unit) layer. The normalization layer makes the sample feature distribution closer and speeds up the training speed, and the nonlinear unit prevents the gradient disappearance of the network. Short connection structure similar to residual network is added between batch normalization layer in a convolution block, as shown in Fig. 3b, assuming that the input of the network is x and the function map to be fitted (i.e., the output) is H(x), then learning through the short connection structure to fit a residual mapping F(x) = H(x) − x. Residual learning believes that fitting a residual mapping is easier than directly fitting an approximate identity mapping [24]. Residual structure makes the output sensitive, which also prevents the problem of gradient disappearance or gradient explosion as the number of network layers increases. We add a short link structure after the BN layer. Because the output features will be more concentrated after the BN layer normalizes the input features, the features of the original input are scattered. Therefore, the input feature cannot be directly added to the convolution with BN layer, which is not conducive to extract features. Short connection structure is added after BN layer to avoid the above problems. The decoding part is similar to the encoding structure and consists of four convolution blocks, except that the down-sampling is replaced by the up-sampling.

Fig. 3.

Fig. 3

a Original residual network. b Our short connection

In the decoder, the receptive field is gradually expanded through maxpooling to extract the abstract information of the input fundus image. After passing through a convolution block of the bottleneck layer, the abstract features can be restored to images with the same resolution as the input images through the up pooling operation in the decoder. The detailed information can be restored by skip connection operation combined with the information of all down-sampling layer and the input information of the up-sampling. The image accuracy can be gradually restored. Adding short connection makes the generation network better guide the generation of samples similar to the real samples, which can improve the reliability and stability of the generated network. Finally, the probability map of segmentation result is derived through sigmoid.

Discriminator

To discriminate actual retinal vessel segmentation image from generated samples, deep convolutional neural network with dense connection block is constructed as the discriminator, as shown in Fig. 4. It consists three convolution block, two dense connection block and two compression layers. Convolution layer all uses 3 × 3 small convolution kernel. Deep neural network structure and small convolution kernel are used to guarantee the receptive field of vision and reduce the parameters of the convolution layer.

Fig. 4.

Fig. 4

Discriminator architecture

First, the fundus image is input into the first convolution block, and the sample features are extracted through multilayer convolution followed by two dense connection blocks. Each of dense connection block is composed of three BN-RELU-Covn composite layers, as shown in Fig. 5. It combines the results of the previous layer and the results of this layer as the input of the next layer. If the output of the ith network layer as is xi, then the output of the ith layer of dense connection block can be expressed as

yi=Hix0xi1 1

Fig. 5.

Fig. 5

Dense connection block

Where Hi(·)represents the nonlinear mapping of the ith layer, and x0, x1, ⋯xi − 1 represents merging the feature map output by the 0⋯i − 1 layer. In this paper, two dense connection block were added after the first convolutional layer, which shortened the distance between the shallow and deep layers of the network and enhanced the transmission of features [17]. The convolution layer of 1 × 1 is followed after the dense blocks; the multilayer feature map is compressed to avoid the network width caused by the dense connections. It can reduce the feature dimension and improve computational efficiency. Finally, the compressed feature data is entered into two multilayer convolution block again, extract abstract sample features, and output judgment for real samples and generated samples through sigmoid.

By introducing dense connection blocks, the feature map of each layer is merged. It cannot only reduce the characteristic parameters but also help the effective communication of shallow features and reduce the loss of the middle layer information. Discriminant networks can distinguish the real sample and generate sample more reliably. Kernel size in the pooling layer of discriminator is all 2 × 2 maxpooling, and the stride is 3. Small pooling core can obtain more detailed information, and maxpooling is better in image task, so it is easier to capture image changes and bring greater local information difference.

Objective Function

In the original GAN, generator G generates an image G(z) from the noise vector z, and discriminator D estimates the probability that the input image is real or generated. In this paper, G maps the input fundus image (instead of noise vector) to the vessel segmentation result. The generated image or the gold truth image is determined through D. Therefore, the loss function of SUD-GAN is defined as:

LGD=Ex,y~pdataxylogDxy+Ex,y~pdataxylog1DxGx 2

In this objective function, x represents the input fundus image, y represents the corresponding gold truth image, logD(x, y) represents the probability that the discriminator judges that y comes from the real sample mapping, log(1 − D(x, G(x))) represents the probability that the discriminator judges that G(x) comes from the generator. In the training phase, the discriminator expects D(x, y) is maximized, and D(x, G(x)) is minimized. On the other hand, the generator model should prevent the discriminator from making a correct judgment by producing an output that is indistinguishable from the real data. The discriminator tries to maximize the objective function, and the generator tries to minimize the objective function. Therefore, the optimization objective of GAN is:

minGmaxDLGD=Ex,y~pdataxylogDxy+Ex,y~pdataxylog1DxGx 3

Since binary cross entropy loss is adopted in this paper, the optimization of discriminator can be expressed as:

minθDLDDxGx0+LDDxy1 4

where θD represents the parameters that need to be optimized. The gradient descent method is used to train the discriminator for K times to make the discriminator reach a certain accuracy required.

The loss of the generator includes the pixel-level loss between the generated probability map of the segmented vessels tree and the gold truth, and the loss against the discriminator. So the optimization of the generator can be expressed as:

minθGLGGxy+λLDxGx1 5

Gradient descent is adopted to train the model, in which λ is used to balance the two kinds of losses and avoid the gradient dispersion problem caused by the adversarial training. In the experiment, 0.1 was selected according to experience.

Experiments

Datasets

We have evaluated our method on two publicly available retinal image vessel segmentation datasets: DRIVE and STARE. The DRIVE dataset consists of 40 color fundus photographs with resolution 565 × 584. The dataset has been divided into training set and test set, each containing 20 images. For training images, a single manual segmentation of the vasculature is available. For test cases, two manual segmentation are available. One is used as the gold truth, and the other can be used to compare computer-generated segmentation with those by an independent human observer. The STARE dataset consists of 20 fundus images with resolution 700 × 605. 10 images for training and the rest 10 images for testing. Each image has pixel-level vessel annotation provided by two experts. The annotations by the first expert are used as ground truth.

Implementation Details

During the training phase, data augmentation methods is adopted first because fewer training images. The online enhancement method is adopted to perform random rotation and random mirror flipping. Then the standard deviation for each image is normalized with z-score; 10% of the augmentation training set is set as the validation set. The label of the first expert is set as ground truth. Model is trained for 20,000 iterations with a mini-batch size of 1, learning rate of 2e−4, Adama optimizer, and a momentum of 0.5.

The machine uses Intel Core i7-8700 CPU with GeForce GTX 1060 GPU. The experiments are implemented based on Anconda+Tensorflow. Training lasts about 10 h.

Evaluation Criteria

By comparing the segmentation results with ground truth, we employed five evaluation criteria, including accuracy (ACC), sensitivity (Se), specificity (Sp), the area under the ROC curve (AU-ROC), and the area under the PR curve (AU-PR).

Acc=TP+TNTP+TN+FP+FN,Se=TPTP+FN,Sp=TNTN+FP

where the true positives (TP) are vessel pixels classified correctly, true negatives (TN) are non-vessel pixels classified correctly, false negatives (FN) are vessel pixels misclassified as non-vessel pixels, and false positives (FP) are non-vessel pixels misclassified as vessel pixels. Sensitivity is used to measure the ability to correctly detect vessel pixels. Specificity is used to measure the ability to recognize non-vessel pixels. Accuracy indicates the proportion of pixels that are correctly segmented in the total number of pixels. ROC curve takes false positive rate FPTN+FP as abscissa and true positive rate TPTP+FN as ordinate. The PR curve takes the precision rate TPTP+FP as the ordinate and the recall rate TPTP+FN as the abscissa. The closer the area under ROC and PR curve is to 1, the better the segmentation effect of the algorithm is.

Experimental Results

Ablation Study

In order to prove the effectiveness of the proposed model, two other networks were designed for comparative experiments. U-GAN: the generator is a classic U-Net network; the discriminator removes the dense connection on the basis of SUD-GAN; SU-GAN: the generator adds the short connection structure on the basis of U-Net network; the discriminator is the same as U-GAN. Three different GAN architectures, including U-GAN, SU-GAN, and SUD-GAN, are applied to DRIVE and STARE datasets. The experimental results are shown in Figs. 6 and 7. The evaluation results are shown in Tables 1 and 3.

Fig. 6.

Fig. 6

ae DRIVE dataset segmentation result comparison

Fig. 7.

Fig. 7

ae STARE dataset segmentation result comparison

Table 1.

Performance comparison of three GAN architectures on DRIVE datasets

Methods DRIVE
ACC Sensitivity Specificity ROC_AUC PR_AUC
U-GAN 0.9514 0.7696 0.9780 0.8909 0.8309
SU-GAN 0.9520 0.7723 0.9773 0.9662 0.8686
SUD-GAN 0.9560 0.8340 0.9820 0.9786 0.8821
Table 3.

Performance comparison with state-of-the-art methods on the DRIVE dataset (best results shown in italic)

Methods DRIVE
ACC Sensitivity Specificity ROC_AUC PR_AUC
Manual 0.9473 0.7746 0.9725 0.9466
Unsupervised segmentation methods
  COSFIRE filters [2] 0.9614 0.7655 0.9704 0.9614
  Contour model [3] 0.9540 0.7420 0.9820 0.8620
  Morphology based [4] 0.9597 0.8375 0.9694
  Matched filtering [5] 0.9404 0.8248 0.9612
Supervised segmentation methods
  CRF [6] 0.7897 0.9684 0.7854
  Decision trees [7] 0.9480 0.7406 0.9807 0.9747
  Multi-level CNN [11] 0.9521 0.7779 0.9780 0.9782
  FCN [13] 0.9576 0.8039 0.9804 0.9821
  Multiscale CNN [12] 0.9632 0.7543 0.9814 0.9754
  BTS-DSN [14] 0.9551 0.7800 0.9806 0.9806
  GAN-CNN [20] 0.9450
Proposed method
  SUD-GAN 0.9560 0.8340 0.9820 0.9786 0.8821

Figure 6 shows segmentation results on DRIVE dataset. We choose an image (Fig. 6a) from DRIVE database. Then we select its corresponding ground truth image (Fig. 6b) which has higher contrast than other three different GAN architectures. From Fig. 6c, although the vessel connectivity is good, U-GAN ignored most low-pixel capillary structures. From Fig. 6d, more thin vessels can been extracted after the process of RU-GAN, but there are still some errors compared with the ground truth. From Fig. 6e, it is shown that the segmentation result based on SUD-GAN is closer to the ground truth, which keeps the blood vessels connected and can segment most of the blood capillaries. SUD-GAN model avoids the disappearance of gradient, the generator model can better fit the real sample, and deep convolution discriminator structure with dense connection block better guide the feature selection of generator model. Therefore, relatively accurate segmentation of retinal capillaries can be obtained. The experiment on STARE dataset also showed the effectiveness of the model proposed; details are shown in the Fig. 7.

In real application, we only care about segmentation in the presence of pathologies. In Figs. 8, and 9, the segmentation results of case images in two databases are shown respectively. It can be seen that our algorithm has been well applied in the segmentation of case images.

Fig. 8.

Fig. 8

ac DRIVE dataset segmentation pathological case results

Fig. 9.

Fig. 9

ac STARE dataset segmentation pathological case results

Figure 10 shows ROC curves and PR curve for STARE and DRIVE datasets. We know that the closer the area under ROC and PR curve is to 1, the better the segmentation effect of the algorithm is. Compared with the other two structures, SUD-GAN has the largest area under the ROC and PR curves, which demonstrates SUD-GAN network has better generalization ability due to its short connection structure and dense connection blocks.

Fig. 10.

Fig. 10

ROC curves and PR curve. a ROC curves for STARE datasets. b PR curve for STARE datasets. c ROC curves for DRIVE datasets. d PR curve for DRIVE datasets

Tables 1 and 2 list the performance comparison of three GAN architectures on DRIVE datasets and STARE datasets. It can be observed that SU-GAN achieved much higher scores than U-GAN in ACC, sensitivity, ROC_AUC, PR_AUC on two datasets, which shows the effectiveness of generator model combined residual connections and shorts connection structure. Moreover, compared with SU-GAN, SUD-GAN also achieved much higher scores in ACC, sensitivity, ROC_AUC, and PR_AUC on two datasets, which demonstrate the effectiveness of deep convolution discriminator with dense connection block. The experimental results show that the performance of the U-GAN is improved by the addition of short connection structure and dense blocks.

Table 2.

Performance comparison of three GAN architectures on STARE datasets

Methods STARE
ACC Sensitivity Specificity ROC_AUC PR_AUC
U-GAN 0.9481 0.7091 0.9759 0.8990 0.7900
SU-GAN 0.9649 0.7696 0.9884 0.9258 0.8577
SUD-GAN 0.9663 0.8334 0.9897 0.9734 0.8718

Comparison with the State-of-the-Art

In Tables 3 and 4, the performance of SUD-GAN is compared with other state-of-the-art methods in terms of ACC, sensitivity, specificity, ROC_AUC, and PR_AUC on DRIVE and STARE datasets. The same training/testing split was used across all the state-of-the-art method comparison.

Table 4.

Performance comparison with state-of-the-art methods on the STARE dataset (best results shown in italic)

Methods STARE
ACC Sensitivity Specificity ROC_AUC PR_AUC
Manual 0.9380 0.8945 0.9373 0.9686
Unsupervised segmentation methods
  COSFIRE filters [2] 0.9563 0.7716 0.9701 0.9563
  Contour model [3] 0.9560 0.7800 0.9780 0.8700
  Morphology based [4] 0.9579 0.8375 0.9694
  Matched filtering [5] 0.9248 0.8874 0.9288
Supervised segmentation methods
  CRF [6] 0.7680 0.9738 0.7644
  Decision trees [7] 0.9534 0.7548 0.9763 0.9768
  Multi-level CNN [11] 0.9674 0.8147 0.9844 0.9885
FCN [13] 0.9694 0.8315 0.9858 0.9905
  Multiscale CNN [12] 0.9632 0.7543 0.9814 0.9751
  BTS-DSN [14] 0.9660 0.8201 0.9828 0.9872
Proposed method
  SUD-GAN 0.9663 0.8334 0.9897 0.9734 0.8718

It can observed from Table 3 that our SUD-GAN achieved the best performance than other state-of-the-art methods in sensitivity, specificity on DRIVE dataset. It shows that SUD-GAN has a high recognition ability to detect tiny blood vessels. Compared with the method mentioned by Jose Ignacio, it is about 10% higher in PR_AUC. We all know that the AUC of precision-recall curve and ROC curve is as close to 1 as possible. It shows great results of retinal vessel segmentation in our method. In addition, compared with the method of applying GAN proposed by [21], our ACC score is also higher than its method. In STARE dataset, SUD_GAN also reached the highest sensitivity and specificity performance compared with other methods, respectively 0.8334 and 0.9897. ACC score of SUD-GAN is also higher than most comparison methods. It is proved that SUD-GAN has higher accuracy and robustness.

Conclusion

In this paper, deep convolutional generative adversarial network combined with short connections and dense blocks is proposed for retinal vessel segmentation, called SUD-GAN. In the generator, the short connections structure is added on the basis of U-Net encoder-decoder network. In the discriminator, the middle convolutional layer of deep convolutional neural network is replaced by dense connection blocks. With the addition of short connection structure of residual network, the problem of network degradation and gradient disappearance can be solved, which makes the training of generator model more stable and produces segmentation probability map that can confuse the discriminator model. On the other hand, the addition of dense connection blocks strengthens the transfer of features, and the shallow details helps the discriminator network distinguish real samples from generated samples; thus, the discriminating ability adversarial network is enhanced. SUD-GAN optimizes the performance of segmentation to some extent. The proposed method does not need to slice the input image, nor does it need additional post-processing. The whole image is taken as the input, and the network can directly fit the input data through the adversarial training between generator and discriminator to produce the result of retinal vascular semantic segmentation. It simplifies the segmentation procedure and prevents the over-fitting. The experiment results show that the SUD-GAN can obtain better results on DRIVE and STARE datasets.

In our model, the proposed loss function adds the hyper-parameter λ to balanced loss, which has certain influence on the effect of retinal vascular segmentation task. The selection of this parameter and the design of loss function are worthy of further study. In future studies, we will focus on exploring more effective loss function to improve retinal vascular segmentation performance.

Electronic Supplementary Material

ESM 1 (12.9KB, docx)

(DOCX 12 kb)

Funding Information

The paper supported by key specialized research and development program of Henan Province (202102210170); Applied research plan of key scientific research projects in Henan colleges and Universities(19A510011).

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Tiejun Yang, Email: tjyanghlyu@126.com.

Tingting Wu, Email: 201892348@stu.haut.edu.cn.

Lei Li, Email: leili@haut.edu.cn.

Chunhua Zhu, Email: zhuchunhua@haut.edu.cn.

References

  • 1.Soomro TA, Afifi AJ, Zheng L, Soomro S, Gao J, Hellwich O, Paul M. Deep learning models for retinal blood vessels segmentation: a review [J] IEEE Access. 2019;7(1):71696–71717. doi: 10.1109/ACCESS.2019.2920616. [DOI] [Google Scholar]
  • 2.Azzopardi G, Strisciuglio N, Vento M, Petkov N. Trainable COSFIRE filters for vessel delineation with application to retinal images [J] Med Image Anal. 2015;19(1):46–57. doi: 10.1016/j.media.2014.08.002. [DOI] [PubMed] [Google Scholar]
  • 3.Zhao Y, Rada L, Chen K, Harding SP, Zheng Y. Automated vessel segmentation using infinite perimeter active contour model with hybrid region information with application to retinal images [J] IEEE Trans Med Imaging. 2015;34(9):1797–1807. doi: 10.1109/TMI.2015.2409024. [DOI] [PubMed] [Google Scholar]
  • 4.Jiang Z, Yepez J, An S, Ko S. Fast, accurate and robust retinal vessel segmentation system [J] Biocybern Biomed Eng. 2017;37:412–421. doi: 10.1016/j.bbe.2017.04.001. [DOI] [Google Scholar]
  • 5.Liang EH. Retinal vascular segmentation based on improved matched filtering [J] Inf Communication. 2018;08:6–9. [Google Scholar]
  • 6.Orlando JI, Prokofyeva E, Blaschko MB. A discriminatively trained fully connected conditional random field model for blood vessel segmentation in fundus images [J] Trans Biomed Eng. 2017;64(1):16–27. doi: 10.1109/TBME.2016.2535311. [DOI] [PubMed] [Google Scholar]
  • 7.Fraz MM, Remagnino P, Hoppe A, Uyyanonvara B, Rudnicka AR, Owen CG, Barman SA. An ensemble classification-based approach applied to retinal blood vessel segmentation [J] IEEE Trans Biomed Eng. 2012;59(9):2538–2548. doi: 10.1109/TBME.2012.2205687. [DOI] [PubMed] [Google Scholar]
  • 8.Shelhamer, E., Long, J., Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv preprint arXiv:1411.4038v3. [DOI] [PubMed]
  • 9.Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Med Image Comp Comp-Assisted Interv—MICCAI. 2015;2015:234–241. [Google Scholar]
  • 10.Badrinarayanan, V., Kendall, A., Cipolla, R. SegNet: a Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. arXiv preprint arXiv: 1511.00561v3 [DOI] [PubMed]
  • 11.Mo J, Zhang L. Multi-level deep supervised networks for retinal vessel segmentation [J] Int J Comput Assist Radiol Surg. 2017;12(12):2181–2193. doi: 10.1007/s11548-017-1619-0. [DOI] [PubMed] [Google Scholar]
  • 12.Hu K, Zhang Z, Niu X, Zhang Y, Cao C, Xiao F, Gao X. Retinal vessel segmentation of color fundus images using multiscale convolutional neural network with an improved cross-entropy loss function. Neurocomputing. 2018;309:179–191. doi: 10.1016/j.neucom.2018.05.011. [DOI] [Google Scholar]
  • 13.Oliveira A, Pereira S, Silva CA. Retinal vessel segmentation based on fully convolutional Neural networks. Expert Syst Appl. 2018;112:229–243. doi: 10.1016/j.eswa.2018.06.034. [DOI] [Google Scholar]
  • 14.Guo S, Wang K, Kang H, Zhang Y, Gao Y, Li T. BTS-DSN: deeply supervised neural network with short connections for retinal vessel segmentation. Int J Med Inform. 2019;126:105–113. doi: 10.1016/j.ijmedinf.2019.03.015. [DOI] [PubMed] [Google Scholar]
  • 15.Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets [C], International Conference on Neural Information Processing Systems. MIT Press, 2014:2672-2680.
  • 16.Mirza M, Osindero S. Conditional generative adversarial nets [J]. Comput Therm Sci, 2014: 2672-2680.
  • 17.Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. Comp Sci, 2015.
  • 18.Moeskops P, Veta M, Lafarge MW, et al. Adversarial training and dilated convolutions for brain MRI segmentation. Deep Learn Med Image Anal Multimodal Learn Clin Decis Support. 2017;2017:56–64. doi: 10.1007/978-3-319-67558-9_7. [DOI] [Google Scholar]
  • 19.Rezaei M, Harmuth K, Gierke W, et al. A conditional adversarial network for semantic segmentation of brain tumor. Brain Lesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. 2018;2018:241–253. [Google Scholar]
  • 20.Shankaranarayana SM, Ram K, Mitra K, et al. Joint optic disc and cup segmentation using fully convolutional and adversarial networks. Fetal Infant Ophthalmic Med Image Anal. 2017;2017:168–176. doi: 10.1007/978-3-319-67561-9_19. [DOI] [Google Scholar]
  • 21.Lahiri A, Ayush K, Biswas PK, et al. Generative adversarial learning for reducing manual annotation in semantic segmentation on large scale miscroscopy images: automated vessel segmentation in retinal fundus image as test case. 2017 IEEE Conf Comp Vision Pattern Recog Workshops (CVPRW) Workshops (CVPRW) 2017;2017:794–800. doi: 10.1109/CVPRW.2017.110. [DOI] [Google Scholar]
  • 22.He K, Zhang X, Ren S, Sun J Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
  • 23.Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017). Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 24.Chenyue W, Benshun Y, Yungang Z, Song H, Yu F. Image segmentation of retinal vessels based on improved convolutional neural network. Acta Opt Sin. 2018;38(11):133–139. doi: 10.3788/AOS201838.1111004. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ESM 1 (12.9KB, docx)

(DOCX 12 kb)


Articles from Journal of Digital Imaging are provided here courtesy of Springer

RESOURCES