Abstract
Detection, segmentation, and quantification of microvascular structures are the main steps towards studying microvascular remodeling. Combined with appropriate staining, confocal microscopy imaging enables exploration of the full 3D anatomical characteristics of microvascular systems. Segmentation of confocal microscopy images is a challenging task due to complexity of anatomical structures, staining and imaging issues, and lack of annotated training data. In this paper, we propose a deep learning system for robust segmentation of cranial vasculature of mice in confocal microscopy images. The proposed system is an ensemble of two deep-learning cascades consisting of two coarse-to-fine subnetworks with skip connections in between. One cascade aims to improve sensitivity, while the other aims to improve precision of the segmentation results. Our experiments on mice cranial vasculature showed promising results achieving segmentation accuracy of 92.02% and dice score of 81.45% despite being trained on very limited confocal microscopy data.
Index Terms—: vessel segmentation, confocal microscopy images, deep learning, semantic segmentation
I. Introduction
Serious intracranial pathologic conditions, such as dural sinus thrombosis, dural arteriovenous fistulas, and aneurysms, involve the vessels, not of the brain itself, but its outer fibrous membrane, the dura mater. These conditions, resulting in significant neurologic morbidity and reduced cognitive abilities [1] [2], have been associated with vascular abnormalities. Meningeal vascular networks contribute to brain metabolic clearance and venous blood outflow. They constantly adjust to tissue metabolic demands through structural and functional remodeling. Defective vascular remodeling under certain pathological conditions leads to tissue damage and limits its repair. Acute damage to meningeal vasculature caused by traumatic brain injury, resulting in disruption of meningeal vascular integrity and peripheral immune response can lead to life-threatening situations [2]. While impaired vascular integrity, capillary rarefaction, and aberrant angio-architecture can also develop under chronic conditions, for example, associated with sex hormone deprivation [3].
Detection, segmentation, and quantification of microvascular structures are the main steps towards studying microvascular remodeling. Confocal microscopy [4] allows 3D image capture using optical sectioning or depth discrimination by blocking light emitted from out-of-focus planes. Each single focus image captures the details of the specimen regions that lie close to its focal plane, while the remaining regions are imaged with poor contrast. Combined with appropriate staining, confocal microscopy imaging enables exploration of the full 3D anatomical characteristics of microvascular systems.
While segmentation of vessels on traditional angiogram-based imagery or retinal imagery has advanced considerably [5]–[12], segmentation of microvasculature in confocal microscopy images remains to be a challenging task. The main challenges are due to complexities of anatomical structures such as irregular shape and varying scale of the vessels; staining issues such as non-homogeneous staining within the lumen and excessive stain in the background due to leakage; imaging issues such as low contrast or background clutter due to out of focus structures. Figure 1 that shows sample slices and xy-plane projections for two sample confocal microscopy stacks illustrates some of these challenges. In addition to aforementioned challenges, unlike angiogram-based or retinal vessel imagery, manually annotated training data needed for supervised machine learning approaches is severely lacking for confocal microscopy images of microvasculature. This is due to the size of data (hundreds of slices per stack), complexity of the vascular structures, and difficulty of 3D annotation.
Fig. 1:

Sample blood microvascular structures imaged using confocal microscopy. The first three columns show sample single focus slices. The last column shows fused multi-focus image.
In this paper, we propose a segmentation system for the confocal microscopy image segmentation, which is a fusion of two deep-learning cascades. The two cascades focus on improving sensitivity (recall) and precision of the segmentation results respectively. To compensate for limited confocal microscopy training data, the proposed network is first trained with an epifluorescence microscopy image dataset, then fine-tuned with a small set of fused confocal microscopy images of mice cranial microvasculature.
II. Ensemble of Deep Learning Cascades for Vessel Segmentation
A. Image preprocessing
The proposed preprocessing scheme consists of two steps: (1) multi-focus image fusion, and (2) contrast enhancement. Each confocal microscopy image stack consists of hundreds of single focus slices capturing only a small portion of the microvascular network. We use the multi-focus image fusion approach described in [13] to produce a single multi-focus image out of hundred of single-focus slices within a confocal microscopy stack. The resulting multi-focus image (as the original set of slices) typically suffers from staining issues (i.e. non-homogeneous staining within the lumen and excessive stain in the background due to leakage), imaging issues (i.e. low contrast), and background clutter due to out of focus structures. To improve image contrast, we apply adaptive histogram equalization [14] to the fused multi-focus image. Sample fused, multi-focus, confocal microscopy images before and after adaptive histogram equalization are shown in Figure 2.
Fig. 2:

Sample confocal microscopy fused multi-focus images before (first column) and after (second column) adaptive histogram equalization.
B. Network Architecture
For robust and precise segmentation of microvascular structures on confocal microscopy images we have developed two deep learning network cascades: (1) deep binary attention cascade (DBAC), and (2) deep distance map attention cascade (DDMAC). The DBAC network is designed for improving sensitivity (recall) scores, while the DDMAC network is designed for improving precision scores of microvascular image segmentation results. Each cascade generates two outputs: an intermediate output from the first subnetwork acting as attention map for the second network and a final refined segmentation mask from the second subnetwork. The two final predictions from the DBAC and DDMAC networks are fused together to produce one final segmentation mask.
1). Deep binary attention cascade (DBAC):
The cascade network involves two subnetworks: a classical semantic segmentation network, UNet [15], followed by a deeper UNet++ [16] designed to exploit the convolutional features at different scales. The architecture of the proposed DBAC is shown in Figure 3. The convolutional layers have channels 32, 64, 128, 256, and 512 respectively from shallow to deeper encoder layers in both U-Net and U-Net++. The decoder layers have symmetric number of channels. Each convolutional layer is replaced with a residual block [17].
Fig. 3:

Architecture of the proposed deep binary attention cascade (DBAC).
The first subnetwork takes the input image and learns to predict a soft attention map where higher values indicate likelihood of vessel presence. The second subnetwork takes the input image and is guided by the soft attention map to performs coarse-to-fine refinement of the vessel segmentation mask from the first subnetwork.
The proposed attention module connecting the two subnetworks is defined as:
| (1) |
| (2) |
where W is the convolutional feature map from the first subnetwork, P is the 1-channel attention map prediction from the first subnetwork trained with a binary mask, where the positive values represent foreground and the negative values represent background. I is the input image, and M is the input to the second subnetwork.
We further extend this cascade by adding two shortcut connections that feed-forward the first subnetwork at different levels of the decoder to the corresponding levels of the decoder in the second subnetwork. The forwarded feature map and target feature map are concatenated together to prevent gradient vanishing as well as to directly forward feature maps from the first network to the second network.
The proposed network is trained on 60 epifluorescence microscopy images with resolution of 1360 × 1036 pixels [7] [18], using binary cross-entropy loss [19] function and stochastic gradient descent (SGD) optimizer for the first and second subnetworks. Data augmentation is applied during the training and fine-tuning processes including random cropping (crop size is 448 × 448), rotation, flipping, scale, brightness, and contrast adjustment.
2). Deep distance map attention cascade (DDMAC):
As can be observed from Figure 1, cranial vasculature’s diameter varies largely within even a small field of view. To better capture the varying size and shape of the vessels and to be able to detect thin vessels, we propose deep distance map attention cascade (DDMAC) shown in Figure 4. Dilated probability regression module (first subnetwork) aims to improve the precision score of the segmentation by applying multi-channels attention (the orange map in Figure 4) to the input image and then concatenating it with the dilated probability map Q (the green map in Figure 4).
Fig. 4:

Architecture of the proposed deep distance map attention cascade (DDMAC).
Convolutional feature map Q is a 1-channel prediction from the first subnetwork of DDMAC and is trained with dilated probability map D, which is defined as:
| (3) |
| (4) |
| (5) |
where V represents the binary mask (vessel is positive, background is negative) used for training the proposed system. k is the upper bound of the pixel distance in D, which is set to be 20 in this study. The dilated probability regression module is defined as:
| (6) |
| (7) |
| (8) |
where Wijk is the convolutional feature map with size i×j×k representing height, width, and channels respectively. In this study, i, j are the same as the height and width of the input image, and k = 3. The output M′ is the input to the second subnetwork. The second subnetwork takes M′ and performs coarse-to-fine refinement of the vessel segmentation mask Q from the first subnetwork.
The proposed DDMAC network is trained on 60 epifluorescence microscopy images with resolution 1360×1036 [7] [18] using the same data augmentation strategies as the described DBAC network. Mean square error (MSE) [19] and binary cross-entropy loss functions are used to train the first and the second subnetworks respectively. SGD is used as training optimizer.
C. Decision fusion (late classifier fusion)
For robust vessel segmentation performance, outputs of the two proposed networks DBAC and DDMAC are fused. Two decision fusion (late classifier fusion) mechanisms, average and maximum are considered:
| (9) |
| (10) |
where L and N denote the 1-channel segmentation probability maps generated by the proposed DBAC and DDMAC networks respectively, and ij refers to pixel coordinates. Binary segmentation masks are produced by performing hysteresis thresholding [20] on the fused probability maps S. Mathematical morphology operations are applied to the predicted binary mask to fill small gaps and to remove small fragments.
III. Experimental results
A. Data collection
In this study, 7–8 weeks old C57BL/6J female mice were used to generate 80 3D confocal microscopy image stacks. Following sacrifice, the entire body of the mouse was perfused through the heart with Kreb’s/albumin solution containing AlexaFluor 594-conjugated soybean agglutinin (SBA) lectin to stain and identify blood microvessels [21]. Skull caps with dura mater were isolated and fixed in 10% formaldehyde. Images were acquired at 20x magnification on confocal FluoView FV1000 inverted microscope system (Olympus). The 160–270 mm thick Z-stacks were acquired with step size 1 mm. All animal experimental procedures were approved by the University of Missouri Institutional Animal Care and Use Committee.
The single focus Z-stacks microscopy images were fused using the method presented in [13] to produce eighty 512×512 multi-focus images. Out of those multi-focus images, 62 images were selected for training and 18 images were selected for testing. Silver truth (ST) segmentation masks were generated for all the training & test images by a combination of computer-generated segmentation masks and manual annotation. More precise ground truth (GT) segmentation masks were generated for the 18 test images by further inspection and manual correction by a domain expert.
B. Evaluation Metrics
Segmentation evaluations were carried out by sensitivity (recall), precision, specificity, accuracy, and dice score measures as defined below:
| (11) |
| (12) |
| (13) |
| (14) |
| (15) |
where TP, TN, FP, FN represent number of true positive, true negative, false positive, and FN false negative pixels respectively.
C. Network inference on 2D multi-focus images
Because of limited amount of confocal microscopy training data, the proposed networks and their subnetworks were first trained with 60 epifluorescence microscopy images and their ground truth segmentation masks described in [7], [18]. Then the same networks were fine-tuned with 62 multi-focus confocal microscopy images and their associated silver truth masks described above. Both set of networks (with and without fine-tuning) were tested on the 18 multi-focus confocal microscopy test images and evaluated using corresponding ground truth segmentation masks.
1). Single network segmentation performances:
First, we compare segmentation performances of the proposed DBAC and DDMAC networks with state-of-the-art segmentation network UNet++ [16] and its deeper version with number of trainable parameters comparable to the proposed networks. The total number of trainable parameters of these networks are listed in Table I.
TABLE I:
Number of trainable parameters of the compared networks.
| Methods | Number of trainable parameters |
|---|---|
| U-Net++ [16] | 71,106,794 |
| deeper U-Net++ | 80,171,050 |
| proposed DBAC | 80,190,922 |
| proposed DDMAC | 80,210,861 |
The 1-channel probability maps outputted from the proposed DBAC and DDMAC networks are binarized using hysteresis thresholding [20] (lower bound=0.45, higher bound=0.95). Segmentation performances of the compared networks with and without fine-tuning are listed in Table II. In order to better preserve the very thin vessels, upsampled images (size 1024×1024) are inputted to the proposed DBAC network. Original sized images (512×512) are used with the proposed DDMAC network since regression to distance map is more robust to scale variations.
TABLE II:
Single network segmentation performances.
| Methods | Image size | Sensitivity % | Precision % | Specificity % | Accuracy % | Dice % | |
|---|---|---|---|---|---|---|---|
| without fine-tuning | U-Net++ [16] | 512 × 512 | 80.87 | 74.23 | 92.67 | 89.86 | 75.88 |
| deeper U-Net++ | 512 × 512 | 84.81 | 71.72 | 91.03 | 89.44 | 75.91 | |
| proposed DBAC | 512 × 512 | 87.73 | 66.78 | 88.39 | 88.00 | 74.22 | |
| proposed DBAC | 1024 × 1024 | 82.14 | 75.02 | 92.08 | 89.80 | 76.86 | |
| proposed DDMAC | 512 × 512 | 81.90 | 75.75 | 93.05 | 90.38 | 77.23 | |
| with fine-tuning | U-Net++ [16] | 512 × 512 | 77.23 | 79.87 | 95.00 | 90.92 | 77.40 |
| deeper U-Net++ | 512 × 512 | 83.04 | 75.76 | 93.05 | 90.61 | 77.81 | |
| proposed DBAC | 512 × 512 | 84.41 | 76.02 | 92.72 | 90.79 | 78.66 | |
| proposed DBAC | 1024 × 1024 | 81.86 | 78.47 | 93.84 | 91.08 | 78.80 | |
| proposed DDMAC | 512 × 512 | 79.59 | 80.12 | 94.76 | 91.50 | 79.16 |
As we can see in Table II, fine-tuning improves the dice score of all the deep learning networks by at least 1.4%. The proposed DBAC and DDMAC networks achieve the best sensitivity (recall) and precision scores respectively with and without fine-tuning. The distance based DDMAC network results in the best dice scores with and without fine-tuning.
2). Ensemble network segmentation performances:
Table III summarizes segmentation performances of different configurations of the proposed ensemble network consisting of DBAC and DDMAC networks. The table explores two different input image sizes and two different decision fusion (late classifier fusion) mechanisms, average and maximum. All configurations of the proposed ensemble network outperform the best single network. Upsampling input images increases the precision scores. Ensemble network using average fusion and upscaled inputs for the proposed DBAC network improves the dice score of the best performing single network by 2.29% reaching the best dice score of 81.45%.
TABLE III:
Ensemble network segmentation performances.
| DBAC with input size | DDMAC with input size | Fusion Mechanism | Sensitivity % | Precision % | Specificity % | Accuracy % | Dice % |
|---|---|---|---|---|---|---|---|
| 512 × 512 | 512 × 512 | average | 81.95 | 79.87 | 94.41 | 91.67 | 80.90 |
| 1024 × 1024 | 512 × 512 | average | 80.60 | 82.32 | 95.26 | 92.02 | 81.45 |
Figure 5 shows single network and ensemble network segmentation results for two sample multi-focus images. Red and blue pixels represent false-positive and false-negative predictions respectively. False detections (false-positives) are typically caused by adverse effects of adaptive histogram equalization. Missed detections (false-negatives) are caused by thin vessels and low contrast between microvasculature and background. Fusing the outputs of the DBAC and DDMAC networks lead to recovery of some missed vessels and removal of some spurious detections (Figure 5f).
Fig. 5:

Intermediate and final segmentation results for two sample multi-focus input images. (a) input images, (b-c) ground truth maps for the binary attention module in DBAC and dilated probability module in DDMAC, (d-f) predicted segmentation masks versus ground truth for DBAC, DDMAC, and ensemble networks. The red, blue, and white regions represent false-positive, false-negative, and true positive predictions respectively.
D. Network inference on 3D image stacks
The 3D confocal microscopy image stacks in this study contain hundreds of slices. Each single slice captures the details of the specimen regions that lie close to its focal plane, while the remaining regions are imaged with poor contrast. Segmentation and visualization of the 3D image stack allow comprehensive visualization and quantification of the anatomical structure of the 3D microvasculature. 3D deep learning networks could be employed to capture and learn the full 3D morphological and anatomical characteristics of the microvasculature. However, demanding computational requirements and more importantly, the need for 3D annotation for training limit their usability. In this section, we utilize the proposed 2D image segmentation network to independently segment the single focus Z-stacks of a 3D confocal microscopy volume.
Figure 6 shows 2D multi-focus images, corresponding 2D segmentation masks, and 3D segmentation masks for two sample confocal microscopy volumes. The 2D multi-focus images were obtained using the multi-focus fusion method described in [13]. 3D segmentation masks were obtained by applying the proposed 2D segmentation network to the individual single focus images forming the confocal microscopy volume. Each input slice has been preprocessed with linear contrast enhancement. Visualization of the 3D segmentation results were generated using the Chimera software [22]. Figure 6c and 6d show promising 3D segmentation results. In the first row of Figure 6, missed detections in 2D (Figure 6b, blue pixels within green rectangles) caused by low contrast are recovered in 3D (Figure 6c and 6d) thanks to linear contrast enhancement. In the second row of Figure 6, missed detections in 3D (Figure 6c, green rectangle) caused by lack of semantic context in each slice can be corrected by the segmentation result of the corresponding 2D multi-focus image (second row in Figure 6b).
Fig. 6:

Sample outputs from the proposed system. (a) multi-focus confocal microscopy image enhanced with adaptive histogram equalization; (b) predicted segmentation mask versus ground truth where the red, blue, and white regions represent false-positive, false-negative, and true positive predictions respectively. (c-d) 3D segmentation masks obtained by applying the proposed 2D segmentation network to the individual single focus images forming the confocal microscopy volume. Visualization of the 3D segmentation results were generated using the Chimera software [22]. Color fades with increasing depth.
IV. Conclusion
In this paper, we presented an ensemble of deep learning cascades for robust segmentation of blood vessels in confocal microscopy images. The proposed ensemble is composed of two complementary deep-learning cascades aiming to improve sensitivity and precision of the segmentation results. The proposed cascades first learn to predict two soft attention maps, one based on binary pixel classification, the other based on regression to a distance map. The attention maps guide the networks to predict an accurate vessel segmentation mask. Experiments demonstrated promising results towards segmention of microvasculatures in both 2D and 3D datasets. Segmentation is the key first step towards objective, and quantitative analysis of microvascular systems. The proposed segmentation system will be used to study microvascular remodeling.
Acknowledgement
This work is partially supported by awards from U.S.NIH National Institute of Neurological Disorders and Stroke R01NS110915. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the U. S. Government or agency thereof.
References
- [1].Gupta A and Periakaruppan A, “Intracranial dural arteriovenous fistulas: a review,” The Indian Journal of Radiology & Imaging, vol. 19, no. 1, p. 43, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Aboian MS, Daniels DJ, Rammos SK, Pozzati E, and Lanzino G, “The putative role of the venous system in the genesis of vascular malformations,” Neurosurgical Focus, vol. 27, no. 5, p. E9, 2009. [DOI] [PubMed] [Google Scholar]
- [3].Glinskii OV, Huxley VH, Glinskii VV, Rubin LJ, and Glinsky VV, “Pulsed estrogen therapy prevents post-ovx porcine dura mater microvascular network weakening via a pdgf-bb-dependent mechanism,” PloS One, vol. 8, no. 12, p. e82900, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Wilson T, “Resolution and optical sectioning in the confocal microscope,” Journal of Microscopy, vol. 244, no. 2, pp. 113–121, 2011. [DOI] [PubMed] [Google Scholar]
- [5].Moccia S, De Momi E, El Hadji S, and Mattos LS, “Blood vessel segmentation algorithms—review of methods, datasets and evaluation metrics,” Computer Methods and Programs in Biomedicine, vol. 158, pp. 71–91, 2018. [DOI] [PubMed] [Google Scholar]
- [6].Zhao F, Chen Y, Hou Y, and He X, “Segmentation of blood vessels using rule-based and machine-learning-based methods: a review,” Multimedia Systems, vol. 25, no. 2, pp. 109–118, 2019. [Google Scholar]
- [7].Kassim YM, Glinskii OV, Glinsky VV, Huxley VH, Guidoboni G, and Palaniappan K, “Deep u-net regression and hand-crafted feature fusion for accurate blood vessel segmentation,” in IEEE Intl. Conf. Image Processing (ICIP), 2019, pp. 1445–1449. [Google Scholar]
- [8].Palaniappan K, Bunyak F, and Chaurasia SS, “Image analysis for ophthalmology: Segmentation and quantification of retinal vascular systems,” in Ocular Fluid Dynamics. Springer, 2019, pp. 543–580. [Google Scholar]
- [9].Mittal K and Rajam VMA, “Computerized retinal image analysis-a survey,” Multimedia Tools and Applications, vol. 79, pp. 22 389–22 421, 2020. [Google Scholar]
- [10].Abdulsahib AA, Mahmoud MA, Mohammed MA, Rasheed HH, Mostafa SA, and Maashi MS, “Comprehensive review of retinal blood vessel segmentation and classification techniques: Intelligent solutions for green computing in medical images, current challenges, open issues, and knowledge gaps in fundus medical images,” Network Modeling Analysis in Health Informatics and Bioinformatics, vol. 10, no. 1, pp. 1–32, 2021. [Google Scholar]
- [11].Kassim YM, Maude RJ, and Palaniappan K, “Sensitivity of cross-trained deep cnns for retinal vessel extraction,” in Annual Intl. Conf. IEEE Engineering in Medicine and Biology Society (EMBC), 2018, pp. 2736–2739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Kassim YM and Palaniappan K, “Extracting retinal vascular networks using deep learning architecture,” in IEEE Intel. Conf. Bioinformatics and Biomedicine (BIBM), 2017, pp. 1170–1174. [Google Scholar]
- [13].Shuvo MMH, Kassim YM, Bunyak F, Glinskii OV, Xie L, Glinsky VV, Huxley VH, Thakkar MM, and Palaniappan K, “Multi-focus image fusion for confocal microscopy using u-net regression map,” in IEEE Intl. Conf. Pattern Recognition (ICPR), 2021, pp. 4317–4323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Pizer SM, Amburn EP, Austin JD, Cromartie R, Geselowitz A, Greer T, ter Haar Romeny B, Zimmerman JB, and Zuiderveld K, “Adaptive histogram equalization and its variations,” Computer Vision, Graphics, and Image Processing, vol. 39, no. 3, pp. 355–368, 1987. [Google Scholar]
- [15].Ronneberger O, Fischer P, and Brox T, “U-net: Convolutional networks for biomedical image segmentation,” in Intl. Conf. Medical Image Computing and Computer-assisted Intervention Springer, 2015, pp. 234–241. [Google Scholar]
- [16].Zhou Z, Siddiquee MMR, Tajbakhsh N, and Liang J, “Unet++: A nested u-net architecture for medical image segmentation,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, 2018, pp. 3–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].He K, Zhang X, Ren S, and Sun J, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 770–778. [Google Scholar]
- [18].Kassim YM, Prasath VS, Pelapur R, Glinskii OV, Maude RJ, Glinsky VV, Huxley VH, and Palaniappan K, “Random forests for dura mater microvasculature segmentation using epifluorescence images,” in Annual Intl. Conf. IEEE Engineering in Medicine and Biology Society (EMBC), 2016, pp. 2901–2904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Murphy KP, Machine learning: a probabilistic perspective. MIT Press, 2012. [Google Scholar]
- [20].Mayergoyz ID, Mathematical models of hysteresis and their applications. Academic Press, 2003. [Google Scholar]
- [21].Glinskii OV, Huxley VH, Xie L, Bunyak F, Palaniappan K, and Glinsky VV, “Complex non-sinus-associated pachymeningeal lymphatic structures: interrelationship with blood microvasculature,” Frontiers in Physiology, vol. 10, p. 1364, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, and Ferrin TE, “UCSF Chimera—a visualization system for exploratory research and analysis,” J. Computational Chemistry, vol. 25, no. 13, pp. 1605–1612, 2004. [DOI] [PubMed] [Google Scholar]
