Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Sep 28.
Published in final edited form as: Proc SPIE Int Soc Opt Eng. 2023 Sep 28;12655:1265506. doi: 10.1117/12.2676893

Cascaded convolutional networks for unsupervised brain tissue segmentation and bias field estimation

Hongming Li a,b, Yong Fan a,b
PMCID: PMC10795010  NIHMSID: NIHMS1954001  PMID: 38250086

Abstract

Brain tissue segmentation from MR images is a critical step for quantifying the brain morphology in neuroimaging studies. While deep learning (DL) based brain tissue segmentation methods have achieved promising performance, most of them are built upon supervised learning and therefore their performance is bounded by the training data used and limited by the small size of datasets with manual segmentation labels. To leverage the large amount of unlabeled brain imaging data, we develop an unsupervised DL model for joint brain tissue segmentation and bias field estimation using cascaded convolutional networks. The proposed DL model consists of multiple cascaded bias field estimation modules and one segmentation module. The bias field estimation modules are applied to the input image for estimating the bias field and generating a bias-free image recursively, and the bias field corrected image is then fed into the segmentation module to obtain the brain tissue segmentation result. A Gaussian mixture model is adopted to characterize the bias-free image with tissue-specific intensity statistics and the model fitting error is adopted as the loss function to guide the optimization of the model parameters progressively in an unsupervised setting. We have evaluated the proposed method on the HCP-Aging and HCP-Development datasets. Quantitative results have demonstrated that our unsupervised DL model could obtain competitive bias field correction and segmentation performance, compared with state-of-the-art bias field correction methods and unsupervised segmentation methods.

Keywords: Brain tissue segmentation, bias field estimation, unsupervised learning, cascaded convolutional networks

1. INTRODUCTION

Segmentation of brain MRI scans is often the first and the most critical step for quantifying the brain morphology and for providing anatomical reference information for other types of imaging data in neuroimaging studies [1]. To segment brain MRI scans, FSL, SPM, and FreeSurfer are among the most popular tools [24]. Since it is a computationally demanding task to obtain the quantitative brain measures, deep learning (DL) based image segmentation methods have been developed [5, 6]. These methods could achieve promising performance in terms of speed, with accuracy comparable to or better than that of the popular conventional algorithms.

Most of the DL methods for brain tissue segmentation are built upon supervised learning [58]. These methods are built upon fully convolutional networks (FCNs) to learn a mapping from structure brain MRI scans to their corresponding brain tissue segmentation maps. Particularly, a supervised DL brain tissue segmentation has been built by learning from training data, i.e., pairs of brain MRI scans and their brain tissue segmentation results generated by ANTs [7], and such a supervised DL brain tissue segmentation model can also be built by learning from training data generated by multiple different brain segmentation tools, including FSL and SPM, so that the trained model is not biased to a specific brain segmentation tool used to generate the training data [8]. Although promising performance has been achieved by these methods in terms of computational efficiency, their accuracy is bounded by the training data used to train their brain tissue segmentation models. Furthermore, the potential of supervised DL methods in medical image analysis tasks is often limited by relatively small medical imaging datasets, compared with large-scale datasets in computer vision studies, e.g., millions of images in ImageNet. Although data augmentation helps increase the size of training examples and reduce overfitting by introducing random variations to the original data [9], unsupervised DL would be more effective to build brain tissue segmentation models since an unlimited number of training samples are available in theory and most state-of-the-art brain tissue segmentation methods are built upon unsupervised learning, including FSL, SPM, and FreeSurfer.

On the other hand, most of the DL methods for brain tissue segmentation are applied to bias field corrected brain images, generated by removing the bias field from the original scans. N4ITK [10] is the one of the most popular tools for the bias field removal, and the quality of the bias-free image generated is largely determined by the convergence setting, i.e., number of iterations, which may be computationally expensive for image with high spatial resolution. Several DL based methods have been developed to accelerate the bias field correction [1113]. Particularly, an iterative bias field correction and image segmentation framework is proposed in [13], where a preliminary image segmentation is first generated and subsequently used to estimate the bias field at each iteration step. An end-to-end learning framework is proposed to generate bias-free image and bias field by integrating segmentation loss, adversarial loss, and reconstruction loss to optimize the DL model [11]. A cycle-GAN [14] based method is proposed to directly learn a mapping from the raw image to a bias field corrected image in [12], where the bias field corrected image used for training is generated using N4ITK. All these methods are built upon supervised learning, which require certain supervision information for the model training, i.e., segmentation maps [11, 13] and bias-free images [12].

To overcome limitations of the existing brain tissue segmentation and bias field correction techniques, we develop an unsupervised DL model for joint brain tissue segmentation and bias field estimation using cascaded convolutional networks. The proposed model consists of multiple cascaded bias field estimation modules and one segmentation module. The bias field estimation modules are applied to the input image to estimate the bias field and generate a bias field corrected image recursively, and the bias field corrected image is fed into the segmentation module to obtain the brain tissue segmentation result. A Gaussian mixture model is adopted to characterize the bias-free image with tissue-specific intensity statistics and the model fitting error is adopted as the loss function to optimize the model parameters progressively in an unsupervised setting. We have evaluated the proposed method using the HCP-Aging and HCP-Development datasets, and experimental results have demonstrated that our method could obtain competitive performance for bias field correction and brain tissue segmentation, compared with state-of-the-art bias field correction and unsupervised segmentation methods.

2. METHODS

A schematic illustration of the proposed method for unsupervised brain tissue segmentation and bias field estimation using cascaded convolutional networks is shown in Fig. 1, which consists of three cascaded bias field estimation modules (DNN-b) and one segmentation module (DNN-S). The bias field estimation modules are adopted to estimate the bias field recursively, with the bias field corrected image output from the previous module as input (the raw image is fed into the first module as input). The bias field corrected image output by the last DNN-b module is fed into the segmentation module to obtain the brain tissue segmentation.

Figure 1.

Figure 1.

Schematic illustration of the proposed cascaded convolutional networks for unsupervised brain tissue segmentation and bias field estimation, consisting of N=3 bias field estimation modules (DNN-b) and 1 segmentation module (DNN-S). DNN: deep neural network, I: input image (raw or bias field corrected), B: bias field, S: brain tissue segmentation, u and σ: intensity statistics for brain tissues (GM, WM, CSF), E: encoder path, D: decoder path, Conv: convolutional layer, F: bottle-neck feature maps, GM: gray matter, WM: white matter, and CSF: cerebrospinal fluid.

2.1. Bias field estimation module

Assuming that the bias field B is a multiplicative and spatially smooth map, and that the intensity distributions of brain tissues (GM: gray matter, WM: white matter, and CSF: cerebrospinal fluid) in the bias-free image can be approximated using Gaussian mixture model, the joint probability of the formation of the raw image I can be formulated as

p(I)=v=1Vj=1Jγj2πBvσjexp((IvBvμj)22(Bvσj)2), (1)

where Iv is the image intensity for voxel v (v = 1,2, … , V), Bv is the bias field value for voxel v, μj and σj are the mean and standard deviation of the intensity distribution for tissue j (j = 1,2, … , J, J = 3 for GM, WM, and CSF) in the bias-free image, and γj is the prior probability for tissue j. The multiplication values Bvμj and Bvσj can be regarded as the local intensity statistics for tissue j located at voxel v [15] in the raw image. To approximate the image I, the bias field B and tissue intensity statistics μj and σj should be optimized to maximize p(I). It is worth noting that the bias field B should be spatially smooth and contain no anatomical information. Instead of using Expectation Maximization (EM) algorithm for the optimization, we adopt a deep neural network (bias field estimation module) to estimate the bias field and tissue intensity statistics directly.

The network architecture of the bias field estimation module is illustrated in Figure 1 (top right). An encoder path consists of five convolutional layers with 8 to 128 filters followed by a max pooling layer, is adopted to learn semantic feature maps F from the input image I, which are followed by two network branches to estimate the image intensity statistics (μj and σj) for different brain tissues and the bias field B respectively. The branch for intensity statistic estimation consists of two convolutional layers with 64 and 32 filters, two output (convolutional) layers with 3 filters followed by global average pooling to estimate the mean and standard deviation of intensities for GM, WM and CSF respectively. The branch for bias field estimation consists of one convolutional layer with 16 filters and one output (convolutional) layer with 1 filter to estimate the bias field at a coarse spatial resolution, which is then spatially upsampled using cubic spline interpolation to obtain spatially smooth bias field at the original resolution. Instance normalization [16] and Leaky ReLU [17] activation function are used for all the convolutional layers, while Sigmoid function and exponential function are used for the output layer in the branch for intensity statistic estimation and the branch for bias field estimation, respectively. The kernel size in all layers is set to 3×3×3. An unsupervised loss function is adopted to optimize the parameters in the module, which measures the negative log of joint brain tissue probability values of individual voxels computed based on the estimated bias field and tissue-specific intensity statistics as

Lbc(I)=v=1Vlogj=1Jγj2πBvσjexp((IvBvμj)22(Bvσj)2). (2)

Given a bias field B, the bias field corrected image can be obtained by Ibc = I/B. As shown in Figure 1, three cascaded bias field estimation modules are adopted to obtain the bias-field corrected brain image recursively. Specifically, the three bias field estimation modules (i = 1,2,3) share the same network architecture and are trained recursively with the bias field corrected image output by its previous module as input (the raw image is fed into the first module as input). The bias field corrected image output by the last module is regarded as a bias-free image Ibf and fed into the segmentation module to obtain the brain tissue segmentation.

2.2. Brain tissue segmentation module

The backbone of the brain tissue segmentation module (Figure 1, bottom right) is U-Net with an Encoder-Decoder architecture [18]. The bias-free image Ibf from the previous module is adopted as the input, and the output consists of three segmentation maps Sj (j = 1,2,3) for GM, WM, and CSF respectively. Particularly, the encoder path consists of five convolutional layers with 8 to 128 filters, followed by a max pooling layer (for the first 4 convolutional layers). The decoder path consists of four deconvolutional layers with 128, 64, 32, and 16 filters and a stride of 2 for upsampling, each of which is followed by one additional convolutional layer with 64, 32, 16, and 8 filters and a stride of 1. One output (convolutional) layer with 3 filters is used to predict the segmentation map Sj. Instance normalization [16] is used for all convolutional layers. Leaky ReLU [17] activation function is used for all the convolutional and deconvolutional layers, and SoftMax function is used for the output layer. The kernel size in all layers is set to 3×3×3. The tissue-specific intensity statistics are computed based on the segmentation maps Sj and bias-free image Ibf as μj=v=1Vsj,vIvbfv=1Vsj,v and σj2=v=1Vsj,v(Ivbfμj)2v=1Vsj,v,where Sj,v indicates the probability of voxel v belonging to tissue j, and Ivbf indicates the intensity of voxel v in the bias-free image. The negative log of joint brain tissue probability values of individual voxels computed using these intensity statistics is adopted as the loss function to optimize the segmentation module in an unsupervised setting, with a formulation as

Lseg(Ibf)=v=1Vlogj=1Jγj2πσjexp((Ivbfμj)22σj2). (3)

The final segmentation label is obtained by assigning each voxel to the corresponding brain tissue with the largest probability as Labelv=argmaxj=1,2,3Sj,v.

3. EXPERIMENTAL RESULTS

The proposed method has been evaluated using brain MR images from the HCP-Aging [19] and HCP-Development [20] datasets. Particularly, brain MR images of 650 randomly selected subjects from the HCP-Aging dataset were used to train the proposed cascaded convolutional networks, and the trained model was evaluated on images of 50 randomly selected testing subjects from the HCP-Aging and HCP-Development datasets respectively. The proposed method was compared with state-of-the-art unsupervised brain tissue segmentation methods, including FAST in FSL [3] and Atropos [21] in ANTs [22]. Dice index was used to evaluate the segmentation accuracy, with the FreeSurfer segmentation results as references. The Dice index for CSF was not included due to that only the label of ventricular CSF was available in the FreeSurfer segmentation (but not for the CSF in the cranial subarachnoid space). In addition, the quality of the bias-free image obtained by the proposed method was compared with state-of-the-art bias field correction methods, including the bias field correction by FAST and by N4ITK as used in the Atropos segmentation. Quantitative measures including Peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) were used to evaluate the performance of bias field correction, with the restored images from the HCP-Aging and HCP-Development datasets as references. The default settings of FAST and Atropos were used in the experiments.

The proposed method was implemented using PyTorch [23]. Adam optimizer [24] was adopted to optimize the networks, the learning rate was set to 1 × 10−4, the batch size was set to 1, and the number of iterations was set to 20000 for each bias field correction module and the brain tissue segmentation module. One NVIDIA TITAN Xp GPU with 12G memory was used for training and testing. The hyperparameters γj (j = 1,2,3) were set to 0.190, 0.486, and 0.324 for CSF, GM, and WM respectively, which were equal to the ratio of the number of voxels of each tissue and the total number of voxels within the brain based on the tissue prior maps available in FSL and SPM.

The bias field corrected images and the brain tissue segmentation maps obtained by different methods for one randomly selected testing subject are demonstrated in Figure 2. As shown in the last column, the proposed cascaded networks method can successfully remove the bias field contained in the original image, facilitating the successful brain tissue segmentation. The quantitative evaluation measures for bias field correction and brain tissue segmentation are summarized in Table 1 and Table 2 for testing images from HCP-Aging and HCP-Development datasets, respectively. These results have demonstrated that the quality of the bias-free images obtained by the proposed method was competitive compared with that obtained by the state-of-the-art methods. Moreover, these results have also demonstrated that our unsupervised DL based brain tissue segmentation model could obtain comparable or better segmentation accuracy than state-of-the-art unsupervised methods. It is worth noting the proposed cascaded networks trained on HCP-Aging dataset could obtain similar performance on the HCP-Development dataset, indicating its promising generalization performance.

Figure 2.

Figure 2.

Bias field corrected images (top) and brain tissue segmentation maps (bottom) obtained by FAST, Atropos, and the proposed cascaded convolutional networks, for one randomly selected testing subject from the HCP-Aging dataset.

Table 1.

Quantitative evaluation of brain tissue segmentation and bias field correction on HCP-Aging dataset (mean±std).

Methods Bias field correction Segmentation
PSNR SSIM Dice (GM) Dice (WM)
FAST (FSL) 24.973±0.934 0.990 ±0.002 0.715±0.017 0.799±0.015
Atropos (ANTs) 26.518±0.986 0.981 ±0.004 0.747±0.028 0.828±0.020
Proposed 26.597±0.670 0.988 ±0.002 0.781±0.018 0.844±0.014

Table 2.

Quantitative evaluation of brain tissue segmentation and bias field correction on HCP-Development dataset (mean±std).

Methods Bias field correction Segmentation
PSNR SSIM Dice (GM) Dice (WM)
FAST (FSL) 23.765±0.726 0.986±0.003 0.734±0.017 0.775±0.022
Atropos (ANTs) 23.388±1.723 0.972±0.013 0.758±0.023 0.798±0.025
Proposed 24.768±0.967 0.982±0.004 0.814±0.009 0.833±0.014

4. CONCLUSION

Brain tissue segmentation from MR images is a critical step for quantifying the brain morphology in neuroimaging studies. To leverage the large amount of unlabeled brain imaging data, an unsupervised deep learning model is proposed to perform brain tissue segmentation and bias field estimation jointly using cascaded convolutional networks, consisting of multiple cascaded bias field estimation modules to estimate the bias field recursively and one segmentation module for brain tissue segmentation. Quantitative evaluation has demonstrated that the proposed method could accurately estimate the bias field and obtain competitive segmentation performance compared with state-of-the-art methods.

ACKNOWLEDGEMENTS

This work was partially supported by NIH grants of R01-AG066650 and R01-EB022573.

REFERENCES

  • [1].Despotovic I, Goossens B, and Philips W, “MRI Segmentation of the Human Brain: Challenges, Methods, and Applications,” Computational and Mathematical Methods in Medicine, 2015, (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Fischl B, “FreeSurfer,” Neuroimage, 62(2), 774–781 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Jenkinson M, Beckmann CF, Behrens TE et al. , “Fsl,” Neuroimage, 62(2), 782–790 (2012). [DOI] [PubMed] [Google Scholar]
  • [4].Penny WD, Friston KJ, Ashburner JT et al. , [Statistical parametric mapping: the analysis of functional brain images] Elsevier, (2011). [Google Scholar]
  • [5].Akkus Z, Galimzianova A, Hoogi A et al. , “Deep Learning for Brain MRI Segmentation: State of the Art and Future Directions,” Journal of Digital Imaging, 30(4), 449–459 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Litjens G, Kooi T, Bejnordi BE et al. , “A survey on deep learning in medical image analysis,” Medical Image Analysis, 42, 60–88 (2017). [DOI] [PubMed] [Google Scholar]
  • [7].Cullen NC, and Avants BB, “Convolutional neural networks for rapid and simultaneous brain extraction and tissue segmentation,” Brain Morphometry, 13–34 (2018). [Google Scholar]
  • [8].Rajchl M, Pawlowski N, Rueckert D et al. , “Neuronet: fast and robust reproduction of multiple brain image segmentation pipelines,” arXiv preprint arXiv:1806.04224, (2018). [Google Scholar]
  • [9].Huo YK, Xu ZB, Xiong YX et al. , “3D whole brain segmentation using spatially localized atlas network tiles,” Neuroimage, 194, 105–119 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Tustison NJ, Avants BB, Cook PA et al. , “N4ITK: Improved N3 Bias Correction,” Ieee Transactions on Medical Imaging, 29(6), 1310–1320 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Goldfryd T, Gordon S, and Raviv TR, “Deep semi-supervised bias field correction of Mr images.” 1836–1840.
  • [12].Dai XJ, Lei Y, Liu YZ et al. , “Intensity non-uniformity correction in MR imaging using residual cycle generative adversarial network,” Physics in Medicine and Biology, 65(21), (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Wan F, Smedby Ö, and Wang C, “Simultaneous MR knee image segmentation and bias field correction using deep learning and partial convolution.” 10949, 61–67. [Google Scholar]
  • [14].Zhu J-Y, Park T, Isola P et al. , “Unpaired image-to-image translation using cycle-consistent adversarial networks.” 2223–2232.
  • [15].Li C, Xu C, Anderson AW et al. , “MRI tissue classification and bias field estimation based on coherent local intensity clustering: A unified energy minimization framework.” 288–299. [DOI] [PubMed]
  • [16].Ulyanov D, Vedaldi A, and Lempitsky V, “Instance normalization: The missing ingredient for fast stylization,” arXiv preprint arXiv:1607.08022, (2016). [Google Scholar]
  • [17].Maas AL, Hannun AY, and Ng AY, “Rectifier nonlinearities improve neural network acoustic models.” 30, 3. [Google Scholar]
  • [18].Ronneberger O, Fischer P, and Brox T, “U-net: Convolutional networks for biomedical image segmentation.” 234–241.
  • [19].Bookheimer SY, Salat DH, Terpstra M et al. , “The Lifespan Human Connectome Project in Aging: An overview,” Neuroimage, 185, 335–348 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Somerville LH, Bookheimer SY, Buckner RL et al. , “The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds,” Neuroimage, 183, 456–468 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Avants BB, Tustison NJ, Wu J et al. , “An Open Source Multivariate Framework for n-Tissue Segmentation with Evaluation on Public Data,” Neuroinformatics, 9(4), 381–400 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Avants BB, Tustison NJ, Song G et al. , “A reproducible evaluation of ANTs similarity metric performance in brain image registration,” Neuroimage, 54(3), 2033–2044 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Paszke A, Gross S, Massa F et al. , “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, 32, (2019). [Google Scholar]
  • [24].Kingma DP, and Ba J, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, (2014). [Google Scholar]

RESOURCES