Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Feb 27.
Published in final edited form as: Mach Learn Med Imaging. 2020 Sep 29;12436:384–393. doi: 10.1007/978-3-030-59861-7_39

Anatomy-Guided Convolutional Neural Network for Motion Correction in Fetal Brain MRI

Yuchen Pei 1,2, Lisheng Wang 1, Fenqiang Zhao 2, Tao Zhong 2, Lufan Liao 2, Dinggang Shen 2, Gang Li 2
PMCID: PMC7912521  NIHMSID: NIHMS1666981  PMID: 33644782

Abstract

Fetal Magnetic Resonance Imaging (MRI) is challenged by the fetal movements and maternal breathing. Although fast MRI sequences allow artifact free acquisition of individual 2D slices, motion commonly occurs in between slices acquisitions. Motion correction for each slice is thus very important for reconstruction of 3D fetal brain MRI, but is highly operator-dependent and time-consuming. Approaches based on convolutional neural networks (CNNs) have achieved encouraging performance on prediction of 3D motion parameters of arbitrarily oriented 2D slices, which, however, does not capitalize on important brain structural information. To address this problem, we propose a new multi-task learning framework to jointly learn the transformation parameters and tissue segmentation map of each slice, for providing brain anatomical information to guide the mapping from 2D slices to 3D volumetric space in a coarse to fine manner. In the coarse stage, the first network learns the features shared for both regression and segmentation tasks. In the refinement stage, to fully utilize the anatomical information, distance maps constructed based on the coarse segmentation are introduced to the second network. Finally, incorporation of the signed distance maps to guide the regression and segmentation together improves the performance in both tasks. Experimental results indicate that the proposed method achieves superior performance in reducing the motion prediction error and obtaining satisfactory tissue segmentation results simultaneously, compared with state-of-the-art methods.

Keywords: Fetal brain, Motion correction, Anatomical knowledge

1. Introduction

Reconstruction of 3D fetal brain images from multiple motion-corrupted stacks of 2D slices plays an important role for modeling and quantification of prenatal brain development [1,19]. However, high-quality volume reconstruction remains a challenging task due to motion commonly occurring between slice acquisitions. Particularly, fetuses in mid-gestation have relatively large space to stretc.h and rotate. Arbitrary fetal motion can invalidate slice alignment and manual intervention may be necessary. However, manual motion correction in each slice often becomes unfeasible in practice, due to the magnitude of image data involved. Although many methods such as slice-to-volume registration [3,12] are successfully applied to reconstruct fetal brain, they require a coarsely aligned initialization of slices to initialize reconstruction process. Therefore, good initial alignment of 2D slices is critical to reconstruct brain volumetric images.

Recently, there has been an increasing interest in applying deep learning techniques in medical image computing, inspired by the promising results that have been achieved in computer vision. In an effort to speed up slice-to-volume rigid registration and improve its capture range, Miao et al. [9,10] firstly proposed a registration algorithm using CNN regressors in X-ray images. In fetal MRI, deep-learning based methods [4,5,13] have been also used to improve the prediction of slice transformation parameters for motion correction. These approaches share similarity with [7] in computer vision, which relied upon image retrieval [14] that matches the testing image and retrieved image’s intensity information to predict the camera pose. While they are powerful in learning to predict brain position based on intensity information, the multi-layer CNNs only implicitly learn the contour and shape of the brain. Since the brain tissue maps have more explicit semantic information about the anatomical boundary, it can provide key compensatory information to intensity for establishing matching across slices, thus is critical to the robustness of motion correction methods in the fetal brain. The anatomical knowledge can be also applied to improve the segmentation performance in MRI, which has been proved to be effective in [17].

Therefore, in this paper, for the first time, we present a multi-task learning framework to jointly predict the position and tissue segmentation map of each 2D slice in fetal MRI stacks. Instead of treating these two tasks as independent problems, we optimize the network by simultaneously features shared within regression and segmentation tasks. We show that brain motion correction can be improved by utilizing the association between two tasks. Moreover, the complementary anatomical information from the tissue maps is incorporated in refinement segmentation network to improve the regression and segmentation results. We quantitatively evaluate the regression and segmentation performance using simulated 2D slice data extracted from reconstructed 3D fetal brain MRI. The comparison with other CNN-based 3D motion correction methods indicates that our method is effective in reducing the motion prediction error and obtaining superior tissue segmentation results simultaneously.

2. Method

As illustrated in Fig. 1, our framework is composed of a coarse regression and segmentation network and a multi-stream refinement network. In coarse stage, a multi-output network is designed to predict the transformation parameters and segmentation results jointly. In the refinement stage, signed distance maps of tissue boundaries are introduced to provide additional anatomical contour information. Multi-stream inputs are convoluted by the same encoder structure and shared representation module is added to fuse multiple high-level features to refine predictions and segmentations.

Fig. 1.

Fig. 1.

An illustration of our framework. (a) Coarse regression and segmentation network. (b) Refinement network that uses multi-stream to extract anatomy features and shared representation module to combine multi-source features to make accurate prediction and segmentation.

2.1. Coarse Regression and Segmentation Network

The coarse regression and segmentation network has two branches: one for regression of 2D slice transformation parameters and the other for segmentation of brain tissues. The encoder part learns common features shared by two tasks, learning valid deep representation for position and segmentation. In decoder parts, we design the regression module and segmentation module, separately, which further learn task-specific features for each task. This network architecture can thus avoid training individual network for each task, by utilizing the association between two tasks and more oversight on learning shared features. Afterwards, in the training process, the joint optimization is employed, which ensures learning common features between the regression and segmentation tasks and also avoids over-fitting to one particular task.

Encoder Module.

As shown in Fig. 1 (a), all the layers in the encoder module and segmentation module use a fixed kernel size 3×3, the last convolution layer in regression module uses a kernel size 1 × 1. The downsampling is achieved by setting stride size of the convolution kernel to 2. The batch normalization (BN) layers are used after each convolution layer in our network to accelerate the training process. We employ rectified linear unit (ReLU) activations after each BN layer. Finally, our network branches from the pooling layer to produce outputs for each specific task.

Regression Module.

The architecture of this part is derived from the VGG [15], in which two fully connected layers are used to transfer feature from the encoder to regress a desired output. The movement of the fetal brain can be considered as a rigid transformation. Thus, the slices are transformed rigidly in 3D space. Parameterization of each slice therefore lies within the bounds of the SE(3) Lie group. The parameterization includes: a rotation component as well as a translation component. Similar to [5], we define three Cartesian anchor points within a plane (nine parameters) as 3D transformation presentation. Any three non-linear or non-identical points in a 3D space form a plane and their order defines the orientation. In this way, the rotation and translation components of the labels are combined together. For an L × L 2D slice, we define P1 as the center of the slice (L2,L2,0), P2 as the up-right corner of the slice (L,0,0), P3 as the bottom-right corner of the slice (L,L,0). The regression module will estimate the transformed positions Q1 =TP1, Q2 =TP2, Q3 =TP3 for these three points, where T is the transformation matrix. Then, we can calculate the transformation matrix T based on the predicted points. Finally, we transform the corresponding slice to its corrected position in 3D volumetric space.

Segmentation Module.

In the segmentation module, motivated by the outstanding performance of U-net [8,11,18] in segmentation, we also use skip connection which can extract holistic and local feature jointly from the intensity image. In order to get more anatomical knowledge, we segment fetal brain into cerebrospinal fluid (CSF), gray matter (GM) and white matter (WM). The stride size of deconvolution kernel is set to 2 to achieve upsampling of feature maps 2 times.

2.2. Anatomy-Guided Refinement Network

Anatomical Knowledge.

In the coarse stage, the architecture only takes the intensity information as the input and finally outputs the transformation parameters and tissue segmentation maps. Therefore, in the refinement stage, we add the anatomical prior knowledge to guide the regression and segmentation tasks as shown in Fig. 1 (b). Based on the coarse segmentation results, we can directly construct the Signed Distance Map (SDM) [2] for three tissue types CSF, GM, WM, respectively. Given a target tissue and a pixel x in the image, the SDM is defined as:

DSDM(x,B)={0,ifxϵB,xy2,ifxϵΩin,+xy2,ifxϵΩout (1)

where B represents the boundary of the target tissue, Ωin and Ωout denote the region inside and outside the target tissue, y denotes the closest pixel on B to x. The absolute value of SDM indicates the distance from the pixel to the closest pixel on boundary of tissue, while the sign indicates either inside (negative) or outside (positive) the tissue. Note that the zero distance or zero level set means that the pixel is on the boundary of the tissue. We normalize the DSDM(x,B) to be in the range [−1, 1] for each tissue.

Multi-stream Network Architecture.

The anatomy-guided refinement network takes the intensity as well as three types of structural knowledge as inputs. Since our distance maps include different tissue information, straightforward concatenation of extracted features from them is not reasonable. To effectively leverage features from different brain tissues to guide regression and segmentation tasks, we propose a multi-stream network architecture. By using individual stream for each tissue, we can investigate when it is best to merge the streams. [16] shows that cross-modality convolution can effectively aggregate the information between modalities to produce better results. Inspired by that, after concatenating the features from the encoder part, we designed the shared representation module composed of 2 convolution layers and 1 pooling layer to better fuse different tissue anatomical features. It is worth noting that only the low-level features of the intensity image pass to upsampling stream by skip connection, while the anatomical knowledge does not, thus noise and irrelevant information is not introduced. Similar to the coarse stage, the regression and segmentation tasks are split in decoder part and make predictions separately.

2.3. Loss Function Design

In essence, we aim to jointly perform the 3D position regression for mapping a 2D slice to 3D volumetric space and segmentation of the 2D slice into different types of tissues: CSF, GM, and WM. To train the multi-task model, we define the loss of the regression module and segmentation module as Lreg and Lseg, respectively.

For regression task, the anchor points consist of 9 parameters: (Q1 (x,y,z), Q2 (x,y,z), Q3 (x,y,z)). As [6] presented, we calculate the Euclidean distance between the points and predicted points as loss function. This approach keeps the nature of the network loss consistent and avoids the balance of rotation loss and translation loss. As each point is Cartesian, the optimization is guaranteed to be balanced. The 3D transformation loss is defined as:

Lreg=Q^1Q12+Q^2Q22+Q^3Q32 (2)

where Q^1, Q^2, Q^3 are the predicted anchor points.

In segmentation stream, we choose the cross-entropy loss for training stability. In addition to this, we apply weights to each class to offset the imbalance of pixel frequency across different classes.

Lseg=icwcliclog(pic) (3)

where wc represents the proportions of different classes in input data, c ϵ{WM, GM,CSF}, lic is 1 if the pixel i belongs to class c, otherwise 0, and the pic is the probability of the pixel i predicted as c.

Incorporation of the multi-loss framework [20], the combined loss for regression and segmentation networks can therefore be written as:

L=Lreg+βLseg (4)

where β controls the relative importance of the segmentation loss term. In our experiments, β is set as 20 in coarse stage and 10 in refinement stage.

3. Experiments and Discussion

3.1. Dataset and Evaluation Metrics

Since there is no ground truth in motion correction, we simulate 2D slices with random motion based on 3D fetal brain volumes, as in [5]. Specifically, experiments were conducted on a dataset with 48 fetal brain MRI volumes with manual tissue segmentation results [1], which were reconstructed to have an isotropic resolution of 0.75mm×0.75mm×0.75mm. 2D slices were extracted from the high-resolution 3D volumes. A stack of 120×120 sampling planes was aligned with the brain. Using random sampling, the entire stack was rotated randomly with the fetal brain’s isocenter, with z-axis offs et al. so sampled randomly. 2000 random rotations were made between −π/2 and +π/2 in x, y, z axes. Since the rotation matrices were known, the anchor points Q1, Q2, Q3 can be computed and used as the ground truth. Each volume generated 32,000 2D slices in total. This method covered half of all possible orientations, and provided different views in the training set. Therefore, for training the network, the separation of different views (i.e., axial, coronal, and sagittal) was unnecessary. We did not span the whole space in this experiment, because some 2D brain slices do not have enough information to distinguish they belong to the right or left hemisphere, due to the relatively symmetric shape of the brain.

We performed 4-fold cross-validation. To evaluate the regression performance, we tested the network with a 2D image slice ωi, as extracted from the volume V. By using the parameters predicted from the network, we extract a new slice ωp from the same V and compare it to slice ωi. We chose several standard image similarity metrics: Cross Correlation (CC), Peak Signal-to-Noise Ratio (PSNR), Mean Squared Error (MSE), and Structural Similarity (SSIM).

3.2. Motion Correction Results on Fetal Brain

We present the results of the single regression task, regression in coarse stage and regression in anatomy-guided refinement stage including single-stream network and multi-stream network to validate the effectiveness of different parts in our proposed method. In addition, we applied the state-of-the-art method, SVRnet [5], which used anchor points as the regression loss, to our dataset for comparison. The CC, MSE, PSNR and SSIM values of different methods are shown in Table 1. For CC, PSNR and SSIM, the higher values indicate higher accuracy. As for MSE, the lower value means higher accuracy.

Table 1.

Regression results of different methods.

Method CC MSE PSNR SSIM
SVRnet [5] 0.822±0.029 1095.12±203.02 17.854±0.867 0.600±0.051
Single Regression 0.822±0.036 1114.33±234.41 17.821±0.953 0.598±0.044
Coarse Stage 0.828±0.024 1058.77±206.73 18.033±0.861 0.608±0.052
Refinement-single 0.803±0.043 1263.32±335.47 17.293±1.100 0.598±0.054
Refinement-multi  0.837± 0.024 1013.09±199.14 18.253±0.804 0.618±0.050

As shown in Table 1, our multi-task learning approach achieves better performance than the single task. For refinement stage, single-stream which directly the anatomical information and intensity information achieves worse result than only intensity information. In multi-stream network, anatomy-guided learning improves the accuracy significantly, which indicates that the anatomical clue helps the network to predict the transformation parameters more accurately and multi-stream architecture facilitates fusing anatomical feature. Our proposed method also outperforms the state-of-the-art method [5], which only utilizes the intensity information. In Fig. 2(a), we present some experimental results, with some distinct regions highlighted by yellow arrows. Compared with other methods, the results of refinement-multi are more similar to the original images. It further validates that our method is effective in predicting the transformation parameters more accurately.

Fig. 2.

Fig. 2.

(a) The origin slices inputted to the network and slices extracted from the respective fetal volume using parameters predicted by different methods. (b) The segmentation results. (red: CSF, green: GM, blue: WM)

3.3. Segmentation Results

In Table 2, we present the segmentation results of the coarse stage, anatomy-guided refinement stage including single-stream network and multi-stream network and U-net [11]. Compared with the U-net and the coarse stage, MHD in anatomy-guided refinement-multi stage has been significantly improved. DSC does not improve significantly, as it is not sensitive to change of details near contours. The results of refinement-multi are better than the refinement-single which demonstrate the contribution of the multi-network for extracting mutli-source feature. As shown in Fig. 2(b), some qualitative results of anatomy-guided refinement stage are superior to other methods.

Table 2.

Segmentation results of different methods.

Method U-net [11] Coarse stage Refinement-single Refinement-multi
DSCCSF 0.954±0.004 0.955±0.003 0.952±0.006 0.956±0.003
DSCGM 0.908±0.008 0.908±0.007 0.897±0.011 0.909±0.007
DSCWM 0.976±0.003 0.976±0.003 0.963±0.010 0.977±0.003
MHDCSF(mm) 0.676±0.073 0.674±0.070 0.685±0.082 0.661±0.069
MHDGM(mm) 0.608±0.079 0.605±0.079 0.616±0.098 0.600±0.079
MHDWM(mm) 0.421±0.069 0.412±0.069 0.424±0.078 0.403±0.062

3.4. Discussion

In our method, the segmentation task actually helps the regression task to improve the prediction performance by providing boundary information. Meanwhile, since the regression task takes boundary information for better slice matching in 3D space, it in turn improves the performance of segmentations. Further, the anatomical knowledge makes the regression of transformation parameters and tissue segmentation more accurately in refinement stage. Consequently, the proposed method can leverage the shared information in both tasks in one pass and achieve satisfactory regression and segmentation results simultaneously.

4. Conclusion

This paper proposes a coarse-to-fine framework for fetal brain motion correction and tissue segmentation. In the coarse stage, our multi-task model jointly predicts the transformation parameters and tissue maps. In the refinement stage, the signed distance maps of segmented tissues are introduced to provide additional tissue boundary features. Experiments demonstrate that motion correction can be improved by multi-task learning and additional anatomical information further enchances the performance of regression and segmentation tasks. In future, we will evaluate on more datasets and make our model predict more accurately.

Acknowledgements.

This work was partially supported by NIH grants (MH117943).

References

  • 1.Benkarim OM, et al. : A novel approach to multiple anatomical shape analysis: application to fetal ventriculomegaly. Med. Image Anal. 64, 101750 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Danielsson PE: Euclidean distance mapping. Comput. Graph. Image Process. 14(3), 227–248 (1980) [Google Scholar]
  • 3.Gholipour A, Estroff JA, Barnewolt CE, Connolly SA, Warfield SK: Fetal brain volumetry through MRI volumetric reconstruction and segmentation. Int. J. Comput. Assist. Radiol. Surg. 6(3), 329–339 (2011) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hou B, et al. : Predicting slice-to-volume transformation in presence of arbitrary subject motion. In: Descoteaux M, Maier-Hein L, Franz A, Jannin P, Collins DL, Duchesne S (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 296–304. Springer, Cham: (2017). 10.1007/978-3-319-66185-8_34 [DOI] [Google Scholar]
  • 5.Hou B: 3-D reconstruction in canonical co-ordinate space from arbitrarily oriented 2-D images. IEEE Trans. Med. Imaging 37(8), 1737–1750 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hou B, et al. : Computing CNN loss and gradients for pose estimation with Riemannian geometry. In: Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 756–764. Springer, Cham: (2018). 10.1007/978-3-030-00928-1_85 [DOI] [Google Scholar]
  • 7.Kendall A, Grimes M, Cipolla R: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015) [Google Scholar]
  • 8.Li G, et al. : Computational neuroanatomy of baby brains: a review. NeuroImage 185, 906–925 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Miao S, Wang ZJ, Liao R: A CNN regression approach for real-time 2D/3D registration. IEEE Trans. Med. Imaging 35(5), 1352–1363 (2016) [DOI] [PubMed] [Google Scholar]
  • 10.Miao S, Wang ZJ, Zheng Y, Liao R: Real-time 2D/3D registration via CNN regression. In: 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), pp. 1430–1434. IEEE; (2016) [Google Scholar]
  • 11.Ronneberger O, Fischer P, Brox T: U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham: (2015). 10.1007/978-3-319-24574-4_28 [DOI] [Google Scholar]
  • 12.Rousseau F, Glenn OA, Iordanova B, Barkovich JA, Studholme C: Registration-based approach for reconstruction of high-resolution in utero fetal MR brain images. Acad. Radiol. 13(9), 1072–1081 (2006) [DOI] [PubMed] [Google Scholar]
  • 13.Salehi SSM, Khan S, Erdogmus D, Gholipour A: Real-time deep pose estimation with geodesic loss for image-to-template rigid registration. IEEE Trans. Med. Imaging 38(2), 470–481 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sattler T, Zhou Q, Pollefeys M, Leal-Taixe L: Understanding the limitations of CNN-based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3302–3312 (2019) [Google Scholar]
  • 15.Simonyan K, Zisserman A: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) [Google Scholar]
  • 16.Tseng KL, Lin YL, Hsu W, Huang CY: Joint sequence learning and cross-modality convolution for 3D biomedical segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6393–6400 (2017) [Google Scholar]
  • 17.Wang L, et al. : Volume-based analysis of 6-month-old infant brain MRI for autism biomarker identification and early diagnosis. In: Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G (eds.) MICCAI 2018. LNCS, vol. 11072, pp. 411–419. Springer, Cham: (2018). 10.1007/978-3-030-00931-1_47 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wang L, et al. : Benchmark on automatic six-month-old infant brain segmentation algorithms: the iSeg-2017 challenge. IEEE Trans. Med. Imaging 38(9), 2219–2230 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Xia J, et al. : Fetal cortical surface atlas parcellation based on growth patterns. Hum. Brain Mapp. 40(13), 3881–3899 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Xu C, et al. : Multi-loss regularized deep neural network. IEEE Trans. Circuits Syst. Video Technol. 26(12), 2273–2283 (2015) [Google Scholar]

RESOURCES