Abstract
Diffusion MRI affords great value for studying brain development, owing to its capability in assessing brain microstructure in association with myelination. With longitudinally acquired pediatric diffusion MRI data, one can chart the temporal evolution of microstructure and white matter connectivity. However, due to subject dropouts and unsuccessful scans, longitudinal datasets are often incomplete. In this work, we introduce a graph-based deep learning approach to predict diffusion MRI data. The relationships between sampling points in spatial domain (x-space) and diffusion wave-vector domain (q-space) are harnessed jointly (x-q space) in the form of a graph. We then implement a residual learning architecture with graph convolution filtering to learn longitudinal changes of diffusion MRI data along time. We evaluate the effectiveness of the spatial and angular components in data prediction. We also investigate the longitudinal trajectories in terms of diffusion scalars computed based on the predicted datasets.
Keywords: Brain development, Longitudinal prediction, Diffusion MRI, Graph representation, Graph convolution, Residual graph neural network
1. Introduction
Diffusion MRI is an attractive imaging modality for longitudinal studies of the developing infant brain, owing to its capability in assessing brain microstructure even in the pre-myelinated brain [1]. To chart temporal brain changes, longitudinal diffusion MRI data need to be collected. However, due to subject dropouts and unsuccessful scans, longitudinal datasets are often incomplete. For T1-weighted and T2-weighted imaging, a number of methods have been developed for longitudinal image prediction, i.e., extrapolation using geodesic regression models [2] and patch-wise sparse representation of image metamorphosis paths [3]. Work on diffusion MRI prediction is however scarce.
Longitudinal prediction of diffusion MRI data is more challenging than T1-weighted and T2-weighted data, partly due to the high variability in q-space sampling. The sampling domain is not necessarily Cartesian and can vary from shell-based to even random. It is therefore not straightforward to extend advances such as convolutional neural network (CNN) based image generation techniques [4] to diffusion MRI.
In this article, we will introduce a graph-based deep learning approach for longitudinal prediction of diffusion MRI data in the first year of life. This work is inspired by [5], where q-space matching with the help of graph representation was harnessed for denoising of diffusion MRI data. In our work, this graph representation is further extended to the x-q space by considering both spatial and angular domains. This allows us to adopt a graph CNN approach for prediction. Convolutions of functions on graphs are implemented as multiplications in the graph spectral domain [6]. In our approach, we introduce a new residual neural network with graph convolutional filtering to predict diffusion MRI data of target time points in a patch-wise manner. We train the residual learning model with a Huber loss for minimizing image differences and L1 regularization on the weights of the convolutional filters to avoid over-fitting.
2. Method
In the following sections, we will introduce (1) graph-based representation of the spatio-angular x-q space, (2) convolutional filtering of a function defined on a graph, and (3) a residual neural network with graph spectral filtering for DW image prediction.
2.1. Graph Representation of x-q Space
Consider an undirected graph , where is a set of vertices, and is a set of edges connecting the vertices. We denote by wi,j the non-negative weight of an undirected edge between two vertices, vi and vj. If vi and vj are not connected, wi,j is 0. The adjacency matrix A = {wi,j}is symmetric. Given A, the degree matrix D is diag{d0, d1, ···, dn−1}, where . In this work, we utilized a symmetric normalized form of the graph Laplacian: L = I − D−1/2 AD−1/2.
In diffusion MRI, a vertex vi can be thought as corresponding to a sampling point at spatial location xi of a diffusion-weighted (DW) image with b-value bi and gradient direction qi. The geometric structure of the sampling domain is encoded using an adjacency matrix A, with weight wi,j between two points vi and vj defined as
(1) |
where σx, σb, and σq are parameters for controlling the width of the Gaussian functions. The weight wi,j reflects the spatial distance, the dissimilarity of gradient directions, and the difference in diffusion weightings between the two points.
2.2. Convolution Filtering on Graph
Convolutions of functions defined on a graph can be implemented as linear operators in the graph spectral domain [6]. The Fourier basis of the graph representation can be determined as the eigenvectors of graph Laplacian. The Laplacian L is diagonalized by the Fourier basis such that L=UΛU⊤, where are orthonormal eigenvectors with nonnegative eigen-values . The Fourier transform of a graph s defined on the graph is , where the i-th element of s (i.e., si) corresponds to vertex vi. The convolution of s can be implemented as
(2) |
where fθ is a diagonal matrix parameterized by θ, representing the Fourier coefficients of the filter. The filter can be localized in space by representing fθ in the form of a polynomial [7]:
(3) |
where the parameters are now in the form of polynomial coefficients. The spectral filter with polynomials of degree K is K-localized in the graph. For fast filtering, Chebyshev polynomials are used, allowing the convolution to be realized directly using a rescaled version of L, i.e., , with a recursive formula without explicit eigende composition of L [6]. That is, based on (3), the filter can be implemented as without the Fourier transform (U⊤) and the inverse Fourier transform (U) in (2).
A feature map can be obtained by combining the outcomes of spectral filtering
(4) |
where {sl} are the input feature maps, is an output feature map, and Fin is the number of the input feature maps. Consequently, the number of trainable parameters of the convolutional layer is Fin × Fout × (K + 1).
2.3. Residual Neural Network for DW Image Prediction
We implemented a deep residual network with graph convolutions to capture brain longitudinal changes for prediction of missing diffusion MRI data in a patch-wise manner. Figure 1 shows the architecture of the proposed residual neural network. A basic building block is the graph convolutional block (GCB) with graph convolution followed by a randomly-translated leaky rectified linear unit (RT-ReLU) [8] and a group normalization layer [9]. The RT-ReLU is more robust to image noise and jitter by adding small Gaussian noise before the non-linear activation function. Due to the large size of the Laplacian matrix, only small batch sizes are used for training. To prevent the degradation of learning accuracy, we normalize the feature maps within each group. Note that RT-ReLU and the group normalization layers are not used at the first and last GCB layers. In the architecture, we build a residual graph convolution block (RGCB) that consists of two GCBs with element-wise addition. Five RGCBs are connected consecutively.
Fig. 1.
Residual neural network with graph convolution blocks. GCB: graph convolution block, RGCB: residual graph convolution block; RT-ReLU: randomly-translated leaky ReLU; Fout: the number of output feature maps; and K: the kernel size of the graph convolutional filter
The goal of our model is to learn changes between two time points in order to predict DW volumes at a target time point. We train the model using Huber loss [10]:
(5) |
where sout is the output diffusion signals of the model, star is the target diffusion signals, and δ is the parameter to determine a point where the loss function changes from quadratic to linear. The L1 loss is usually employed in image-to-image translation, however it is more difficult to converge at small errors than L2 loss. Huber loss combines the advantages of L1 and L2 losses—the sensitivity of L2 loss and the robustness of L1 loss. We also employ L1 regularization on filter coefficients to avoid overfitting.
3. Experiments
3.1. Materials
We evaluated the effectiveness of the proposed method by means of the longitudinal prediction of diffusion MRI for 3 and 6-month-old using neonate data set. DW data collected from 28 infants, all born at full term, were used. For each subject, we acquired 42 DW volumes using a spin-echo echo planar imaging sequence with TR/TE = 7680/82 ms, resolution 2 × 2 × 2mm3, and b = 1000s/mm2 using a 3T Siemens Allegra scanner. Seven non-DW (b = 0s/mm2) reference scans were also acquired. The image dimension is 128 × 96 × 60. On average, each subject was scanned 1.2 times.
3.2. Implementation Details
For training, we used the diffusion MRI datasets of 20 subjects, covering neonates, 3-month-olds, and 6-month-olds. We used the DW images of 1 randomly selected subject for validation. For model evaluation, we utilized the DW images of 6 subjects, acquired consecutively at three time-points: birth and 3 and 6 months of age. All DW images were processed using the FSL software package [11], involving correction of eddy-current distortion and brain extraction.
We aligned all source and target DW images to a longitudinal template [12]. We reoriented the diffusion signals using the spatial warping method described in [13]. After alignment, we extracted patches of size 5 × 5 × 5 × 43 (42 DW volumes + 1 non-DW volume) for training and testing. In our network architecture, the kernel size K of RGCBs was 3 and the number of the output feature maps was 128 for each RGCB and GCB, except the last GCB (see Fig. 1). σx, σb, and σq in (1) for the spatio-angular distance were determined via a grid search in range of 0.0 to 2.0. The point parameter (δ) of Huber loss was 0.5 and the weight parameter for the L1 regularization was 0.00001. The number of the feature groups for the group normalization was 4. The network was trained up to 500 epochs using the ADAM optimizer with initial learning rate = 0.0001 and decay rate = 0.95. Training was terminated based on a validation loss threshold.
3.3. Quantitative Analysis
The accuracy of the trained model is assessed in terms of mean absolute error (MAE) and peak signal-to-noise ratio (PSNR) of the predicted images with respect to the target images. We compare the proposed model with spatial-only (SO) convolution. This is done by keeping the network architecture but limiting convolution to the spatial domain. We also compare the longitudinal trajectory of diffusion scalars, i.e., fractional anisotropy (FA) and mean diffusivity (MD), which are computed from the predicted DW images of neonates, 3-month-olds, and 6-month-olds.
3.3.1. Comparison with Spatial-Only Convolution
Table 1 shows the accuracy of the proposed model in comparison with the SO model. The proposed model shows significantly improved performance in 3-month-old DW image prediction from neonate images across all test subjects. The FA maps of the DW images predicted using the proposed model were more consistent with the target FA maps than those of the DW images generated by the SO model. In 6-month-old DW image prediction, there was no significant difference between the accuracy of the two models in the DW image prediction. However, the FA maps of the proposed model were more similar to the target FA maps than the SO model. Figure 2 shows the FA maps of the two models and those of the target DW images. As can be observed in Fig. 2, the FA maps of the proposed model show very similar patterns of FA values in white matter regions with the targets.
Table 1.
Comparison between graph convolutional neural networks: spatio-angular versus spatial-only model
MAE (DW volumes) | MAE (FA maps) | PSNR (FA maps) | |
---|---|---|---|
3-month-old from neonates | |||
Spatial only | 47.4±17.9 | 0.075±0.008 | 35.6±1.3 |
Spatio-angular | 44.4±17.5 | 0.066±0.008 | 36.5±1.3 |
6-month-old from 3-month-olds | |||
Spatial only | 39.5±10.6 | 0.080±0.004 | 35.1±0.5 |
Spatio-angular | 40.1±10.6 | 0.064±0.003 | 38.4±0.4 |
MAE: mean absolute error; PSNR: peak signal-to-noise ratio; DW: diffusion-weighted; FA: fractional anisotropy; Bold: Significant improvement of spatio-angular over spatial-only model
Fig. 2.
FA images of the predicted DW images
3.3.2. Longitudinal Trajectory of Diffusion Scalars
Figure 3 shows the FA and MD values of the corpus callosum, measured at corresponding regions of interest in the predicted and target DW images of test subjects at each time point (neonates to 6-month-olds). The predicted values have a similar trajectory as the target values with temporally increasing FA and decreasing MD. However, the predicted FA was slightly lower and the predicted MD was higher than the target values. This may be due to the smoothing effects associated with the averaging operation in patch-wise image generation.
Fig. 3.
Longitudinal trajectories of FA and MD of the corpus callosum
4. Conclusion
We have introduced a graph-based residual neural network for longitudinal prediction of diffusion MRI data, which takes into account signal measurements in the joint x-q space. This allows missing data to be imputed so that longitudinal analysis can be performed to study brain development.
Acknowledgements
This work was supported in part by NIH grants (NS093842, EB022880, EB006733, EB009634, AG041721, MH100217, and AA012388), an NSFC grant (11671022, China), and Institute for Information & communications Technology Promotion (IITP) grant (MSIT, 2018-2-00861, Intelligent SW Technology Development for Medical Data Analysis, South Korea).
Contributor Information
Jaeil Kim, School of Computer Science and Engineering, Kyungpook National University, Daegu, South Korea; Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, USA.
Yoonmi Hong, Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, USA.
Geng Chen, Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, USA.
Weili Lin, Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, USA.
Pew-Thian Yap, Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, USA.
Dinggang Shen, Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, USA.
References
- 1.Qiu A, Mori S, Miller MI: Diffusion tensor imaging for understanding brain development in early life. Ann. Rev. Psychol. 66(1), 853–876 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Niethammer M, Huang Y, Vialard FX: Geodesic regression for image time-series. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2011, pp. 655–662 (2011) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Rekik I, Li G, Wu G, Lin W, Shen D: Prediction of infant MRI appearance and anatomical structure evolution using sparse patch-based metamorphosis learning framework. In: Patch-Based Techniques in Medical Imaging, pp. 197–204, October 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Isola P, Zhu JY, Zhou T, Efros AA: Image-to-image translation with conditional adversarial networks (2016)
- 5.Chen G, Dong B, Zhang Y, Shen D, Yap PT: Neighborhood matching for curved domains with application to denoising in diffusion MRI. In: Medical Image Computing and Computer Assisted Intervention – MIC-CAI, pp. 629–637. Springer, Cham: (2017) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Defferrard M, Bresson X, Vandergheynst P: Convolutional neural networks on graphs with fast localized spectral filtering, pp. 3844–3852 (2016) [Google Scholar]
- 7.Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P: Geometric deep learning: going beyond euclidean data. IEEE Signal Process. Mag. 34(4), 18–42 (2017) [Google Scholar]
- 8.Cao J, Pang Y, Li X, Liang J: Randomly translational activation inspired by the input distributions of ReLU. Neurocomputing 275, 859–868 (2018) [Google Scholar]
- 9.Wu Y, He K: Group Normalization (2018) [Google Scholar]
- 10.Huber PJ: Robust estimation of a location parameter. Ann. Math. Stat. 35(1), 73–101 (1964) [Google Scholar]
- 11.Jenkinson M, Beckmann CF, Behrens TEJ, Woolrich MW, Smith SM: FSL. NeuroImage 62(2), 782–790 (2012) [DOI] [PubMed] [Google Scholar]
- 12.Kim J, Chen G, Lin W, Yap PT, Shen D: Graph-constrained sparse construction of longitudinal diffusion-weighted infant atlases. Medical Image Computing and Computer-Assisted Intervention, pp. 49–56. Springer, Cham: (2017) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chen G, Zhang P, Li K, Wee CY, Wu Y, Shen D, Yap PT: Improving estimation of fiber orientations in diffusion MRI using inter-subject information sharing. Sci. Rep. 6, 37847 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]