Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 1.
Published in final edited form as: IEEE Trans Med Imaging. 2019 Apr 15;38(12):2717–2725. doi: 10.1109/TMI.2019.2911203

Longitudinal Prediction of Infant Diffusion MRI Data via Graph Convolutional Adversarial Networks

Yoonmi Hong 1, Jaeil Kim 2, Geng Chen 3, Weili Lin 4, Pew-Thian Yap 5,*, Dinggang Shen 6,7,*
PMCID: PMC6935161  NIHMSID: NIHMS1544997  PMID: 30990424

Abstract

Missing data is a common problem in longitudinal studies due to subject dropouts and failed scans. We present a graph-based convolutional neural network to predict missing diffusion MRI data. In particular, we consider the relationships between sampling points in the spatial domain and the diffusion wave-vector domain to construct a graph. We then use a graph convolutional network to learn the non-linear mapping from available data to missing data. Our method harnesses a multi-scale residual architecture with adversarial learning for prediction with greater accuracy and perceptual quality. Experimental results show that our method is accurate and robust in longitudinal prediction of infant brain diffusion MRI data.

Keywords: Graph CNN, Diffusion MRI, Adversarial Learning, Longitudinal Prediction, Early Brain Development

I. Introduction

AN increasing body of evidence suggests that many neurodevelopmental and psychiatric disorders stem from developmental abnormalities that occur during infancy [1]–[3]. Diffusion MRI (DMRI) provides unique insights into the developing brain, owing to its sensitivity to tissue microstructural properties. However, in contrast to structural MRI, DMRI is more susceptible to low signal-to-noise ratio, motion artifacts, and partial volume effects. In addition, the relatively longer acquisition and the associated loud acoustic noise result in a higher number of failed and incomplete scans, especially when infant subjects are imaged.

There are several studies on longitudinal prediction of images or morphological attributes of structural MRI. In [4], a geodesic regression based generative model is proposed for spatio-temporal image time-series estimation. In [5], a sparse metamorphosis patch-based model is proposed for prediction of the temporal evolution of anatomical structures. In [6], a dynamically-assembled regression forest method is proposed to predict cortical development. However, prediction methods for DMRI data are scarce. In this work, we will introduce an approach based on a graph convolutional neural network (GCNN) for longitudinal prediction of missing DMRI data.

Convolutional neural networks (CNNs) have been widely used for segmentation, classification, and synthesis of medical images [7], [8]. Graph CNNs (GCNNs) have been recently introduced [9], [10] to generalize CNNs to high-dimensional, irregular, and non-Euclidean domains. In GCNNs, convolutions are defined as multiplications in the Fourier domain determined based on graph spectral decomposition. Fast convolution can be achieved by circumventing explicit computation of forward and inverse Fourier transforms [11].

The first GCNN architecture, introduced in [9], uses cubic B-splines for spectral filtering. They parameterized spectral multipliers using cubic B-spline basis with learned interpolation coefficients. However, this approach requires successive applications of forward and inverse Fourier transforms, making it computationally expensive. It also does not guarantee spatial localization. These limitations are overcome by using smooth spectral filters approximated by orthogonal polynomials of the Laplacian operator [11]. For example, spectral filters generated by Chebyshev polynomials of order K can be computed efficiently using recursive formulas and are exactly localized with K-hop support. A comprehensive overview on deep learning for non-Euclidean manifolds, represented as graphs, can be found in [12].

In addition to GCNNs, our method also harnesses the power of generative adversarial networks (GANs) [13], which have demonstrated impressive results in natural image generation [14], [15] and in a variety of applications [16], [17]. GANs are also applied to medical image applications, for example, synthesizing images from MRI to CT [8] and noise reduction for low-dose CT [18]. Key in GANs is the adversarial loss, which forces the generated images to be indistinguishable from the real images [17]. This is realized via a discriminator that learns a trainable loss function.

In this paper, we formulate DMRI data prediction as a high-dimensional image synthesis problem, which is realized using a GCNN. There are existing methods for prediction of missing longitudinal data [19] and missing imaging modalities [7], [8]. However, these methods cannot be directly applied to DMRI data prediction since the diffusion wave-vectors can be irregularly sampled. We fully exploit the relationships of neighboring sampling points in the spatial domain and the diffusion wave-vector domain in the form of a graph, based on which convolutions can be performed in the GCNN.

The rest of the paper is organized as follows. In Section II, we introduce the fundamental background of graph Fourier analysis in the context of GCNNs and provide the details of the proposed methods. In Section III, we report the experimental results on longitudinal prediction of infant DMRI data. We provide additional discussions in Section IV and conclude in Section V.

II. Methods

In this section, we introduce a GCNN with graph representation of diffusion signals, acquired with multiple gradient directions and diffusion weightings (i.e., b-values), for longitudinal DMRI prediction. We first show how the graph representation can be constructed. Then, we present a GCNN architecture with pooling and unpooling realized respectively via graph coarsening and uncoarsening for multi-scale residual learning. Finally, we show how adversarial learning can be used for further performance improvement.

A. Graph Fourier Analysis

We represent the diffusion signal as a function defined on the nodes of a graph G=(V,E,W), where V is a set of n nodes, E is a set of edges, and W=(wi,j)n×n is a weighted symmetric adjacency matrix between two nodes. Then, the signal x:V can be considered as a vector xn, where x = (x1, x2, · · · , xn) and xi is the value of x at the ith node. The graph Laplacian operator L plays a crucial role in graph signal processing, where L = DW, D is a diagonal degree matrix with Dii=jiwi,j. L can be normalized via L = InD−1/2WD−1/2, where In is an identity matrix. L can be decomposed by L = U ΛUT, where each column vector of U corresponds to an eigenvector of L, and Λ is a diagonal matrix with eigenvalues of L. These eigenvector decomposition defines the Fourier basis for analysis in the spectral domain and enables the formulation of spectral filtering [11].

B. Spectral Filtering

As the translation operation is not defined for graphs, it is not obvious how convolutional operations can be performed for functions defined on graphs. A solution is to perform a graph convolution as element-wise multiplication in the graph Fourier domain. More specifically, for a signal x, a convolutional operation by a filter gθ parameterized by θ, can be represented in the Fourier domain as gθUT x. We can also understand gθ as a function of eigenvalues of L: gθ(Λ) [11]. Then, the filtered output y can be expressed as

y=U(gθ(Λ)UTx)=gθ(UΛUT)x=gθ(L)x. (1)

Direct implementation of GCNN based on equation (1) used in [9] requires O(n2) computational complexity for the forward and inverse Fourier transform, and it does not guarantee localization in the spatial domain.

By Parseval’s theorem, the localization property in spatial domain corresponds to smoothness in the spectral domain. For spatial localized filters, each filter has to be smooth and can hence be approximated and parameterized by polynomials [12]. Spectral filters approximated by the K-th order polynomials of the Laplacian are exactly K-localized in the graph [20]. Following [11], we employ the Chebyshev polynomials, which afford fast recursive implementation. Chebyshev polynomials form an orthogonal basis on [−1, 1] and have the recurrence relation

Tk(x)=2xTk1(x)Tk2(x),T0(x)=1,T1(x)=x, (2)

where Tk(x) is the Chebyshev polynomial of order k. Then graph convolution from input x to output y can be expressed as

y=gθ(L)x=k=0KθkTk(L˜)x, (3)

where L˜ is the scaled Laplacian L˜=2L/λmaxI with λmax the maximal eigenvalue of L. By using the recurrence relation, the computational complexity is O(Kn), where K is the maximal order of the polynomials and n is the number of graph nodes. In the network training, the K + 1 coefficients {θk}k=0K are learned and multiple graph convolutional layers can be stacked as

f(l)=ξ(k=0Kθk(l)Tk(L˜)f(l1)), (4)

where f (l) denotes feature map at the l-th layer, θk(l) is a vector of Chebyshev polynomial coefficients to be learned at the l-th layer, and ξ is a non-linear activation function.

C. Laplacian Matrix

The DMRI signal sampling domain can be represented as a graph with each node representing a spatial location xi3 and a wave-vector qj3 with corresponding direction gj=q^j=qj/qj and diffusion weighting bjqj2. Then, an adjacency matrix W can be defined with weights {wi,j;i,j}:

wi,j;i,j:=exp(xixi22σx2)×exp(1gj,gj2σg2)×exp((bjbj)2σb2), (5)

where σx, σg and σb are the parameters used to control the contributions of the spatial, angular, and diffusion-weighting distances, respectively. Our definition of the adjacency matrix (5) is modified from the matrix constructed for DMRI denoising in [21]. We note that, in (5), the numerators in the exponential function are normalized to [0, 1].

D. Graph Coarsening and Uncoarsening

In deep CNNs, the receptive fields should be large enough to capture the relevant neighborhood or global context information [22]. One can increase the receptive fields by simply stacking several convolutional layers, or by using sub-sampling or dilated convolutions [23]. U-Net is widely utilized and combined with various architectures, owing to its capability in increasing receptive fields [24]. U-Net consists of two main paths: a contraction path for encoding context and a symmetric expansion path for decoding or increasing the resolution of the output. We propose to achieve this in a GCNN framework with the coarsening and uncoarsening layers, corresponding to down-sampling and up-sampling by a factor of 2, respectively.

Graph coarsening and uncoarsening are not as straightforward as typical pooling and unpooling operations in CNNs. For graph coarsening, we adopt the Graclus multi-scale clustering algorithm [25] as in [11]. This fast coarsening algorithm provides coarse graphs at each coarsening level after rearranging the vertices using a binary tree structure [11]. The graph is represented as a single-array vector with permutation indices for rearrangement. The uncoarsening operation can be achieved via a one-dimensional upsampling operation followed by inverse permutation of indices. In our implementation, we employ the transposed convolution filter [26] to determine the values of all the elements in the upsampled array vector, instead of filling them by simple averaging or the maximum value of neighboring elements. This allows better restoring the signal values on the graph with learned adaptive weights in association with the neighboring elements.

E. Multi-Scale Residual Network

ResNet [27] was recently introduced to ease the training of very deep network. By adding an identity mapping in the residual building block, the signal is directly propagated from one unit to any other units [28]. Figure 1 shows a residual graph convolutional block with two consecutive graph convolutions followed by batch normalization and leaky rectified linear unit (LReLU) activation. Our proposed framework with the residual convolutional block is illustrated in Figure 2. Since the dimensions of the input and output of residual block must be equal, simple graph convolutional layers are added to adapt the dimensionality before the input features are passed through the residual convolutional block. After the residual graph convolutional block, the graph pooling layer and the unpooling layer are applied in the encoding path and the decoding path, respectively. Each convolutional layer outputs 64 feature maps except the last convolutional layer, which outputs just one feature map.

Fig. 1.

Fig. 1.

A residual convolutional block. LReLU: Leaky ReLU.

Fig. 2.

Fig. 2.

A schematic diagram of the proposed GCNN framework.

Multi-scale input graphs are added as new features with simple graph convolutional operations at the contraction path of each level. In this paper, multi-scale inputs are generated by down-sampling with max pooling operations with stride 2 and 4. In the standard U-Net, the skip connection from the layers of equal level at the contraction path is introduced to combine the low-level and high-resolution features to the expansion path. Instead of simply concatenating the low-level features, we apply residual graph convolutions followed by concatenation in order to narrow the gap between low- and high-level features. These additional layers can be interpreted as gap filling layers, which are introduced in [29] to overcome the dissimilarity of feature maps. They are also called transformation modules in [30] for their role in boosting the low-level features to complement the high-level features.

We note that the input and output of GCNN are patches represented as graphs. Each graph is represented as an 1D vector with permutation indices as explained in Section II-D.

F. Adversarial Learning

We employ adversarial learning to better model the non-linear prediction mapping. This is inspired by the image synthesis method presented in [8]. In adversarial learning, the generator estimates the target image and the discriminator distinguishes the target image from the estimated one. During training, the generator and the discriminator are trained in an alternating fashion. Here, the generator G is the proposed GCNN, and the discriminator D is constructed via consecutive graph convolutional layers followed by fully connected layers, as illustrated in Figure 3. The number of filters are 64, 128, and 256 for the graph convolutional layers, and the numbers of the output nodes in the fully connected layers are 512, 128, and 1. In the generator and discriminator, we use leaky ReLU activation with negative slope 0.2.

Fig. 3.

Fig. 3.

Architecture of the discriminator model. FC: Fully connected layer.

We further employ cycle-consistent adversarial learning [17] to utilize unpaired datasets for training. CycleGAN learns the bijective mapping GXY from the source domain X to the target domain Y such that the distribution of GXY (X) is indistinguishable from the distribution of Y. Since GXY is bijective, the cycle consistency loss defines GY X such that GY X (GXY (X)) ≈ X and vice versa.

G. Loss Function

We adopt the L1 loss to measure the distance between images. L1 distance results in images that are less blurry compared to the L2 distance [16]. The loss function is defined as

LG(x,y):=λgFθ(x)y1, (6)

where x and y are the input source of the network and target, respectively, and Fθ(x) is the predicted output through the non-linear mapping Fθ.

For adversarial learning, we define the discriminator loss as

LD(x,y)=LBCE(D(y),1)+LBCE(D(G(x)),0), (7)

where LBCE is the binary cross-entropy function defined as

LBCE(p,q):=i[qilogpi+(1qi)log(1pi)]. (8)

In (8), q is a vector of 1’s for real target images and 0’s for the generated ones, and p is the probability given by the discriminator. The generator loss is defined as

LGADV(x,y)=LG(x,y)+λADVLBCE(D(G(x)),1), (9)

so that the generator G can produce more realistic output to fool the discriminator D.

For cycle-consistent training, we further extend (9) to form a forward and backward cycle as follows:

LGADV(x,y)=λgLGXY(x,y)+λgLGYX(y,x)+λADVLBCE(DY(GXY(x)),1)+λADVLBCE(DX(GYX(y)),1), (10)

where GXY and GY X are the generators from the domain X to Y, and from Y to X, respectively. DX and DY are the discriminators in the domain X and Y, respectively. The parameter λg can be set to 1 for the paired dataset and 0 for the unpaired dataset. The discriminator loss (7) can be extended as

LDY(x,y)=LBCE(DY(y),1)+LBCE(DY(GXY(x)),0),LDX(y,x)=LBCE(DX(x),1)+LBCE(DX(GYX(y)),0).

Cycle consistency loss is defined as

Lcyc(GXY,GYX)=λcycGYX(GXY(x))x1+λcycGXY(GYX(y))y1.

III. Experimental Results

A. Materials

We demonstrate the effectiveness of the proposed methods using a longitudinal dataset of 25 full-term infant subjects. The diffusion-weighted images were acquired using a 3T Siemens Allegra scanner and spin-echo echo planar imaging sequence with TR/TE = 7680/82 ms, voxel size 2 ×2 ×2 mm3, and b = 1000 s/mm2. Each dataset contained 7 non-diffusion-weighted reference scans and 42 diffusion-weighted scans. Each image had dimensions 128 × 96 × 60. We predicted 3-, 6-, 9-, and 12-month-old images from 0-, 3-, 6-, and 9-month-old images, respectively. There were 9, 9, 8, and 7 paired datasets for each age pair. For cycle consistent learning, we utilized additional 4, 6, 6, and 8 unpaired datasets, respectively.

B. Implementation Details

We first aligned all source and target datasets to the longitudinal template space as described in [31]. Then, the diffusion signals were reoriented using the spatial warping method described in [32]. Since the intensity ranges were significantly different for non-diffusion-weighted image (b0) and diffusion-weighted images (DWIs), we normalize each image using the minimal and maximal values computed from the corresponding source image.

We created different adjacency matrices by varying the parameters σx2, σg2 and σb2 in (5) for joint consideration of spatial and angular neighborhoods, resulting in joint GCNN (JGCNN). Note that, instead of constructing two independent networks for b0 images and DWIs, we train a single network with a graph that jointly considers gradient direction and strength in addition to spatial location. Setting σx2=1.0 and σg2=0.0 results in a GCNN that considers only the spatial distance, i.e., spatial GCNN (SGCNN). In SGCNN, the edge weight wi,j;i,j is set to 0 if two nodes belong to different gradient directions, i.e.,gjgj. For the loss function, we set λg = 1.0, λADV = 0.01, and λcyc = 0.1. Note that, for unpaired dataset, we set λg = 0 so that only cycle consistent loss and adversarial loss are considered.

The proposed network was trained using NVIDIA TITAN X with 12 GB of RAM. It was implemented using Tensorflow 1.2.1 and trained with the ADAM optimizer with an initial learning rate of 0.00001 and an exponential decay rate of 0.95. The mini-batch size was set to 10. For adversarial JGCNN (Adv. JGCNN), the initial learning rates for the generator and the discriminator were set to 0.0001 and 0.00001, respectively. We adopted early stopping strategy to prevent over-fitting.

C. Patch Size

Note that the input patch size should be large enough to capture context information. However, a large patch size results in a large number of graph nodes that in turn increase the time needed for training and testing. An increase in patch size also implies a decrease in the number of independent patches that can be extracted, reducing training efficiency. We tested a number of patch sizes and patch offsets, as summarized in Table I. Utilizing leave-one-out cross-validation for the prediction of 3-month-old images from 0-month-old images, we evaluated the impact of different patch sizes using the JGCNN with common hyper-parameters. We measured the prediction accuracy of generalized fractional anisotropy (GFA) by means of mean absolute error (MAE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM). Based on the quantitative results summarized in Table II, we set the patch size as 4 × 4 × 4 × 43.

TABLE I.

Patch sizes, number of nodes, patch offsets, and extracted patches.

Patch size Nodes Patch offset Extracted patches
3 × 3 × 3 × 43 1164 2 18139
4 × 4 × 4 × 43 2752 2 18709
5 × 5 × 5 × 43 5376 3 5741

TABLE II.

GFA prediction accuracy in relation to patch size.

Patch sizes MAE PSNR SSIM
3 × 3 × 3 × 43 0.043 ± 0.004 35.416 ± 0.782 0.976 ± 0.003
4 × 4 × 4 × 43 0.042 ± 0.005 35.727 ± 1.317 0.977 ± 0.004
5 × 5 × 5 × 43 0.097 ± 0.073 28.061 ± 4.252 0.943 ± 0.020

D. Graph Construction

As pointed out in [11], the quality of learned filters relies on the quality of the graph. Construction of the graph Laplacian matrix according to (5) involves three controlling parameters:σx2, σg2 and σb2. We fix two parameters, σx2=1.0 and σb2=2.0, and demonstrate the importance of the angular dimension by varying σg2. As summarized in Table III, the performance increases with the angular weight up to σg2=1. We therefore set σg2=1.

TABLE III.

GFA prediction accuracy in relation toσg2.

σg2 MAE PSNR SSIM
0.01 0.044 ± 0.007 35.488 ± 1.187 0.975 ± 0.005
0.1 0.047 ± 0.008 34.742 ± 1.924 0.974 ± 0.005
1.0 0.042 ± 0.005 35.727 ± 1.317 0.977 ± 0.004
10.0 0.047 ± 0.008 35.140 ± 1.399 0.974 ± 0.006

E. Polynomial Order

Recall that K-th order polynomials yield K-hop localized filters. In general, higher order polynomials yield greater accuracy in filter approximation but at the cost of a linear increase in computation time. We compared the impact of K = 2, 4, 6, 9, 14. The results summarized in Table IV indicate that the gain is marginal when K is greater than 6. We therefore set K = 6.

TABLE IV.

GFA prediction accuracy in relation to polynomial order.

Polynomial order MAE PSNR SSIM
2 0.049 ± 0.016 34.070 ± 3.277 0.972 ± 0.009
4 0.060 ± 0.052 34.128 ± 4.732 0.970 ± 0.017
6 0.042 ± 0.005 35.727 ± 1.317 0.977 ± 0.004
9 0.045 ± 0.007 35.562 ± 1.280 0.976 ± 0.005
14 0.045 ± 0.008 35.566 ± 1.319 0.976 ± 0.005

F. Ablation Study

In order to understand the effect of each component of the proposed network, we conduct an ablation study. We remove the transformation module and the multi-scale input module in the proposed architecture independently, and compare the performance. All models use the same training hyper-parameters as JGCNN. The cross-validation results for the prediction of 3-month-old images from 0-month-old images are summarized in Table V.

TABLE V.

GFA prediction accuracy in relation to model components.

Model MAE PSNR SSIM
JGCNN-no-transf. 0.045 ± 0.007 34.979 ± 0.889 0.975 ± 0.005
JGCNN-no-multi-scale 0.044 ± 0.006 35.454 ± 1.101 0.975 ± 0.004
JGCNN 0.042 ± 0.005 35.727 ± 1.317 0.977 ± 0.004

G. Method Comparison

We evaluated the proposed method using leave-one-out cross-validation for all paired datasets of each age pair. As comparison baseline, we applied 3D U-Net to predict each DWI individually [33]. The GFA prediction results are summarized in Figure 4, and visual comparison results are provided in Figure 6. 3D U-Net and SGCNN consider only local spatial neighborhoods and yield similar accuracy. In contrast, JGCNN fully exploits the spatial and angular neighborhood information and therefore provides more accurate prediction results. Adv. JGCNN yields greater details closer to the target than SGCNN. We further conducted Wilcoxon signed-rank test to validate whether the improvement of the proposed method compared to other methods is statistically significant. The results in Figure 4 indicate that the improvements over 3D U-Net are statistically significant (p < 0.05) for all except one case.

Fig. 4.

Fig. 4.

Quantitative comparison using (a) MAE, (b) PSNR, and (c) SSIM. * and † indicate the p-value < 0.01 and < 0.05, respectively, compared to Adv. JGCNN.

Fig. 6.

Fig. 6.

Predicted GFA maps and the corresponding error maps.

We also compared the relative contrast (RC) [34] between white and gray matter:

RC=SGMSWMSGM+SWM, (11)

where SGM and SW M are the average intensity values of gray matter and white matter, respectively. Figure 5 indicates that the RC of Adv. JGCNN is close to the target.

Fig. 5.

Fig. 5.

Relative contrast between white and gray matter.

Representative prediction results for DWIs, shown in Figure 7, indicate that JGCNN and Adv. JGCNN recover more structural details compared with SGCNN.

Fig. 7.

Fig. 7.

Representative predicted diffusion-weighted images and their close-up views (3 to 6 month prediction).

For regions of interest (ROIs) shown in Figure 8, i.e., forceps major (FMajor), forceps minor (FMinor), left cingulum (L-Cingulum), left inferior fronto-occipital fasciculus (L-IFOF), and left uncinate fasciculus (L-UF), Figure 9 and Figure 10 show that JGCNN with adversarial learning outperforms SGCNN and JGCNN.

Fig. 8.

Fig. 8.

Regions of interest: FMajor (red), FMinor (green), Left Cingulum (blue), Left IFOF (cyan), and Left UF (yellow).

Fig. 9.

Fig. 9.

ROI-specific MAE results. * and † indicate the p-value < 0.01 and < 0.05, respectively, compared to Adv. JGCNN.

Fig. 10.

Fig. 10.

ROI-specific PSNR results. * and † indicate the p-value < 0.01 and < 0.05, respectively, compared to Adv. JGCNN.

To quantitatively compare fiber orientation distribution functions (ODFs) and tracts, the normalized root-mean-square errors (NRMSE) in terms of spherical harmonic (SH) coefficients of fiber ODFs and the probability of false fiber detection [35] are computed and summarized in Table VI.

TABLE VI.

NRMSE of SH coefficients of various methods and probability of false fiber detection (Pd %).

Methods 0 to 3 3 to 6 6 to 9 9 to 12
NRMSE Pd (%) NRMSE Pd (%) NRMSE Pd (%) NRMSE Pd (%)
3D U-Net 0.574 ± 0.044 7.084 ± 1.054 0.591 ± 0.038 8.128 ± 0.885 0.593 ± 0.033 8.080 ± 0.806 0.615 ± 0.028 8.453 ± 1.124
SGCNN 0.556 ± 0.041 8.727 ± 1.081 0.537 ± 0.041 9.485 ± 1.174 0.572 ± 0.050 8.208 ± 0.678 0.622 ± 0.036 8.797 ± 0.866
JGCNN 0.562 ± 0.037 8.322 ± 1.229 0.605 ± 0.096 8.531 ± 1.899 0.540 ± 0.036 11.411 ± 1.445 0.629 ± 0.043 8.303 ± 2.039
Adv. JGCNN 0.536 ± 0.044 7.068 ± 0.888 0.540 ± 0.032 7.380 ± 0.843 0.565 ± 0.034 7.884 ± 1.245 0.569 ± 0.017 7.345 ± 0.779

IV. Discussion

In this paper, we have proposed a GCNN based method for predicting missing infant brain DMRI data. This is a challenging task as the infant brain changes dynamically during the first postnatal year. Our method harnesses information from the spatial domain and diffusion wave-vector domain jointly for effective prediction.

The graph adjacency matrix, which defines the Laplacian, are closely associated with the spectral filters and plays a crucial role in GCNN. In this work, we defined the adjacency matrix as the product of Gaussian functions of distances in terms of physical space, gradient directions, and gradient strengths. In this work, the optimal values for tuning parameters (σx2,σg2,σb2) were obtained via greedy search by fixing one of the parameters at each time. We observed that setting σx2 too high results in blurred images. The experimental results confirmed that joint consideration of the information in spatial and wave-vector domains improve prediction accuracy. The proposed method assumes a homogeneous graph structure across all samples, i.e., the source and the target share the same gradient tables. More general GCNN methods have been proposed for heterogeneous graphs [36].

We employed Chebyshev polynomial approximation for spatially localized filtering. An alternative is the Cayley polynomials which have been shown in [37] to exhibit spectral zoom property. They argue that Chebyshev polynomials make it hard to produce narrow-band filters, and this issue can be addressed by the learnable spectral zoom factor in Cayley filters. However, as pointed out by the authors, the current implementation on Tensorflow is slow due to the Jacobi iteration for the approximation of matrix inversion.

The performance of the prediction can be improved by implementing a more sophisticated network architecture and by efficient training strategies. We observed that some structures in the cortex are difficult to recover. More effective sampling strategies, such as hard example mining [38] and self-paced learning [39], can be applied to improve learning accuracy.

We note that the proposed method predicts missing DMRI data at fixed time points determined by the training image pairs. Prediction for arbitrary time points is more challenging and needs to be addressed in the future.

The proposed method can potentially be applied to other problems such as image-to-image translation, image synthesis, and super-resolution, with the data residing in non-Euclidean domains that can be represented with graphs.

V. Conclusion

We have proposed a novel method for prediction of missing DMRI data using GCNN. We consider jointly spatial and angular neighborhood information in constructing a graph, based on which convolutions are performed in a graph-based CNN framework for data prediction. We applied our method in predicting missing infant DMRI data and showed that it can predict DMRI data with more structural details.

Supplementary Material

supplementary_materials

Acknowledgments

This work was supported in part by NIH grants (NS093842, EB022880, MH117943, EB006733, EB009634, AG041721, and MH100217).

Footnotes

This paper has supplementary downloadable material available at http://ieeexplore.ieee.org., provided by the author.

Contributor Information

Yoonmi Hong, Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina at Chapel Hill, NC, U.S.A..

Jaeil Kim, School of Computer Science and Engineering, Kyungpook National University, Daegu, Korea..

Geng Chen, Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina at Chapel Hill, NC, U.S.A..

Weili Lin, Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina at Chapel Hill, NC, U.S.A..

Pew-Thian Yap, Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina at Chapel Hill, NC, U.S.A..

Dinggang Shen, Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina at Chapel Hill, NC, U.S.A.; Department of Brain and Cognitive Engineering, Korea University, Seoul, Korea.

References

  • [1].Gilmore JH, Kang C, Evans DD, Wolfe HM, Smith JK, Lieberman JA, Lin W, Hamer RM, Styner M, and Gerig G, “Prenatal and neonatal brain structure and white matter maturation in children at high risk for schizophrenia,” American Journal of Psychiatry, vol. 167, no. 9, pp. 1083–1091, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Li G, Nie J, Wang L, Shi F, Lin W, Gilmore JH, and Shen D, “Mapping region-specific longitudinal cortical surface expansion from birth to 2 years of age,” Cerebral cortex, vol. 23, no. 11, pp. 2724–2733, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Li G, Wang L, Yap P-T, Wang F, Wu Z, Meng Y, Dong P, Kim J, Shi F, Rekik I et al. , “Computational neuroanatomy of baby brains: A review,” NeuroImage, 2018. [DOI] [PMC free article] [PubMed]
  • [4].Niethammer M, Huang Y, and Vialard F-X, “Geodesic regression for image time-series,” in International conference on medical image computing and computer-assisted intervention Springer, 2011, pp. 655–662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Rekik I, Li G, Wu G, Lin W, and Shen D, “Prediction of infant mri appearance and anatomical structure evolution using sparse patch-based metamorphosis learning framework,” in International Workshop on Patch-based Techniques in Medical Imaging Springer, 2015, pp. 197–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Meng Y, Li G, Rekik I, Zhang H, Gao Y, Lin W, and Shen D, “Can we predict subject-specific dynamic cortical thickness maps during infancy from birth?” Human brain mapping, vol. 38, no. 6, pp. 2865–2874, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Huang Y, Shao L, and Frangi AF, “Simultaneous super-resolution and cross-modality synthesis of 3D medical images using weakly-supervised joint convolutional sparse coding,” arXiv preprint arXiv:1705.02596, 2017.
  • [8].Nie D, Trullo R, Lian J, Wang L, Petitjean C, Ruan S, Wang Q, and Shen D, “Medical image synthesis with deep convolutional adversarial networks,” IEEE Transactions on Biomedical Engineering, 2018. [DOI] [PMC free article] [PubMed]
  • [9].Bruna J, Zaremba W, Szlam A, and LeCun Y, “Spectral networks and locally connected networks on graphs,” arXiv preprint arXiv:1312.6203, 2013.
  • [10].Henaff M, Bruna J, and LeCun Y, “Deep convolutional networks on graph-structured data,” arXiv preprint arXiv:1506.05163, 2015.
  • [11].Defferrard M, Bresson X, and Vandergheynst P, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Advances in Neural Information Processing Systems, 2016, pp. 3844–3852.
  • [12].Bronstein MM, Bruna J, LeCun Y, Szlam A, and Vandergheynst P, “Geometric deep learning: going beyond Euclidean data,” IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 18–42, 2017. [Google Scholar]
  • [13].Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, and Bengio Y, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
  • [14].Radford A, Metz L, and Chintala S, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
  • [15].Denton EL, Chintala S, Fergus R et al. , “Deep generative image models using a laplacian pyramid of adversarial networks,” in Advances in neural information processing systems, 2015, pp. 1486–1494.
  • [16].Isola P, Zhu J-Y, Zhou T, and Efros AA, “Image-to-image translation with conditional adversarial networks,” arXiv preprint, 2017.
  • [17].Zhu J-Y, Park T, Isola P, and Efros AA, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” arXiv preprint, 2017.
  • [18].Wolterink JM, Leiner T, Viergever MA, and Išgum I, “Generative adversarial networks for noise reduction in low-dose CT,” IEEE transactions on medical imaging, vol. 36, no. 12, pp. 2536–2545, 2017. [DOI] [PubMed] [Google Scholar]
  • [19].Ghazi MM, Nielsen M, Pai A, Cardoso MJ, Modat M, Ourselin S, and Sørensen L, “Robust training of recurrent neural networks to handle missing data for disease progression modeling,” 2018.
  • [20].Hammond DK, Vandergheynst P, and Gribonval R, “Wavelets on graphs via spectral graph theory,” Applied and Computational Harmonic Analysis, vol. 30, no. 2, pp. 129–150, 2011. [Google Scholar]
  • [21].Chen G, Dong B, Zhang Y, Shen D, and Yap P-T, “Neighborhood matching for curved domains with application to denoising in diffusion MRI,” in International Conference on Medical Image Computing and Computer-Assisted Intervention Springer, 2017, pp. 629–637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Luo W, Li Y, Urtasun R, and Zemel R, “Understanding the effective receptive field in deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2016, pp. 4898–4906.
  • [23].Yu F and Koltun V, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
  • [24].Ronneberger O, Fischer P, and Brox T, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention Springer, 2015, pp. 234–241. [Google Scholar]
  • [25].Dhillon IS, Guan Y, and Kulis B, “Weighted graph cuts without eigenvectors a multilevel approach,” IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 11, 2007. [DOI] [PubMed] [Google Scholar]
  • [26].Long J, Shelhamer E, and Darrell T, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440. [DOI] [PubMed] [Google Scholar]
  • [27].He K, Zhang X, Ren S, and Sun J, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [Google Scholar]
  • [28].He K, “Identity mappings in deep residual networks,” in European Conference on Computer Vision Springer, 2016, pp. 630–645. [Google Scholar]
  • [29].Fan J, Cao X, Yap P-T, and Shen D, “BIRNet: Brain image registration using dual-supervised fully convolutional networks,” arXiv preprint arXiv:1802.04692, 2018. [DOI] [PMC free article] [PubMed]
  • [30].Nie D, Wang L, Adeli E, Lao C, Lin W, and Shen D, “3-D fully convolutional networks for multimodal isointense infant brain image segmentation,” IEEE Transactions on Cybernetics, 2018. [DOI] [PMC free article] [PubMed]
  • [31].Kim J, Chen G, Lin W, Yap P-T, and Shen D, “Graph-constrained sparse construction of longitudinal diffusion-weighted infant atlases,” in International Conference on Medical Image Computing and Computer-Assisted Intervention Springer, 2017, pp. 49–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Chen G, Zhang P, Li K, Wee C-Y, Wu Y, Shen D, and Yap P-T, “Improving estimation of fiber orientations in diffusion MRI using inter-subject information sharing,” Scientific reports, vol. 6, p. 37847, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, and Ronneberger O, “3D U-Net: learning dense volumetric segmentation from sparse annotation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention Springer, 2016, pp. 424–432. [Google Scholar]
  • [34].Jones RA, Palasis S, and Grattan-Smith JD, “MRI of the neonatal brain: optimization of spin-echo parameters,” American Journal of Roentgenology, vol. 182, no. 2, pp. 367–372, 2004. [DOI] [PubMed] [Google Scholar]
  • [35].Michailovich O, Rathi Y, and Dolui S, “Spatially regularized compressed sensing for high angular resolution diffusion imaging,” IEEE transactions on medical imaging, vol. 30, no. 5, pp. 1100–1115, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Such FP, Sah S, Dominguez MA, Pillai S, Zhang C, Michael A, Cahill ND, and Ptucha R, “Robust spatial filtering with graph convolutional neural networks,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 6, pp. 884–896, 2017. [Google Scholar]
  • [37].Levie R, Monti F, Bresson X, and Bronstein MM, “Cayleynets: Graph convolutional neural networks with complex rational spectral filters,” arXiv preprint arXiv:1705.07664, 2017.
  • [38].Shrivastava A, Gupta A, and Girshick R, “Training region-based object detectors with online hard example mining,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 761–769. [Google Scholar]
  • [39].Kumar MP, Packer B, and Koller D, “Self-paced learning for latent variable models,” in Advances in Neural Information Processing Systems, 2010, pp. 1189–1197.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementary_materials

RESOURCES