Abstract
Most deep learning models for temporal regression directly output the estimation based on single input images, ignoring the relationships between different images. In this paper, we propose deep relation learning for regression, aiming to learn different relations between a pair of input images. Four non-linear relations are considered: “cumulative relation”, “relative relation”, “maximal relation” and “minimal relation”. These four relations are learned simultaneously from one deep neural network which has two parts: feature extraction and relation regression. We use an efficient convolutional neural network to extract deep features from the pair of input images and apply a Transformer for relation learning. The proposed method is evaluated on a merged dataset with 6,049 subjects with ages of 0-97 years using 5-fold cross-validation for the task of brain age estimation. The experimental results have shown that the proposed method achieved a mean absolute error (MAE) of 2.38 years, which is lower than the MAEs of 8 other state-of-the-art algorithms with statistical significance (p<0.05) in paired T-test (two-side).
Keywords: Deep relation learning, brain age estimation, deep learning
I. Introduction
Regression aims to estimate continuous values (or ordinal outcomes [1]) from the input data using machine learning models. It has many applications such as severity scores [2], [3], brain age estimation [4], [5] and fluid intelligence prediction [6]. Deep convolutional neural networks (CNNs) can transform the raw input image data into target variables by training on a large-scale dataset [7]. Therefore, many recent applications use deep learning to solve the regression problem [5], [8], [9].
Most deep regression methods estimate the ordinal output z based on a single input image x sampled from a set 𝒳, which can be denoted by (where is the machine learning model and is the set of parameters). This requires the machine learning model to learn the output directly from the single input image x without any reference. Instead of learning from the single image, pairwise learning is also used in regression [1], [10], [11] and segmentation [12], aiming to learn the ordinal relationship r between the pair of input subjects (x, y), such as the ternary relationship [1]: “greater than”, “similar to” or “smaller than”. The order regression based on pairwise learning has two advantages: (1) it is easy to learn the relationship between two instances [13] and (2) y can be used as the reference to estimate an ordinal output of x [1]. However, limitations of the order learning with the ternary relationship proposed in [1] include: (1) it only learns one relationship between x and y, thus the estimation of the ordinal output of x needs a chain with many different references y and (2) it lacks the reflexivity [12] which cannot estimate the output with the pair of the identical input (x, x). (The reflexivity means the model can provide the estimation based on a pair of identical input images.)
In this paper, we propose a novel deep relation learning framework to solve these limitations of ordinal learning. Given two sets 𝒳 and 𝒴, the Cartesian product 𝒳 × 𝒴 is defined as {(x, y)∣x ∈ 𝒳 and y ∈ 𝒴} which contains a pair of elements from two sets. A relation r over sets 𝒳 and 𝒴 can be defined to capture the relationships between x and y. Given a pair of images (x, y) and the corresponding chronological ages (τx, τy), we train a deep learning model to learn relations r ∈ 𝓡 where 𝓡 is the set of relationships without linear correlation, including “cumulative relation” (r1 = τx + τy), “relative relation” (r2 = τx – τy), “maximal relation” (r3 = max(τx, τy)), and “minimal relation” (r4 = min(τx, τy)). The cumulative relation τ1 = τx + τy aims to learn the sum of ages from the two subjects which can mitigate the additive noise. If the model has errors on subjects x and y but the errors are in a different direction (positive on one and negative on the other), the cumulative relation τ1 = τx+τy will be close to the ground-truth. Another potential application of the cumulative relation is the bias correction [14]. Based on the regression to the mean problem [15], the brain ages of the young subjects are usually over-estimated and the brain ages of the old subjects are usually under-estimated. Recent studies are on correcting this bias [16], [14], [17], but with controversies [18]. With the pair of the young and old subjects, the cumulative relation we propose here may serve as a potential model for bias correction. On the other hand, if the errors on two subjects are in the same direction, the relative relation r2 = τx – τy may get us closer to the ground-truth. In addition, the relative relation can also be converted into the order relationship [1] or relative attributes [19]. The maximal relation r3 = max(τx, τy) and minimal relation r4 = min(τx, τy) are useful for obtaining the upper and lower boundary of the estimated ages from the two subjects.
The advantages of the proposed deep relation learning are summarized as follows: (1) it is an extension of the order learning. When only “relative relation” r2 = τx – τy is applied, it becomes the deep order learning [1]; (2) the values of the x and y can be directly estimated from the relations ri, i ∈ {1, 2, 3, 4}; (3) mathematically, a relation is reflexivity if each element is related to itself. The proposed four relations are reflexivity [12] because the model can compute the relations between the pair of the same image (x, x) and r1 = 2τx, r2 = 0, r3 = r4 = τx. All the four relations can be used for prediction with an ensemble principle, similar to the multi-input multi-output (MIMO) method [20]; (4) y can be used as the reference instance with the known age τy to estimate the τx based on ensemble learning, providing a robust prediction with different y.
We use the proposed deep relation learning for brain age estimation based on structural magnetic resonance imaging (MRI), which contains not only brain anatomy information, but also the brain age information. The MRI-based brain age estimation is a typical regression task that aims at estimating the (artificial) biological brain age on brain MRIs using machine learning techniques [21], [22], [23]. The estimated “brain age” is purely computed from brain MRIs and is a useful biomarker for brain health. Mathematically, we use the ν and to denote the chronological and estimated brain ages, respectively. is estimated by machine learning model 𝓜, given an input image . The difference between the estimated and chronological brain ages is usually named “brain age gap, (BAG)” [4]. Many studies have shown that the BAG is correlated to brain diseases or disorders, such as Alzheimer’s Disease (AD) [9], Bipolar disorder [24], Autism Spectrum Disorder (ASD) [25], Psychopathology [26], Major Depressive Disorder (MDD) [27], Multiple sclerosis [28], Psychosis [29] and other common brain disorders [30].
Deep convolutional neural networks can extract task-discriminative features and learn the subtle patterns in the minimally pre-processed input MR images [31]. Many deep learning brain MRI age estimation models have roots in models widely used in computer vision. For example, the Age-Net [22] which is a hybrid combination of inception v1 [32] and SqueezeNet [33]. The DeepBrainNet [9] which has been built based on the inception-resnet-v2 [34] framework, the two-stage-age-network (TSAN) [23] which is inspired by the DenseNet [35], the simple fully convolution network (SFCN) which is the lightweight version of the VGGNet [36] and the fusion with attention (FiA-Net) [37] which is the multi-channel fusion network based on the ResNet [38] and Hi-Net [39]. All of these methods use deep learning models to estimate the brain age directly from input brain MRIs. Our proposed method, however, is different from these models and aims to learn the different relations on a pair of input subjects. Our proposed method can also directly estimate the brain age from the pair of identical input images with an ensemble strategy.
Fig. 1 shows the proposed hybrid neural network of deep relation learning for brain age estimation with a pair of input images (x, y). To learn different relations between two input subjects of brain MRIs, we first use a simple and efficient neural network (SFCN) [40] to extract the deep features. The SFCN is a lightweight neural network and it can estimate brain ages from 3D MRIs with few parameters and low consumption of computational memories. The Transformer with attention [41] is exploited to learn different relations with a multi-task framework [42]. A Transformer model in our proposed framework uses the self-attention mechanism to learn the information from a pair of input images for relation learning. Studies [43], [44], [45] have shown that Transformer can capture the inherent relations between different inputs.
Fig. 1.
The framework of the proposed deep relation learning for brain age estimation has two parts: feature extraction and relation regression. It has the pair of inputs (x, y) and the CNN backbones ℱ1 and ℱ2 for feature extraction. The two backbones can be shared (Siamese) or be independent, which stack several Convolutional, Batch Normalization, Max-Pooling and ReLU layers. The outputs of the CNN backbones are split into tokens to the standard transformer for relation regression: learning the four relations between the pair of inputs x and y.
The main contributions of this paper are summarized as follows:
we propose a deep relation learning framework for regression with four different relations not linearly related: accumulative, relative, maximal and minimal relations between a pair of input images.
we evaluate the proposed deep relation learning for brain age estimation and there are different ways to estimate the brain age based on the pair of input images with the ensemble strategy.
we propose a hybrid neural network with a convolutional neural network for feature extraction and a Transformer for relation learning.
The rest of the paper is organized as follows: Section II describes the proposed four different relations and the structure of the neural network for deep relation learning. Section III presents the detailed experimental setting for brain age estimation using the proposed deep relation learning. The results are provided in Section IV and the discussion and conclusion is given in Section V.
II. Method
A. Relations for regression
As mentioned above, the Cartesian product 𝒳×𝒴 is defined as {(x, y)∣x ∈ 𝒳 and y ∈ 𝒴} which contains a pair of elements from two sets 𝒳 and 𝒴. In this paper, the sets 𝒳 = 𝒴 are datasets containing brain MRIs and the defined relation is called a homogeneous relation [46]. For brain age estimation, the brain age range of the subjects is: 0 ≤ τx, τy ≤ A, where A denotes the maximum age contained in the dataset. The typical relations can be defined as:
| (1) |
where x and y denote different input images and τx and τy denote the corresponding chronological age of subjects x and y, respectively.
In practice when using the neural network to estimate the relation and it should be bounded: ∣r∣ ≤ M where M is a real number. Thus, we only use the first four relations ri, i ∈ {1, 2, 3, 4} and do not consider the “amplifying relation r5” and the “divided relation r6” since their boundaries are very large, toward [0,10,000] or even [0,+∞], making it hard to train the neural network on a lifespan dataset with brain age of 0-100 years. We train a neural network to learn the four relations ri, i ∈ {1, 2, 3, 4} and the brain age estimation can be obtained based on the estimated ri, i ∈ {1, 2, 3, 4} from the trained neural network.
B. Deep neural network
The framework of the proposed deep relation learning is shown in Fig. 1, which contains two parts: a convolutional neural network (CNN) backbone for deep feature extraction from the pair of input images and a Transformer for the fusion of the extracted deep features to learn the four relations.
1). CNN backbone to extract deep features:
For the CNN backbone, we use a structure similar to the Simple Fully Convolutional Neural Network (SFCN) [40]. The network contains 6 blocks and each block on the first 5 blocks contains a convolutional layer with a kernel size of 3 × 3 × 3, a batch normalization layer [47], a ReLU activation layer [48] and a max-pooling layer with a kernel size of 2 × 2 × 2 and stride 2. The last block contains a convolutional layer with a kernel size of 1 × 1 × 1, a batch normalization and a ReLU layer. We also set the channel numbers of each convolutional layers to [32, 64, 128, 256, 256, 64], as in [40].
In practice we have found that applying a max-pooling layer with the kernel size of 2 × 2 × 2 and stride 2 at the beginning of the neural network or directly on the input images (see Fig. 1) can reduce the errors in brain age estimation. It can also reduce the image size at the beginning and thus reduce the computational complexity and memory cost. Applying the max-pooling directly on input images can help reduce the redundant information in the input images. We name the neural network with the max-pooling at the first layer “mSFCN” in this paper. Using the CNN, the 3D input image x can be converted into a tensor where d = 64 is the feature dimension and h, w, c are the height, the width, and the number of channels in each input image, respectively.
2). Transformer to learn relations:
Given 4D feature tensors Tx and Ty extracted from the pair of input images (x, y), we exploit a “patch” operation to convert the feature tensors into a sequence of feature vectors where each feature vector represents the deep feature from a patch receptive field of the input image. These two 4D feature tensors are reshaped into 2D sequences of tokens: and (where L = h × w × c). Each token is a 1D feature vector with the size of L and there are L = h × w × c tokens in total extracted on each 4D feature tensor. In practice, the size of the input image is 2 × 80 × 130 × 170 where 2 represents the 2 channels of input images (intensity image and RAVENS map, see details on Section III.A) and 80, 130, 170 are the size of the three dimensions of the brain MRIs. The size of the tensors Tx and Ty extracted from the mSFCN is (d =)64 × (h =)4 × (w =)5 × (c =)2 after five max-pooling layers (with kernel size of 2 and stride of 2) on the mSFCN backbone. Thus, the size of the corresponding token sequences tx and ty is 64 × 40. These two sequences of tokens are concatenated into a sequence of 2L tokens [49]: . The combined tokens from the two input images are concatenated into a sequence of tokens with a size of 64 × 80 and fed into a standard Transformer [41] which contains several encoder blocks. Each encoder block contains two parts: an attention part and a multi-layer perceptron (MLP) part. The attention part consists of Layernorm and multi-headed self-attention (the head size is 8) layers:
| (2) |
The key idea of self-attention is to fuse the tokens with the attention mechanism. It first transforms the sequence of tokens td×2L into query Qd×2L, key Kd×2L and value Vd×2L by a linear transformation:
| (3) |
The second part is multi-layer perceptron (MLP) layer. It is also named Feed-Forward Networks (FFN) [41] which contains two fully connected layers with a ReLU activation in between:
| (4) |
Finally, the outputs of the relation estimation are computed as:
| (5) |
where td,i is the ith token sampled from the sequence of the tokens td×2L and class_head is a fully-connected layer to learn the relation ri, i ∈ {1, 2, 3, 4}.
3). Neural network training:
We use the mean absolute error (MAE) as the loss function to train the neural network for relation learning:
| (6) |
where is the estimated relation from the network and ri is the ground-truth for the ith relation (1 ≤ i ≤ 4). n is the number of samples involved in the computation (i.e. the number of training samples on each batch). The mean absolute error loss is widely used to train the neural network for brain age estimation [8], [5], [22]. The neural network is trained with a total loss computed by:
| (7) |
where K is the number of relations estimated by the neural network. For joint relation learning, the neural network is used to estimate the four relations and K = 4. For pair relation learning, the neural network is trained to estimate a pair of relations ((r1, r2) and (r3, r4)) and K = 2. For single relation learning, the neural network is only used to estimate one relation and K = 1. The detailed information of the joint, pair, and single relation learning can be found in Section III-C3.
III. Experiments
We will first introduce our data (III-A) and network training (III-B). Two major experiment settings follow: optimizing various components in the framework to maximize the accuracy of relation learning (III-C), and evaluating the accuracy of the proposed relation-based brain age estimation (III-D).
A. Data sets
In this paper, we use a lifespan dataset which is the same as used in our previous work [37]. A lifespan dataset is recently used in studies [56], [57], [9] and the deep learning model trained on a lifespan dataset can be applied/transferred to any age group without introducing artificial boundaries on the predicted ages. This is especially important for quantifying premature aging or development delays in diseased cohorts, as MRI-manifested brain ages can range from 0-100 even though the patient’s actual ages are in a narrowly bounded range. The summary of the dataset is shown in Table I. The merged dataset consists of brain MRI scans from 8 datasets with 6,049 samples (0-97 years of age). Fig.2 shows the age distribution of the dataset. Only healthy brains with T1-weighted MRIs are collected in each dataset.
TABLE I.
Demographics of datasets used in this paper, sorted by median age (years).
| Dataset | Nsamples | Age range [Median] | Male/Female |
|---|---|---|---|
| MGHBCH [50] | 428 | 0-6 [1.7] | 226/202 |
| NIH-PD [51] | 1,211 | 0-22.3 [9.8] | 585/626 |
| ABIDE-I [52] | 567 | 6.47-56.2 [14.8] | 469/98 |
| BGSP [53] | 1,570 | 19-53 [21] | 665/905 |
| BeijingEN 1 | 180 | 17-28 [21] | 73/107 |
| IXI2 | 556 | 20.0-86.3 [48.6] | 247/309 |
| DLBS [54] | 315 | 20-89 [54] | 117/198 |
| OASIS-3 [55] | 1,222 | 42-97 [69] | 750/472 |
| Total | 6,049 | 0-97 [22.8] | 3,132/2,917 |
Fig. 2.
Age distribution of the datasets used in this paper, covering the age from 0 to 97 years, with a mean of μ = 30.58 years (median 22.8 years) and a standard deviation of σ = 24.52 years.
Similar to [37], we perform a minimum pre-processing and harmonization of the T1w images with the following steps: (1) N4 bias correction [58]; (2) field of view normalization [59]; (3) Multi-Atlas Skull Stripping (MASS) [60], [61]; (4) non-rigidly registered to the SRI24 atlas [62] by the Deformable Registration via Attribute-Matching and Mutual-Saliency weighting (DRAMMS) algorithm [63]; (5) splitting the registered image into two channels: intensity image containing the contrast information and RAVENS map [64] containing the morphological information. The final size of the MRIs after pre-processing is 80 × 130 × 170 per channel without the black background voxels on the boundary. We concatenate the intensity image and RAVENS map as the input of the 3D neural network so the image size of each subject is 2 × 80 × 130 × 170. Our previous study showed that explicitly splitting the T1w image into two channels led to more accurate age estimation than each channel alone [37].
B. Network training
The network is trained by the Adam optimizer built in PyTorch platform, with an initial learning rate of 0.0001, reducing to half at every 35 epochs in the total 80 training epochs. The batch size is set to 20 due to the limitation of the GPU memory. The training of the neural network takes around 24 hours on a single NVIDIA RTX 6000 GPU with 24G memory. We divide the training images into 100 groups according to their ages. To make a balanced distribution of the input pair (x, y) during training, on each iteration, we first randomly select an age group and then randomly select an image from this group to collect the batch of the training samples. Fig. 3 shows the training loss and testing accuracies of the four relations. The MAEs of the four relations decrease fast in the first 35 epochs. After that, the neural network starts to converge and the accuracies of the four relations increase slowly and are stable with 80 training epochs.
Fig. 3.
The training loss and testing accuracies (in terms of MAE, the lower MAE, the better accuracy) of different relations. The bold curves are the average MAEs over the five-fold cross-validation.
C. Optimization of the accuracy for relationship estimation
1). Accuracy metric in cross validation for relationship estimation:
We use the cross-validation strategy [65], [37] to evaluate the accuracy of the relation estimation. The merged dataset is randomly split into 5 folds of approximately equal sample sizes without overlapping. Each time, one fold of the samples is used for evaluation and the rest four folds are used for training. It repeats five times and each sample is left out for testing once and only once.
Our relations are computed based on a pair of input images and the evaluation of the relation estimation is performed on a set of test pairs which contains N pairs of input images. In other words, N is the total number of testing pairs involved in the evaluation. We use three accuracy metrics computed on the test pairs: mean absolute error (MAE), cumulative score (CS) and Pearson correlation coefficient. The MAE is computed as the mean absolute errors between the estimation and ground-truth of the relations on test pairs. Note that the MAE is computed on the test samples for testing which is different from the MAE loss defined in Eq. 6 computed on the training samples for training. The CS is computed by: CS(α) = N∣e∣≤α/N ∈ 100%, where N∣e∣≤α is the number of test pairs whose absolute error ∣e∣ is no higher than a given threshold α. Following previous works [37], [66], we set the α = 5 (years) in experiments. The Pearson correlation is computed between the ground-truth ri and the estimated on the whole test set.
2). Eight variations of underlying deep learning models:
There are two main parts of the neural network for relation learning given a pair of input (x, y): feature extraction and relation regression (as shown in Fig. 1). For the feature extraction part, we compare the accuracies of the SFCN [40] and the mSFCN which applies a Max-Pooling layer (with the kernel size of 2 × 2 × 2, stride size of 2) on the input image. The backbones ℱ1 and ℱ2 can be shared (similar to the Siamese network [1]: ℱ1 = ℱ2) or independent (the same network with different parameters [20]: ℱ1 ≠ ℱ2). For the relation regression part, we consider the traditional CNN-based method [1] as the baseline: deep features from the two pairs are concatenated and input to three fully-connected (FC) layers with the size of: 64, 64, 4-channel vectors sequentially. The 4-channel vectors represent the 4 output relations. We also use the Transformer to compute the four relations and we set the number of transformer blocks to 2, with the same number of FC layers of the CNN-based method for a fair comparison.
Table II summarizes the 8 model variations compared in this paper and the main difference is which backbone (SFCN or mSFCN, and whether it is shared or independent) is used for feature extraction and which model (FCs or Transformer) is used for relation regression (Fig. 1).
TABLE II.
Model variations compared with different backbones and relation regressions. (FCs represents the three Fully-Connected layers for relation regression)
| Model Name | Configuration |
|---|---|
| SFCNs+FCs | Shared SFCN backbone (ℱ1 = ℱ2) with FCs for relation regression |
| SFCNi+FCs | Independent SFCN backbone (ℱ1 ≠ ℱ2) with FCs for relation regression |
| mSFCNs+FCs | Shared mSFCN backbone (ℱ1 = ℱ2) with FCs for relation regression |
| mSFCNi+FCs | Independent mSFCN backbone (ℱ1 ≠ ℱ2) with FCs for relation regression |
| SFCNs+Transformer | Shared SFCN backbone (ℱ1 = ℱ2) with Transformer for relation regression |
| SFCNi+Transformer | Independent SFCN backbone (ℱ1 ≠ ℱ2) with Transformer for relation regression |
| mSFCNs+Transformer | Shared mSFCN backbone (ℱ1 = ℱ2) with Transformer for relation regression |
| mSFCNi+Transformer | Independent mSFCN backbone (ℱ1 ≠ ℱ2) with Transformer for relation |
3). Using 1, 2 or 4 deep learning models for estimating 4 relations:
The outputs of the proposed neural network are the four relations of the pair of inputs (x, y). Three different configurations are considered: joint relation learning, pair relation learning and single relation learning. For the joint relation learning, we train 1 neural network to learn the four relations with the multi-task framework [67]. For the pair relation learning, we train 2 neural networks to learn the pair of relation (r1, r2) and (r3, r4), respectively. For the single relation learning, we train 4 neural networks and each neural network learns one relation. On the test set, the pairs of the input (x, y) are randomly sampled for the testing set and the results from the 5-fold cross-validation are averaged.
D. Accuracy evaluation for brain age estimation
1). Accuracy metric in cross-validation for brain age estimation:
Similar to Section III-C, we also use 5-fold cross-validation strategy to evaluate the accuracy of brain age estimation based on estimated relations in different configurations given a pair of input images.
For evaluation of brain age estimation, we also use the three accuracy metrics: MAE, CS and Pearson correlation coefficient. Specifically, the MAE for brain age estimation is computed by: where is the estimated brain age of the test scan x and τx is the corresponding chronological age. M is the total number of the testing images. The CS for brain age estimation is computed by: CS(α) = M∣e∣≤α/M×100% which indicates the accuracy of brain age estimation with the absolute error ∣e∣ no higher than a given threshold α. We also set the α = 5 (years) for brain age estimation in experiments. The Pearson correlation is computed between the ground-truth τx and the estimated on the whole test images (M=6,049 in our experiment since each subject has been left out once and only once in the 5-fold cross-validation).
To measure the significance of the improvement of the proposed method compared to other state-of-the-art models, we use paired t-test (two-side) to compute p-value between the absolute errors obtained by the proposed method and absolute errors obtained by other state-of-the-art models in comparison over all the 6,049 test images for brain age estimation. p ≤ 0.05 indicates that there is a significant difference between the MAEs of the two different models in comparison.
2). Accuracy of relation-based brain age estimation when x ≠ y, and x and y both from the testing set:
Based on the estimated relations , i ∈ {1, 2, 3,4} of the input pair (x, y), the estimated brain age and of both x and y can be computed simultaneously based on the following equations:
| (8) |
There are two estimations of each pair x and y and we estimate the age of image x by averaging the two above, and do the same for estimating the age for image y.
In this section, we group the testing images into pairs and feed them into the trained neural network for relation estimation. The estimated brain ages of both x and y are computed according to Eq.(8). The average of the MAE, CS(α=5 years) and Pearson correlation from models trained for 5-fold cross-validation are reported.
3). Accuracy of relation-based brain age estimation on x with reference y when x ≠ y, x from the testing set and y from the training set.:
In this section, we estimate the brain age of the testing input x from the testing set based on the reference y which is sampled from the training set with known brain age τy. The estimation of the τy is close to the ground-truth because y is sampled from the training set. Thus, the error of the estimation is only from the test sample x. We sample no more than 2 brain MRIs from each age on the training set and there are roughly 186 reference samples y used in experiments. Based on the definition the four relations as defined on Eq. (1), for each input testing image x, there are four different ways to estimate the brain age of x based on the known age τy of the reference y:
| (9) |
where , i ∈ {1, 2, 3, 4} are the estimation of the four relations from the trained neural network. We also compare the proposed methods with the baseline model proposed in [1], which only considers the “relative relation r2” and converts it into a binary relation given a threshold t [1]: τx > τy if r2 > t, τx ≈ τy if ∣r2∣ ≤ t and τx < τy if r2 < −t. Given the binarized order relationship, the estimation of is obtained by the maximum consistency (MC) rule [1] given N different reference y with the chronological age τyi:
| (10) |
where ϕ(τx, τyi, τx′) is the consistency function defined by:
| (11) |
where [•] is the indicator function and ϕ(τx, τyi, τx′) returns either 0 (inconsistent) or 1 (consistent). More computation and explanation details of the MC rule can be found in [1]. The complexity of the MC rule is 𝓞(MN) where M is the number of references and N is the total age bins (N = 97 in this paper, covering the age from 0-97 years of age).
4). Accuracy of relation-based brain age estimation when x = y, and both from the testing set:
The pair of inputs (x, y) can be the same testing image when we set the input y = x. Ideally, when y = x, the “relative relation” r2 = 0 and r3 = r4 = x. However, due to the possible regression errors from the neural network, the estimated relations might not be zeros and may occur. Thus, all estimations of the four relations , i ∈ {1, 2, 3,4} can be used to estimate the brain age based on the following calculations:
| (12) |
5). Accuracy comparison with state-of-the-art brain age estimation algorithms:
Eight other deep learning based methods for brain age estimation are compared in this section, including the Hi-Net [39], FiA-Net [37], GL-Transformer [66], 3D CNN [4], SFCN [40], DeepBrainNet [9]. The Hi-Net [39] and FiA-Net [37] fuse the multi-channel input MRI images (intensity and RAVENS) in a layer-level fusion. The GL-Transformer [66] applies a global-local transformer for exploiting the global-context information from the whole image and the local fine-grained information from the local patches on 2D input images. The 3D CNN [4] uses 5 convolutional layers with kernel size of 3 × 3 × 3, followed by ReLU and max-pooling layers. The number of channels at the first layer is eight, and is doubled after each max-pooling layer. We use the global average pooling layer and a fully-connected layer for the brain age estimation. The SFCN[40] is also used as the backbone in our method, as described in Section II-B. The DeepBrainNet [9] is based on the Inception-Res-V2 [34] model, which works on 2D slices for brain age estimation. We get the results of Hi-Net [39] and FiA-Net [37] from the original research papers since we have similar experimental configurations and datasets. We also compare the accuracies of models with the relation learning (mSFCN+Transformer+Relation) and without relation learning (mSFCN+Transformer) where the model is directly trained to estimate the brain age. For the GL-Transformer [66], 3D CNN [4], SFCN [40], DeepBrain-Net [9] and mSFCN+Transformer, we train them from scratch with the same training configuration as our neural network for a fair comparison.
IV. Results
This section has two major parts: Section IV-A reports the accuracies for relation learning, which are results from experiment setting in III-C; and Section IV-B presents the accuracies for relation-based brain age estimation, which are results from experiment setting introduced in Section III-D.
A. Accuracy of relation estimation
This section presents the accuracies of relation estimation which are measured by the three metrics of MAE, CS(α=5) and Pearson correlation described in Section III-C1 on the test samples. Higher accuracy means lower MAE and higher scores of CS and Pearson correlation for relation estimation in this section.
1). Effects of underlying CNN models:
Table III shows the accuracies of 8 model variations when the pair of input images are sampled from the testing set. Several observations can be obtained: (1) the MAEs of the “maximal relation ()” and “minimal relation ()” are lower than the MAEs of “cumulative relation ()” and “relative relation ()”, indicating that the neural network is more powerful at capturing the non-linear relations than the linear relations of the pair of input images. The “minimal relations ()” has the lowest MAEs among the four relation estimations. (2) Using Transformer for the relation regression provides better accuracies than using FCs for relation regression on different configurations. (3) the mSFCNi provides the best accuracies than other model variations for the four relations regression. (4) when using the Transformer for relation regression, the shared backbone ℱ1 = ℱ2 provides better results than the independent backbone ℱ1 ≠ ℱ2.
TABLE III.
The accuracies of relation estimation with pair samples from the test set.
| Different strategies | Relation regression based on FCs [1] | Relation regression based on Transformer | |||||||
|---|---|---|---|---|---|---|---|---|---|
| CNN Backbone: SFCN [40] | CNN Backbone: mSFCN | CNN Backbone: SFCN [40] | CNN Backbone: mSFCN | ||||||
| Shared ℱ1 = ℱ2 |
Independent ℱ1 ≠ ℱ2 |
Shared ℱ1 = ℱ2 |
Independent ℱ1 = ℱ2 |
Shared ℱ1 = ℱ2 |
Independent ℱ1 = ℱ2 |
Shared ℱ1 = ℱ2 |
Independent ℱ1 = ℱ2 |
||
| MAE | 4.88±0.45 | 4.69±0.30 | 5.57±0.74 | 5.08±0.51 | 4.13±0.24 | 4.74±0.28 | 3.71±0.18 | 3.88±0.14 | |
| CS(α = 5) | 62.13%±5.35% | 63.15%±3.75% | 58.66%±6.43% | 61.89%±5.41% | 70.74%±2.43% | 64.60%±3.38% | 75.04%±2.01% | 73.18%±1.63% | |
| Pearson | 0.9851±0.0012 | 0.9844±0.0016 | 0.9871±0.0022 | 0.9845±0.0028 | 0.9862±0.0018 | 0.9822±0.0019 | 0.9878±0.0016 | 0.9866±0.0014 | |
| MAE | 5.40±0.53 | 5.69±0.29 | 5.65±0.39 | 5.70±0.47 | 5.63±0.12 | 5.01±0.44 | 3.80±0.24 | 3.88±0.14 | |
| CS(α = 5) | 59.22%±4.81% | 53.96%±2.18% | 57.51%±3.69% | 57.41%±4.66% | 57.96%±1.21% | 62.46%±4.80% | 74.17%±2.06% | 74.02%±1.76% | |
| Pearson | 0.9844±0.0011 | 0.9832±0.0018 | 0.9861±0.0016 | 0.9846±0.0036 | 0.9742±0.0020 | 0.9804±0.0028 | 0.9880±0.0018 | 0.9868±0.0017 | |
| MAE | 4.81±0.54 | 5.17±0.39 | 5.80±0.68 | 5.10±0.60 | 3.56±0.24 | 3.84±0.14 | 3.03±0.16 | 3.15±0.10 | |
| CS(α = 5) | 61.88%±4.41% | 58.21%±4.55% | 54.07%±5.30% | 60.40%±5.55% | 76.36%±1.55% | 74.82%±2.46% | 81.62%±1.44% | 80.79%±0.93% | |
| Pearson | 0.9778±0.0023 | 0.9723±0.0035 | 0.9798±0.0032 | 0.9779±0.0026 | 0.9783±0.0043 | 0.9749±0.0020 | 0.9838±0.0027 | 0.9823±0.0015 | |
| MAE | 3.81±0.23 | 4.36±0.33 | 3.10±0.20 | 3.75±0.12 | 1.82±0.04 | 2.24±0.12 | 1.69±0.11 | 1.84±0.13 | |
| CS(α = 5) | 72.83%±3.03% | 67.01%±3.14% | 82.18%±1.28% | 75.46%±1.82% | 93.09%±0.50% | 89.42%±1.96% | 94.19%±0.36% | 93.07%±1.06% | |
| Pearson | 0.9502±0.0050 | 0.9343±0.0090 | 0.9654±0.0023 | 0.9508±0.0029 | 0.9803±0.0028 | 0.9744±0.0036 | 0.9825±0.0044 | 0.9777±0.0066 | |
Fig. 4 shows the MAEs of relation estimation with the age difference τx – τy of the pair input (x, y) sampled from the testing set. We only compare the two models: mSFCNi+CNN and mSFCNi+Transformer. It can be seen from the figure that mSFCNi+Transformer provides lower MAE than mSFCNs+CNN and it is less sensitive to the age difference τx – τy than mSFCNs+CNN.
Fig. 4.
The MAEs of the relation estimation of models with the age difference τx – τy of the pair input image (x, y) when both x and y are sampled from the testing set. (The blue line shows the accuracies of mSFCNs+FCs while the red line shows the accuracies of mSFCNs+Transformer)
2). Effects of 1, 2, or 4 CNN models for estimating 4 relations:
Table IV shows accuracies of deep relation learning with the joint, pair and single learning with 1, 2, and 4 CNNs, respectively. There is no significant difference in results among these three different configurations. However, joint relation learning only needs 1 neural network to learn the four relations and it requires fewer parameters, memories, and computational times than single relation learning which needs 4 different neural networks, and pair relation learning which needs 2 different neural networks. In the following sections, we use the estimated relations of joint relation learning for brain age estimation, which only requires to train 1 CNN model for estimating 4 relations.
TABLE IV.
Accuracy of relation estimation with different configurations.
| Relations | MAE | CS(α = 5) | Pearson | |
|---|---|---|---|---|
| Single | 3.95±0.20 | 73.48%±1.91% | 0.9861±0.0024 | |
| Pair | 3.85±0.26 | 73.88%±3.00% | 0.9869±0.0020 | |
| Joint | 3.71±0.18 | 75.04%±2.01% | 0.9878±0.0016 | |
| Single | 3.98±0.20 | 71.80%±2.06% | 0.9869±0.0018 | |
| Pair | 3.89±0.21 | 73.95%±2.41% | 0.9872±0.0019 | |
| Joint | 3.80±0.24 | 74.17%±2.06% | 0.9880±0.0018 | |
| Single | 3.15±0.14 | 80.78%±0.89% | 0.9828±0.0027 | |
| Pair | 3.00±0.14 | 83.12%±1.24% | 0.9834±0.0029 | |
| Joint | 3.03±0.16 | 81.62%±1.44% | 0.9838±0.0027 | |
| Single | 1.64±0.09 | 94.46%±0.60% | 0.9804±0.0067 | |
| Pair | 1.67±0.12 | 93.99%±0.79% | 0.9805±0.0079 | |
| Joint | 1.69±0.11 | 94.19%±0.36% | 0.9825±0.0044 | |
B. Accuracy of relation-based brain age estimation
Results in this section correspond to the experiment setting described in Section III-D. Accuracies of brain age estimation are measured by the three metrics of MAE, CS(α=5) and Pearson correlation as described in Section III-D1 on the test samples. Higher accuracy means lower MAE and higher scores of CS and Pearson correlation for brain age estimation in this section.
1). Accuracy when x ≠ y, x and y both from the testing set:
Sub Table I in Table V shows the accuracies of the 8 underlying CNN variations for brain age estimation based on pair of input images sampled from the test set. In general, models with the shared feature extraction backbones provide better accuracies than the independent backbones and using Transformer for relation regression gives better results than using CNN. The mSFCNi+Transformer provides the lowest MAEs and highest scores of CS(α=5) and Pearson correlation among all models. The accuracy goes even higher by the ensemble of different relations, with an MAE of 2.42 years.
TABLE V.
The summary of the accuracies for brain age estimation.
| Different strategies: 𝒮i | Relation regression based on FCs [1] | Relation regression based on Transformer | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| CNN Backbone: SFCN [40] | CNN Backbone: mSFCN | CNN Backbone: SFCN [40] | CNN Backbone: mSFCN | |||||||
| Shared ℱ1 = ℱ2 |
Independent ℱ1 ≠ ℱ2 |
Shared ℱ1 = ℱ2 |
Independent ℱ1 ≠ ℱ2 |
Shared ℱ1 = ℱ2 |
Independent ℱ1 ≠ ℱ2 |
Shared ℱ1 = ℱ2 |
Independent ℱ1 ≠ ℱ2 |
|||
| Sub Table I: The accuracies of brain age estimation with the pair inputs x ≠ y sampled from thtest set. The accuracies of the and are measured together for brain age estimation. | ||||||||||
| 𝒮1 |
|
MAE | 3.57±0.21 | 3.62±0.15 | 3.85±0.33 | 3.71±0.26 | 3.31±0.09 | 3.19±0.23 | 2.43±0.13 | 2.54±0.07 |
| CS(α = 5) | 76.53%±2.57% | 75.84%±2.25% | 72.83%±4.07% | 74.67%±2.82% | 78.91%±1.33% | 80.40%±3.09% | 87.37%±0.62% | 86.33%±0.61% | ||
| Pearson | 0.9837±0.0007 | 0.9825±0.0015 | 0.9860±0.0016 | 0.9836±0.0030 | 0.9800±0.0014 | 0.9811±0.0023 | 0.9879±0.0017 | 0.9867±0.0015 | ||
| 𝒮2 |
, if , Otherwise |
MAE | 4.46±0.24 | 4.97±0.34 | 4.56±0.40 | 4.57±0.31 | 2.98±0.10 | 3.17±0.11 | 2.45±0.11 | 2.58±0.08 |
| CS(α = 5) | 65.63%±1.76% | 59.99%±3.67% | 66.83%±3.12% | 66.00%±3.03% | 82.23%±0.83% | 80.85%±1.76% | 87.19%±0.92% | 85.98%±0.86% | ||
| Pearson | 0.9770±0.0015 | 0.9705±0.0036 | 0.9813±0.0011 | 0.9768±0.0031 | 0.9815±0.0019 | 0.9808±0.0010 | 0.9878±0.0017 | 0.9857±0.0019 | ||
| 𝒮3 | Ensemble | MAE | 3.92±0.21 | 4.14±0.22 | 4.13±0.36 | 4.05±0.29 | 3.07±0.10 | 3.13±0.17 | 2.42±0.12 | 2.53±0.08 |
| CS(α = 5) | 72.18%±1.89% | 69.27%±2.80% | 70.39%±3.64% | 71.47%±2.96% | 81.38%±0.92% | 81.06%±2.28% | 87.34%±0.90% | 86.41%±0.77% | ||
| Pearson | 0.9817±0.0009 | 0.9790±0.0019 | 0.9847±0.0012 | 0.9816±0.0032 | 0.9815±0.0015 | 0.9815±0.0015 | 0.9879±0.0016 | 0.9863±0.0017 | ||
| Sub Table II: The accuracies of different reference methods to estimate the according to the learned relations with the references y sampled from the training samples with known age information. | ||||||||||
| 𝒮4 | MC [1] () | MAE | 2.99±0.14 | 3.75±0.30 | 2.73±0.04 | 3.06±0.22 | 5.27±0.24 | 3.82±0.47 | 2.55±0.16 | 2.61±0.15 |
| CS(α = 5) | 84.48%±1.87% | 75.06%±3.14% | 87.58%±0.54% | 83.90%±3.17% | 64.99%±1.66% | 75.84%±4.46% | 87.67%±1.15% | 87.31%±1.45% | ||
| Pearson | 0.9859±0.0013 | 0.9827±0.0016 | 0.9876±0.0014 | 0.9864±0.0017 | 0.9475±0.0048 | 0.9731±0.0062 | 0.9871±0.0018 | 0.9860±0.0024 | ||
| 𝒮5 | MAE | 4.06±0.60 | 3.77±0.55 | 5.96±1.18 | 4.80±0.95 | 2.63±0.10 | 3.54±0.36 | 2.41±0.11 | 2.59±0.12 | |
| CS(α = 5) | 71.06%±7.54% | 73.83%±8.38% | 47.94%±12.96% | 62.77%±9.09% | 85.57%±0.76% | 76.97%±3.39% | 87.26%±0.84% | 86.30%±1.09% | ||
| Pearson | 0.9859±0.0010 | 0.9853±0.0006 | 0.9877±0.0017 | 0.9867±0.0020 | 0.9866±0.0014 | 0.9775±0.0045 | 0.9879±0.0017 | 0.9863±0.0019 | ||
| 𝒮6 | MAE | 3.90±0.44 | 4.31±0.68 | 4.48±0.71 | 4.39±0.51 | 4.23±0.14 | 3.72±0.44 | 2.50±0.13 | 2.56±0.14 | |
| CS(α = 5) | 71.43%±4.22% | 66.68%±6.46% | 65.30%±7.50% | 65.63%±6.00% | 71.26%±1.59% | 75.62%±4.35% | 86.98%±1.05% | 86.43%±1.57% | ||
| Pearson | 0.9834±0.0014 | 0.9824±0.0014 | 0.9855±0.0024 | 0.9858±0.0023 | 0.9653±0.0035 | 0.9737±0.0061 | 0.9874±0.0018 | 0.9864±0.0025 | ||
| 𝒮7 | MAE | 3.57±0.28 | 3.67±0.16 | 3.74±0.25 | 3.82±0.27 | 3.30±0.07 | 3.58±0.39 | 2.41±0.11 | 2.55±0.13 | |
| CS(α = 5) | 73.24%±3.56% | 72.24%±1.94% | 73.67%±5.15% | 69.70%±3.44% | 79.01%±0.86% | 76.43%±3.86% | 87.33%±0.71% | 86.44%±1.28% | ||
| Pearson | 0.9849±0.0008 | 0.9840±0.0011 | 0.9872±0.0017 | 0.9864±0.0021 | 0.9788±0.0018 | 0.9760±0.0051 | 0.9878±0.0017 | 0.9864±0.0022 | ||
| 𝒮8 | MAE | 4.17±0.56 | 3.75±0.61 | 6.17±1.35 | 4.92±0.96 | 2.60±0.10 | 3.42±0.22 | 2.38±0.11 | 2.56±0.14 | |
| CS(α = 5) | 67.94%±7.14% | 74.16%±10.03% | 43.53%±15.36% | 61.71%±9.32% | 85.59%±0.84% | 78.35%±2.06% | 87.40%±0.80% | 85.87%±1.39% | ||
| Pearson | 0.9861±0.0011 | 0.9855±0.0006 | 0.9875±0.0016 | 0.9866±0.0020 | 0.9861±0.0018 | 0.9754±0.0073 | 0.9881±0.0017 | 0.9860±0.0024 | ||
| 𝒮9 | Ensemble | MAE | 3.66±0.36 | 3.60±0.12 | 4.04±0.38 | 3.96±0.39 | 3.13±0.08 | 3.50±0.34 | 2.40±0.11 | 2.53±0.13 |
| CS(α = 5) | 73.50%±4.52% | 73.42%±2.19% | 72.29%±3.87% | 69.90%±5.06% | 80.42%±0.81% | 77.32%±3.22% | 87.31%±0.84% | 86.42%±1.36% | ||
| Pearson | 0.9853±0.0007 | 0.9849±0.0008 | 0.9874±0.0019 | 0.9867±0.0020 | 0.9808±0.0018 | 0.9763±0.0051 | 0.9879±0.0017 | 0.9864±0.0022 | ||
| Sub Table III: The accuracies of brain age estimation when the pair inputs are the same y = x. | ||||||||||
| 𝒮10 | MAE | 3.11±0.17 | 3.05±0.11 | 3.27±0.24 | 3.18±0.24 | 2.71±0.12 | 2.84±0.18 | 2.41±0.11 | 2.40±0.08 | |
| CS(α = 5) | 81.37%±2.24% | 82.04%±1.60% | 78.79%±3.41% | 79.27%±2.92% | 84.81%±0.86% | 83.52%±1.85% | 87.28%±0.89% | 87.35%±0.94% | ||
| Pearson | 0.9867±0.0010 | 0.9868±0.0013 | 0.9881±0.0019 | 0.9880±0.0021 | 0.9861±0.0014 | 0.9852±0.0014 | 0.9879±0.0017 | 0.9880±0.0015 | ||
| 𝒮11 | MAE | 3.09±0.17 | 3.07±0.19 | 3.33±0.24 | 3.26±0.36 | 2.69±0.05 | 2.85±0.11 | 2.46±0.14 | 2.52±0.03 | |
| CS(α = 5) | 81.57%±2.22% | 82.02%±2.49% | 78.60%±3.60% | 79.13%±4.22% | 85.67%±0.93% | 84.07%±1.35% | 87.46%±0.91% | 86.75%±0.55% | ||
| Pearson | 0.9865±0.0009 | 0.9862±0.0016 | 0.9879±0.0019 | 0.9870±0.0027 | 0.9869±0.0014 | 0.9852±0.0004 | 0.9877±0.0017 | 0.9870±0.0013 | ||
| 𝒮12 | MAE | 3.16±0.19 | 3.26±0.20 | 3.24±0.25 | 3.24±0.28 | 3.55±0.13 | 3.65±0.45 | 2.44±0.11 | 2.56±0.15 | |
| CS(α = 5) | 80.45%±2.38% | 79.32%±1.85% | 78.82%±3.34% | 78.37%±3.31% | 77.21%±1.03% | 76.10%±3.93% | 87.28%±0.81% | 86.44%±1.48% | ||
| Pearson | 0.9868±0.0011 | 0.9861±0.0012 | 0.9882±0.0020 | 0.9878±0.0018 | 0.9767±0.0024 | 0.9757±0.0053 | 0.9878±0.0017 | 0.9864±0.0022 | ||
| 𝒮13 | MAE | 3.91±0.84 | 4.91±0.59 | 3.04±0.48 | 3.84±0.32 | 3.60±0.23 | 3.86±0.15 | 2.98±0.15 | 3.20±0.19 | |
| CS(α = 5) | 72.44%±9.74% | 58.97%±6.93% | 82.13%±5.72% | 73.49%±4.53% | 76.69%±2.79% | 74.01%±2.78% | 84.59%±1.27% | 81.95%±1.72% | ||
| Pearson | 0.9865±0.0012 | 0.9865±0.0013 | 0.9881±0.0018 | 0.9872±0.0020 | 0.9832±0.0017 | 0.9833±0.0012 | 0.9879±0.0015 | 0.9873±0.0018 | ||
| 𝒮14 | MAE | 4.98±0.96 | 6.23±0.27 | 5.32±0.65 | 6.17±0.69 | 3.27±0.20 | 3.25±0.16 | 2.93±0.19 | 2.76±0.13 | |
| CS(α = 5) | 61.25%±10.55% | 47.37%±3.64% | 58.24%±6.22% | 49.23%±8.67% | 80.72%±1.84% | 81.45%±1.56% | 84.70%±1.03% | 85.51%±1.37% | ||
| Pearson | 0.9863±0.0010 | 0.9858±0.0011 | 0.9872±0.0020 | 0.9870±0.0024 | 0.9846±0.0017 | 0.9814±0.0056 | 0.9877±0.0018 | 0.9871±0.0024 | ||
| 𝒮15 | MAE | 3.05±0.13 | 2.95±0.13 | 3.13±0.26 | 3.09±0.22 | 2.75±0.09 | 2.90±0.13 | 2.39±0.10 | 2.44±0.09 | |
| CS(α = 5) | 81.70%±1.81% | 83.40%±1.79% | 80.99%±3.95% | 80.38%±2.76% | 84.44%±0.90% | 83.45%±1.53% | 87.51%±0.85% | 87.32%±0.99% | ||
| Pearson | 0.9867±0.0010 | 0.9868±0.0013 | 0.9881±0.0019 | 0.9880±0.0021 | 0.9852±0.0015 | 0.9841±0.0024 | 0.9880±0.0016 | 0.9877±0.0019 | ||
| 𝒮16 | Ensemble | MAE | 3.07±0.14 | 2.99±0.12 | 3.20±0.25 | 3.14±0.22 | 2.71±0.10 | 2.85±0.15 | 2.39±0.11 | 2.41±0.08 |
| CS(α = 5) | 81.63%±1.98% | 82.47%±1.52% | 79.97%±3.74% | 79.84%±2.84% | 84.57%±0.84% | 83.62%±1.63% | 87.33%±0.90% | 87.29%±0.89% | ||
| Pearson | 0.9867±0.0010 | 0.9868±0.0013 | 0.9881±0.0019 | 0.9880±0.0021 | 0.9857±0.0014 | 0.9849±0.0016 | 0.9880±0.0016 | 0.9879±0.0017 | ||
2). Accuracy on x with reference y when x ≠ y, x from the testing set and y from the training set:
Fig. 6 shows an example of the age estimation (x from the testing set whose age is to be estimated) based on different reference images y (from the training set) according to four estimated relations , i ∈ {1, 2, 3, 4}. We compute the average age of brain age estimations based on different references y as the final estimated brain age based on learned relations. The estimated brain ages are slightly different given different reference images y.
Fig. 6.
An example of brain age estimation of the mSFCNs+Transformer given the test image x and the reference images y from 0-97 years of age sampled from the training set. Each dot represents the estimated age with one reference image according to the estimated relations , i ∈ {1, 2, 3, 4}. The red line indicates the chronological age.
Sub Table II in Table V shows the accuracies of the brain age estimation according to the learned relations , i ∈ {1, 2, 3, 4} of the pair input (x, y) with the known reference y sampled from the training set. From the table we can obtain the following observations: (1) The method using the MC rule [1] ( only using the binarized for brain age estimation) provides better results than proposed methods when using FCs for the relation regression. However, our proposed relation regression based on Transformer gives better results than the method of MC rule. In addition, the computation of the MC method [1] takes a long time due to its high complexity. (2) In about ≈ 75% of the cases, the shared backbone (ℱ1 = ℱ2) for feature extraction gives higher accuracies than the independent backbone (ℱ1 ≠ ℱ2). (3) When using the CNN for the relation regression, models using SFCN as the backbone provides lower MAEs than models using mSFCN as the backbone. However, when using the Transformer for the relation regression, models using the mSFCN as the backbone have lower MAEs than models using the SFCN as the backbone. (4) The mSFCNs+Transformer provides lower MAEs than other model variations and the highest accuracy is given by using the “maximal relation ()” and “minimal relation ()” in terms of MAE and Pearson coefficient, with an MAE of 2.38 years. Fig. 7 shows the scatter plots between the estimated brain age and chronological age based on different relations.
Fig. 7.
Scatter plots of the estimated and chronological ages based on different relations with the reference y sampled from the training set. The orange lines indicate the ideal estimation when the estimated age equals the chronological age while the green lines are the fitted regression lines. The r is the Spearsman correlation between the estimated brain age and chronological brain age.
3). Accuracy when x = y and both from the testing set:
In this section, we feed the neural network with the same image as the pair of input: y = x. Ideally, when y = x, the “relative relation r2” should be zero, the “maximal relation r3” and the “minimal relation r4” should be equal to the input x. In practice, the estimated relations and due to the regression errors from the neural network. To learn the relations between the pair input samples, the training pair images are randomly sampled with different ground-truth (x ≠ y). The distributions of these three estimated relations over different chronological ages are shown in Fig. 8. The estimated error of the represents the uncertainty of the model for brain age estimation when it compares to itself. From Fig. 8 (b) we can see that the estimation of the “maximal relation ” is greater than the input τx in most testing samples. Similarly, from Fig. 8 (c) we can see that the estimation of the “minimal relation ” is smaller than the input τx in most testing samples.
Fig. 8.
The scatter plots of the (a), (b) and (c) versus chronological age when the pair input is the same: y = x. These three figures show the regression errors from the mSFCNs+Transformer model for relation learning.
Sub Table III in Table V shows the accuracies of different model variations for brain age estimation when y = x. The lowest MAEs are from the on different CNN model variations. In addition, using Transformer provides better results than using CNN for relation regression. The highest accuracies are given by the mSFCNs+Transformer in terms of MAE and CS score. We also find that combining these six estimations (ensemble by averaging) does not improve the accuracies.
4). Robustness/uncertainty in relation-based brain age estimation:
Fig. 5 shows the MAEs of the joint, pair, and single relation learning for brain age estimation with different strategies 𝒮i indexed in Table V. In most cases, the joint relation learning provides the highest accuracy, except for the 𝒮2, 𝒮3, 𝒮13 and 𝒮14 where the relations r3 and r4 are involved in the computation. Table V and Fig. 5 show that the best result of brain age estimation is from the strategy of 𝒮8 and 𝒮15.
Fig. 5.
The Mean Absolute Error (MAE) of the joint, pair, and single relation learning with 𝒮i indexed based on different brain age estimation strategies as shown on Table V.
Different brain age estimations can be obtained according to Eqs. 8, 9 and 12 and the uncertainty can be measured as the standard deviation of these different brain age estimations [68], [69], which shows how uncertainty for brain age estimation can be introduced by the deep relation learning. Fig. 9 shows the distribution of the uncertainty over different chronological ages. Uncertainty computed from different estimations in Eqs. 8, 9 and 12 corresponds to three different brain age estimation methods. The mSFCNs+Transformer has almost the same uncertainty on different ages and the Pearson correlations between the uncertainty and chronological age are smaller than the mSFCNs+FCs on these three different estimation methods.
Fig. 9.
The uncertainty which is the standard deviation of estimated brain age from the different estimations in Eq. 8 (Figure (a)), Eq. 9 (Figure (b)) and Eq. 12 (Figure (c)) over different chronological ages. The Pearson correlation is computed between the uncertainty and chronological age on all test subjects. The red lines are the average uncertainty over years.
5). Accuracy comparison with state-of-the-art brain age estimation algorithms:
Table VI shows the accuracies of different state-of-the-art models for brain age estimation. With the same training dataset and configuration, the 2D DeepBrain-Net [9] gives slightly higher accuracies than 3D SFCN [40]. The mSFCN model, which is similar to SFCN [40] but has the max-pooling layer at the beginning, has a similar accuracy with the SFCN. However, mSFCN is more efficient than SFCN since it reduces the image size at the beginning using max-pooling. Our proposed deep relation learning with reference image y or with the same input pair y = x provides higher accuracies than all other models. The MAE of the proposed method (2.38 years) is lower than MAEs of other models. The statistical significance measured by the paired t-test (two-side) based on absolute errors across all 6,049 test samples indicates that the differences of the MAEs between the proposed method and other models are statistically significant (p < 0.05). In addition, using different reference images y can provide slightly lower MAE and higher CS score than using the same input pair when y = x. The experimental results show that the combination of the mSFCN and Transformer can provide better results than other models. Moreover, the neural network with deep relation learning (mSFCN+Transformer+Relations) provides lower MAE and higher scores of CS and Pearson correlation than the neural network without relation learning (mSFCN+Transformer), demonstrating that using the four different relations can further reduce the error for brain age estimation.
TABLE VI.
The comparison accuracies of different methods on the discovery cohort with 5-fold cross-validation.
| Method | MAE | CS(α = 5) | Pearson |
|---|---|---|---|
| 1 Hi-Net [39] | 3.12±0.22**** | 80.5% | 0.983 |
| 1 FiA-Net [37] | 3.00±0.06**** | 82.1% | 0.984 |
| 3D CNN in [4] | 3.93±0.56**** | 71.26%±6.98% | 0.981±0.002 |
| GL-Transformer [66] | 2.59±0.18*** | 85.83%±1.45% | 0.987±0.001 |
| SFCN [40] | 2.62±0.07**** | 84.47%±0.79% | 0.987±0.002 |
| 2D DeepBrainNet [9] | 2.59±0.09*** | 86.45%±0.39% | 0.985±0.003 |
| mSFCN | 2.64±0.17**** | 85.58%±1.28% | 0.986±0.003 |
| mSFCN+Transformer | 2.56±0.07** | 85.92%±1.13% | 0.987±0.001 |
| § mSFCNs+Transformer+Relation(with y = x) | 2.39±0.11 | 87.33%±0.90% | 0.988±0.002 |
| § mSFCNs+Transformer+Relation(with y ≠ x) | 2.38±0.11 | 87.40%±0.80% | 0.988±0.002 |
Results are from the work [37]. The methods are based on 3D CNN.
The best results of the proposed deep relation learning in different configurations.
Significance level:
(p <0.05)
(p <0.01)
(p <0.001)
(p <0.0001)
Significance is measured by using paired t-test (two-side) based on the absolute errors across all test samples.
Fig. 10 shows the MAE of different models on the 8 different datasets involved in the cross-validation. We first rank the models on each dataset and assign a rank score (rank=1,2,…,10 for 10 different models) and the average rank score of each model on the 8 datasets is computed and shown in Fig. 10. The result shows that our proposed deep relation learning gives the lowest rank score, demonstrating that deep relation learning provides good generalization on different datasets.
Fig. 10.
The MAE of different models on each data set. The ranking score is the average rank of each model on the 8 different datasets involved in the cross-validation.
V. DISCUSSION AND CONCLUSION
We proposed a deep relation learning for regression given a pair of input images and evaluated it for brain age estimation on brain MRIs. Four non-linearly correlated relations were evaluated, including the “cumulative relation”, “relative relation”, “maximal relation” and “minimal relation”. To learn these relations on a pair of input images, we used a neural network with two parts: feature extraction which was based on convolutional neural networks and relation regression which was based on Transformer. We evaluated the accuracies of the proposed deep relation learning on a merged dataset with 6,049 healthy brain MRIs acquired between 0-97 years of age.
Our experimental results demonstrate the advantages of relation learning for brain age estimation. (1) Our proposed relation learning is an extension of the order learning [1] and experimental results in Table V shows that the MAE (2.38 years) of using the four different relations is lower than the MAE (2.55 years) of the order learning based on the MC rules [1]. (2) We evaluated different strategies to estimate the brain ages of a pair of input images: brain age estimation with different test images (strategies 𝒮1–3 in Table V), brain age estimation with the reference with known age (strategies 𝒮4–9 in Table V) and brain age estimation with a pair of the same image (strategies 𝒮10–16 in Table V). The accuracies of these different strategies are similar and the lowest MAE is 2.38 years which is better than other state-of-the-art models for brain age estimation (as shown in Table VI). (3) We also evaluated three different training configurations to learn the four relations: joint, pair, and single learning. Experimental results in Table IV and Fig. 5 show that the MAEs and the scores of CS(α=5) and Pearson correlation of these three different learning strategies are similar. However, joint relation learning only needs 1 neural network to learn the 4 relations simultaneously which requires fewer parameters, memories, and computational times than the pair and single relation learning. (4) The uncertainty shown in Fig. 9 indicates that our proposed method is robust for brain age estimation on different ages.
Fig. 7 also shows the bias problem known as “regression to the mean”: the ages of the older subjects being underestimated while the ages of the younger subjects being over-estimated. The regression to the mean (RTM) problem [15] is a natural statistical phenomenon for many regression problems, and age estimation is not an exception. This bias, or RTM problem, also exists in other age estimation studies that focus purely on adults, where bias correction has been proposed [16], [14], [17]. Bias correction, however, is also with controversies, for which model should be optimal for correction, how to best quantify bias, and how to best evaluate the effects of correction [18]. Bias correction is also our ongoing work.
The unbalanced age distribution introduces the bias of relation learning between the pair of input images with different age ranges. As shown in Fig. 4, the errors of relation r1 increase among subjects with age differences between 20 to 40. This is also the same for the relations r3 and r4. In addition, the accuracy of the relation r4 is higher than other relations and one possible reason is that the number of young subjects is larger than the number of old subjects. A balanced age distribution may mitigate this problem. Our future work will dive into the data imbalanced problem for relation learning and brain age estimation across the lifespan.
Table V shows that there is no significant difference among the accuracies of deep relation learning with different strategies (the pair of input images are the same image (x=y) or different images (x≠y)). The results demonstrate the consistency and generality of the proposed method. For consistency, our proposed method provides consistent accuracies with the same image or different images as a pair of input images. For the generality, our proposed method can be used in different scenarios and it can predict the brain age based on reference images if they are available or based on itself if reference images are not available.
Experimental results in Table V show that the lowest MAEs and highest CS and Pearson correlations is achieved based on the relations r3 and r4, demonstrating that the neural network can capture non-linear relations better than the linear relations. In addition, joint relation learning can provide slightly lower MAEs (𝒮8 and 𝒮15) than single and pair relation learning.
The limitations and future works include: (1) We only focused on the evaluation of the proposed deep relation learning and we did not apply it to potential applications, such as building different chains [1] for subjects with different sex, race, ethnicity, etc. Our proposed method can be directly used for age comparison between different subject groups or cohorts, which can provide richer information than only “relative relations” proposed in [1]. (2) Only healthy subjects are involved in this study. However, the age differences between the healthy and diseased subjects can be compared by the proposed method with the pair of the input images (x, y). In the future, we will apply the model to compute the brain age difference when x is from the healthy cohort and y is from the disease cohort. (3) As shown in Table I and Fig. 2, the age distribution of our dataset is unbalanced, with more samples in 15-30 years and elderly ages. As a result, we see bigger errors in age ranges where fewer samples are available [37]. Feng et al. [5] recently showed that balancing the age distribution (by under-sampling in more popular age bins) may reduce the errors in less popular age bins. However, under-sampling the data created a smaller overall sample size. Our ongoing work is on gathering more images [4], [5], and even resampling or synthesizing new images, so that we can increase the sample size while balancing the age distribution. (4) The proposed deep relation learning can provide good accuracy for brain age estimation. However, the deep neural networks are usually hard to be interpreted, especially for the convolutional neural networks which are used to extract deep features. One future direction is to interpret the Transformer by visualizing the attention heatmaps based on the deep Taylor decomposition principle [70]. These attention heatmaps may be useful to understand the interaction and influence among the four relations for brain age estimation. (5) Table VI compares the proposed method to other state-of-the-art models with the same training configurations. However, another fair comparison might be using different models at their own optimal parameters. In practice, we have found that these parameters do not affect the accuracies if they are in a reasonable range (e.g., learning rate between 0.001 and 0.0001). We also ran SFCN [40], another 3D model, with only the T1w image and with the parameters specified in its original paper [40], and we found a accuracy (MAE=2.67±0.11) statistically equivalent to SFCN’s accuracy we listed in Table VI (MAE=2.62±0.07), which does not affect the ranking and findings in the comparison. In future work, we will more thoroughly study how the training setting affects the strong backbone for deep relation learning.
In conclusion, we have proposed a novel deep relation learning for brain age estimation and our proposed method can achieve lower MAE than the other six state-of-the-art models. The proposed method was validated on a lifespan dataset with 5-fold cross-validation, yielding an MAE of 2.38 years, CS(α = 5 years) of 87.40% and Pearson correlation of 0.988.
Acknowledgment
The work of Sheng He was supported by Charles A. King Trust Research Fellowship. The work of Yangming Ou was supported in part by Harvard Medical School/Boston Children’s Hospital Faculty Development Award, and in part by St. Baldrick Foundation Scholar Award Grace Fund and R03 HD104891, R21 NS121735.
Contributor Information
Sheng He, Boston Children’s Hospital and Harvard Medical School, Harvard University, 300 Longwood Ave., Boston, MA, USA..
Yanfang Feng, Massachusetts General Hospital and Harvard Medical School, Harvard University, 55 Fruit St., Boston, MA, USA..
P. Ellen Grant, Boston Children’s Hospital and Harvard Medical School, Harvard University, 300 Longwood Ave., Boston, MA, USA..
Yangming Ou, Boston Children’s Hospital and Harvard Medical School, Harvard University, 300 Longwood Ave., Boston, MA, USA..
References
- [1].Lim K, Shin N-H, Lee Y-Y, and Kim C-S, “Order learning and its application to age estimation,” in International Conference on Learning Representations, 2019. [Google Scholar]
- [2].Lu M, Zhao Q, Poston KL, Sullivan EV, Pfefferbaum A, Shahid M, Katz M, Kouhsari LM, Schulman K, Milstein A et al. , “Quantifying Parkinson’s disease motor severity under uncertainty using MDS-UPDRS videos,” Medical Image Analysis, vol. 73, p. 102179, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Signoroni A, Savardi M, Benini S, Adami N, Leonardi R, Gibellini P, Vaccher F, Ravanelli M, Borghesi A, Maroldi R et al. , “BS-Net: Learning COVID-19 pneumonia severity on a large chest X-ray dataset,” Medical Image Analysis, vol. 71, p. 102046, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Cole JH, Poudel RP, Tsagkrasoulis D, Caan MW, Steves C, Spector TD, and Montana G, “Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker,” NeuroImage, vol. 163, pp. 115–124, 2017. [DOI] [PubMed] [Google Scholar]
- [5].Feng X, Lipton ZC, Yang J, Small SA, Provenzano FA, Initiative ADN, Initiative FLDN et al. , “Estimating brain age based on a uniform healthy population with deep learning and structural MRI,” Neurobiology of Aging, vol. 91, pp. 15–25, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Saha S, Pagnozzi A, Bradford D, and Fripp J, “Predicting fluid intelligence in adolescence from structural mri with deep learning methods,” Intelligence, vol. 88, p. 101568, 2021. [Google Scholar]
- [7].LeCun Y, Bengio Y, and Hinton G, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. [DOI] [PubMed] [Google Scholar]
- [8].Jónsson BA, Bjornsdottir G, Thorgeirsson T, Ellingsen LM, Walters GB, Gudbjartsson D, Stefansson H, Stefansson K, and Ulfarsson M, “Brain age prediction using deep learning uncovers associated sequence variants,” Nature communications, vol. 10, no. 1, pp. 1–10, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Bashyam VM, Erus G, Doshi J, Habes M, Nasralah I, Truelove-Hill M, Srinivasan D, Mamourian L, Pomponio R, Fan Y et al. , “MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide,” Brain, vol. 143, no. 7, pp. 2312–2324, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Zhang Z, Song Y, and Qi H, “Age progression/regression by conditional adversarial autoencoder,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5810–5818. [Google Scholar]
- [11].Lee J-H and Kim C-S, “Monocular depth estimation using relative depth maps,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9729–9738. [Google Scholar]
- [12].Wang R, Cao S, Ma K, Zheng Y, and Meng D, “Pairwise learning for medical image segmentation,” Medical Image Analysis, vol. 67, p. 101876, 2021. [DOI] [PubMed] [Google Scholar]
- [13].Chen W, Fu Z, Yang D, and Deng J, “Single-image depth perception in the wild,” Advances in neural information processing systems, vol. 29, pp. 730–738, 2016. [Google Scholar]
- [14].de Lange A-MG and Cole JH, “Commentary: Correction procedures in brain-age prediction,” NeuroImage: Clinical, vol. 26, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Barnett AG, Van Der Pols JC, and Dobson AJ, “Regression to the mean: what it is and how to deal with it,” International journal of epidemiology, vol. 34, no. 1, pp. 215–220, 2005. [DOI] [PubMed] [Google Scholar]
- [16].Liang H, Zhang F, and Niu X, “Investigating systematic bias in brain age estimation with application to post-traumatic stress disorders,” Wiley Online Library, Tech. Rep, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Beheshti I, Nugent S, Potvin O, and Duchesne S, “Bias-adjustment in neuroimaging-based brain age frameworks: A robust scheme,” NeuroImage: Clinical, vol. 24, p. 102063, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Butler ER, Chen A, Ramadan R, Le TT, Ruparel K, Moore TM, Satterthwaite TD, Zhang F, Shou H, Gur RC et al. , “Pitfalls in brain age analyses,” Wiley Online Library, Tech. Rep, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Parikh D and Grauman K, “Relative attributes,” in 2011. International Conference on Computer Vision. IEEE, 2011, pp. 503–510. [Google Scholar]
- [20].Havasi M, Jenatton R, Fort S, Liu JZ, Snoek J, Lakshminarayanan B, Dai AM, and Tran D, “Training independent subnetworks for robust prediction,” in International Conference on Learning Representations, 2020. [Google Scholar]
- [21].Gaser C and Franke K, “Ten years of BrainAGE as an neuroimaging biomarker of brain aging: What insights did we gain?” Frontiers in Neurology, vol. 10, p. 789, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Armanious K, Abdulatif S, Shi W, Salian S, Küstner T, Weiskopf D, Hepp T, Gatidis S, and Yang B, “Age-Net: An MRI-based iterative framework for brain biological age estimation,” IEEE Transactions on Medical Imaging, vol. 40, no. 7, pp. 1778–1791, 2021. [DOI] [PubMed] [Google Scholar]
- [23].Cheng J, Liu Z, Guan H, Wu Z, Zhu H, Jiang J, Wen W, Tao D, and Liu T, “Brain age estimation from MRI using cascade networks with ranking loss,” IEEE Transactions on Medical Imaging, vol. 40, no. 12, pp. 3400–3412, 2021. [DOI] [PubMed] [Google Scholar]
- [24].Tønnesen S, Kaufmann T, de Lange A-MG, Richard G, Doan NT, Alnæs D, van der Meer D, Rokicki J, Moberget T, Maximov II et al. , “Brain age prediction reveals aberrant brain white matter in schizophrenia and bipolar disorder: A multisample diffusion tensor imaging study,” Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, vol. 5, no. 12, pp. 1095–1103, 2020. [DOI] [PubMed] [Google Scholar]
- [25].Tunç B, Yankowitz LD, Parker D, Alappatt JA, Pandey J, Schultz RT, and Verma R, “Deviation from normative brain development is associated with symptom severity in autism spectrum disorder,” Molecular autism, vol. 10, no. 1, pp. 1–14, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Cropley VL, Tian Y, Fernando K, Pantelis C, Cocchi L, Zalesky A et al. , “Brain-predicted age associates with psychopathology dimensions in youths,” Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, vol. 6, no. 4, pp. 410–419, 2021. [DOI] [PubMed] [Google Scholar]
- [27].Han LK, Dinga R, Hahn T, Ching CR, Eyler LT, Aftanas L, Aghajani M, Aleman A, Baune BT, Berger K et al. , “Brain aging in major depressive disorder: results from the enigma major depressive disorder working group,” Molecular psychiatry, pp. 1–16, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Høgestøl EA, Kaufmann T, Nygaard GO, Beyer MK, Sowa P, Nordvik JE, Kolskår K, Richard G, Andreassen OA, Harbo HF et al. , “Cross-sectional and longitudinal MRI brain scans reveal accelerated brain aging in multiple sclerosis,” Frontiers in neurology, vol. 10, p. 450, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Chung Y, Addington J, Bearden CE, Cadenhead K, Cornblatt B, Mathalon DH, McGlashan T, Perkins D, Seidman LJ, Tsuang M et al. , “Use of machine learning to determine deviance in neuroanatomical maturity associated with future psychosis in youths at clinically high risk,” JAMA psychiatry, vol. 75, no. 9, pp. 960–968, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Kaufmann T, van der Meer D, Doan NT, Schwarz E, Lund MJ, Agartz I, Alnæs D, Barch DM, Baur-Streubel R, Bertolino A et al. , “Common brain disorders are associated with heritable patterns of apparent aging of the brain,” Nature neuroscience, vol. 22, no. 10, pp. 1617–1623, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Abrol A, Fu Z, Salman M, Silva R, Du Y, Plis S, and Calhoun V, “Deep learning encodes robust discriminative neuroimaging representations to outperform standard machine learning,” Nature communications, vol. 12, no. 1, pp. 1–17, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, and Rabinovich A, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9. [Google Scholar]
- [33].Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, and Keutzer K, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size,” arXiv preprint arXiv:1602.07360, 2016. [Google Scholar]
- [34].Szegedy C, Ioffe S, Vanhoucke V, and Alemi AA, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Thirty-first AAAI conference on artificial intelligence, 2017. [Google Scholar]
- [35].Huang G, Liu Z, Van Der Maaten L, and Weinberger KQ, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708. [Google Scholar]
- [36].Simonyan K and Zisserman A, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [Google Scholar]
- [37].He S, Pereira D, Perez JD, Gollub RL, Murphy SN, Prabhu S, Pienaar R, Robertson RL, Grant PE, and Ou Y, “Multi-channel attention-fusion neural network for brain age estimation: Accuracy, generality, and interpretation with 16,705 healthy mris across lifespan,” Medical Image Analysis, vol. 72, p. 102091, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].He K, Zhang X, Ren S, and Sun J, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [Google Scholar]
- [39].Zhou T, Fu H, Chen G, Shen J, and Shao L, “Hi-Net: hybrid-fusion network for multi-modal MR image synthesis,” IEEE transactions on medical imaging, vol. 39, no. 9, pp. 2772–2781, 2020. [DOI] [PubMed] [Google Scholar]
- [40].Peng H, Gong W, Beckmann CF, Vedaldi A, and Smith SM, “Accurate brain age prediction with lightweight deep neural networks,” Medical Image Analysis, p. 101871, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, and Polosukhin I, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008. [Google Scholar]
- [42].Liu X, He P, Chen W, and Gao J, “Multi-task deep neural networks for natural language understanding,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 4487–4496. [Google Scholar]
- [43].Yu J, Li J, Yu Z, and Huang Q, “Multimodal transformer with multiview visual representation for image captioning,” IEEE transactions on circuits and systems for video technology, vol. 30, no. 12, pp. 4467–4480, 2019. [Google Scholar]
- [44].Gabeur V, Sun C, Alahari K, and Schmid C, “Multi-modal transformer for video retrieval,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16. Springer, 2020, pp. 214–229. [Google Scholar]
- [45].Prakash A, Chitta K, and Geiger A, “Multi-modal fusion transformer for end-to-end autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7077–7087. [Google Scholar]
- [46].Schmidt G and Ströhlein T, Relations and graphs: discrete mathematics for computer scientists. Springer Science & Business Media, 2012. [Google Scholar]
- [47].Ioffe S and Szegedy C, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning. PMLR, 2015, pp. 448–456. [Google Scholar]
- [48].Krizhevsky A, Sutskever I, and Hinton GE, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105. [Google Scholar]
- [49].Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al. , “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2020. [Google Scholar]
- [50].He S, Gollub RL, Murphy SN, Perez JD, Prabhu S, Pienaar R, Robertson RL, Grant PE, and Ou Y, “Brain age estimation using LSTM on children’s brain MRI,” ISBI 2020, pp. 420–423, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Evans AC, Group BDC et al. , “The NIH MRI study of normal brain development,” Neuroimage, vol. 30, no. 1, pp. 184–202, 2006. [DOI] [PubMed] [Google Scholar]
- [52].Di Martino A, Yan C-G, Li Q, Denio E, Castellanos FX, Alaerts K, Anderson JS, Assaf M, Bookheimer SY, Dapretto M et al. , “The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism,” Molecular psychiatry, vol. 19, no. 6, pp. 659–667, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Holmes AJ, Hollinshead MO, O’keefe TM, Petrov VI, Fariello GR, Wald LL, Fischl B, Rosen BR, Mair RW, Roffman JL et al. , “Brain genomics superstruct project initial data release with structural, functional, and behavioral measures,” Scientific data, vol. 2, p. 150031, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Park J, Carp J, Kennedy KM, Rodrigue KM, Bischof GN, Huang C-M, Rieck JR, Polk TA, and Park DC, “Neural broadening or neural attenuation? investigating age-related dedifferentiation in the face network in a large lifespan sample,” Journal of Neuroscience, vol. 32, no. 6, pp. 2154–2158, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].LaMontagne PJ, Keefe S, Lauren W, Xiong C, Grant EA, Moulder KL, Morris JC, Benzinger TL, and Marcus DS, “OASIS-3: Longitudinal neuroimaging, clinical, and cognitive dataset for normal aging and Alzheimer’s disease,” Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association, vol. 14, no. 7, p. P1097, 2018. [Google Scholar]
- [56].Becker BG, Klein T, Wachinger C, Initiative ADN et al. , “Gaussian process uncertainty in age estimation as a measure of brain abnormality,” NeuroImage, vol. 175, pp. 246–258, 2018. [DOI] [PubMed] [Google Scholar]
- [57].Pomponio R, Erus G, Habes M, Doshi J, Srinivasan D, Mamourian E, Bashyam V, Nasrallah IM, Satterthwaite TD, Fan Y et al. , “Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan,” NeuroImage, vol. 208, p. 116450, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, and Gee JC, “N4itk: improved n3 bias correction,” IEEE transactions on medical imaging, vol. 29, no. 6, pp. 1310–1320, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].Ou Y, Zöllei L, Da X, Retzepi K, Murphy SN, Gerstner ER, Rosen BR, Grant PE, Kalpathy-Cramer J, and Gollub RL, “Field of view normalization in multi-site brain MRI,” Neuroinformatics, vol. 16, no. 3-4, pp. 431–444, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Doshi J, Erus G, Ou Y, Gaonkar B, and Davatzikos C, “Multi-atlas skull-stripping,” Academic radiology, vol. 20, no. 12, pp. 1566–1576, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Ou Y, Gollub RL, Retzepi K, Reynolds N, Pienaar R, Pieper S, Murphy SN, Grant PE, and Zöllei L, “Brain extraction in pediatric ADC maps, toward characterizing neuro-development in multi-platform and multi-institution clinical images,” NeuroImage, vol. 122, pp. 246–261, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [62].Rohlfing T, Zahr NM, Sullivan EV, and Pfefferbaum A, “The SRI24 multichannel atlas of normal adult human brain structure,” Human brain mapping, vol. 31, no. 5, pp. 798–819, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [63].Ou Y, Sotiras A, Paragios N, and Davatzikos C, “DRAMMS: Deformable registration via attribute matching and mutual-saliency weighting,” Medical image analysis, vol. 15, no. 4, pp. 622–639, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [64].Davatzikos C, Genc A, Xu D, and Resnick SM, “Voxel-based morphometry using the ravens maps: methods and validation using simulated longitudinal atrophy,” NeuroImage, vol. 14, no. 6, pp. 1361–1369, 2001. [DOI] [PubMed] [Google Scholar]
- [65].Schaffer C, “Selecting a classification method by cross-validation,” Machine Learning, vol. 13, no. 1, pp. 135–143, 1993. [Google Scholar]
- [66].He S, Grant PE, and Ou Y, “Global-local transformer for brain age estimation,” IEEE Transactions on Medical Imaging, vol. 41, no. 1, pp. 213–224, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [67].Ruder S, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098, 2017. [Google Scholar]
- [68].Hepp T, Blum D, Armanious K, Schölkopf B, Stern D, Yang B, and Gatidis S, “Uncertainty estimation and explainability in deep learning-based age estimation of the human brain: Results from the german national cohort mri study,” Computerized Medical Imaging and Graphics, vol. 92, p. 101967, 2021. [DOI] [PubMed] [Google Scholar]
- [69].Palma M, Tavakoli S, Brettschneider J, Nichols TE, Initiative ADN et al. , “Quantifying uncertainty in brain-predicted age using scalar-on-image quantile regression,” Neuroimage, vol. 219, p. 116938, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [70].Chefer H, Gur S, and Wolf L, “Transformer interpretability beyond attention visualization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 782–791. [Google Scholar]










