Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Sep 1.
Published in final edited form as: IEEE Trans Med Imaging. 2020 Mar 24;39(9):2965–2975. doi: 10.1109/TMI.2020.2983085

Spatially-Constrained Fisher Representation for Brain Disease Identification with Incomplete Multi-Modal Neuroimages

Yongsheng Pan 1, Mingxia Liu 1,*, Chunfeng Lian 1, Yong Xia 1,*, Dinggang Shen 1,*
PMCID: PMC7485604  NIHMSID: NIHMS1625177  PMID: 32217472

Abstract

Multi-modal neuroimages, such as magnetic resonance imaging (MRI) and positron emission tomography (PET), can provide complementary structural and functional information of the brain, thus facilitating automated brain disease identification. Incomplete data problem is unavoidable in multi-modal neuroimage studies due to patient dropouts and/or poor data quality. Conventional methods usually discard data-missing subjects, thus significantly reducing the number of training samples. Even though several deep learning methods have been proposed, they usually rely on pre-defined regions-of-interest in neuroimages, requiring disease-specific expert knowledge. To this end, we propose a spatially-constrained Fisher representation framework for brain disease diagnosis with incomplete multi-modal neuroimages. We first impute missing PET images based on their corresponding MRI scans using a hybrid generative adversarial network. With the complete (after imputation) MRI and PET data, we then develop a spatially-constrained Fisher representation network to extract statistical descriptors of neuroimages for disease diagnosis, assuming that these descriptors follow a Gaussian mixture model with a strong spatial constraint (i.e., images from different subjects have similar anatomical structures). Experimental results on three databases suggest that our method can synthesize reasonable neuroimages and achieve promising results in brain disease identification, compared with several state-of-the-art methods.

Index Terms—: Multi-Modal Neuroimage, Incomplete Data, Generative Adversarial Network, Fisher Vector, Brain Disease Diagnosis, MRI, PET

I. Introduction

STRUCTURAL magnetic resonance imaging (MRI) and fluorodeoxyglucose positron emission tomography (PET) have been widely used for computer-aided diagnosis of Alzheimer’s disease (AD) and mild cognitive impairment (MCI) [1]–[4]. While recent studies have shown that MRI and PET could provide complementary information for brain disease diagnosis [5]–[7], existing approaches have to face two challenges in practice, i.e., (1) how to deal with the incomplete multi-modal data, and (2) how to effectively represent multi-modal neuroimages for diagnosis purposes.

The issue of incomplete data is a common challenge in multi-modal studies [8]–[11]. In clinical practice, subjects undergoing MRI may reject PET scans, possibly due to the high cost of PET scans or the patient’s concerns about radioactive exposure. For example, in the large-scale Alzheimer’s Disease Neuroimaging Initiative (ADNI-1) database [12], only approximately half of subjects have baseline PET scans due to scanning costs, even though all subjects have baseline MRI data. Previous studies usually tackle this problem by simply discarding subjects without PET scans [5], [13], [14]. However, such a simple approach will significantly reduce the number of training subjects for learning a reliable model, thus inevitably degrading diagnostic performance. Another commonly-used strategy is to impute the missing data/features of a subject using the mean or median feature values of other subjects (with complete data) or even using random values [8], [15], which brings additional noise and is only feasible for handcrafted features. To utilize all available subjects, an intuitive strategy is to impute directly the missing PET scans [16], [17].

As another challenge in multi-modal studies, it is usually difficult to effectively define feature representations for neuroimages. The main reason is that there may be millions of voxels in each 3D volume, and many voxels may not be affected by a specific disease. Therefore, the representations defined on the entire brain image may not have sufficient discriminative power for disease diagnosis due to the inclusion of those non-informative voxels/regions. Also, we usually have very limited (e.g., tens or hundreds) training subjects, which severely limits the generalization capacity of learned models, especially for deep-learning-based model construction [18]–[21]. To effectively employ multi-modal neuroimages, a generally-used strategy is to include disease-related prior knowledge to guide the extraction of feature representations from neuroimages. For example, biological/anatomical prior knowledge on dementia-associated brain changes/abnormalities has been used widely to identify local image patches for representing each image and automated brain disease diagnosis. The widely-used method is to calculate the gray matter (GM) volumes as feature representation, by warping a standard atlas with pre-defined regions-of-interest (ROIs) to MRI/PET scans. Such kind of handcrafted features are dependent on the atlas and may not well coordinate with the subsequent diagnosis model. Deep learning methods have been recently proposed to learn task-oriented features of neuroimages based on anatomical landmarks [22]–[25]. These methods generally first define AD-related anatomical landmarks and then automatically extract features based on image patches (located by landmarks) via deep networks, avoiding using non-informative voxels and regions in the brain. Since these methods highly rely on expert knowledge on specific brain diseases, their generalization performance may be limited in practical applications.

In this paper, we develop a two-stage deep learning framework (see Fig. 1) to deal with the above-mentioned challenges in computer-aided brain disease diagnosis with incomplete multi-modal (i.e., MRI and PET) data. As shown in Fig. 1, in the first stage, we propose a hybrid generative adversarial network (HGAN) to synthesize the missing PET images based on their corresponding MRI scans. Notably, we create a hybrid loss function in HGAN, containing a unique voxel-wise-consistent loss, a cycle-consistent loss, and an adversarial loss. In the second stage, we develop a spatially-constrained Fisher representation (SCFR) model to make use of multi-modal neuroimages for automated AD diagnosis and MCI conversion prediction. The proposed SCFR model is a deep network to efficiently extract statistical descriptors from neuroimages, assuming that these descriptors follow a Gaussian mixture model (GMM) with a strong spatial constraint (i.e., brain images from different subjects have similar anatomical structure). The proposed method is evaluated on 2, 317 subjects from three public databases for automated brain disease diagnosis. Experimental results demonstrate that our method can synthesize reasonable PET and MRI scans, and outperforms several state-of-the-art methods in AD diagnosis and MCI conversion prediction.

Fig. 1.

Fig. 1.

Illustration of our two-stage deep learning framework for brain disease diagnosis using incomplete MRI and PET scans, including two sequential stages: (1) missing PET imputation using our proposed hybrid generative adversarial network (HGAN) based on pairwise MRI and PET scans, and (2) disease diagnosis using our spatially-constrained Fisher representation (SCFR) model based on the complete (after imputation) data.

II. Related Work

A. Incomplete Multi-modal Neuroimage Analysis

By providing complementary structural and functional information of the brain, multi-modal neuroimages (e.g., MRI and PET) have been widely used for computer-aided diagnosis of brain diseases, leading to improved diagnostic performance. Kohannim et al. [26] verified that directly concatenating multi-modal (MRI, PET, and cerebrospinal fluid) features results in better performance than using solely single-modal features. Suk et al. [27] propose to learn a shared feature representation for small patches in MRI and PET scans via a deep Boltzmann machine, where the studied 398 subjects have complete MRI and PET scans. However, these methods cannot utilize subjects with incomplete multi-modal data, while, in reality, subjects may lack specific modalities due to patient dropouts and/or poor data quality in clinical practice.

To handle the incomplete data issue, conventional methods typically discard subjects with missing data [14], [26]–[28], which will significantly reduce the number of training subjects for learning a reliable model, thus degrading the diagnostic performance. Although several data imputing methods have been proposed [8], [15], [17], [29], most of them focus on imputing missing handcrafted features, which may not well coordinate with the subsequent diagnosis model. Another commonly-used strategy is to impute the missing features of a subject using the mean or median feature values of other subjects (with complete data) or even using random values [30]. However, such a strategy may introduce additional noise and is only applicable to certain scenarios with handcrafted features.

To utilize all available subjects, an intuitive strategy is to directly impute the missing images [16], [17], [30], [31]. Considering that MRI and PET data (scanned from the same subject) have underlying relevance, it could be a promising direction to employ generative methods [32]–[34] to construct a mapping between MRI and PET for synthesizing missing scans. For instance, Li et al. [35] proposed a shallow network with two hidden layers to estimate missing PET images based on their corresponding MRI scans. However, this method only learns a unidirectional mapping from MRI to PET, thus cannot fully model the complex relationship between MRI and PET. Also, this method cannot guarantee that the synthesized PET scans follow the true data distribution of PET data. In our previous work [31], we developed a 3D cycle-consistent generative adversarial network to synthesize missing PET based on their corresponding MRI scans. Especially, this method learns a bidirectional mapping between MRI and PET by using a cycle-consistent loss, and constrains that synthesized PET scans should follow the true data distribution due to the utilization of an adversarial loss. However, this work does not consider the spatial consistency between a pair of PET and MRI scans acquired from the same subject.

B. Neuroimage Representation for Disease Diagnosis

While multi-modal neuroimages (e.g., MRI and PET) have shown to be useful in enhancing the diagnostic performance, it is usually difficult to effectively define neuroimage representations in 3D volumes, where each volume contains millions of voxels and many voxels may not be affected by diseases. Also, we usually have limited (e.g., tens or hundreds) training subjects, which severely limits the generalization capacity of learned models (especially for deep learning methods) [18]–[21]. Previous approaches usually rely on handcrafted features for representing neuroimaging data, which could be categorized into three types: (1) voxel-based methods, (2) ROI-based methods, and (3) patch-based methods. The voxel-based methods use voxel-wise features (e.g., image intensity) for classification [36], [37], however, their performance is usually limited by the high-dimensionality of features and the limited number of training subjects. The ROI-based methods [38], [39] extract features from structurally or functionally pre-defined brain regions in neuroimages, covering the whole brain with relatively low feature dimension. However, the ROI features are coarse to sensitively represent small or subtle changes caused by brain diseases. The patch-based methods [40], [41] dissect brain areas into small 3D patches and combine the features extracted from patches for classification. Since patch-based methods assume that the abnormal regions affected by neurodegenerative diseases can be part of ROIs or span over multiple ROIs, it is usually superior to ROI-based methods in capturing disease-related pathology [27], [40], [41]. Since these voxel-based, ROI-based and patch-based features are handcrafted and independent of model construction, these methods typically result in suboptimal learning performance. Deep learning methods have been recently proposed to learn task-oriented features of neuroimages for brain disease diagnosis, by integrating feature extraction and model construction into a unified framework. For example, Zhou et al. [42] first extracted ROI-based features from MRI and PET scans, and then fed these features into a fully-connected network to jointly learn high-level neuroimaging representation and a classification model, achieving superior performance than conventional methods with handcrafted features. However, previous deep learning methods usually suffer from limited data problem because we typically have only tens or hundreds of training objects in neuroimaging analysis.

To address this issue, previous studies proposed to use prior knowledge to assist feature extraction. Laakso et al. [43] and Barnes et al. [44] found that brain regions with strong association to AD include amygdala, hippocampus, and frontal lobes, and extracting neuroimaging features from these regions can improve diagnostic performance. Zhang et al. [22], [24] first defined AD-related anatomical landmarks, and then extracted handcrafted features from image patches centered at those landmarks for disease diagnosis. Liu et al. [25] proposed an anatomical landmark based deep learning model, by learning local patch-level features via multiple sub-networks (with each one corresponding to a particular landmark) and further extracting global image-level representations of MRIs for AD diagnosis and MCI conversion prediction. Due to the unified learning of features and classifiers, this method significantly improves the diagnostic performance than the conventional methods using handcrafted features. Since the hippocampus is reported to be one of AD-associated regions [45], [46], Cui and Liu [47] employed a densely connected convolutional network [48] to combine the global and local features of the hippocampus for AD diagnosis. Ortiz et al. [49] partitioned the brain into multiple regions and applied an ensemble of multiple deep belief networks for AD diagnosis, with each network being trained on a specific region. By using prior anatomical knowledge to reduce redundant or noisy information in neuroimages, these methods help the models focus on discriminative regions for diagnosis. However, they often highly rely on expert knowledge on disease-associated brain regions, which may limit their generalization performance.

III. Methodology

A. Problem Formulation

Suppose we have a multi-modal database {XMi,XPi}i=1N with N subjects, where the i-th subject has complete MRI (i.e., XMi) and PET (i.e., XPi) data. As shown in Fig. 2 (a), a general disease diagnosis model F using complete multi-modal data can be formulated as

y^i=F(XMi,XPi), (1)

Where y^i is the predicted label (e.g., AD) for the i-th subject. For problems with incomplete multi-modal data (e.g., the PET scan is missing for the i-th subject) in Fig. 2 (b), the diagnosis model becomes

y^i=F(XMi,), (2)

which cannot be executed due to the missing PET scan.

Fig. 2.

Fig. 2.

Illustration of multi-modal neuroimage based diagnosis problem scenarios with complete and incomplete MRI and PET scans, respectively. (a) A general diagnosis model F trained on the complete MRI and PET scans; (b) A diagnosis model F with missing PET; and (c) A diagnosis model F trained on the complete (after imputation for missing PET scans via an image synthesis model G) multi-modal data. We denote XMi and XPi as the MRI and PET scans for the i-th subject, respectively. Also, X^Pi denotes the synthesized PET scan via G, and y^i is the predicted label for the i-th subject.

An intuitive solution to address this issue is to directly impute the missing PET image based on its corresponding MRI (see Fig. 2 (c)), because these two modalities have underlying relevance (i.e., scanned from the same subject). Given G as the mapping function from MRI to PET, we denote the virtual/synthetic PET image as X^Pi=G(XMi). Thus, the diagnosis model with complete (after PET synthesis) multi-modal data can be executed as

y^iF(XMi,X^Pi)=F(XMi,G(XMi)). (3)

From Fig. 2 and Eqs. 13, one can see that there are two sequential tasks in this computer-aided disease diagnosis framework based on the incomplete multi-modal neuroimages, including (1) learning a reliable mapping function G for missing image imputation, and (2) constructing an effective classification model F to extract task-oriented neuroimaging features for automated disease diagnosis. In the following, we present our proposed HGAN model for missing image imputation and also our SCFR model for disease classification.

B. Hybrid Generative Adversarial Network

1). Network Architecture:

Denote XM and XP as the MRI domain and PET domain, respectively. Based on the underlying relevance between MRI and PET, we consider imputing missing PET images for subjects with MRI scans, by learning a mapping function G:XMXP. Also, we require G to be a one-to-one mapping, i.e., there should exist a reversed function G1:XPXM to keep the mapping consistent. As we do in our previous work [31], we employ a cycle-consistent loss to learn the bidirectional mapping between MRI and PET, aiming to guarantee the interactive relationship between two modalities. However, this work [31] does not consider the spatial consistency between a pair of PET and MRI scans acquired from the same subject, while such spatial consistency can be used as prior knowledge to improve the to-be-learned image synthesis model.

To deal with this limitation, we develop a hybrid generative adversarial network (HGAN), by introducing a unique voxel-wise-consistent loss to explicitly capture the spatial consistency between paired MRI and PET scans from the same subject. The architecture of our HGAN model is illustrated in Fig. 3, containing two generators, i.e., G1:XMXP and G2:XPXM with G2=G11 , and two adversarial discriminators, i.e., D1 and D2. Specifically, each generator (e.g., G1) consists of three sequential parts, including the encoding, transferring, and decoding ones. The encoding part consists of three convolutional (Conv) layers (with 8, 16, and 32 channels, respectively) to extract the knowledge of images in the original domain (e.g., XM). The transferring part contains 6 residual network blocks (RNBs) [20] to transfer the knowledge from the original domain (e.g., XM) to the target domain (e.g., XP). And the decoding part contains 2 deconvolutional (Deconv) layers (with 32 and 16 channels, respectively) and 1 Conv layer (with 1 channel) for constructing the images in the target domain (e.g., XP). Besides, each discriminator (e.g., D2) contains 5 Conv layers with 16, 32, 64, 128, and 1 channel(s), respectively. The discriminator inputs a pair of real (e.g., XPi) and synthetic (e.g., G1(XMi)) images, and outputs a binary indicator to tell us whether the real and its corresponding synthetic images are distinguishable (output: 0) or not (output: 1).

Fig. 3.

Fig. 3.

Illustration of the proposed hybrid generative adversarial network (HGAN) for PET image synthesis based on its corresponding MRI scan. Two generators (i.e., G1 and G2) are used to generate passable images (to lie without being caught by the discriminator), and two adversarial discriminators (i.e., D1 and D2) are utilized to identify images coming from the generator as fake. A bidirectional mapping between MRI and PET is learned, including the mapping from MRI to PET (with G1 and D2) and that from PET to MRI (with G2 and D1). A hybrid loss function is created, including a unique voxel-wise-consistent loss Lv, a cycle-consistent loss Lc, and an adversarial loss Lg. Six residual network blocks (RNBs) are included in each generator, and five convolutional layers are involved in each discriminator. A stride of 2 is used for downsampling in convolutional layers except the RNBs in this model.

2). Hybrid Loss Function:

Our proposed HGAN model has three complementary losses with respect to G1, G2, D1, and D2, including (1) the adversarial loss Lg, (2) the cycle-consistent loss [31], [33] Lc, and (3) our proposed voxel-wise-consistent loss Lv. Denote ||·|| as the l1-norm, and these three losses are defined as follows:

Lg(XM,XP;G1,G2,D1,D2)=log(D2(XP))+log(1D2(G1(XM)))+log(D1(XM))+log(1D1(G2(XP))), (4)
Lc(XM,XP;G1,G2)=G2(G1(XM))XM1+G1(G2(XP))XP1, (5)
Lv(XM,XP;G1,G2)=G1(XM)XP1+G2(XP)XM1, (6)

where the adversarial loss Lg ensures the generated images to be, in principle, indistinguishable from real images. The cycle-consistent loss Lc guarantees the interactive relationship between MRI and PET scans, while our proposed voxel-wise-consistent loss Lv is employed to encourage the spatial consistency between a pair of MRI and PET scans from the same subject. Finally, the hybrid loss function L of our proposed HGAN model is defined as

L(G1,G2,D1,D2)=Lg(XM,XP;G1,G2,D1,D2)+Lc(XM,XP;G1,G2)+Lv(XM,XP;G1,G2). (7)

To speed up the network training, we propose to optimize the proposed HGAN model in an iterative manner. That is, given a batch, we first train two adversarial discriminators (i.e., D1 and D2) by minimizing Lg with the fixed generators (i.e., G1 and G2), and then train two generators (i.e., G1 and G2) by minimizing L with the fixed discriminators (i.e., D1 and D2). Such two steps are performed iteratively.

C. Spatially-Constrained Fisher Representation Model

1). Network Architecture:

Using the complete multi-modal data (after imputation via HGAN), we further develop a spatially-constrained Fisher representation (SCFR) network to extract statistical descriptors of neuroimages for end-toend classification, without need to use any expert knowledge on disease-associated brain regions. As illustrated in Fig. 4, the SCFR model is a deep network to simultaneously extract statistical descriptors from MRI and PET scans and construct the classifier. There are three major components in this network, including (1) two backbone subnetworks to capture the local textural information of input images (with each backbone corresponding to a specific modality), (2) a spatially constrained Fisher layer with several location-specific Fisher units to capture the globally spatial structure information of input images, and (3) a fully-connected layer for classification.

Fig. 4.

Fig. 4.

Illustration of our proposed spatially-constrained Fisher representation (SCFR) network for brain disease classification with complete paired MRI and PET data. There are three major components in SCFR, including (1) a backbone subnetwork to capture the local textural information of input images, (2) a spatially-constrained Fisher layer with several location-specific Fisher units to model the globally spatial structure information of input images, and (3) a fully-connected layer for classification. Note that the Fisher layer contains several location-specific Fisher units (see top-right panel), with each unit being used to learn the statistical information of the input feature map (generated by the backbone subnetwork) at a specific location. Here, μ and σ are the to-be-learned network parameters in each Fisher unit.

Each backbone subnetwork consists of 5 convolutional layers, with the rectified linear unit (ReLU) used as the activation function. The stride of each convolutional layer is set to 1. Each of the first four convolutional layers is followed by a max pooling layer to downsample the output with the stride of 2. These 5 convolutional layers share the same kernel size (i.e., 3 × 3 × 3), while the numbers of channels are 16, 32, 64, 64, and 64, respectively. For an input MRI/PET scan with the size of 144 × 176 × 144, we can generate an output feature map (size: 9 × 11 × 9) with 64 channels through the backbone subnetwork. With the learned feature map (corresponding to a specific modality) as the input, our proposed spatially-constrained Fisher layer contains L = 4 × 5 × 4 = 80 location-specific Fisher units (size: 3 × 3 × 3; stride: 2; no padding). Here, each Fisher unit is used to learn the location-specific statistical information of the feature map learned by the backbone subnetwork (see the top-right of Fig. 4). Also, μ; and σ are the to-be-learned network parameters in each location-specific Fisher unit. More information for the proposed Fisher unit can be found in Section III-C2. The output of each Fisher unit is a 128-dimensional feature vector, and we concatenate these features learned from all 80 Fisher units to represent the input image. Given a pair of MRI and PET scans, we can finally obtain a 20, 480-dimensional feature vector for each subject. We further feed this feature vector to a fully-connected layer with soft-max activation for disease classification.

2). Spatially-Constrained Fisher Representation:

As shown in Fig. 4, using the backbone subnetwork, we can generate a 64-channel feature map (size: 9 × 11 × 9) for each input MRI/PET scan. We now investigate how to efficiently extract statistical features from this feature map for representing the input image. In the literature, the Fisher vector (FV) [50]–[53] is a statistical model to build a local-to-global image representation and has been successfully used in visual classification. Therefore, we resort to the FV-based methods to learn statistical representations of multi-modal neuroimages for disease classification.

Given a feature map containing K (K = 9 × 11 × 9) elements, we can represent it as A={akRD;k=1,,K}, where each element ak denotes a local descriptor corresponding to a specific location in the feature map. Conventional FV-based methods [50]–[52] usually assume that the prior probability of all descriptors follows uθ, i.e., a Gaussian mixture model (GMM), where the parameter θ={ωl,μl,σl;l=1,,L} estimated by maximum likelihood estimation (MLE) and L is the number of Gaussian components. Specifically, the FV of A is defined as

Bθ=GθBθ=(B1,μT,,BL,μT,B1,σT,,BL,σT)T, (8)

where

{Bl,μ=1Kωlk=1Kτl(ak)[akμlσl],Bl,σ=1K2ωlk=1Kτl(ak)[(akμl)2σl21], (9)

and τl(ak)=ωlμl(ak)Σj=1L[ωjuj(ak)] is the posterior probability of ak assigned to the l-th Gaussian component ul, and ωl indicates the importance of the l-th component. Bθ is the Fisher score, described by the gradient of the log-likelihood as

Bθ=θP(Aθ)=1KΣk=1Kθloguθ(ak). (10)

Besides, the term Gθ in 8 is a Cholesky decomposition of the Fisher information matrix, defined as GθTGθ=Ea~uθ[θloguθ(a)θloguθ(a)T]. From Eqs. (8) and (9), we can see that FV is the concatenation of normalized gradient components with respect to each mean μl and standard deviation σl. Hence, FV is a statistical representation, where each component could be associated with elements from different locations in the input feature map. That is, there is no correspondence between each FV component and spatial location in the input image in the conventional FV method.

Different from natural images, neuroimages have strong spatial consistency of structures across subjects, due to the globally-similar and locally-different characteristic of human brains. Upon this, previous studies usually segment the brain into several ROIs for neuroimage analysis, where each specific ROI statistically has similar location, shape, and volume in different brains [54]. Intuitively, such spatial structure consistency among different brains can be utilized as the prior knowledge to help learn discriminative representations of neuroimages for brain disease diagnosis. Accordingly, we develop a spatially-constrained Fisher representation method, by explicitly incorporating a spatial constraint (i.e., each Gaussian component is corresponding to a fixed location in the brain) into the FV method.

For a feature map A={akRD;k=1,,K} with K elements, we use each element ak to represent a location-specific descriptor for A. Assume local features of all subjects at the l-th location follow a Gaussian distribution μl=(μl,σl), where μl and σl are the to-be-learned network parameters (see Fig. 4). Then, the l-th location could be represented by a spatially-constrained FV component as

{Bl,μ=1Klk=1Klτl(ak)[akμlσl],Bl,σ=12Klk=1Klτl(ak)[(akμl)2σl21], (11)

Where

τl(ak)={1,akηl;0otherwise , (12)

and ηl is an element set containing all neighboring elements of the l-th location in the feature map A, and Kl=|ηl| is the number of elements in ηl. For example, as shown in Fig. 4, there are Kl = 27 elements in ηl, since the size of our proposed Fisher unit is 3 × 3 × 3. Note that the term τl(ak) is a spatial constraint used to define the spatial structure consistency among different subjects (e.g., the k-th locations in different brains simultaneously contribute to the computation of the l-th FV component). Note that (11) can be considered as a special case of (9) with ωl=1/L, where all L components have equal importance. Thus, we do not explicitly include ωl in (11).

Given L components, we can finally represent the input feature map A via our proposed spatially-constrained Fisher representation (SCFR) method, by concatenating all these location-specific FV components as follows:

B=(B1T,,BLT)T. (13)

From (13), one can see that SCFR is an aggregation transformation from local features to global image-level representation. Based on this representation, for any two subjects represented by B(i) and B(j), their similarity K(B(i), B(j)) is a Mercer’s kernel [55] as

K(B(i),B(j))=B(i)TB(j)=Σl=1L[Bl(i)TBl(j)], (14)

which is the sum of component-wise similarities and each component corresponds to a specific location in the brain. The proposed SCFR itself is a non-linear transformation that maps an image to a high-dimensional representation, which can followed by a linear or non-linear kernel for classification. For simplicity, we further feed the learned SCFR representation for each subject to a fully-connected layer with a soft-max activation for classification, as shown in Fig. 4.

Our proposed SCFR method in (13) is defined on single-modal (such as MRI). Given a pair of MRI and PET scans from the same subject as the input, we further extend SCFR to a multi-modal variant as

B=(B1T,,BLT,BL+1T,,B2LT)T, (15)

where the first L components are from the first modality (e.g., MRI) and the remaining L components are from the second modality (e.g., PET).

As reported in previous studies [13], [43], [44], [56], the AD-associated brain regions mainly locate in the hippocampus, amygdala, etc. It means that different brain regions could have different contributions to disease classification tasks. Hence, we can further improve the SCFR method by explicitly considering different contributions of different regions as follows

B=(w1B1T,,wLBLT)T, (16)

where wl(l=1,,L) is the to-be-learned weight for the l-th component, with each component corresponding to a specific brain region. Besides, one can perform network pruning by empirically setting ωl=0. In this way, the lth brain region will not contribute to the final classification. In our experiments, we initialize ωl=1 and its optimal value will be automatically tuned during the training of SCFR.

Given the feature map (size: 9 × 11 × 9) learned by the backbone, each Fisher layer contains 80 Fisher units, with each unit corresponding to a specific location in the brain. Accordingly, we can further prune our SCFR network to discard those uninformative or less informative regions. After training the initial SCFR model, we can represent each of 80 location proposals by a 128-dimensional feature from each Fisher unit, and further infer the discriminative capabilities of these location proposals (with each location corresponding to a specific Fisher unit), based on their resulting classification scores on the training set. We then refine the initial SCFR model by removing those uninformative Fisher units, and only select the top 10 informative Fisher units. Such a network pruning strategy is expected to boost the classification performance by excluding those uninformative brain regions.

3). Network Extension:

As illustrated in Fig. 4, our SCFR is an end-to-end classification framework using multi-modal data (such as MRI and PET), with 3D backbone subnetworks being employed to extract features from input 3D volumes. Besides using 3D backbone subnetworks, we can also extend this framework by using 2D backbone subnetworks, with 2D slices extracted from 3D neuroimages as the input data. Details of two backbone variants of our proposed SCFR model are shown in Fig. S1 in the Supplementary Materials. We denote SCFR with a 2D backbone as SCFR-2D, while SCFR with a pre-trained VGG-M backbone as SCFR-VGG.

IV. Experiments

A. Materials and Image Pre-processing

We evaluate the proposed methods on three public databases, including (1) the Alzheimer’s Disease Neuroimaging Initiative database (ADNI-1) [12], (2) the ADNI-2 database, and (3) the Australian Imaging, Biomarkers and Lifestyle (AIBL) database [57]. These three databases contain baseline brain images from (1) AD patients, (2) cognitively normal (CN) subjects and (3) MCI individuals. These MCI subjects can be further divided into progressive MCI (pMCI) that would progress to AD within 18 months after the baseline time, and static MCI (sMCI) that would not progress to AD within 18 months after baseline. There are a total of 2, 355 subjects used in this work, which is more than most of the existing studies for neuroimaging-based Alzheimer’s disease diagnosis [13], [24], [25], [47]. The demographic and clinical information of these studied subjects is reported in Table SI of the Supplementary Materials. All MRI and PET scans are pre-processed vis a standard pipeline, with details given in the Supplementary Materials.

B. Evaluation of Image Imputation Model

1). Experimental Setup:

We now evaluate the quality of synthetic images generated by our proposed HGAN method. In this group of experiments, we compare HGAN with five generative models, including (1) the baseline GAN (GAN) method with only the adversarial loss, (2) the cycle-consistent GAN (CGAN) method with both the the adversarial loss and the cycle-consistent loss, (3) the variant of our HGAN model (called VGAN) containing both the adversarial loss and our proposed voxel-wise-consistent loss. (4) the 3D UNet that has same architecture with [58] but the voxel-wise-consistent loss (see (6)) to deal with problems with continuous output values in the implementation, and (5) the Pixel-2-Pixel GAN (P2PGAN) that uses the original architecture [34] but use the 3D convolutional/deconvolutional layers. Note that the first three competing methods (i.e., GAN, CGAN, and VGAN) have the same network architecture but different loss functions with HGAN, while UNet and P2PGAN have different network architecture and different loss functions. We use the Adam solver [59] for network optimization (batch size: 1; learning rate: 2 × 10−3; epoch: 100) for these methods (i.e., GAN, CGAN, VGAN, and HGAN). For all comparison methods, rather than using small image patches as input, we use the entire image (size: 144 × 176 × 144) as the input to avoid the fringe effect cause by padding.

In this group of experiments, we use all subjects with complete MRI and PET data in ADNI-1 to train these networks, and then apply the resulting image imputation models to subjects in ADNI-2 to synthesize MR and PET images. Note that all subjects in ADNI-1 have complete real/ground-truth MRI scans, while only half of them have PET scans. More details can be found in Table SI in the Supplementary Materials. Three evaluation metrics are used to measure the quality of synthetic images generated by different GAN models, including (1) the mean absolute error (MAE) [60], (2) the peak signal-to-noise ratio (PSNR), and (3) the structural similarity index measure (SSIM) [61].

2). Results of Neuroimage Synthesis:

The averaged results in terms of PSNR, MAE and SSIM on all complete subjects (i.e., having both real/ground-truth MRI and PET scans) in ADNI-2 are reported in Table I. From Table I, one can have the following observations. On one hand, the results yielded by VGAN on both synthetic MRI and PET data are consistently better than those achieved by GAN and CGAN. As two improved variants of GAN, the VGAN method outperforms CGAN regarding three evaluation metrics. The possible reason is that the cycle-consistent loss cannot guarantee the spatial consistency between a synthetic image and its corresponding real image, thus leading to relatively poor image quality. In contrast, the voxel-wise-consistent loss is used to directly encourage the spatial consistency between a synthetic image and its corresponding real image. These results demonstrate that our proposed voxel-wise-consistent loss is useful in improving the quality of synthetic images. On the other hand, our proposed HGAN method (with the voxel-wise-consistent, cycle-consistent, and adversarial losses) usually outperforms three competing methods. For example, for synthetic PET images, the PSNR achieved by HGAN is 30.24, which is higher than those of the other methods (27.13 for GAN, 27.16 for CGAN, and 28.98 for VGAN). These results suggest that the synthetic scans achieved by our HGAN model have acceptable image quality regarding all four evaluation metrics.

TABLE I.

Image synthesis results (mean ± standard deviation) achieved by six different image synthesis methods for both MRI and PET scans of subjects in ADNI-2, with models trained on ADNI-1.

Method Synthetic MRI Synthetic PET
PSNR SSIM (%) MAE (%) PSNR SSIM (%) MAE (%)
UNet 26.03 ± 0.79 67.89 ± 3.02 10.67 ± 1.07 29.74 ± 1.49 68.41 ± 5.55 8.01 ± 1.56
P2PGAN 24.51 ± 0.70 62.81 ± 3.56 13.04 ± 1.18 29.46 ± 1.33 67.12 ± 5.06 8.35 ± 1.42
GAN 23.54 ± 1.02 54.35 ± 4.32 17.02 ± 2.67 27.13 ± 1.50 55.27 ± 4.62 11.62 ± 2.83
CGAN 23.19 ± 0.95 55.62 ± 3.71 15.34 ± 2.46 27.16 ± 1.72 58.15 ± 5.25 10.70 ± 3.18
VGAN 24.96 ± 1.06 64.35 ± 4.73 12.24 ± 1.98 28.98 ± 2.19 65.14 ± 7.43 8.78 ± 3.52
HGAN (Ours) 26.07 ± 1.02 66.83 ± 4.51 10.70 ± 2.13 30.24 ± 2.06 69.45 ± 7.17 7.57 ± 3.46

Furthermore, we fed the synthetic PET images to the single-modal SCFR model (using PET data only) for the classification of AD vs. CN, and then measured the classification performance by the area under the receiver operating characteristic (AUC), which is 94.63% for real PET images. We achieved an AUC value of 90.76% when the PET images are synthesized by our HGAN and achieved AUC values of 88.38%, 87.86%, 65.81%, 72.64%, and 88.06% when the PET images are synthesized by U-Net, P2PGAN, GAN, CGAN, and VGAN, respectively. These results show that the use of different neuroimage synthesis models affects the performance of disease diagnosis and, compared to five competing models, our HGAN can generate more reasonable PET scans for more accurate AD diagnosis.

Besides, we visually show the synthetic MRI and PET scans generated by four methods and their ground-truth images for two typical subjects from ADNI-2 in Fig. 5. Note that those two subjects shown Fig. 5 have both real MRI and PET scans, while not all subjects in three datasets (i.e., ADNI-1, ADNI-2, and AIBL) have real PET images. This figure suggests that PET and MR images synthesized by our HGAN look very similar to their corresponding real images, and images generated by GAN and CGAN look worse than those yielded by HGAN and VGAN. These results demonstrate that our HGAN method is reasonable to synthesize missing PET scans, by generating images with acceptable quality. More visual results on ADNI-2 and AIBL can be found in Fig. S2 and Fig. S3 of the Supplementary Materials, respectively.

Fig. 5.

Fig. 5.

Illustration of synthetic PET and MRI scans generated by six methods for two typical subjects in ADNI-2 as well as their corresponding ground-truth images. Each row denotes a specific subject. The first six columns show the synthetic images generated by six different methods, while the last column indicates the ground truth. For the enlarged synthetic PET images, the pink and green regions mean higher and lower intensities than the ground truth.

C. Evaluation of Disease Diagnosis Model

1). Experimental Setup:

After imputing the missing PET images using our HGAN model, we evaluate the proposed SCFR method on both tasks of AD classification (AD vs. CN) and MCI conversion prediction (pMCI vs. sMCI) using subjects with complete MRI and PET (both real and synthetic) scans. Six metrics are used for performance evaluation, including accuracy (ACC), sensitivity (SEN), specificity (SPE), F1-Score (F1S), the area under the receiver operating characteristic (AUC) and Matthews correlation coefficient (MCC) [62]. We use ADNI-1 as the training set, while ADNI-2 and AIBL are treated as two independent test sets in the experiments.

We compare SCFR with 4 conventional approaches, including (1) gray matter (GM) volume within 116 regions-of-interest (denoted as ROI) [13], [63], (2) patch-based morphometry (denoted as PBM) [41], (3) landmark-based local energy patterns (LLEP) [7], (4) landmark-based deep single-instance learning (LDSIL) [23], [25], and (5) landmark-based deep multi-instance learning (LDMIL) [23]. We further compare SCFR with its two variants with different backbones, including the SCFR-2D method with a 2D backbone (see Fig. S1 (a)) and the SCFR-VGG method with VGG-M [64] as the backbone (see Fig. S1 (b)). The details of these seven competing methods can be found in the Supplementary Materials. Note that four methods (i.e., LDSIL, LDMIL, SCFR-2D, and SCFR) perform classification in an end-to-end manner, with feature learning and classifier construction integrated into a unified framework. The remaining four methods (i.e., ROI, PBM, LLEP, and SCFR-VGG) rely on the linear support vector machine (SVM) for classification. To validate the usefulness of synthetic PET images generated by our HGAN model, we use both single-modal (i.e., MRI) data and multi-modal (i.e., MRI and PET) data for disease classification in each method. To avoid overfitting, we early stop the network training of SCFR and its variants by using the epoch of 100.

2). Classification Results:

Using models trained on ADNI-1, classification results achieved by eight different methods using single-modal and multi-modal data on ADNI-2 are reported in Table II, while those on AIBL are shown in Table SII in the Supplementary Materials. From Table II, we can have the following observations. First, methods using multi-modal data (i.e., MRI and PET) usually outperform their counterparts with single-modal data (i.e., MRI). For instance, our SCFR method using MRI and PET data achieves an ACC value of 93.58% in AD vs. CN classification, which is higher than SCFR with only MRI data (i.e., 91.44%). These results further suggest the complementary information provided by MRI and PET is essential to improve the disease classification performance, and also validate the usefulness of our synthetic PET scans via HGAN. Second, deep learning methods (i.e., LDSIL, LDMIL, SCFR-VGG, SCFR-2D, and SCFR) usually outperform the conventional methods with handcrafted features (i.e., ROI, PBM, and LLEP). The possible reason is that the features learned from deep networks are task-oriented, while those handcrafted features are defined independently from classifier construction. Third, among four deep learning methods, our SCFR-2D and SCFR methods (with spatially-constrained Fisher representation) usually outperform LDSIL and LDMIL, suggesting the rationality of our proposed statistical information based image representation strategy. Besides, the competing methods (i.e., LLEP, LDSIL, and LDMIL) using pre-defined anatomical landmarks generally perform worse than our SCFR method that automatically discover disease-associated brain regions. This may be due in part to the fact that the landmark definition in these competing methods is independent of the classification model construction, so features learned from patches (located by landmarks) are not well coordinated with subsequent SVM classifiers. Finally, SCFR with 3D backbone achieves overall better performance than SCFR-VGG (with the pre-trained VGG-M backbone) and SCFR-2D (with the 2D backbone). This could due to the fact that the 3D backbone used in SCFR can take advantage of the structure information of brain images, thus yielding more discriminative features and improved classification performance.

TABLE II.

Performance (%) of eight different methods using single-modal data (i.e., MRI) and multi-modal data (i.e., MRI+PET) on ADNI-2 in both tasks of AD classification (AD vs. CN classification) and MCI conversion prediction (pMCI vs. sMCI classification).

Modality Method AD vs. CN classification pMCI vs. sMCI classification
ACC AUC SEN SPE F1S MCC ACC AUC SEN SPE F1S MCC
MRI ROI 79.41 86.22 83.64 76.08 78.19 59.30 68.25 68.62 60.47 69.30 31.33 20.37
PBM 82.22 88.11 77.36 86.07 79.35 63.83 71.59 72.64 65.12 72.47 35.44 26.15
LLEP 84.76 90.47 80.61 88.04 82.35 69.00 72.98 72.39 65.12 74.05 36.60 27.59
LDSIL 89.30 94.80 86.06 91.87 87.65 78.27 72.42 80.21 74.42 72.15 39.26 32.06
LDMIL 90.37 95.77 88.48 91.87 89.02 80.46 74.09 80.98 74.42 74.05 40.76 33.81
SCFR-VGG 88.50 94.97 87.27 89.47 87.01 76.70 74.37 80.50 72.09 74.68 40.26 32.86
SCFR-2D 88.50 95.22 90.30 87.08 87.39 76.98 76.32 82.79 74.42 76.58 42.95 36.30
SCFR (Ours) 91.44 96.26 89.70 92.82 90.24 82.63 76.32 81.50 79.07 75.95 44.44 38.75
MRI+PET ROI 82.89 90.96 78.18 86.60 80.12 65.18 70.75 71.50 62.79 71.84 33.96 24.04
PBM 82.22 88.11 77.36 86.07 79.35 63.83 74.09 73.82 67.44 75.00 38.41 30.05
LLEP 87.43 91.89 84.24 89.95 85.54 74.46 71.03 75.00 74.42 70.57 38.10 30.66
LDSIL 90.91 96.14 87.88 93.30 89.51 81.54 73.54 80.94 74.42 73.42 40.25 33.21
LDMIL 91.18 96.08 89.70 92.34 89.97 82.10 75.77 81.45 72.09 76.27 41.61 34.42
SCFR-VGG 91.44 95.50 88.48 93.78 90.12 82.62 75.49 81.97 72.09 75.95 41.33 34.10
SCFR-2D 92.78 96.17 89.70 95.22 91.64 85.36 77.77 83.32 76.74 77.85 45.20 39.18
SCFR (Ours) 93.58 96.95 91.52 95.22 92.64 86.97 77.44 82.51 79.07 77.22 45.64 40.06

D. Influence of Network Pruning in SCFR

In Section IV-C, we employ the pruned SCFR network for classification to reduce the computational complexity, by keeping only 10 Fisher units. We now investigate the influence of the network pruning strategy in SCFR, by varying the number of to-be-kept Fisher units from 5 to 80. We report the classification results achieved by SCFR with different amounts of Fisher units in both AD diagnosis and MCI conversion prediction tasks in Fig. 6. It can be seen from Fig. 6 that our SCFR method achieves good results when the number of Fisher units is within [10, 20], and the performance is not largely improved using more than 20 Fisher units. This implies that those less informative brain regions could negatively affect the diagnostic performance, further validating the rationality of our proposed network pruning strategy in SCFR to discard uninformative regions. In the Supplementary Materials, we further evaluate the predictive capability of SCFR in MCI conversion prediction within 18 and 36 months after baseline, illustrate the importance maps for disease diagnosis, and investigate the influence of different losses and different numbers of RNBs in HGAN, and study the reliability of synthetic PET scans.

Fig. 6.

Fig. 6.

Influence of the number of Fisher units on SCFR in (a) AD vs. CN classification, and (b) pMCI vs. sMCI classification.

E. Limitations and Future Work

There are several technical issues to be considered in the future. First, the training of our proposed HGAN model for image imputation is independent of the subsequent classification task, which limits the discriminative capacity of the generated images for disease identification. It is desired to integrate the process of missing image synthesis and classification model training into a unified framework. Second, we employ Fisher units with a fixed size (i.e., 3 × 3 × 3) in the proposed SCFR model, and hence the learned statistical features are limited to describe brain regions with a fixed size (i.e., 32 × 32 × 32). It is reasonable to use Fisher units with flexible sizes in SCFR to capture multi-scale structural information of neuroimages, which will be our future work. Besides, we directly apply models trained on ADNI-1 to ADNI-2 and AIBL, without considering that these databases may have different data distributions. Data harmonization/adaptation techniques [65], [66] will be used to alleviate the negative influence of different data distributions in the future.

V. Conclusion

In this paper, we have proposed a spatially-constrained Fisher representation framework for brain disease diagnosis, using incomplete multi-modal neuroimaging data (i.e., MRI and PET). Specifically, in the first stage, we develop a hybrid generative adversarial network (HGAN) with a hybrid loss function to impute those missing PET images based on their corresponding MRI scans. In the second stage, with the complete (after imputation) MRI and PET for each subject, we develop a spatially-constrained Fisher representation (SCFR) network to extract statistical descriptors of multi-modal neuroimaging data for brain disease diagnosis. Experimental results on three datasets have demonstrated the efficacy of our method in neuroimage synthesis and brain disease diagnosis.

Supplementary Material

supp1-2983085

Acknowledgment

We thank the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Australian Imaging, Biomarker & Lifestyle Flagship Study of Ageing (AIBL) for data collection and sharing.

Y. Pan and Y. Xia were partially supported by the National Natural Science Foundation of China under Grant 61771397, the Science and Technology Innovation Committee of Shenzhen Municipality, China, under Grants JCYJ20180306171334997, and the Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University under Grant CX201835. M. Liu, C. Lian, and D. Shen were partially supported by NIH grants (Nos. EB008374, AG041721).

References

  • [1].Buyken A, Mitchell P, Ceriello A, and Brand-Miller J, “Optimal dietary approaches for prevention of type 2 diabetes: A life-course perspective,” Diabetologia, vol. 53, no. 3, pp. 406–418, 2010. [DOI] [PubMed] [Google Scholar]
  • [2].James BD, Leurgans SE, Hebert LE et al. , “Contribution of Alzheimer disease to mortality in the United States,” Neurology, vol. 82, no. 12, pp. 1045–1050, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Davatzikos C, Genc A, Xu D, and Resnick SM, “Voxel-based morphometry using the RAVENS maps: Methods and validation using simulated longitudinal atrophy,” NeuroImage, vol. 14, no. 6, pp. 1361–1369, 2001. [DOI] [PubMed] [Google Scholar]
  • [4].Nordberg A, Rinne JO, Kadir A, and Långstrom B,¨ “The use of PET in Alzheimer disease,” Nature Reviews Neurology, vol. 6, no. 2, p. 78, 2010. [DOI] [PubMed] [Google Scholar]
  • [5].Calhoun VD and Sui J, “Multimodal fusion of brain imaging data: A key to finding the missing link(s) in complex mental illness,” Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, vol. 1, no. 3, pp. 230–244, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Fan Y, Gur RE, Gur RC, Wu X, Shen D, Calkins ME, and Davatzikos C, “Unaffected family members and schizophrenia patients share brain structure patterns: a high-dimensional pattern classification study,” Biological psychiatry, vol. 63, no. 1, pp. 118–124, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Wee C-Y, Yap P-T, Zhang D, Wang L, and Shen D, “Group-constrained sparse fmri connectivity modeling for mild cognitive impairment identification,” Brain Structure and Function, vol. 219, no. 2, pp. 641–656, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, and Carpenter JR, “Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls,” British Medical Journal, vol. 338, p. b2393, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Marlin B, “Missing data problems in machine learning,” Ph.D. dissertation, 2008.
  • [10].Fan Y, Rao H, Hurt H, Giannetta J, Korczykowski M, Shera D, Avants BB, Gee JC, Wang J, and Shen D, “Multivariate examination of brain abnormality using both structural and functional mri,” NeuroImage, vol. 36, no. 4, pp. 1189–1199, 2007. [DOI] [PubMed] [Google Scholar]
  • [11].Jie B, Zhang D, Gao W, Wang Q, Wee C-Y, and Shen D, “Integration of network topological and connectivity properties for neuroimaging classification,” IEEE Transactions on Biomedical Engineering, vol. 61, no. 2, pp. 576–589, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Jack CR Jr, Bernstein MA, Fox NC, Thompson P, Alexander G, Harvey D, Borowski B, Britson PJ, Whitwell JL, Ward C et al. , “The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods,” Journal of Magnetic Resonance Imaging, vol. 27, no. 4, pp. 685–691, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Zhang D, Wang Y, Zhou L et al. , “Multimodal classification of Alzheimer’s disease and mild cognitive impairment,” NeuroImage, vol. 55, no. 3, pp. 856–867, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Zhang D and Shen D, “Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease,” NeuroImage, vol. 59, no. 2, pp. 895–907, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Donders ART, Van Der Heijden GJ, Stijnen T, and Moons KG, “A gentle introduction to imputation of missing values,” Journal of Clinical Epidemiology, vol. 59, no. 10, pp. 1087–1091, 2006. [DOI] [PubMed] [Google Scholar]
  • [16].Parker R, Missing data problems in machine learning. VDM Verlag, 2010. [Google Scholar]
  • [17].Efron B, “Missing data, imputation, and the bootstrap,” Journal of the American Statistical Association, vol. 89, no. 426, pp. 463–475, 1994. [Google Scholar]
  • [18].Krizhevsky A, Sutskever I, and Hinton GE, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012, pp. 1097–1105. [Google Scholar]
  • [19].Simonyan K and Zisserman A, “Very deep convolutional networks for large-scale image recognition,” In the 3rd International Conference on Learning Representations, 2015. [Google Scholar]
  • [20].He K, Zhang X, Ren S, and Sun J, “Deep residual learning for image recognition,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2016, pp. 770–778. [Google Scholar]
  • [21].Szegedy C, Ioffe S, Vanhoucke V, and Alemi AA, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in AAAI Conf. on Artificial Intelligence, 2017. [Google Scholar]
  • [22].Zhang J, Liu M, An L et al. , “Alzheimer’s disease diagnosis using landmark-based features from longitudinal structural MR images,” IEEE Journal of Biomedical and Health Informatics, vol. 21, no. 6, pp. 1607–1616, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Liu M, Zhang J, Nie D et al. , “Anatomical landmark based deep feature representation for MR images in brain disease diagnosis,” IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 5, pp. 1476–1485, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Zhang J, Gao Y, Gao Y, Munsell BC, and Shen D, “Detecting anatomical landmarks for fast Alzheimer’s disease diagnosis,” IEEE Transactions on Medical Imaging, vol. 35, no. 12, pp. 2524–2533, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Liu M, Zhang J, Adeli E, and Shen D, “Landmark-based deep multi-instance learning for brain disease diagnosis,” Medical Image Analysis, vol. 43, pp. 157–168, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Kohannim O, Hua X, Hibar DP, Lee S, Chou Y-Y, Toga AW, Jack CR Jr, Weiner MW, and Thompson PM, “Boosting power for clinical trials using classifiers based on multiple biomarkers,” Neurobiology of Aging, vol. 31, no. 8, pp. 1429–1442, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Suk H-I, Lee S-W, and Shen D, “Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis,” NeuroImage, vol. 101, pp. 569–582, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Suk H-I and Shen D, “Deep learning-based feature representation for AD/MCI classification,” in Int. Conf. on Medical Image Computing and Computer-Assisted Intervention Springer, 2013, pp. 583–590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Van Buuren S, Flexible imputation of missing data. Chapman and Hall/CRC, 2018. [Google Scholar]
  • [30].Cismondi F, Fialho AS, Vieira SM, Reti SR, Sousa JM, and Finkelstein SN, “Missing data in medical databases: Impute, delete or classify?” Artificial Intelligence in Medicine, vol. 58, no. 1, pp. 63–72, 2013. [DOI] [PubMed] [Google Scholar]
  • [31].Pan Y, Liu M, Lian C, Zhou T, Xia Y, and Shen D, “Synthesizing missing PET from MRI with cycle-consistent generative adversarial networks for Alzheimer’s disease diagnosis,” in Proc. of Int. Conf. on Medical Image Computing and Computer-Assisted Intervention Springer, 2018, pp. 455–463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Goodfellow I, Pouget-Abadie J, Mirza M et al. , “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, pp. 2672–2680. [Google Scholar]
  • [33].Zhu J-Y, Park T, Isola P, and Efros AA, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. of IEEE Int. Conf. on Pattern Recognition, 2017, pp. 2223–2232. [Google Scholar]
  • [34].Isola P, Zhu J-Y, Zhou T, and Efros AA, “Image-to-image translation with conditional adversarial networks,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2017, pp. 1125–1134. [Google Scholar]
  • [35].Li R, Zhang W, Suk H-I, Wang L, Li J, Shen D, and Ji S, “Deep learning based imaging data completion for improved brain disease diagnosis,” in Int. Conf. on Medical Image Computing and Computer-Assisted Intervention Springer, 2014, pp. 305–312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Baron J, Chetelat G, Desgranges B, Perchey G, Landeau B, De La Sayette V, and Eustache F, “In vivo mapping of gray matter loss with voxel-based morphometry in mild Alzheimer’s disease,” NeuroImage, vol. 14, no. 2, pp. 298–309, 2001. [DOI] [PubMed] [Google Scholar]
  • [37].Ishii K, Kawachi T, Sasaki H, Kono AK, Fukuda T, Kojima Y, and Mori E, “Voxel-based morphometric comparison between early-and late-onset mild Alzheimer’s disease and assessment of diagnostic performance of z score images,” American Journal of Neuroradiology, vol. 26, no. 2, pp. 333–340, 2005. [PMC free article] [PubMed] [Google Scholar]
  • [38].Cuingnet R, Gerardin E, Tessieras J, Auzias G, Lehéricy S, Habert M-O, Chupin M, Benali H, and Colliot O, “Automatic classification of patients with Alzheimer’s disease from structural MRI: A comparison of ten methods using the ADNI database,” NeuroImage, vol. 56, no. 2, pp. 766–781, 2011. [DOI] [PubMed] [Google Scholar]
  • [39].Walhovd K, Fjell A, Brewer J, McEvoy L, Fennema-Notestine C, Hagler D, Jennings R, Karow D, and Dale A, “Combining MR imaging, positron-emission tomography, and CSF biomarkers in the diagnosis and prognosis of Alzheimer disease,” American Journal of Neuroradiology, vol. 31, no. 2, pp. 347–354, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Liu M, Zhang D, and Shen D, “Ensemble sparse classification of Alzheimer’s disease,” NeuroImage, vol. 60, no. 2, pp. 1106–1116, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Liu M, Zhang D, Shen D, and Alzheimer’s Disease Neuroimaging Initiative, “Hierarchical fusion of features and classifier decisions for Alzheimer’s disease diagnosis,” Human Brain Mapping, vol. 35, no. 4, pp. 1305–1319, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Zhou T, Thung K-H, Zhu X, and Shen D, “Feature learning and fusion of multimodality neuroimaging and genetic data for multi-status dementia diagnosis,” in International Workshop on Machine Learning in Medical Imaging. Springer, September 2017, pp. 132–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Laakso M, Soininen H, Partanen K, Helkala E-L, Hartikainen P, Vainio P, Hallikainen M, Hänninen T, and Riekkinen Sr P, “Volumes of hippocampus, amygdala and frontal lobes in the mri-based diagnosis of early Alzheimer’s disease: Correlation with memory functions,” Journal of Neural Transmission-Parkinson’s Disease and Dementia Section, vol. 9, no. 1, pp. 73–86, 1995. [DOI] [PubMed] [Google Scholar]
  • [44].Barnes J, Whitwell JL, Frost C, Josephs KA, Rossor M, and Fox NC, “Measurements of the amygdala and hippocampus in pathologically confirmed Alzheimer disease and frontotemporal lobar degeneration,” Archives of Neurology, vol. 63, no. 10, pp. 1434–1439, 2006. [DOI] [PubMed] [Google Scholar]
  • [45].Lötjönen J, Wolz R, Koikkalainen J, Julkunen V, Thurfjell L, Lundqvist R, Waldemar G, Soininen H, and Rueckert D, “Fast and robust extraction of hippocampus from MR images for diagnostics of Alzheimer’s disease,” NeuroImage, vol. 56, no. 1, pp. 185–196, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Achterberg HC, van der Lijn F, den Heijer T, Vernooij MW, Ikram MA, Niessen WJ, and de Bruijne M, “Hippocampal shape is predictive for the development of dementia in a normal, elderly population,” Human Brain Mapping, vol. 35, no. 5, pp. 2359–2371, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Cui R and Liu M, “Hippocampus analysis by combination of 3D densenet and shapes for Alzheimer’s disease diagnosis,” IEEE Journal of Biomedical and Health Informatics, 2018. [DOI] [PubMed] [Google Scholar]
  • [48].Huang G, Liu Z, Van Der Maaten L, and Weinberger KQ, “Densely connected convolutional networks,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2017, pp. 4700–4708. [Google Scholar]
  • [49].Ortiz A, Munilla J, Gorriz JM, and Ramirez J, “Ensembles of deep learning architectures for the early diagnosis of the Alzheimer’s disease,” International Journal of Neural Systems, vol. 26, no. 07, p. 1650025.(23 pages), 03 2016. [DOI] [PubMed] [Google Scholar]
  • [50].Sánchez J, Perronnin F, Mensink T, and Verbeek J, “Image classification with the fisher vector: Theory and practice,” International Journal of Computer Vision, vol. 105, no. 3, pp. 222–245, 2013. [Google Scholar]
  • [51].Cimpoi M, Maji S, Kokkinos I, and Vedaldi A, “Deep filter banks for texture recognition, description, and segmentation,” International Journal of Computer Vision, vol. 118, no. 1, pp. 65–94, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Liu L, Wang P, Shen C, Wang L, Van Den Hengel A, Wang C, and Shen HT, “Compositional model based fisher vector coding for image classification,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 39, no. 12, pp. 2335–2348, 2017. [DOI] [PubMed] [Google Scholar]
  • [53].Dixit M, Chen S, Gao D, Rasiwasia N, and Vasconcelos N, “Scene classification with semantic fisher vectors,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2015, pp. 2974–2983. [Google Scholar]
  • [54].Mai JK, Majtanik M, and Paxinos G, Atlas of the human brain. Academic Press, 2015. [Google Scholar]
  • [55].Lyu S, “Mercer kernels for object recognition with local features,” in IEEE Conf. on Computer Vision and Pattern Recognition, vol. 2 IEEE, 2005, pp. 223–229. [Google Scholar]
  • [56].Wachinger C, Salat DH, Weiner M, and Reuter M, “Whole-brain analysis reveals increased neuroanatomical asymmetries in dementia for hippocampus and amygdala,” Brain, vol. 139, no. 12, pp. 3253–3266, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Ellis KA, Bush AI, Darby D, De Fazio D, Foster J, Hudson P, Lautenschlager NT, Lenzo N, Martins RN, Maruff P et al. , “The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: Methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer’s disease,” International Psychogeriatrics, vol. 21, no. 4, pp. 672–687, 2009. [DOI] [PubMed] [Google Scholar]
  • [58].Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, and Ronneberger O, “3d u-net: Learning dense volumetric segmentation from sparse annotation,” in Int. Conf. on Medical Image Computing and Computer-Assisted Intervention, Ourselin S, Joskowicz L, Sabuncu MR, Unal G, and Wells W, Eds. Cham: Springer International Publishing, 2016, pp. 424–432. [Google Scholar]
  • [59].Kingma DP and Ba J, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:14126980, 2014. [Google Scholar]
  • [60].Willmott CJ and Matsuura K, “Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance,” Climate Research, vol. 30, no. 1, pp. 79–82, 2005. [Google Scholar]
  • [61].Hore A and Ziou D, “Image quality metrics: PSNR vs. SSIM,” in Proc. of IEEE Int. Conf. on Pattern Recognition IEEE, August 2010, pp. 2366–2369. [Google Scholar]
  • [62].Koyejo OO, Natarajan N, Ravikumar PK, and Dhillon IS, “Consistent binary classification with generalized performance metrics,” in Advances in Neural Information Processing Systems, 2014, pp. 2744–2752. [Google Scholar]
  • [63].Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, Mazoyer B, and Joliot M, “Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain,” NeuroImage, vol. 15, no. 1, pp. 273–289, 2002. [DOI] [PubMed] [Google Scholar]
  • [64].Chatfield K, Simonyan K, Vedaldi A, and Zisserman A, “Return of the devil in the details: Delving deep into convolutional nets,” in British Machine Vision Conference, 2014. [Google Scholar]
  • [65].Wang M, Zhang D, Huang J, Yap P-T, Shen D, and Liu M, “Identifying autism spectrum disorder with multi-site fMRI via low-rank domain adaptation,” IEEE Transactions on Medical Imaging, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [66].Jovicich J, Barkhof F, Babiloni C, Herholz K, Mulert C, van Berckel BN, Frisoni GB, S.-N. J. W. Group et al. , “Harmonization of neuroimaging biomarkers for neurodegenerative diseases: A survey in the imaging community of perceived barriers and suggested actions,” Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring, vol. 11, pp. 69–73, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp1-2983085

RESOURCES