Addressing multi‐site functional MRI heterogeneity through dual‐expert collaborative learning for brain disease identification

Yuqi Fang; Guy G Potter; Di Wu; Hongtu Zhu; Mingxia Liu

doi:10.1002/hbm.26343

. 2023 May 25;44(11):4256–4271. doi: 10.1002/hbm.26343

Addressing multi‐site functional MRI heterogeneity through dual‐expert collaborative learning for brain disease identification

Yuqi Fang ¹, Guy G Potter ², Di Wu ³, Hongtu Zhu ³, Mingxia Liu ^1,^✉

PMCID: PMC10318248 PMID: 37227019

Abstract

Several studies employ multi‐site rs‐fMRI data for major depressive disorder (MDD) identification, with a specific site as the to‐be‐analyzed target domain and other site(s) as the source domain. But they usually suffer from significant inter‐site heterogeneity caused by the use of different scanners and/or scanning protocols and fail to build generalizable models that can well adapt to multiple target domains. In this article, we propose a dual‐expert fMRI harmonization (DFH) framework for automated MDD diagnosis. Our DFH is designed to simultaneously exploit data from a single labeled source domain/site and two unlabeled target domains for mitigating data distribution differences across domains. Specifically, the DFH consists of a domain‐generic student model and two domain‐specific teacher/expert models that are jointly trained to perform knowledge distillation through a deep collaborative learning module. A student model with strong generalizability is finally derived, which can be well adapted to unseen target domains and analysis of other brain diseases. To the best of our knowledge, this is among the first attempts to investigate multi‐target fMRI harmonization for MDD diagnosis. Comprehensive experiments on 836 subjects with rs‐fMRI data from 3 different sites show the superiority of our method. The discriminative brain functional connectivities identified by our method could be regarded as potential biomarkers for fMRI‐related MDD diagnosis.

Keywords: functional MRI, harmonization, knowledge distillation, major depressive disorder

We propose a dual‐expert fMRI harmonization framework to reduce inter‐site data heterogeneity in MDD diagnosis. A general domain‐generic model is derived which can well adapt to unseen data. The discriminative functional connectivities identified by our method could be used as potential clinical biomarkers for MDD analysis.

graphic file with name HBM-44-4256-g009.jpg

1. INTRODUCTION

Major depressive disorder (MDD) is a heterogeneous psychiatric disorder with high morbidity and mortality rate, caused by a combination of genes, environment, and psychology (Fan et al., 2020; Lewinsohn et al., 1998; Lohoff, 2010). It is estimated that the lifetime prevalence of MDD is around 16.2% (Yildirim et al., 2022), and long‐term MDD treatment brings society much manpower and economic burden. Timely and accurate diagnosis for MDD patients is of great significance to help clinicians develop early intervention plans. Resting‐state functional magnetic resonance imaging (rs‐fMRI) offers a noninvasive imaging technique that measures spontaneous neuronal activity through the blood‐oxygen‐level‐dependent (BOLD) signals. It has been widely used to investigate brain functional architecture and demonstrated great clinical values in automated MDD diagnosis (Dichter et al., 2015; Luo et al., 2021).

Several methods have been designed for MDD identification based on multi‐site fMRI data (Gallo et al., 2023; Shi et al., 2020). However, these methods usually assume that fMRI data are sampled from the same distribution while neglecting significant inter‐site heterogeneity caused by the use of different scanners/protocols, which leads to unsatisfactory classification results. A few studies consider inter‐site heterogeneity and study cross‐domain fMRI adaptation for other brain disease diagnosis such as autism spectrum disorder (Gao et al., 2019; Shi, Xin, & Zhang, 2021). These methods are generally designed for single‐target domain adaptation (STDA) problems (see Figure 1a), where the source and target domains correspond to training and testing data, respectively. They cannot generate models with strong generalizability on multi‐target domains with more complex data distributions. Moreover, spatiotemporal feature patterns and topological characteristics conveyed in fMRI signals are often ignored in these studies.

Illustration of (a) single‐target domain adaptation (STDA) and (b–d) three kinds of multi‐target domain adaptation (MTDA) strategies, with one labeled source domain (S) and two unlabeled target domains (T ₁ and T ₂) involved. (a) *STDA* performs cross‐domain adaptation f between a source and a target domain. (b) *Unified MTDA* directly combines two targets into one cohort and then performs feature adaptation. (c) *Progressive MTDA* distills knowledge from each target domain at each iteration, without considering knowledge transfer among targets. (d) *Collaborative MTDA* simultaneously exploits data from the source and two target domains, encouraging knowledge transfer among them for cross‐domain data adaptation.

Several methods in the computer vision field have been proposed to tackle multi‐target domain adaptation (MTDA) problems (Nguyen‐Meidine et al., 2021; Peng et al., 2019). Some of them (Jin et al., 2020; Peng et al., 2019) directly merge multiple targets into one target (see Figure 1b), and then perform domain adaptation following the STDA scheme, neglecting complementary and diverse knowledge contained in multiple target domains. As shown in Figure 1c, Nguyen‐Meidine et al. (2021) consider the specificity of each target domain and propose to progressively/iteratively distill knowledge from multiple teachers/targets to a common student model. However, the progressive MTDA strategy cannot exploit all targets simultaneously and may lead to suboptimal performance.

To this end, we introduce a dual‐expert fMRI harmonization (DFH) framework for automated MDD diagnosis. As illustrated in Figure 2, the DFH consists of (1) a domain‐generic student model that is trained on a labeled source domain and two unlabeled target domains, (2) two domain‐specific teacher/expert models, each trained on a specific source‐target domain pair, and (3) a deep collaborative learning module for cross‐domain knowledge distillation and feature adaptation. The student and expert models are simultaneously learned in a collaborative manner, significantly mitigating data distribution differences across domains, resulting in a generalized student model. To capture spatiotemporal patterns and topological characteristics of fMRI time series, we employ a graph convolution network (GCN) as the feature extractor for both source and target domains. Experiments on 836 subjects with rs‐fMRI data from 3 different sites suggest that our method outperforms several state‐of‐the‐art (SOTA) approaches in automated MDD identification. Our contributions are summarized as follows.

We propose a dual‐expert fMRI harmonization (DFH) framework for MDD diagnosis, which is one of the first attempts for single‐source multi‐target fMRI analysis.
Our DFH can simultaneously exploit data from a single labeled source and two unlabeled target domains through the cooperation of a domain‐generic student model and two domain‐specific expert models based on a hybrid loss function. The student model can be well generalized to unseen targets not involved in the training process, as demonstrated by its good performance on unseen sites.
Experiments on REST‐meta‐MDD Consortium (Yan et al., 2019) show the effectiveness of our method. And the discriminative functional connectivities identified by DFH could be used as potential clinical biomarkers for MDD analysis.
The proposed DFH is a general framework that can be well applied to the analysis of other brain disorders (e.g., mild cognitive impairment, MCI), as evidenced by its superior results in early MCI detection on Alzheimer's Disease Neuroimaging Initiative (ADNI) (Jack Jr et al., 2008).

Illustration of the proposed dual‐expert fMRI harmonization (DFH) framework for MDD diagnosis. The DFH consists of (a) a *domain‐generic student model* trained on the labeled source and unlabeled target data, (b) two *domain‐specific expert models* trained on each source‐target domain pair, and (c) a *deep collaborative learning module* for joint learning of the student model and two expert models, encouraging them to cooperate with each other via a hybrid loss. Each student/expert model consists of a feature encoder with a graph convolutional network (GCN) embedded, a class label predictor, and a domain discriminator.

2. RELATED WORK

2.1. Automated MDD identification

Many learning‐based methods have been proposed for MDD diagnosis based on fMRI time series, which can be mainly categorized into two groups, that is, conventional machine learning and deep learning‐based methods. Conventional machine learning approaches mainly utilize handcrafted fMRI features (e.g., node strength) that require expert knowledge, and the commonly used classification models are support vector machines (SVMs) (Ramasubbu et al., 2016; Rosa et al., 2015; Sen et al., 2021), partial least squares regression (Yoshida et al., 2017), and linear discriminant analysis (Ma et al., 2013). Deep learning approaches facilitate us to learn fMRI features in a task‐oriented data‐driven manner. For instance, Noman et al. (2021) design a graph autoencoder to map non‐Euclidean fMRI patterns into latent compact representations, followed by fully connected deep neural networks. Jun et al. (2020) distinguish MDD patients from healthy controls by utilizing the spectral GCN based on a population graph that incorporates resting‐state effective connectivity and non‐imaging phenotypic information.

Existing methods usually use data from one single hospital/site for model training, which often suffer from small‐sample‐size issue. Several studies (Gallo et al., 2023; Shi, Zhang, et al., 2021) employ multi‐site fMRI data to increase the sample size for MDD diagnosis. Nevertheless, these methods often combine data acquired from different sites into one cohort, while neglecting significant inter‐site data heterogeneity, leading to unsatisfactory classification results. Different from current studies, our method takes site diversity into account and performs fMRI feature alignment among different sites via knowledge distillation.

2.2. Functional MRI adaptation

Only a few studies perform cross‐site fMRI harmonization for brain disease diagnosis (Gao et al., 2019; Shi, Xin, & Zhang, 2021). However, these methods only consider the single‐source single‐target setting and cannot produce generalized models that can well adapt to multi‐target domains. Moreover, these methods ignore the spatiotemporal and topological characteristics inherent in fMRI data.

In the computer vision field, several methods (Jin et al., 2020; Nguyen‐Meidine et al., 2021; Peng et al., 2019) have been proposed to tackle multi‐target domain adaptation problems. For instance, Peng et al. (2019) and Jin et al. (2020) directly merge multiple target domains into one target and then perform domain adaptation following the single‐source single‐target setting. But, they neglect data diversity among different domains. Instead, Nguyen‐Meidine et al. (2021) take into consideration the specificity of each target domain and propose to perform knowledge distillation to iteratively distill target domain knowledge from multiple teachers to a common student. However, this method exploits each target at each time, and this progressive practice ignores knowledge transfer among different target domains, which may negatively affect model performance. Differently, our proposed DFH simultaneously exploits data from source and multiple target domains in a collaborative manner, resulting in a general model with strong generalizability (as evidenced by its superior performance in automated MDD diagnosis and early MCI detection). Recently, Tian et al. (2022) propose a multi‐site data harmonization method for site‐effect removal with a traveling‐subject dataset. Specifically, they first design four encoders to extract site factors and brain factors for each subject in source and target domains, and then they use two decoders to synthesize harmonized MR images by combining encoded site and brain factors. The data harmonization is achieved by disentangling site and brain factors by minimizing two reconstruction loss and two consistency loss functions. This harmonization method is designed for image data, which is not much suitable for processing fMRI time‐series data. Moreover, Bell et al. (2022) propose to use ComBat (Johnson et al., 2007) for the harmonization of multi‐site magnetic resonance spectroscopy data, where ComBat performs site harmonization based on the estimation of an empirical statistical distribution.

3. MATERIALS AND METHODOLOGY

3.1. Materials and image preprocessing

3.1.1. Data acquisition

REST‐meta‐MDD Consortium (Yan et al., 2019) is used to evaluate the effectiveness of our DFH, with rs‐fMRI signals from 25 research groups across 17 Chinese sites. In this work, we select the top three largest sites for model validation, including Site‐20 with 282 MDD and 251 healthy controls (HCs), Site‐21 with 85 MDD and 70 HCs, and Site‐1 with 74 MDD and 74 HCs. Site‐20 is treated as the source domain (S), while Site‐21 and Site‐1 are treated as the first target domain (T ₁) and second target domain (T ₂), respectively.

For Site‐20 (S), the resting‐state fMRI data are collected using a Siemens Tim Trio 3 T MRI scanner with a 12‐channel receiver coil. For each MRI scan, the repetition time (TR) is 2,000 ms, echo time (TE) is 30 ms, voxel size is 3.44 × 3.44 × 4.00 mm³, slice number is 32, slice thickness is 3.0 mm, interslice gap is 1.0 mm, number of time points is 242, field‐of‐view (FOV) is 220 × 220, and flip angle is 90°. For Site‐21 (T ₁), fMRI data are collected using a Siemens Tim Trio 3 T MRI scanner with a 32‐channel receiver coil. For each MRI scan, TR is 2,000 ms, TE is 30 ms, voxel size is 3.12 × 3.12 × 4.20 mm³, slice number is 33, slice thickness is 3.5 mm, interslice gap is 0.7 mm, number of time points is 240, FOV is 200 × 200, and flip angle is 90°. For Site‐1 (T ₂), fMRI data are collected using a Siemens Tim Trio 3 T MRI scanner with a 32‐channel receiver coil. For each MRI scan, TR is 2,000 ms, TE is 30 ms, voxel size is 3.28 × 3.28 × 4.80 mm³, slice number is 30, slice thickness is 4.0 mm, interslice gap is 0.8 mm, number of time points is 210, FOV is 210 × 210, and flip angle is 90°. The demographic characteristics of the studied subjects are reported in Table 1.

TABLE 1.

Demographic characteristics of the studied subjects from three sites (i.e., Site‐20, Site‐21, and Site‐1) in the REST‐meta‐MDD Consortium. Site‐20 is used as the source domain, while Site‐21 and Site‐1 are used as two target domains. S: source domain; T _i: ith target domain; MDD: major depressive disorder; HC: healthy control; M/F: Male/Female; Edu: education.

	Site‐20 (S)		Site‐21 (T ₁)		Site‐1 (T ₂)
Group	MDD	HC	MDD	HC	MDD	HC
Subject No.	282	251	85	70	74	74
Gender (M/F)	99/183	87/164	38/47	31/39	31/43	32/42
Age (mean ± std)	38.74 ± 13.74	39.64 ± 15.87	34.84 ± 12.65	36.13 ± 12.64	31.72 ± 8.19	31.80 ± 8.99
Edu (mean ± std)	10.78 ± 3.61	12.97 ± 3.94	11.61 ± 2.88	12.83 ± 2.59	13.80 ± 2.94	15.23 ± 2.26

Open in a new tab

3.1.2. Data preprocessing

The rs‐fMRI BOLD signals are preprocessed following a standardized fMRI processing pipeline based on Data Processing Assistant for Resting‐State fMRI (DPARSF) (Yan & Zang, 2010). Specifically, we first discard the first 10 volumes for a steady state of magnetization. The remaining volumes are processed through slice timing correction, head motion correction, bandpass filtering (0.01–0.10 HZ), nuisance covariate removal (i.e., head motion parameters, white matter, and cerebrospinal fluid), co‐registration between T1‐weighted images and mean functional images and spatial normalization from individual native space to Montreal Neurological Institute (MNI) space.

The processed volumes are partitioned into 116 regions of interest (ROIs) based on the Automated Anatomical Labeling (AAL) atlas. Since rs‐fMRI data from different domains have different time points (i.e., 232 for S, 230 for T ₁, and 200 for T ₂), we select the first 200 time points for all domains. So each subject can be represented by a $Γ \times N$ matrix, where $Γ = 200$ and $N = 116$ denote the numbers of time points and ROIs, respectively. We also standardize each ROI's BOLD signal across all subjects using z‐score normalization. The resulting signals are used as the inputs of the proposed DFH.

3.2. Proposed methodology

3.2.1. Domain‐generic student model

Given a labeled source domain and two unlabeled target domains, our goal is to learn a generalizable student model, with only label information of source data as supervision. By taking into consideration the complementary knowledge contained in each specific target domain, the student model can be well adapted to the target domains by mitigating cross‐domain discrepancy. We denote a labeled source domain as $S = \{(X_{src}, Y_{src})\}$ and the ith unlabeled target domain as $T_{i} = \{(X_{t_{i}})\}$ ( $i = 1, 2$ ).

As demonstrated in Figure 2, the student model is comprised of a feature extractor, a class label predictor, and a domain discriminator. The feature extractor aims to capture spatiotemporal fMRI patterns based on the graph convolutional network (GCN) (Gadgil et al., 2020). Specifically, we first concatenate BOLD signals of each ROI of all subjects to construct an adjacency functional connectivity matrix (F), with each node being an ROI and each edge weight denoting Pearson correlation coefficients (PCC) between paired ROIs. Based on this functional connectivity matrix, for each ROI (represented by a $Γ$ ‐dimensional BOLD signal vector), we use a spatial graph convolutional unit to aggregate spatial features from its functionally connected neighbors. This practice models coactivation characteristics among connectomic ROIs, which facilitates a deeper understanding of brain activity. With the derived spatial features, a temporal convolutional unit (via a standard 1D convolutional operation) is then applied to model the temporal dynamics across consecutive MRI volumes. In this way, both spatial and temporal patterns of fMRI data can be explicitly characterized. Here, we stack four such spatial and temporal graph convolutional units for efficient feature abstraction, and this hierarchical structure helps model functional interaction among multi‐hop and cross‐volume brain regions. Moreover, to quantify unique contributions of different functional connectivities to MDD diagnosis, we embed a learnable attention matrix/mask $M^{st} \in ℛ^{N \times N}$ into the functional connectivity matrix F in forwarding propagation. For this attention mask, larger elements represent that the corresponding functional connectivities play more important roles in differentiating MDD patients from healthy controls.

Given the derived spatiotemporal feature characteristics, a class label predictor $Ψ_{c}$ and a domain discriminator $Ψ_{d}$ are introduced for MDD identification and domain classification, respectively, each with a series of convolutional blocks for feature extraction. $Ψ_{c}$ aims to predict the probability of each subject in all domains categorized as an MDD patient group, with label information of source data as supervision. $Ψ_{d}$ aims to figure out the original domain (i.e., source/target) each input subject belongs to, encouraging feature distribution of the source samples to be indistinguishable from that of the target samples. The proposed student model incorporates fMRI feature learning, category‐level prediction and domain‐level adversarial training into a unified network, encouraging the model to capture fMRI features that are discriminative for MDD diagnosis and invariant to different domains.

Mathematically, the student model is trained through a binary cross‐entropy loss $L_{ce}^{st}$ and an adversarial loss $L_{adv}^{st}$ . The cross‐entropy loss $L_{ce}^{st}$ aims to penalize the classification error by calculating the sum of the negative logarithm of predicted probabilities of each input, formulated as

L_{ce}^{st} = - \sum_{r = 1}^{N_{S}} y_{r}^{st} \log (p_{r}^{st}) + (1 - y_{r}^{st}) \log (1 - p_{r}^{st}),

(1)

where $p_{r}^{st} \in (0, 1)$ denotes the predicted probability of the rth source subject categorized as the patient group; $y_{r}^{st} \in \{1, 0\}$ means the ground‐truth category label (i.e., 1 denotes MDD and 0 denotes HC); and $N_{S}$ represents the number of subjects in the source domain.

The adversarial loss $L_{adv}^{st}$ aims to constrain features derived from source and target domains into a common feature space by penalizing the domain classification error:

\begin{array}{l} L_{adv}^{st} = & - \sum_{j = 1}^{N_{S} + N_{T_{1}} + N_{T_{2}}} d_{j}^{st} \log (q_{j}^{st}) + (1 - d_{j}^{st}) \log (1 - q_{j}^{st}), \end{array}

(2)

where $q_{j}^{st} \in (0, 1)$ is the probability of the jth input subject categorized as the source data; $d_{j}^{st} \in \{1, 0\}$ denotes the ground‐truth domain label (i.e., 1 for source domain, and 0 for target domain); $N_{T_{i}} (i = 1, 2)$ represents the number of subjects in the ith target domain. The objective function of the domain‐generic student model is finally formulated as:

L^{st} = L_{ce}^{st} + L_{adv}^{st} .

(3)

3.2.2. Domain‐specific expert model

Intuitively, different models trained on different domains may contain complementary knowledge that can be exploited to enhance the generalizability of the student model. Therefore, we first construct a domain‐specific expert model on each source‐target domain pair (e.g., $\{S, T_{1}\}$ ), and then further train the student model through knowledge distillation of the expert models.

Given data from the ith source‐target domain pair (i.e., $\{S, T_{i}\}$ ) as input, the ith expert model shares the same architecture as the student model but with different learning parameters that are trained from scratch. Each expert model also leverages a GCN (Gadgil et al., 2020) for fMRI‐related feature abstraction. Similar to the student model, we embed a learnable attention mask $M^{e_{i}} \in ℛ^{N \times N}$ into the functional connectivity matrix for each expert model, aiming to automatically identify discriminative functional connectivities related to MDD diagnosis. A class label predictor and a domain discriminator are utilized for MDD diagnosis and domain classification, respectively. The class label predictor of the ith expert model is supervised by a cross‐entropy loss $L_{ce}^{e_{i}}$ , formulated as

\begin{array}{l} L_{ce}^{e_{i}} = & - \sum_{r = 1}^{N_{S}} y_{r}^{e_{i}} \log (p_{r}^{e_{i}}) + (1 - y_{r}^{e_{i}}) \log (1 - p_{r}^{e_{i}}), \end{array}

(4)

where $p_{r}^{e_{i}} \in (0, 1)$ denotes the predicted probability of the rth source subject categorized as the patient group; and $y_{r}^{e_{i}} \in \{1, 0\}$ means the ground‐truth category label (i.e., 1 for MDD and 0 for HC). The domain discriminator of the ith expert model aims to constrain feature distributions of the source and the ith target domain to be similar via an adversarial loss:

\begin{array}{l} L_{adv}^{e_{i}} = & - \sum_{j = 1}^{N_{S} + N_{T_{i}}} d_{j}^{e_{i}} \log (q_{j}^{e_{i}}) + (1 - d_{j}^{e_{i}}) \log (1 - q_{j}^{e_{i}}) . \end{array}

(5)

To sum up, the loss function of the ith expert model is

L^{e_{i}} = L_{ce}^{e_{i}} + L_{adv}^{e_{i}} .

(6)

3.2.3. Collaborative learning with knowledge distillation

Since different expert models are learned from different domain pairs, it is expected that complementary information contained in both expert models can be captured, and their integration may help improve the identification accuracy for all domains. To this end, we design a deep collaborative learning module to jointly learn the student model and two expert models, encouraging effective knowledge transfer among them for cross‐domain data adaptation. Inspired by (Tarvainen & Valpola, 2017), we develop a dual‐target mean teacher strategy for knowledge distillation across multiple domains, where the expert models use exponential moving average (EMA) weights of the student model during optimization, according to:

θ_{j}^{e_{i}} = {ηθ}_{j - 1}^{e_{i}} + (1 - η) θ_{j}^{st},

(7)

where $θ^{e_{i}}$ and $θ^{st}$ represent the learnable parameters of the ith expert model and the student model, respectively, j is the iteration number; and $η = 0.99$ is a hyperparameter. The parameter‐sharing strategy via Equation (7) helps encourage consistency between the student and each expert model.

Denote $P_{S}^{e_{i}}$ and $Q_{S}^{st}$ as the predicted probability of a source sample generated by the ith expert model and the student model, respectively. Denote $P_{T_{i}}^{e_{i}}$ and $Q_{T_{i}}^{st}$ as the predicted probability of a sample from the ith target domain generated by the ith expert model and the student model, respectively. We further design a source‐based consistency loss to encourage probabilities of a source sample predicted by the student model and each expert model to be consistent:

\begin{array}{l} L_{con}^{S} = \sum_{i = 1}^{2} α L_{KLD} (P_{S}^{e_{i}}, Q_{S}^{st}) + β L_{MSE} (P_{S}^{e_{i}}, Q_{S}^{st}), \end{array}

(8)

where $L_{KLD}$ denotes Kullback–Leibler divergence and $L_{MSE}$ is a mean squared error, and their corresponding coefficient $α$ and $β$ are set as 100 and 10 empirically. These two types of losses perform cross‐domain bi‐level feature alignment from two perspectives, that is, minimizing probability distribution and element‐wise prediction outputs, respectively. This facilitates the full use of unlabeled data in an unsupervised manner, helping the student model distill the common knowledge from expert models. Furthermore, a target‐based consistency loss is also designed to encourage estimated probabilities of a target sample predicted by the student model and each expert model to be consistent, which is defined as follows:

\begin{array}{l} L_{con}^{T} = \sum_{i = 1}^{2} α L_{KLD} (P_{T_{i}}^{e_{i}}, Q_{T_{i}}^{st}) + β L_{MSE} (P_{T_{i}}^{e_{i}}, Q_{T_{i}}^{st}) . \end{array}

(9)

Equations ((8), (9)) facilitate inter‐domain knowledge transfer, through which the student model is expected to be taught by two experts for learning diverse knowledge. The hybrid loss function of the proposed DFH is then formulated as

L = L_{ce}^{st} + L_{con}^{S} + L_{con}^{T} .

(10)

3.2.4. Implementation

Our proposed DFH follows a two‐stage optimization scheme. In the first stage, we pretrain the student and each expert model individually via Equation (3) and Equation (6), respectively. This stage encourages the model to learn discriminative and invariant features based on their involved domains. In the second stage, we simultaneously exploit all domains and optimize the whole network via Equation (10). All source and target data are involved in model training, while the label information of two targets is not available. The resulting domain‐generic student model is finally used for inference.

The proposed DFH is implemented with Python based on PyTorch using a GPU (NVIDIA TITAN Xp) with 12GB of memory. The stochastic gradient descent (SGD) optimizer is used for optimization, with a mini‐batch size of 64. The initial learning rate is set to 0.01 and reduced by a factor of 2 every 30 epochs, with a total of 100 training epochs.

4. EXPERIMENTS

4.1. Competing methods

In this study, we compare our DFH with 14 competing methods, including seven conventional methods and seven deep learning methods. The conventional methods are detailed as follows. (1) STR‐SVM where we first construct a functional connectivity (FC) matrix based on the PCC algorithm (same as our DFH), and then we use node strength calculated from the FC matrix as fMRI features (116‐dimensional) for each subject. A linear SVM classifier is finally used for classification. (2) CC‐SVM uses the same FC matrix as STR‐SVM but with clustering coefficient as fMRI features (116‐dimensional), and also uses a linear SVM for prediction. (3) GTF‐SVM is built based on the same FC matrix but with global topology property (i.e., density, global efficiency, assortativity coefficient, characteristic path length, and transitivity) as fMRI features and a linear SVM as a classifier. (4) PCC‐SVM and (5) PCC‐XG are constructed based on the same FC matrix as STR‐SVM. Specifically, we first flatten the upper triangle elements of each FC matrix and convert them into a 6,670‐dimensional vectorized feature to represent each subject, which is then fed into an SVM and an XGBoost model, respectively. (6) MaLRR (Wang et al., 2019) is a multi‐site domain adaptation method via low‐rank representation decomposition. (7) ComBat (Fortin et al., 2018) is a conventional cross‐domain harmonization method based on Empirical Bayes (Johnson et al., 2007). Specifically, we first employ GCN (similar to our DFH) to extract fMRI features (3200‐dimensional) for each subject, and then, we perform cross‐domain harmonization using the ComBat algorithm, resulting in harmonized features for each domain. With harmonized source features, we train a class predictor (same as our DFH) in a supervised manner, and then we apply the trained predictor on harmonized features of two unlabeled target domains to get the prediction results. The deep learning methods include three methods with single‐target settings, that is, (8) single‐target DANN (sDANN) (Ganin et al., 2016), (9) single‐target ADDA (sADDA) (Tzeng et al., 2017), (10) single‐target FLC (sFLC) (Rozantsev et al., 2018), and four methods with dual‐target settings, that is, (11) dual‐target DANN (dDANN) (Ganin et al., 2016), (12) dual‐target ADDA (dADDA) (Tzeng et al., 2017), (13) dual‐target FLC (dFLC) (Rozantsev et al., 2018), and (14) dual‐target UMT (dUMT) (Nguyen‐Meidine et al., 2021).

For five conventional methods (i.e., STR‐SVM, CC‐SVM, GTF‐SVM, PCC‐SVM, and PCC‐XG), we train their models using all labeled source data (533 subjects). For three single‐target domain adaptation methods (i.e., sDANN, sADDA, and sFLC), we perform feature alignment between S and T ₁, and between S and T ₂, respectively. That is, in these methods, we use all labeled source data (533 subjects) and one target domain (155 subjects in T ₁ or 148 subjects in T ₂) for model training. However, the potential complementary information contained in both T ₁ and T ₂, which is expected to help boost the classification performance, is neglected. To tackle this problem, we combine T ₁ and T ₂ into a unified target domain T _U ( $T_{U} = {T_{1}, T_{2}$ }), and then perform adaptation between S and T _U. The three dual‐target methods (i.e., dDANN, dADDA, and dFLC) share the same experimental setup as the proposed DFH. The other three dual‐target methods (i.e., maLRR, ComBat, and dUMT) are designed for multi‐domain harmonization. For these six dual‐target methods, we use all labeled source data (533 subjects) and two target domains (155 subjects in T ₁ and 148 subjects in T ₂) for model training. Note that in all experiments, all target data (155 subjects in T ₁ and 148 subjects in T ₂) are used for model inference. In this way, we can ensure the comparison between our method and each competing method is fair to the largest extent. We repeat all deep learning methods five times independently to reduce the bias caused by parameter initialization. For conventional methods except for ComBat, we perform the experiments only once since they do not require initialization.

4.2. Experimental settings

Six metrics are leveraged for model evaluation, i.e., the area under the receiver operating characteristic curve (AUC), classification accuracy (ACC), F1 score (F1), sensitivity (SEN), specificity (SPE), and precision (PRE). Denote true positive (TP) as the number of subjects that are correctly classified as the positive group (i.e., MDD); true negative (TN) as the number of subjects correctly classified as the negative group (i.e., HC); false positive (FP) as the number of subjects wrongly classified as the positive group; and false negative (FN) as the number of subjects wrongly classified as the negative group. Here, $ACC = \frac{TP + TN}{TP + FP + TN + FN}$ , $F 1 = \frac{2 TP}{2 TP + FP + FN}$ , $SEN = \frac{TP}{TP + FN}$ , $SPE = \frac{TN}{TN + FP}$ , and $PRE = \frac{TP}{TP + FP}$ .

4.3. Classification results

The classification results of our DFH and 14 competing methods for MDD identification are reported in Table 2. We also perform a two‐tailed paired t‐test between our DFH and each specific competing method in terms of their predicted probability scores, with “*” denoting that the DFH shows statistically significant better performance (p < 0.05) via paired t‐test. From Table 2, we have the following observations.

TABLE 2.

Results of 15 methods in MDD versus HC classification on two target domains.

	Target domain 1 (T ₁: Site‐21)							Target domain 2 (T ₂: Site‐1)
Method	AUC (%)	ACC (%)	F1 (%)	SEN (%)	SPE (%)	PRE (%)	p	AUC (%)	ACC (%)	F1 (%)	SEN (%)	SPE (%)	PRE (%)	p
STR‐SVM	50.84	50.32	56.50	58.82	40.00	54.35	*	56.78	56.08	59.12	63.51	48.65	55.29	*
CC‐SVM	50.62	51.61	59.46	64.71	35.71	55.00		55.66	54.05	58.02	63.51	44.59	53.41	*
GTF‐SVM	53.73	50.32	58.82	64.71	32.86	53.92	*	53.45	50.68	54.09	58.11	43.24	50.59	*
PCC‐SVM	52.05	52.26	58.89	62.35	40.00	55.79	*	56.25	52.70	57.32	63.51	41.89	52.22
PCC‐XG	55.41	54.19	61.20	65.88	40.00	57.14	*	54.27	50.68	51.01	51.35	50.00	50.67	*
MaLRR	50.13	50.97	56.32	57.65	42.86	55.06	*	55.06	55.41	58.75	63.51	47.30	54.65	*
ComBat	50.59 ± 5.19	52.00 ± 6.06	59.27 ± 5.80	64.00 ± 7.68	37.43 ± 5.06	55.26 ± 4.50	*	54.61 ± 1.15	53.51 ± 1.38	57.84 ± 0.60	63.78 ± 1.58	43.24 ± 3.92	52.96 ± 1.24	*
sDANN	50.65 ± 3.56	52.13 ± 3.96	55.80 ± 4.45	55.29 ± 5.57	48.29 ± 3.55	56.39 ± 3.44		52.18 ± 4.22	50.95 ± 3.33	53.25 ± 4.80	56.49 ± 8.44	45.41 ± 8.36	50.88 ± 2.98	*
sADDA	51.44 ± 1.66	50.45 ± 2.87	53.15 ± 11.63	56.94 ± 25.72	42.57 ± 24.95	54.43 ± 0.59	*	51.72 ± 3.74	50.27 ± 4.55	52.74 ± 12.35	61.35 ± 26.64	39.19 ± 22.02	49.30 ± 4.02
sFLC	50.27 ± 3.77	50.97 ± 5.41	42.73 ± 33.97	58.59 ± 47.38	41.71 ± 45.96	39.82 ± 21.66	*	51.45 ± 5.57	51.76 ± 3.07	45.37 ± 19.32	50.27 ± 32.06	53.24 ± 31.28	50.53 ± 7.10
dDANN	50.06 ± 1.77	52.64 ± 0.77	58.43 ± 3.92	61.65 ± 10.54	41.71 ± 11.86	56.35 ± 0.91	*	53.53 ± 3.12	52.70 ± 1.76	56.96 ± 5.38	64.86 ± 17.87	40.54 ± 19.55	52.50 ± 1.81	*
dADDA	50.26 ± 4.18	50.19 ± 4.13	52.65 ± 4.94	50.82 ± 6.80	49.43 ± 6.68	54.95 ± 4.19		52.33 ± 2.71	51.08 ± 2.79	53.66 ± 2.67	56.76 ± 4.52	45.41 ± 6.20	51.07 ± 2.67	*
dFLC	50.71 ± 2.96	51.82 ± 2.17	58.29 ± 6.96	64.71 ± 14.02	33.71 ± 12.80	54.05 ± 2.02	*	54.59 ± 4.51	56.48 ± 5.94	57.68 ± 7.05	63.24 ± 12.49	45.95 ± 5.86	53.55 ± 3.24
dUMT	51.65 ± 3.19	52.26 ± 3.62	59.62 ± 2.56	64.24 ± 3.07	37.71 ± 7.19	55.69 ± 2.96	*	51.30 ± 1.88	51.35 ± 2.48	49.14 ± 4.45	47.30 ± 6.41	55.41 ± 4.58	51.38 ± 2.61	*
DFH (Ours)	57.23 ± 0.96	56.77 ± 2.61	63.09 ± 9.34	66.12 ± 20.48	45.43 ± 20.45	60.32 ± 3.29	‐	57.30 ± 2.62	57.16 ± 2.16	60.12 ± 8.72	64.05 ± 18.06	50.27 ± 16.07	56.65 ± 1.85	‐

Open in a new tab

Note: “*” denotes that the difference between our DFH and the competing method is statistically significant (with p‐value <.05). The best results are shown in bold.

First, our method outperforms the first five conventional methods by a large margin. For instance, in terms of AUC values, the DFH achieves improvement of 6.39%, 6.61%, 3.5%, 5.18%, 1.82%, and 0.52%, 1.64%, 3.85%, 1.05%, 3.03% compared with STR‐SVM, CC‐SVM, GTF‐SVM, PCC‐SVM, and PCC‐XG on T ₁ and T ₂, respectively. The underlying reason may be that our DFH automatically learns spatio‐temporal characteristics of fMRI BOLD signals, while the conventional models are built based on human‐engineered features without taking into account such spatiotemporal patterns, thus resulting in suboptimal results. Moreover, our DFH considers inter‐site data heterogeneity and employs an advanced adaptation strategy between source and target domains, whereas the conventional methods only apply the model trained with source data on target samples directly, neglecting informative features from target domains. Second, our DFH achieves superior performance when compared with three single‐target domain adaptation methods (i.e., sDANN, sADDA, and sFLC). The possible reason could be that our DFH simultaneously takes into account fMRI‐related patterns from two targets while these single‐target methods only consider feature alignment between source and one specific target domain, and ignore complementary characteristics contained among two targets. Therefore, knowledge is not effectively exchanged between the student and experts, thus degrading the classification performance. Compared with these methods, their dual‐target versions, i.e., dDANN, dADDA, and dFLC, generally yield slightly better classification results but are still not much promising. Furthermore, our DFH yields a statistically significant better performance compared with other three dual‐target methods (i.e., maLRR, ComBat, and dUMT). Compared with maLRR and ComBat, our DFH incorporates feature learning and class prediction into one unified framework, which is trained in an end‐to‐end manner. But feature extraction in maLRR and ComBat is independent of classification, which may degrade the learning performance. Compared with dUMT that iteratively distills target knowledge to a general student model, our DFH performs adaptation between student and experts in a collaborative manner to explore complementary information among multiple domains.

4.4. Ablation study

We conduct ablation studies to investigate the influence of several components on classification performance, by comparing DFH with its six variants. The DFHw/oDA denotes the DFH without any adaptation technique, that is, the model trained on the source domain is directly applied on the target domain. The DFHw/oMT represents the DFH without mean teacher strategy, i.e., no learning parameters are shared between the student model and expert models. DFHw/oMSE and DFHw/oKLD denotes the DFH without $L_{MSE}$ and $L_{KLD}$ , respectively. The DFHw/oSRC represents training DFH without source domain, that is, we train and test the model only based on target data. Specifically, we first randomly divide the target domain into two subsets, each with equal sample size. Then, each subset is sequentially chosen as the test set, and the remaining subset is regarded as the training set. The final result is achieved by averaging the prediction results derived from test subsets, and this process is repeated five times. The DFHw/oEXP represents the DFH without any expert models involved. Specifically, we pretrain a student model using all labeled source data and unlabeled target data by optimizing Equation 3, and then directly use this pretrained student model for inference. That is, no expert models participate in model training in DFHw/oEXP. The results of the DFH and its six variants are listed in Table 3.

TABLE 3.

Classification performance of the DFH and its six variants on two target domains.

Method	Target domain 1 (T ₁)						Target domain 2 (T ₂)
	AUC (%)	ACC (%)	F1 (%)	SEN (%)	SPE (%)	PRE (%)	AUC (%)	ACC (%)	F1 (%)	SEN (%)	SPE (%)	PRE (%)
DFHw/oDA	50.67 ± 3.83	51.35 ± 2.10	55.00 ± 2.24	54.35 ± 4.30	47.71 ± 6.80	55.92 ± 2.27	52.29 ± 3.40	51.76 ± 4.05	55.27 ± 4.95	60.00 ± 7.33	43.51 ± 5.01	51.40 ± 3.38
DFHw/oMT	57.43 ± 1.82	55.23 ± 3.10	58.34 ± 11.08	62.12 ± 20.60	46.86 ± 21.31	59.90 ± 4.22	57.10 ± 3.78	56.22 ± 2.47	56.99 ± 8.54	61.08 ± 16.38	51.35 ± 17.45	56.63 ± 3.62
DFHw/oMSE	54.68 ± 2.00	55.74 ± 2.73	60.95 ± 3.55	63.29 ± 5.97	46.57 ± 4.83	58.96 ± 2.16	55.91 ± 2.60	55.41 ± 3.68	58.88 ± 2.37	63.78 ± 3.35	47.03 ± 8.13	54.86 ± 3.37
DFHw/oKLD	57.18 ± 1.03	56.52 ± 2.50	60.57 ± 9.88	65.65 ± 21.14	45.43 ± 20.98	60.11 ± 2.71	57.30 ± 2.56	56.76 ± 2.56	58.15 ± 9.28	63.78 ± 18.77	49.73 ± 16.63	56.22 ± 1.98
DFHw/oSRC	50.47 ± 2.31	52.24 ± 2.03	56.34 ± 5.31	59.10 ± 8.36	44.00 ± 7.42	57.23 ± 2.09	52.21 ± 1.09	50.95 ± 2.03	54.39 ± 7.31	62.43 ± 12.40	39.46 ± 9.72	49.75 ± 2.55
DFHw/oEXP	48.62 ± 4.38	49.16 ± 2.36	51.28 ± 6.36	50.35 ± 12.41	47.71 ± 15.92	54.38 ± 2.77	57.76 ± 3.47	54.86 ± 2.72	57.75 ± 4.11	62.43 ± 9.61	47.30 ± 10.71	54.43 ± 2.79
DFH (Ours)	57.23 ± 0.96	56.77 ± 2.61	63.09 ± 9.34	66.12 ± 20.48	45.43 ± 20.45	60.32 ± 3.29	57.30 ± 2.62	57.16 ± 2.16	60.12 ± 8.72	64.05 ± 18.06	50.27 ± 16.07	56.65 ± 1.85

Open in a new tab

Note: The best results are shown in bold.

From Table 3, we can see that DFHw/oDA is inferior to DFH in most cases. The underlying reason is that DFHw/oDA does not consider cross‐site data heterogeneity. These results suggest the necessity of domain adaptation in multi‐site research. Besides, the overall performance of DFHw/oMT is worse than DFH, and the possible reason may be that each specific expert pays more attention to its own involved target, hence the general domain knowledge cannot be well transferred from the target domains to the student model, causing performance degradation. In addition, we observe that our DFH with both $L_{MSE}$ and $L_{KLD}$ consistently outperforms DFHw/oMSE and DFHw/oKLD. The possible reason may be that our DFH is able to perform feature adaptation from two perspectives, making the student model well exploit unlabeled data and distill more common knowledge from experts, which helps improve the classification performance. On the other hand, DFHw/oSRC actually yields upper‐bound classification performance for a specific target domain, but its results are not as promising as our DFH, possibly due to the limited sample size of one single‐target domain. In addition, we can see that DFH generally outperforms DFHw/oEXP, which suggests the necessity of involving the experts in helping effective cross‐domain knowledge transfer.

4.5. Comparison with state‐of‐the‐art

We compare the results of DFH with several SOTA studies (Gallo et al., 2023; Shi et al., 2020) that also employ REST‐meta‐MDD Consortium to differentiate MDD patients from healthy controls, with results reported in Table 4. Specifically, Shi et al. (2020) develop an SVM model with 19 episodic memory‐related imaging features and validate the model on the largest site of REST‐meta‐MDD Consortium (i.e., Site‐20 with 533 subjects). It can be seen from Table 4 that our DFH generally achieves superior classification performance compared with this study, suggesting the effectiveness of our method. Moreover, Gallo et al. (2023) identify the MDD group using two machine learning algorithms (i.e., linear and rbf SVMs) and one deep learning method (i.e., GCN) based on 1948 subjects from multiple sites. Even though the number of subjects in (Gallo et al., 2023) is more than two times larger than that in our work, our method can still achieve much higher sensitivity and comparable F1 results.

TABLE 4.

Comparison with SOTA studies using rs‐fMRI data from the REST‐meta‐MDD (Yan et al., 2019) in MDD identification. T _i: ith target domain.

Study	Model	Subject number	AUC (%)	ACC (%)	F1 (%)	SEN (%)	SPE (%)	PRE (%)
Shi et al. (2020)	SVM	533	51.6	51.3	‐	47.8	54.8	51.7
Gallo et al. (2023)	Linear SVM	1948	‐	‐	60.38	60.57	60.21	‐
Gallo et al. (2023)	rbf SVM	1948	‐	‐	63.11	63.93	62.38	‐
Gallo et al. (2023)	GCN	1948	‐	‐	61.29	61.47	61.47	‐
Ours on T ₁	DFH	836	57.23 ± 0.96	56.77 ± 2.61	63.09 ± 9.34	66.12 ± 20.48	45.43 ± 20.45	60.32 ± 3.29
Ours on T ₂	DFH	836	57.30 ± 2.62	57.16 ± 2.16	60.12 ± 8.72	64.05 ± 18.06	50.27 ± 16.07	56.65 ± 1.85

Open in a new tab

Note: The best results are shown in bold. “‐” denotes that the results are not reported in the original paper.

4.6. Model generalizability to other unseen sites

Besides the two sites from the REST‐meta‐MDD Consortium (Yan et al., 2019) as target domains (i.e., Site‐21 and Site‐1), we also apply the generalized student model produced by our DFH to the other five independent sites (i.e., Site‐8, Site‐9, Site‐14, Site‐15, and Site‐25) to test its generalization capability. The construction of the generalized student model is based on the source domain (S) and two target domains (T ₁ and T ₂), with the following two steps. Specifically, we first pretrain the student model and each expert model individually using a classification loss and an adversarial loss, i.e., Equation 3 for the student and Equation 6 for the two experts. Then, the pretrained student model and two expert models are jointly optimized via a deep collaborative learning module, which encourages effective cross‐domain knowledge transfer. In this way, the finally derived student model can be guided by the knowledge learned from two experts. This generalized student model is used for diagnosis on these five independent sites. Note that data from the five test sites are not used for model training. The demographic information of these studied subjects can be found online.1 The classification results are reported in Table 5, showing that our student model achieves satisfactory performance when generalized to unseen sites. This implies that the student model generated by our DFH can effectively distill multi‐target domain knowledge and extract rich diagnosis‐related information that can adapt to new sites.

TABLE 5.

Results (%) of our proposed DFH on five unseen target/test sites from the REST‐meta‐MDD Consortium (Yan et al., 2019) in the task of MDD versus HC classification.

Metric	Site‐8	Site‐9	Site‐14	Site‐15	Site‐25
AUC	61.72 ± 4.77	59.59 ± 4.36	59.39 ± 9.46	57.11 ± 4.92	55.94 ± 4.90
ACC	56.78 ± 3.23	54.20 ± 6.24	60.83 ± 4.91	54.80 ± 3.43	57.50 ± 1.42
F1	59.04 ± 6.47	57.68 ± 10.40	68.46 ± 8.33	56.14 ± 10.94	67.84 ± 3.87
SEN	66.00 ± 16.30	66.40 ± 20.88	67.19 ± 16.89	63.20 ± 23.03	77.98 ± 12.50
SPE	47.95 ± 17.31	42.00 ± 20.28	48.12 ± 20.12	46.40 ± 22.85	28.57 ± 17.42
PRE	55.73 ± 4.80	53.51 ± 5.02	72.98 ± 3.13	54.85 ± 3.07	61.10 ± 2.57

Open in a new tab

5. DISCUSSION

5.1. Influence of different losses

As mentioned in Section 3.2, feature alignment between the student and expert models is achieved via two loss functions, that is, $L_{KLD}$ and $L_{MSE}$ . To further investigate their specific contributions, we vary the hyperparameters $α$ and $β$ in Equations ((8), (9)) within $\{1; 50; 100; 500; 1,000; 5,000\}$ and $\{1; 5; 10; 50; 100; 200\}$ , respectively, and record the corresponding results in Figure 3. From Figure 3a, b, it can be observed that with a fixed $α$ , the prediction results fluctuate with different $β$ values, and the best results are achieved with $β = 10$ two target domains. It can be seen from Figure 3c, d that, given a fixed $β$ , our proposed DFH yields stable results when $α$ $\in$ $[1; 1,000]$ , but the performance decreases dramatically on two targets when $α$ is too large.

AUC and ACC results of the proposed DFH with a fixed $α$ (a, b) and a fixed $β$ (c, d) on two target domains (T ₁ and T ₂) for MDD versus HC classification.

5.2. Discriminative functional connectivity identification

As mentioned in Section 3.2, a model‐specific learnable importance mask $M$ ( $M^{st}$ for student and $M^{e_{i}}$ for each expert) is applied on GCN to automatically capture diagnosis‐related functional connectivity features for each model. In Figure 4, we visualize the top 30 discriminative connectivities corresponding to the domain‐generic student model (Figure 4a) and two domain‐specific expert models (Figure 4b, c). As shown in Figure 4a, the connectivities and associated ROIs highlighted in green are also identified by both expert models, suggesting that the student model is capable of extracting common characteristics shared by all domains. From Figure 4a, we can see the shared connectivities include: STG.L‐STG.R, THA.R‐THA.L, PUT.L‐PUT.R, LING.R‐LING.L, LING.L‐CAL.L, PCG.L‐PCG.R, CUN.L‐CUN.R, Vermis 7‐Vermis 6, IFGoperc.R‐IFGtriang.R, LING.R‐CAL.L, ROL.L‐PoCG.L, and ANG.R‐INS.L. In these shared connectivities, the top five discriminative brain regions include superior temporal gyrus, thalamus, putamen, lingual gyrus, and calcarine fissure. These regions have been verified to play important roles in MDD diagnosis in previous findings (Cui et al., 2018; Cullen et al., 2009; Geng et al., 2018; Hu et al., 2019; Meng et al., 2014; Zou et al., 2016). Besides these common patterns shared by three models, we observe that each expert captures its unique knowledge that is not shared by other models, highlighted by blue (see Figure 4b) and red (see Figure 4c), respectively. Specifically, the unique connections identified by first expert are: PCL.L—PCL.R, CRBL10.L—CRBL10.R, IOG.R—LING.R, IOG.R—SOG.R, CUN.L—SOG.L, FFG.R—MOG.R, CRBL3.R—CRBL3.L, and CRBL7b.L—CRBLCrus2.L. The unique connections identified by second expert are: PCG.L—ANG.R and SOG.L—CRBL6.L. The involved brain regions, such as paracentral lobule, superior occipital gyrus, posterior cingulate gyrus, and angular gyrus, have also been reported in previous studies (Jiang et al., 2020; Scheepens et al., 2020; Yan et al., 2021; Yang et al., 2016). The findings suggest that the student model can extract common characteristics shared by all domains, while each expert can learn domain‐specific knowledge, which further validates the reliability and rationality of our method.

Top 30 discriminative functional connectivities identified by (a) *Student* ( $st$ ), (b) *Expert 1* ( $e_{1}$ ), and (c) *Expert 2* ( $e_{2}$ ) models in DFH. The thicker lines suggest that the corresponding connectivities hold more discriminative power in automated MDD diagnosis. The connectivities and associated ROIs denoted by green are identified discriminative for all three models, suggesting that the student model is able to extract common characteristics shared by all domains. Moreover, each expert model can capture its unique knowledge, highlighted by blue ( $e_{1}$ ) and red ( $e_{2}$ ), respectively.

5.3. Visualization of fMRI features

We employ t‐SNE (Van der Maaten & Hinton, 2008) to visualize feature distributions before and after fMRI harmonization in Figure 5. Each subject's BOLD signal is represented by a 200 × 116 matrix with $Γ = 200$ time points and $N = 116$ ROIs. Before fMRI harmonization, we represent each subject using the flattened feature vector (dimension: 23,200). In DFH, the BOLD signals are fed into a GCN followed by a class label predictor. The predictor is comprised of a series of convolutional blocks, with channel number 3200, 1024, 512, 64, and 1, respectively. After fMRI harmonization via DFH, the immediate 64‐dimensional feature is extracted to represent each subject. In Figure 5, different colors represent different domains, that is, green, blue, and magenta denote S, T ₁, and T ₂, respectively.

T‐SNE visualization of feature distributions of three domains (a) before fMRI harmonization and (b) after fMRI harmonization using the proposed DFH.

From Figure 5a, we observe that data from T ₂ are much distinguishable compared with the other two domains, and data distribution between S and T ₁ is much closer compared with that between S and T ₂. This finding is consistent with the demographic data distribution reported in Table 1. For instance, S and T ₁ are much more similar in terms of age and education. As illustrated in Figure 5b, after fMRI harmonization using DFH, samples from all domains are mixed and mapped into a common latent feature space, indicating the effectiveness of our DFH in reducing cross‐site distribution differences.

5.4. Generalizability to another brain disease

To further verify the robustness and effectiveness of our DFH, we apply it to the automatic diagnosis of another brain disorder, that is, prodromal Alzheimer's disease, aiming to differentiate early mild cognitive impairment (eMCI) patients from those with late mild cognitive impairment (lMCI). Three different sites from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database2 are selected for model validation. Specifically, Site‐2 (with 24 eMCI and 32 lMCI) serves as the source domain, while Site‐31 (with 28 eMCI and 25 lMCI) and Site‐130 (with 30 eMCI and 19 lMCI) are treated as two target domains. The classification results of several competing methods and our method are reported in Figure 6. It can be observed from Figure 6 that the proposed DFH achieves superior performance compared with the competing methods in terms of AUC and ACC values on two target domains. Even with a limited sample size (only ~50 subjects in each domain) for model training, our DFH can still achieve good classification results. In future work, we will increase the sample size to further improve the performance and apply DFH to other applications.

Classification results of the DFH and several competing methods in eMCI versus lMCI classification task on two target domains. eMCI, early mild cognitive impairment; lMCI, late mild cognitive impairment.

5.5. Influence of different image acquisition settings

To further verify the robustness of our method to different choices of MRI scanners and acquisition parameters, we apply the DFH on two new target domains (i.e., Site‐8 and Site‐25) of REST‐meta‐MDD Consortium (Yan et al., 2019), and still employ Site‐20 as the source domain. Note that fMRIs of the source domain were collected by using a Siemens Tim Trio MRI, while data of the two target domains were collected by using a GE Signa MRI (Site‐8) and a Siemens Verio MRI (Site‐25), respectively. The demographic information of the studied subjects and MRI parameters are listed in Table 6. We perform MDD versus HC classification on these three domains using our DFH and the competing methods, with results reported in Figure 7. It can be observed from Figure 7 that the DFH generally achieves superior performance compared with the competing methods. Comparing the results in Figure 7 and Table 2, we can see that our DFH produces comparable results in two settings (i.e., with different target sites). This implies that our DFH is not very sensitive to image acquisition settings.

TABLE 6.

Demographic information of the studied subjects from three sites (i.e., Site‐20, Site‐8, and Site‐25) in the REST‐meta‐MDD Consortium. Here, Site‐20 is used as the source domain, while Site‐8 and Site‐25 are used as two target domains.

Information	Site‐20 (source)	Site‐8 (target)	Site‐25 (target)
No. MDD (M/F)	282 (99/183)	70 (23/47)	89 (21/68)
No. HC (M/F)	251 (87/164)	73 (30/43)	63 (29/34)
Education (years)	11.81 ± 3.92	11.50 ± 3.40	11.92 ± 3.04
Age (years)	39.16 ± 14.73	28.73 ± 11.09	67.27 ± 6.68
No. time points	232	190	230
MRI scanner	Siemens Tim Trio 3 T	GE Signa 3 T	Siemens Verio 3 T
Receiver channel	12	8	12
TR (ms)	2000	2000	2000
TE (ms)	30	30	25
Flip angle	90	90	90
Thickness/Gap	3.0 mm/1.0 mm	3.0 mm/0 mm	4.0 mm/0 mm
Slice number	32	35	36
Voxel size	3.44 × 3.44 × 4.00	3.75 × 3.75 × 3.00	3.75 × 3.75 × 4.00
Field‐of‐view	220 × 220	240 × 240	240 × 240

Open in a new tab

Abbreviations: HC, healthy control; M/F, Male/Female; MDD, major depressive disorder; TE: echo time; TR, repetition time.

Classification results (AUC and ACC) of the proposed DFH and competing methods on two new target sites (i.e., Site‐8 and Site‐25), with Site‐20 as the source domain.

5.6. Influence of training sample size

In the proposed DFH, all labeled source data (533 subjects) and unlabeled target data (155 subjects in T ₁ and 148 subjects in T ₂) in REST‐meta‐MDD Consortium (Yan et al., 2019) are used for training. To investigate the influence of training sample size, we train the DFH using different percentages of labeled source data within the range of (20%, 40%, 60%, 80%, and 100%). For a fair comparison, we use all unlabeled target data for model training and inference in all experiments. We have reported the results of DFH for MDD versus HC classification in Figure 8, from which we can see that the performance of DFH generally rises as training size increases, and the best results are achieved when using all source data. The underlying reason may be that using more labeled source data tends to capture more representative fMRI features, thus encouraging more efficient cross‐domain feature alignment to improve performance.

AUC results of our DFH on two target domains (i.e., T ₁: Site‐21; T ₂: Site‐1), with different percentages of labeled source data for model training and Site‐20 as source domain.

5.7. Influence of FC matrix construction methods

As mentioned in Section 3.2, our proposed DFH uses a functional connectivity (FC) matrix constructed by PCC for graph learning. To investigate the influence of different FC matrix construction algorithms on the performance of the DFH, we employ other two methods, including (1) Spearman's rank correlation coefficient (SCC) (Spearman, 1961), and (2) sparse representation (SR) (Liu et al., 2009). In SCC, we first concatenate fMRI time series of each ROI of all subjects (as DFH does), and then perform Spearman's rank correlation between any pairs of ROIs. Compared with PCC that measures linear relationships between ROIs, SCC assesses monotonic relationships (whether linear or not) between ROIs, and it has been used in many fMRI‐related studies (Göttlich et al., 2015; Meskaldji et al., 2015; Savva et al., 2019). In SR, we first concatenate fMRI time series of each ROI of all subjects and then represent each ROI's features using the remaining ROIs' features based on (Liu et al., 2009). Compared with the FC matrix constructed by PCC which is a fully connected matrix, SR is a sparse matrix that can help learn fMRI representations efficiently. We denote the DFH with SCC and SR for FC matrix construction as DFH‐SCC and DFH‐SR, respectively, while DFH utilizes the PCC algorithm for FC matrix construction. For a fair comparison, the three methods (i.e., DFH, DFH‐SCC, and DFH‐SR) employ the same network architecture, source and target data, optimization strategy, and parameters. The results of the three methods with different FC matrix construction algorithms are shown in Table 7.

TABLE 7.

Influence of different functional connectivity matrix construction algorithms used in the proposed DFH for MDD versus HC classification on two target domains (i.e., T ₁: Site‐21; T ₂: Site‐1), with Site‐20 as the source domain.

	Target domain 1 (T ₁)						Target domain 2 (T ₂)
Method	AUC (%)	ACC (%)	F1 (%)	SEN (%)	SPE (%)	PRE (%)	AUC (%)	ACC (%)	F1 (%)	SEN (%)	SPE (%)	PRE (%)
DFH‐SCC	56.84 ± 2.53	56.26 ± 2.14	62.66 ± 5.79	69.41 ± 16.52	40.29 ± 21.31	59.55 ± 4.31	58.25 ± 2.92	56.22 ± 0.66	59.65 ± 5.26	66.49 ± 13.58	45.95 ± 13.73	55.50 ± 1.68
DFH‐SR	57.83 ± 5.23	53.16 ± 3.73	53.54 ± 26.46	69.18 ± 39.86	33.71 ± 42.47	65.63 ± 17.55	52.51 ± 1.86	50.00 ± 0.85	48.57 ± 25.98	67.30 ± 41.68	32.70 ± 41.24	40.38 ± 20.21
DFH	57.23 ± 0.96	56.77 ± 2.61	63.09 ± 9.34	66.12 ± 20.48	45.43 ± 20.45	60.32 ± 3.29	57.30 ± 2.62	57.16 ± 2.16	60.12 ± 8.72	64.05 ± 18.06	50.27 ± 16.07	56.65 ± 1.85

Open in a new tab

It can be observed from Table 7 that DFH‐SCC has comparable classification results compared to our DFH with PCC, suggesting that the rank‐based correlation can be regarded as an effective functional connectivity measurement. Moreover, the DFH‐SR shows inferior performance compared with our DFH with PCC in most cases, implying that this sparse matrix may neglect some discriminative ROI‐related fMRI information for further diagnosis. As one of our future works, we will explore how to represent the fully connected FC matrix in a sparse manner more effectively.

5.8. Limitations and future work

In this section, we present several limitations of the current work and potential future research directions. First, we only utilize functional MRI data in the current study, without taking advantage of structural MRI data and/or non‐imaging information (e.g., demographic and genetic data). It is interesting to combine complementary features from multi‐modality data to further boost the classification performance. Second, we construct the functional connectivity matrix based on a single brain template (i.e., AAL) with 116 pre‐defined ROIs. In the future, multi‐scale brain atlases with different partition schemes will be taken into consideration, which can provide coarse‐to‐fine information that helps improve MDD identification accuracy. Besides, current work investigates cross‐site fMRI harmonization based on one source and two target domains. We will explore multi‐target settings to enrich data diversity and improve model robustness for performance improvement. Furthermore, we can effectively handle two target domains in the current study. For those problems with more than two target domains, it is interesting to employ incremental learning strategies (Geng et al., 2020) without adding too much computational burden. In addition, the overall performance of our DFH and competing methods is not much promising. The possible reason may be due to the limited sample size we use in this study. In the future, we will employ unsupervised contrastive learning to pretrain the feature extractor using all 2428 subjects in REST‐meta‐MDD Consortium to extract more informative fMRI features. We will utilize fMRI data of other brain neurological diseases (e.g., autism spectrum, cognitive impairment) to pretrain the GCN‐based feature extractor, which may help improve its generalizability.

6. CONCLUSION

In this article, we propose an unsupervised dual‐expert fMRI harmonization (DFH) framework for MDD identification. The DFH consists of a domain‐generic student model and two domain‐specific expert models, which are collaboratively learned to perform knowledge distillation from a single labeled source domain and two unlabeled target domains. A student model with strong generalizability is finally derived, which is able to identify discriminative functional connectivities shared by all domains, and it performs well even on new target domains that are not involved in model training. Experiments on 836 subjects from three sites suggest the superior performance of our method in MDD diagnosis. Additionally, our method can be well extended to another brain disease, that is, prodromal Alzheimer's disease, suggesting its robustness and generalizability. Our method is well interpretable in brain functional connectivity abnormality detection, helping facilitate fMRI‐based MDD analysis in clinical practice.

CONFLICT OF INTEREST STATEMENT

The authors have no conflict of interest to disclose.

ACKNOWLEDGMENTS

This work was supported by NIH grants RF1AG073297 and R01MH108560.

Fang, Y. , Potter, G. G. , Wu, D. , Zhu, H. , & Liu, M. (2023). Addressing multi‐site functional MRI heterogeneity through dual‐expert collaborative learning for brain disease identification. Human Brain Mapping, 44(11), 4256–4271. 10.1002/hbm.26343

Footnotes

^¹

http://rfmri.org/REST-meta-MDD

^²

adni.loni.usc.edu

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are openly available in REST‐meta‐MDD at https://www.pnas.org/doi/full/10.1073/pnas.1900390116, and Alzheimer's Disease Neuroimaging Initiative (ADNI) at https://doi.org/10.1002/jmri.21049/.

REFERENCES

Bell, T. K. , Godfrey, K. J. , Ware, A. L. , Yeates, K. O. , & Harris, A. D. (2022). Harmonization of multi‐site MRS data with ComBat. NeuroImage, 257, 119330. [DOI] [PubMed] [Google Scholar]
Cui, X. , Liu, F. , Chen, J. , Xie, G. , Wu, R. , Zhang, Z. , Chen, H. , Zhao, J. , & Guo, W. (2018). Voxel‐wise brain‐wide functional connectivity abnormalities in first‐episode, drug‐naive patients with major depressive disorder. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, 177(4), 447–453. [DOI] [PubMed] [Google Scholar]
Cullen, K. R. , Gee, D. G. , Klimes‐Dougan, B. , Gabbay, V. , Hulvershorn, L. , Mueller, B. A. , Camchong, J. , Bell, C. J. , Houri, A. , Kumra, S. , Lim, K. O. , Castellanos, F. X. , & Milham, M. P. (2009). A preliminary study of functional connectivity in comorbid adolescent depression. Neuroscience Letters, 460(3), 227–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dichter, G. S. , Gibbs, D. , & Smoski, M. J. (2015). A systematic review of relations between resting‐state functional‐MRI and treatment response in major depressive disorder. Journal of Affective Disorders, 172, 8–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fan, T. , Hu, Y. , Xin, J. , Zhao, M. , & Wang, J. (2020). Analyzing the genes and pathways related to major depressive disorder via a systems biology approach. Brain and Behavior, 10(2), e01502. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fortin, J.‐P. , Cullen, N. , Sheline, Y. I. , Taylor, W. D. , Aselcioglu, I. , Cook, P. A. , Adams, P. , Cooper, C. , Fava, M. , McGrath, P. J. , McInnis, M. , Phillips, M. L. , Trivedi, M. H. , Weissman, M. M. , & Shinohara, R. T. (2018). Harmonization of cortical thickness measurements across scanners and sites. NeuroImage, 167, 104–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gadgil, S. , Zhao, Q. , Pfefferbaum, A. , Sullivan, E. V. , Adeli, E. , & Pohl, K. M. (2020). Spatio‐temporal graph convolution for resting‐state fMRI analysis. In International Conference on Medical Image Computing and Computer‐Assisted Intervention. Springer, 528–538. [DOI] [PMC free article] [PubMed]
Gallo, S. , El‐Gazzar, A. , Zhutovsky, P. , Thomas, R. M. , Javaheripour, N. , Li, M. , Bartova, L. , Bathula, D. , Dannlowski, U. , Davey, C. , et al. (2023). Functional connectivity signatures of major depressive disorder: Machine learning analysis of two multicenter neuroimaging studies. Molecular Psychiatry, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ganin, Y. , Ustinova, E. , Ajakan, H. , Germain, P. , Larochelle, H. , Laviolette, F. , Marchand, M. , & Lempitsky, V. (2016). Domain‐adversarial training of neural networks. The Journal of Machine Learning Research, 17(1), 2096–2030. [Google Scholar]
Gao, Y. , Zhang, Y. , Cao, Z. , Guo, X. , & Zhang, J. (2019). Decoding brain states from fMRI signals by using unsupervised domain adaptation. IEEE Journal of Biomedical and Health Informatics, 24(6), 1677–1685. [DOI] [PubMed] [Google Scholar]
Geng, C. , Huang, S.‐J. , & Chen, S. (2020). Recent advances in open set recognition: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3614–3631. [DOI] [PubMed] [Google Scholar]
Geng, X. , Xu, J. , Liu, B. , & Shi, Y. (2018). Multivariate classification of major depressive disorder using the effective connectivity and functional connectivity. Frontiers in Neuroscience, 12, 38. [DOI] [PMC free article] [PubMed] [Google Scholar]
Göttlich, M. , Beyer, F. , & Krämer, U. M. (2015). BASCO: A toolbox for task‐related functional connectivity. Frontiers in Systems Neuroscience, 9, 126. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hu, L. , Xiao, M. , Ai, M. , Wang, W. , Chen, J. , Tan, Z. , Cao, J. , & Kuang, L. (2019). Disruption of resting‐state functional connectivity of right posterior insula in adolescents and young adults with major depressive disorder. Journal of Affective Disorders, 257, 23–30. [DOI] [PubMed] [Google Scholar]
Jack, C. R., Jr. , Bernstein, M. A. , Fox, N. C. , Thompson, P. , Alexander, G. , Harvey, D. , Borowski, B. , Britson, P. J. , Whitwell, J. L. , Ward, C. , et al. (2008). The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods. Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine, 27(4), 685–691. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiang, X. , Fu, S. , Yin, Z. , Kang, J. , Wang, X. , Zhou, Y. , Wei, S. , Wu, F. , Kong, L. , Wang, F. , & Tang, Y. (2020). Common and distinct neural activities in frontoparietal network in first‐episode bipolar disorder and major depressive disorder: Preliminary findings from a follow‐up resting state fMRI study. Journal of Affective Disorders, 260, 653–659. [DOI] [PubMed] [Google Scholar]
Jin, Y. , Wang, X. , Long, M. , & Wang, J. (2020). Minimum class confusion for versatile domain adaptation. In European Conference on Computer Vision. Springer, pp. 464–480.
Johnson, W. E. , Li, C. , & Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8(1), 118–127. [DOI] [PubMed] [Google Scholar]
Jun, E. , Na, K.‐S. , Kang, W. , Lee, J. , Suk, H.‐I. , & Ham, B.‐J. (2020). Identifying resting‐state effective connectivity abnormalities in drug‐naive major depressive disorder diagnosis via graph convolutional networks. Human Brain Mapping, 41(17), 4997–5014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lewinsohn, P. M. , Rohde, P. , & Seeley, J. R. (1998). Major depressive disorder in older adolescents: Prevalence, risk factors, and clinical implications. Clinical Psychology Review, 18(7), 765–794. [DOI] [PubMed] [Google Scholar]
Liu, J. , Ji, S. , & Ye, J. (2009). SLEP: Sparse learning with efficient projections. Arizona State University, 6(491), 7. [Google Scholar]
Lohoff, F. W. (2010). Overview of the genetics of major depressive disorder. Current Psychiatry Reports, 12(6), 539–546. [DOI] [PMC free article] [PubMed] [Google Scholar]
Luo, L. , Wu, H. , Xu, J. , Chen, F. , Wu, F. , Wang, C. , & Wang, J. (2021). Abnormal large‐scale resting‐state functional networks in drug‐free major depressive disorder. Brain Imaging and Behavior, 15(1), 96–106. [DOI] [PubMed] [Google Scholar]
Ma, Z. , Li, R. , Yu, J. , He, Y. , & Li, J. (2013). Alterations in regional homogeneity of spontaneous brain activity in late‐life subthreshold depression. PLoS One, 8(1), e53148. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meng, C. , Brandl, F. , Tahmasian, M. , Shao, J. , Manoliu, A. , Scherr, M. , Schwerthöffer, D. , Bäuml, J. , Förstl, H. , Zimmer, C. , Wohlschläger, A. M. , Riedl, V. , & Sorg, C. (2014). Aberrant topology of striatum's connectivity is associated with the number of episodes in depression. Brain, 137(2), 598–609. [DOI] [PubMed] [Google Scholar]
Meskaldji, D.‐E. , Morgenthaler, S. , & Van De Ville, D. (2015). New measures of brain functional connectivity by temporal analysis of extreme events. In 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI). 1em plus 0.5em minus 0.4em IEEE, pp. 26–29.
Nguyen‐Meidine, L. T. , Belal, A. , Kiran, M. , Dolz, J. , Blais‐Morin, L.‐A. , & Granger, E. (2021). Unsupervised multi‐target domain adaptation through knowledge distillation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1339–1347.
Noman, F. , Ting, C.‐M. , Kang, H. , Phan, R. C.‐W. , Boyd, B. D. , Taylor, W. D. , & Ombao, H. (2021). Graph autoencoders for embedding learning in brain networks and major depressive disorder identification. arXiv Preprint arXiv:2107.12838. [DOI] [PubMed] [Google Scholar]
Peng, X. , Huang, Z. , Sun, X. , & Saenko, K. (2019). Domain agnostic learning with disentangled representations. In International Conference on Machine Learning. PMLR. pp. 5102–5112.
Ramasubbu, R. , Brown, M. R. , Cortese, F. , Gaxiola, I. , Goodyear, B. , Greenshaw, A. J. , Dursun, S. M. , & Greiner, R. (2016). Accuracy of automated classification of major depressive disorder as a function of symptom severity. NeuroImage: Clinical, 12, 320–331. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rosa, M. J. , Portugal, L. , Hahn, T. , Fallgatter, A. J. , Garrido, M. I. , Shawe‐Taylor, J. , & Mourao‐Miranda, J. (2015). Sparse network‐based models for patient classification using fMRI. NeuroImage, 105, 493–506. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rozantsev, A. , Salzmann, M. , & Fua, P. (2018). Beyond sharing weights for deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4), 801–814. [DOI] [PubMed] [Google Scholar]
Savva, A. D. , Mitsis, G. D. , & Matsopoulos, G. K. (2019). Assessment of dynamic functional connectivity in resting‐state fMRI using the sliding window technique. Brain and Behavior, 9(4), e01255. [DOI] [PMC free article] [PubMed] [Google Scholar]
Scheepens, D. S. , van Waarde, J. A. , Lok, A. , de Vries, G. , Denys, D. A. , & van Wingen, G. A. (2020). The link between structural and functional brain abnormalities in depression: A systematic review of multimodal neuroimaging studies. Frontiers in Psychiatry, 11, 485. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sen, B. , Cullen, K. R. , & Parhi, K. K. (2021). Classification of adolescent major depressive disorder via static and dynamic connectivity. IEEE Journal of Biomedical and Health Informatics, 25, 2604–2614. [DOI] [PubMed] [Google Scholar]
Shi, C. , Xin, X. , & Zhang, J. (2021). Domain adaptation using a three‐way decision improves the identification of autism patients from multisite fMRI data. Brain Sciences, 11(5), 603. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shi, Y. , Wang, Z. , Chen, P. , Cheng, P. , Zhao, K. , Zhang, H. , Shu, H. , Gu, L. , Gao, L. , Wang, Q. , et al. (2020). Episodic memory‐related imaging features as valuable biomarkers for the diagnosis of Alzheimer's disease: A multicenter study based on machine learning. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 8(2), 171–180. [DOI] [PubMed] [Google Scholar]
Shi, Y. , Zhang, L. , Wang, Z. , Lu, X. , Wang, T. , Zhou, D. , & Zhang, Z. (2021). Multivariate machine learning analyses in identification of major depressive disorder using resting‐state functional connectivity: A multicentral study. ACS Chemical Neuroscience, 12(15), 2878–2886. [DOI] [PubMed] [Google Scholar]
Spearman, C. (1961). The proof and measurement of association between two things. [DOI] [PubMed]
Tarvainen, A. , & Valpola, H. (2017). Mean teachers are better role models: Weight‐averaged consistency targets improve semi‐supervised deep learning results. Advances in Neural Information Processing Systems, 30, 1195–1204. [Google Scholar]
Tian, D. , Zeng, Z. , Sun, X. , Tong, Q. , Li, H. , He, H. , Gao, J.‐H. , He, Y. , & Xia, M. (2022). A deep learning‐based multisite neuroimage harmonization framework established with a traveling‐subject dataset. NeuroImage, 257, 119297. [DOI] [PubMed] [Google Scholar]
Tzeng, E. , Hoffman, J. , Saenko, K. , & Darrell, T. (2017). Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7167–7176.
Van der Maaten, L. , & Hinton, G. (2008). Visualizing data using t‐SNE. Journal of Machine Learning Research, 9(11), 2579–2605. [Google Scholar]
Wang, M. , Zhang, D. , Huang, J. , Yap, P.‐T. , Shen, D. , & Liu, M. (2019). Identifying autism spectrum disorder with multi‐site fMRI via low‐rank domain adaptation. IEEE Transactions on Medical Imaging, 39(3), 644–655. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yan, C. , & Zang, Y. (2010). DPARSF: A MATLAB toolbox for “pipeline” data analysis of resting‐state fMRI. Frontiers in Systems Neuroscience, 4, 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yan, C.‐G. , Chen, X. , Li, L. , Castellanos, F. X. , Bai, T.‐J. , Bo, Q.‐J. , Cao, J. , Chen, G.‐M. , Chen, N.‐X. , Chen, W. , Cheng, C. , Cheng, Y. Q. , Cui, X. L. , Duan, J. , Fang, Y. R. , Gong, Q. Y. , Guo, W. B. , Hou, Z. H. , Hu, L. , … Zang, Y. F. (2019). Reduced default mode network functional connectivity in patients with recurrent major depressive disorder. Proceedings of the National Academy of Sciences, 116(18), 9078–9083. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yan, M. , He, Y. , Cui, X. , Liu, F. , Li, H. , Huang, R. , Tang, Y. , Chen, J. , Zhao, J. , Xie, G. , & Guo, W. (2021). Disrupted regional homogeneity in melancholic and non‐melancholic major depressive disorder at rest. Frontiers in Psychiatry, 12, 124. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang, R. , Gao, C. , Wu, X. , Yang, J. , Li, S. , & Cheng, H. (2016). Decreased functional connectivity to posterior cingulate cortex in major depressive disorder. Psychiatry Research: Neuroimaging, 255, 15–23. [DOI] [PubMed] [Google Scholar]
Yildirim, M. , Gaynes, B. N. , Keskinocak, P. , Pence, B. W. , & Swann, J. (2022). DIP: Natural history model for major depression with incidence and prevalence. Journal of Affective Disorders, 296, 498–505. [DOI] [PubMed] [Google Scholar]
Yoshida, K. , Shimizu, Y. , Yoshimoto, J. , Takamura, M. , Okada, G. , Okamoto, Y. , Yamawaki, S. , & Doya, K. (2017). Prediction of clinical depression scores and detection of changes in whole‐brain using resting‐state functional MRI data with partial least squares regression. PLoS One, 12(7), e0179638. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zou, K. , Gao, Q. , Long, Z. , Xu, F. , Sun, X. , Chen, H. , & Sun, X. (2016). Abnormal functional connectivity density in first‐episode, drug‐naive adult patients with major depressive disorder. Journal of Affective Disorders, 194, 153–158. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[hbm26343-bib-0001] Bell, T. K. , Godfrey, K. J. , Ware, A. L. , Yeates, K. O. , & Harris, A. D. (2022). Harmonization of multi‐site MRS data with ComBat. NeuroImage, 257, 119330. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0002] Cui, X. , Liu, F. , Chen, J. , Xie, G. , Wu, R. , Zhang, Z. , Chen, H. , Zhao, J. , & Guo, W. (2018). Voxel‐wise brain‐wide functional connectivity abnormalities in first‐episode, drug‐naive patients with major depressive disorder. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, 177(4), 447–453. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0003] Cullen, K. R. , Gee, D. G. , Klimes‐Dougan, B. , Gabbay, V. , Hulvershorn, L. , Mueller, B. A. , Camchong, J. , Bell, C. J. , Houri, A. , Kumra, S. , Lim, K. O. , Castellanos, F. X. , & Milham, M. P. (2009). A preliminary study of functional connectivity in comorbid adolescent depression. Neuroscience Letters, 460(3), 227–231. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0004] Dichter, G. S. , Gibbs, D. , & Smoski, M. J. (2015). A systematic review of relations between resting‐state functional‐MRI and treatment response in major depressive disorder. Journal of Affective Disorders, 172, 8–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0005] Fan, T. , Hu, Y. , Xin, J. , Zhao, M. , & Wang, J. (2020). Analyzing the genes and pathways related to major depressive disorder via a systems biology approach. Brain and Behavior, 10(2), e01502. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0006] Fortin, J.‐P. , Cullen, N. , Sheline, Y. I. , Taylor, W. D. , Aselcioglu, I. , Cook, P. A. , Adams, P. , Cooper, C. , Fava, M. , McGrath, P. J. , McInnis, M. , Phillips, M. L. , Trivedi, M. H. , Weissman, M. M. , & Shinohara, R. T. (2018). Harmonization of cortical thickness measurements across scanners and sites. NeuroImage, 167, 104–120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0007] Gadgil, S. , Zhao, Q. , Pfefferbaum, A. , Sullivan, E. V. , Adeli, E. , & Pohl, K. M. (2020). Spatio‐temporal graph convolution for resting‐state fMRI analysis. In International Conference on Medical Image Computing and Computer‐Assisted Intervention. Springer, 528–538. [DOI] [PMC free article] [PubMed]

[hbm26343-bib-0008] Gallo, S. , El‐Gazzar, A. , Zhutovsky, P. , Thomas, R. M. , Javaheripour, N. , Li, M. , Bartova, L. , Bathula, D. , Dannlowski, U. , Davey, C. , et al. (2023). Functional connectivity signatures of major depressive disorder: Machine learning analysis of two multicenter neuroimaging studies. Molecular Psychiatry, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0009] Ganin, Y. , Ustinova, E. , Ajakan, H. , Germain, P. , Larochelle, H. , Laviolette, F. , Marchand, M. , & Lempitsky, V. (2016). Domain‐adversarial training of neural networks. The Journal of Machine Learning Research, 17(1), 2096–2030. [Google Scholar]

[hbm26343-bib-0010] Gao, Y. , Zhang, Y. , Cao, Z. , Guo, X. , & Zhang, J. (2019). Decoding brain states from fMRI signals by using unsupervised domain adaptation. IEEE Journal of Biomedical and Health Informatics, 24(6), 1677–1685. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0011] Geng, C. , Huang, S.‐J. , & Chen, S. (2020). Recent advances in open set recognition: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3614–3631. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0012] Geng, X. , Xu, J. , Liu, B. , & Shi, Y. (2018). Multivariate classification of major depressive disorder using the effective connectivity and functional connectivity. Frontiers in Neuroscience, 12, 38. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0013] Göttlich, M. , Beyer, F. , & Krämer, U. M. (2015). BASCO: A toolbox for task‐related functional connectivity. Frontiers in Systems Neuroscience, 9, 126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0014] Hu, L. , Xiao, M. , Ai, M. , Wang, W. , Chen, J. , Tan, Z. , Cao, J. , & Kuang, L. (2019). Disruption of resting‐state functional connectivity of right posterior insula in adolescents and young adults with major depressive disorder. Journal of Affective Disorders, 257, 23–30. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0015] Jack, C. R., Jr. , Bernstein, M. A. , Fox, N. C. , Thompson, P. , Alexander, G. , Harvey, D. , Borowski, B. , Britson, P. J. , Whitwell, J. L. , Ward, C. , et al. (2008). The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods. Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine, 27(4), 685–691. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0016] Jiang, X. , Fu, S. , Yin, Z. , Kang, J. , Wang, X. , Zhou, Y. , Wei, S. , Wu, F. , Kong, L. , Wang, F. , & Tang, Y. (2020). Common and distinct neural activities in frontoparietal network in first‐episode bipolar disorder and major depressive disorder: Preliminary findings from a follow‐up resting state fMRI study. Journal of Affective Disorders, 260, 653–659. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0017] Jin, Y. , Wang, X. , Long, M. , & Wang, J. (2020). Minimum class confusion for versatile domain adaptation. In European Conference on Computer Vision. Springer, pp. 464–480.

[hbm26343-bib-0018] Johnson, W. E. , Li, C. , & Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8(1), 118–127. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0019] Jun, E. , Na, K.‐S. , Kang, W. , Lee, J. , Suk, H.‐I. , & Ham, B.‐J. (2020). Identifying resting‐state effective connectivity abnormalities in drug‐naive major depressive disorder diagnosis via graph convolutional networks. Human Brain Mapping, 41(17), 4997–5014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0020] Lewinsohn, P. M. , Rohde, P. , & Seeley, J. R. (1998). Major depressive disorder in older adolescents: Prevalence, risk factors, and clinical implications. Clinical Psychology Review, 18(7), 765–794. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0021] Liu, J. , Ji, S. , & Ye, J. (2009). SLEP: Sparse learning with efficient projections. Arizona State University, 6(491), 7. [Google Scholar]

[hbm26343-bib-0022] Lohoff, F. W. (2010). Overview of the genetics of major depressive disorder. Current Psychiatry Reports, 12(6), 539–546. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0023] Luo, L. , Wu, H. , Xu, J. , Chen, F. , Wu, F. , Wang, C. , & Wang, J. (2021). Abnormal large‐scale resting‐state functional networks in drug‐free major depressive disorder. Brain Imaging and Behavior, 15(1), 96–106. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0024] Ma, Z. , Li, R. , Yu, J. , He, Y. , & Li, J. (2013). Alterations in regional homogeneity of spontaneous brain activity in late‐life subthreshold depression. PLoS One, 8(1), e53148. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0025] Meng, C. , Brandl, F. , Tahmasian, M. , Shao, J. , Manoliu, A. , Scherr, M. , Schwerthöffer, D. , Bäuml, J. , Förstl, H. , Zimmer, C. , Wohlschläger, A. M. , Riedl, V. , & Sorg, C. (2014). Aberrant topology of striatum's connectivity is associated with the number of episodes in depression. Brain, 137(2), 598–609. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0026] Meskaldji, D.‐E. , Morgenthaler, S. , & Van De Ville, D. (2015). New measures of brain functional connectivity by temporal analysis of extreme events. In 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI). 1em plus 0.5em minus 0.4em IEEE, pp. 26–29.

[hbm26343-bib-0027] Nguyen‐Meidine, L. T. , Belal, A. , Kiran, M. , Dolz, J. , Blais‐Morin, L.‐A. , & Granger, E. (2021). Unsupervised multi‐target domain adaptation through knowledge distillation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1339–1347.

[hbm26343-bib-0028] Noman, F. , Ting, C.‐M. , Kang, H. , Phan, R. C.‐W. , Boyd, B. D. , Taylor, W. D. , & Ombao, H. (2021). Graph autoencoders for embedding learning in brain networks and major depressive disorder identification. arXiv Preprint arXiv:2107.12838. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0029] Peng, X. , Huang, Z. , Sun, X. , & Saenko, K. (2019). Domain agnostic learning with disentangled representations. In International Conference on Machine Learning. PMLR. pp. 5102–5112.

[hbm26343-bib-0030] Ramasubbu, R. , Brown, M. R. , Cortese, F. , Gaxiola, I. , Goodyear, B. , Greenshaw, A. J. , Dursun, S. M. , & Greiner, R. (2016). Accuracy of automated classification of major depressive disorder as a function of symptom severity. NeuroImage: Clinical, 12, 320–331. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0031] Rosa, M. J. , Portugal, L. , Hahn, T. , Fallgatter, A. J. , Garrido, M. I. , Shawe‐Taylor, J. , & Mourao‐Miranda, J. (2015). Sparse network‐based models for patient classification using fMRI. NeuroImage, 105, 493–506. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0032] Rozantsev, A. , Salzmann, M. , & Fua, P. (2018). Beyond sharing weights for deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4), 801–814. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0033] Savva, A. D. , Mitsis, G. D. , & Matsopoulos, G. K. (2019). Assessment of dynamic functional connectivity in resting‐state fMRI using the sliding window technique. Brain and Behavior, 9(4), e01255. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0034] Scheepens, D. S. , van Waarde, J. A. , Lok, A. , de Vries, G. , Denys, D. A. , & van Wingen, G. A. (2020). The link between structural and functional brain abnormalities in depression: A systematic review of multimodal neuroimaging studies. Frontiers in Psychiatry, 11, 485. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0035] Sen, B. , Cullen, K. R. , & Parhi, K. K. (2021). Classification of adolescent major depressive disorder via static and dynamic connectivity. IEEE Journal of Biomedical and Health Informatics, 25, 2604–2614. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0036] Shi, C. , Xin, X. , & Zhang, J. (2021). Domain adaptation using a three‐way decision improves the identification of autism patients from multisite fMRI data. Brain Sciences, 11(5), 603. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0037] Shi, Y. , Wang, Z. , Chen, P. , Cheng, P. , Zhao, K. , Zhang, H. , Shu, H. , Gu, L. , Gao, L. , Wang, Q. , et al. (2020). Episodic memory‐related imaging features as valuable biomarkers for the diagnosis of Alzheimer's disease: A multicenter study based on machine learning. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 8(2), 171–180. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0038] Shi, Y. , Zhang, L. , Wang, Z. , Lu, X. , Wang, T. , Zhou, D. , & Zhang, Z. (2021). Multivariate machine learning analyses in identification of major depressive disorder using resting‐state functional connectivity: A multicentral study. ACS Chemical Neuroscience, 12(15), 2878–2886. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0039] Spearman, C. (1961). The proof and measurement of association between two things. [DOI] [PubMed]

[hbm26343-bib-0040] Tarvainen, A. , & Valpola, H. (2017). Mean teachers are better role models: Weight‐averaged consistency targets improve semi‐supervised deep learning results. Advances in Neural Information Processing Systems, 30, 1195–1204. [Google Scholar]

[hbm26343-bib-0041] Tian, D. , Zeng, Z. , Sun, X. , Tong, Q. , Li, H. , He, H. , Gao, J.‐H. , He, Y. , & Xia, M. (2022). A deep learning‐based multisite neuroimage harmonization framework established with a traveling‐subject dataset. NeuroImage, 257, 119297. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0042] Tzeng, E. , Hoffman, J. , Saenko, K. , & Darrell, T. (2017). Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7167–7176.

[hbm26343-bib-0043] Van der Maaten, L. , & Hinton, G. (2008). Visualizing data using t‐SNE. Journal of Machine Learning Research, 9(11), 2579–2605. [Google Scholar]

[hbm26343-bib-0044] Wang, M. , Zhang, D. , Huang, J. , Yap, P.‐T. , Shen, D. , & Liu, M. (2019). Identifying autism spectrum disorder with multi‐site fMRI via low‐rank domain adaptation. IEEE Transactions on Medical Imaging, 39(3), 644–655. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0045] Yan, C. , & Zang, Y. (2010). DPARSF: A MATLAB toolbox for “pipeline” data analysis of resting‐state fMRI. Frontiers in Systems Neuroscience, 4, 13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0046] Yan, C.‐G. , Chen, X. , Li, L. , Castellanos, F. X. , Bai, T.‐J. , Bo, Q.‐J. , Cao, J. , Chen, G.‐M. , Chen, N.‐X. , Chen, W. , Cheng, C. , Cheng, Y. Q. , Cui, X. L. , Duan, J. , Fang, Y. R. , Gong, Q. Y. , Guo, W. B. , Hou, Z. H. , Hu, L. , … Zang, Y. F. (2019). Reduced default mode network functional connectivity in patients with recurrent major depressive disorder. Proceedings of the National Academy of Sciences, 116(18), 9078–9083. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0047] Yan, M. , He, Y. , Cui, X. , Liu, F. , Li, H. , Huang, R. , Tang, Y. , Chen, J. , Zhao, J. , Xie, G. , & Guo, W. (2021). Disrupted regional homogeneity in melancholic and non‐melancholic major depressive disorder at rest. Frontiers in Psychiatry, 12, 124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0048] Yang, R. , Gao, C. , Wu, X. , Yang, J. , Li, S. , & Cheng, H. (2016). Decreased functional connectivity to posterior cingulate cortex in major depressive disorder. Psychiatry Research: Neuroimaging, 255, 15–23. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0049] Yildirim, M. , Gaynes, B. N. , Keskinocak, P. , Pence, B. W. , & Swann, J. (2022). DIP: Natural history model for major depression with incidence and prevalence. Journal of Affective Disorders, 296, 498–505. [DOI] [PubMed] [Google Scholar]

[hbm26343-bib-0050] Yoshida, K. , Shimizu, Y. , Yoshimoto, J. , Takamura, M. , Okada, G. , Okamoto, Y. , Yamawaki, S. , & Doya, K. (2017). Prediction of clinical depression scores and detection of changes in whole‐brain using resting‐state functional MRI data with partial least squares regression. PLoS One, 12(7), e0179638. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26343-bib-0051] Zou, K. , Gao, Q. , Long, Z. , Xu, F. , Sun, X. , Chen, H. , & Sun, X. (2016). Abnormal functional connectivity density in first‐episode, drug‐naive adult patients with major depressive disorder. Journal of Affective Disorders, 194, 153–158. [DOI] [PubMed] [Google Scholar]

PERMALINK

Addressing multi‐site functional MRI heterogeneity through dual‐expert collaborative learning for brain disease identification

Yuqi Fang

Guy G Potter

Di Wu

Hongtu Zhu

Mingxia Liu

Abstract

1. INTRODUCTION

FIGURE 1.

FIGURE 2.

2. RELATED WORK

2.1. Automated MDD identification

2.2. Functional MRI adaptation

3. MATERIALS AND METHODOLOGY

3.1. Materials and image preprocessing

3.1.1. Data acquisition

TABLE 1.

3.1.2. Data preprocessing

3.2. Proposed methodology

3.2.1. Domain‐generic student model

3.2.2. Domain‐specific expert model

3.2.3. Collaborative learning with knowledge distillation

3.2.4. Implementation

4. EXPERIMENTS

4.1. Competing methods

4.2. Experimental settings

4.3. Classification results

TABLE 2.

4.4. Ablation study

TABLE 3.

4.5. Comparison with state‐of‐the‐art

TABLE 4.

4.6. Model generalizability to other unseen sites

TABLE 5.

5. DISCUSSION

5.1. Influence of different losses

FIGURE 3.

5.2. Discriminative functional connectivity identification

FIGURE 4.

5.3. Visualization of fMRI features

FIGURE 5.

5.4. Generalizability to another brain disease

FIGURE 6.

5.5. Influence of different image acquisition settings

TABLE 6.

FIGURE 7.

5.6. Influence of training sample size

FIGURE 8.

5.7. Influence of FC matrix construction methods

TABLE 7.

5.8. Limitations and future work

6. CONCLUSION

CONFLICT OF INTEREST STATEMENT

ACKNOWLEDGMENTS

Footnotes

DATA AVAILABILITY STATEMENT

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases