Abstract
In practice, collecting auxiliary labeled data with same feature space from multiple domains is difficult. Thus, we focus on the heterogeneous transfer learning to address the problem of insufficient sample sizes in neuroimaging. Viewing subjects, time, and features as dimensions, brain activation and dynamic functional connectivity data can be treated as high-order heterogeneous data with heterogeneity arising from distinct feature space. To use the heterogeneous priori knowledge from the low-dimensional brain activation data to improve the classification performance of high-dimensional dynamic functional connectivity data, we propose a tensor dictionary-based heterogeneous transfer learning framework. It combines supervised tensor dictionary learning with heterogeneous transfer learning for enhance high-order heterogeneous knowledge sharing. The former can encode the underlying discriminative features in high-order data into dictionaries, while the latter can transfer heterogeneous knowledge encoded in dictionaries through feature transformation derived from mathematical relationship between domains. The primary focus of this paper is gender classification using fMRI data to identify emotion-related brain gender differences during adolescence. Additionally, experiments on simulated data and EEG data are included to demonstrate the generalizability of the proposed method. Experimental results indicate that incorporating prior knowledge significantly enhances classification performance. Further analysis of brain gender differences suggests that temporal variability in brain activity explains differences in emotion regulation strategies between genders. By adopting the heterogeneous knowledge sharing strategy, the proposed framework can capture the multifaceted characteristics of the brain, improve the generalization of the model, and reduce training costs. Understanding the gender specific neural mechanisms of emotional cognition helps to develop the gender-specific treatments for neurological diseases.
Keywords: Transfer learning, Tensor dictionary learning, Gender differences, Emotion processing
1. Introduction
Machine learning has shown extremely successful in neuroscience due to its ability to decode brain activity, understand brain connectivity, diagnose brain disorders, and develop personalized treatment (Nielsen, Barch, Petersen, Schlaggar, & Greene, 2020). Traditional machine learning typically requires a substantial amount of labeled data for effective model training, and insufficient data can compromise learning ability and result in less reliable outcomes (Niu, Liu, Wang, & Song, 2020). Usually, the sample size of data in neuroscience is small, which is caused by the complexity of data acquisition, participant restrictions, difficulty in data processing and storage, and ethical and privacy issues. Transfer learning has emerged as a way to address the problem of insufficient data in neuroscience, which emphasizes the effective use of prior knowledge from one or more source domains to improve performance in related target domain Aldayel, Ykhlef, and Al-Nafjan (2020). Based on the type of knowledge that is transferred, transfer learning can be split as instance-based, feature-based, parameter-based, relational-based, and hybrid-based approaches (Zhuang et al., 2021). Instance-based approaches rely on the instance weighting strategy. Feature-based approaches focus on learning the new feature representation from the original data. Parameter-based approaches encode the knowledge into parameters for transfer. Relational-based approaches aim to transfer the logical relationships or rules acquired in the source domain to the target domain. Hybrid-based approaches are to arbitrarily combine the four approaches mentioned above. Additionally, transfer learning can be categorized into homogeneous and heterogeneous according to the consistency in feature and label spaces between the source and target domains (Day & Khoshgoftaar, 2017). Homogeneous transfer learning refers to the same attributes in both feature spaces and label spaces between two domains, aiming to bridge the gap in the data distributions between two domains. Heterogeneous transfer learning refers to the different attributes in feature spaces and/or label spaces between two domains, which not only requires narrowing the differences in data distribution but also needs feature and/or label space transformation for knowledge transfer.
In recent neuroscience researches, homogeneous transfer learning with feature-based and parameter-based approaches has been applied to utilize multi-site data for dealing with data scarcity. For example, transfer learning within the framework of Gaussian graphical model has been introduced for brain connectivity analysis, allowing the utilization of auxiliary information to further enhance graphical learning accuracy (He, Li, Hu, & Liu, 2022). In Yousefnezhad, Selvitella, Zhang, Greenshaw, and Greiner (2020), a shared space transfer learning method was introduced to improve multi-voxel pattern analysis with multi-site functional magnetic resonance imaging (fMRI) datasets. To address the issue of analyzing a primary dataset with a limited number of samples, dictionary learning-based transfer learning was used, involving joint learning of a model with secondary fMRI datasets (Zhang, Chen and Ramadge, 2018). In Ni, Gu, and Jiang (2023), a transfer discriminative dictionary learning was proposed for electroencephalograph (EEG) signal classification, which utilizes label information from the source domain samples and a few labeled samples from the target domain to concurrently learn dictionaries in two domains. For cross-domain EEG signal classification, a geometric preserving transfer discriminative dictionary learning was developed, transferring the discriminative information learned from the source to the target domains (Gu, Shen, Qu, & Ni, 2022). To enhance the classification performance in the target domain with limited labeled samples, a semi-supervised transfer learning approach was introduced for the diagnosis of mental disorders using fMRI data (Hu et al., 2023).
However, the methods mentioned above only concentrate on scenarios where the source and target domains share same feature spaces, thereby greatly restricting their applicability. Compared with homogeneous transfer learning, heterogeneous transfer learning can acquire more comprehensive and diverse prior information from different domains or modalities, promoting knowledge sharing across different fields and improving the generalization of the model in the target domain. Heterogeneous transfer learning is more challenging because there are significant differences between the source and target domains, making it difficult to transfer heterogeneous feature from the source to the target domains. In Mignone, Pio, and Ceci (2024), a distributed heterogeneous transfer learning model was presented. It can process large source and target datasets, work in the PU learning setting, and align heterogeneous feature spaces. To enhance the interpretability of the transfer process and uncover hidden information between the two domains, heterogeneous feature transfer using fuzzy inference rules was proposed (Lou et al., 2024). Lou et al. (2024) and Mignone et al. (2024) focus on transfer learning for data with vector-based features, which limits their ability to transfer prior knowledge from higher-order data. In Zhan, Wang, Ren, and Huang (2022), the multi-band data stitching with label alignment and tangent space mapping algorithm was proposed for handling multivariate time series. It addresses cases where the feature spaces are the same, but the label spaces differ for multivariate time series data in the source and target domains. In Aggarwal et al. (2024), a federated learning model for the classification of brain tumors in heterogeneously distributed images was proposed, which combines convolutional neural network and transfer learning. Thus, it is specifically suitable for processing brain images for classification. To effectively process multisource data represented as heterogeneous tensors, the heterogeneous support tensor machine was proposed to directly separate samples in heterogeneous tensor space (Gao, Gu, Chen, & Zhou, 2023). However, it focuses on processing tensor data with different feature dimensions while maintaining the same feature attributes.
Different from the above heterogeneous transfer learning, we focus on heterogeneous higher-order feature transfer for capturing brain differences using brain activation (BA) and dynamic functional connectivity (dFC) data. BA is the multivariate time series that reflects the neural activity of brain regions over a period of time, whereas dFC is the dynamic correlation among these variables that reflects the time-varying dependencies between pairs of brain regions. Thus, they have heterogeneity in feature dimensions and attributes, but there is an association between them that can be used for knowledge transfer through relation-based approaches (Tomasi & Volkow, 2018). Compared with dFC data with high-dimensional features and a small sample size, BA data has low-dimensional features, which are more easily processed by traditional machine learning methods. Motivated by these specific properties, we utilize the relationship between them to transfer prior knowledge obtained from BA data to dFC data for obtaining more accurate results and reducing computational complexity in dFC analysis. In this way, we can enhance the learning ability in dFC analysis, capture the multifaceted characteristics of the brain, and reduce training and computational resource costs.
Given that dictionary learning can provide a more intuitive representation of data structure and features and offer stronger interpretability, it is often employed for uncovering recurring brain patterns from BA and dFC data (Jin, Dontaraju, Kim, Akhonda, & Adali, 2020; Qiao, Yang, Calhoun, Xu, & Wang, 2021). Viewing subjects, time, and features as dimensions, both BA and dFC data can be formed as high-order data (i.e., tensor). However, dictionary learning potentially results in the information loss or the disruption of underlying structural relationships for the high-order data. Thus, it fails to effectively dealing with high-order data that possess multi-dimensional structures. Tensor dictionary learning is then employed to discover the BA and dFC patterns from such high-order data, because it has the ability to extract relevant features and represent high-order data sparsely with preserving the multi-dimensional structure within high-order data. Moreover, a priori knowledge can be encoded into the model parameters through tensor dictionary learning, and then it can be used for knowledge transfer by the parameter-based approaches. To use high-order heterogeneous data (i.e., BA and dFC data) for brain analysis, we propose tensor dictionary-based heterogeneous transfer learning (TD-HTL). It combines tensor dictionary learning with both relational-based and parameter-based approaches for heterogeneous transfer learning. Specifically, BA and dFC data are considered as the source and target domains respectively. Supervised tensor dictionary learning is first used to decompose the source data into dictionaries and sparse representation tensors. Then, the dictionaries learned in the source domain undergo a feature transformation guided by a specific relationship between two domains. Considering the transformed set of dictionaries as the corresponding ones in the target domain, sparse representation tensors can be obtained in the target domain.
In this paper, we analyze emotion-task fMRI data within Philadelphia Neurodevelopmental Cohort (PNC) dataset. BA and dFC data were derived for brain analysis of emotion-related gender differences during adolescence. Gender bias in brain disorders has been reported across various diseases, and the gender-specific treatments may have an impact on neuropsychiatric disorders (Kundakovic & Tickerhoof, 2023). Previous reports suggesting that emotion processing differs between genders, which may contribute to the varying prevalence of some psychopathological conditions between males and females (Zhang, Dougherty, Baum, White and Michael, 2018). Consequently, understanding the gender specific neural mechanisms of emotional cognition helps to develop the gender-specific treatments for neurological diseases (İçer, Acer, & Baş, 2020; Küchenhoff et al., 2024). Additionally, it can shed light on the ways in which male and female brains are structurally and functionally distinct, which can refine our understanding of brain function and inform cognitive training and educational strategies (Derks & Krabbendam, 2013). In addition, compared with single analysis of brain gender differences based on BA or dFC data, the combination of them can provide complementary information including neuronal activity, functional connectivity, and their associations to enhance the understanding of brain. By applying TD-HTL for gender classification, results indicates that utilizing prior knowledge in the source domain is beneficial for improving target performance. Furthermore, brain analysis shows that both genders exhibit the highly segregated and densely integrated network structure during emotion identification tasks, with gender differences primarily manifesting in the temporal variability and transition patterns over time.
The main contributions are summarized as follows: (1) Viewing subjects, time, and features as dimensions, BA and dFC data can be formed as high-order data (i.e., tensor). However, in many BA or dFC analyses, the multi-dimensional structure within data is often overlooked, potentially leading to the loss of valuable information or the disruption of underlying structural relationships. To address this issue, we employ supervised tensor dictionary learning. It can fully leverage the label information to effectively extracting the underlying discriminative features and represent high-order data sparsely while preserving the multi-dimensional structure. (2) In dFC analyses based on dictionary learning methods, the dictionary is typically learned from dFC data with high-dimensional features and a small sample size. This often results in poor dictionary learning, lengthy training times and high computational resource requirements. To address this issue, we propose a heterogeneous transfer strategy to improve the classification performance on dFC data by utilizing the prior knowledge learned from low-dimensional BA data. Through heterogeneous knowledge sharing across BA and dFC data, it helps better to capture the multifaceted characteristics of the brain, improve the generalization, and reduce training costs. (3) In conjunction with brain activation data, dynamic functional connectivity data not only reflect functional connections associated with emotion-related gender differences but also reveal the brain activity of regions corresponding to these functional connections. It is suggested that brain regions with high functional connectivity are more likely to be activated during emotion tasks for both females and males.
2. Tensor basics
A scalar is a zeroth-order tensor, denoted by lowercase letters (e.g., ). A vector is a first-order tensor, represented by bold lowercase letters (e.g., ). A matrix is a second-order tensor, denoted by bold capital letters (e.g., ). An -dimensional array is an th-order tensor, represented by bold Euler script letters (e.g., ). Fibers are high-order analogues of matrix rows and columns, represented as column vectors obtained by fixing all but one index of a tensor. Slices are two-dimensional sections of a tensor, resulting from fixing all but two indices. For a third-order tensor , the element is denoted by . Its column, row, and tube fibers are denoted by , and respectively. The horizontal, lateral, and frontal slices are denoted by , , and , respectively. Below are the tensor definitions and operations relevant to the proposed framework.
- Tensor Norm: For an th-order tensor , its norm is defined by
(1) Matricization: The most common way to matricize a tensor is the mode- matricization, which arranges the mode- fibers to be the columns of the resulting matrix. For an th-order tensor , its mode- matricization is an matrix with each column being one of the mode- fibers.
-
Tensor Multiplication: The -mode product is a type of tensor contraction that involves multiplying a tensor by a matrix along mode . The -mode product of an th-order tensor with a matrix is defined by
(2) In other words, the -mode product is to multiply each mode- fiber by the matrix . Thus, the -mode product can be expressed as(3)
3. Methodology
In this section, we first introduced unsupervised and supervised tensor dictionary learning. By treating BA data as the source domain and dFC data as the target domain, we derived the feature transformation from the source domain to the target domain based on their interrelated relationships. Building upon this derived feature transformation, we then presented the proposed TD-HTL framework.
3.1. Unsupervised tensor dictionary learning
In data denoising, the goal is to separate the noise from the observed data while retaining significant features. In data decomposition, the goal is to decompose the data into meaningful components, facilitating tasks such as feature extraction and pattern recognition. To achieve data denoising and decomposition, a dictionary and sparse representations are learned from samples such that for all . Traditional dictionary learning is to minimize the reconstruction error with an added sparsity constraint on the sparse representations
| (4) |
where and denote the -norm and the -norm, respectively, and is a sparsity threshold that controls the number of non-zero entries in . However, traditional dictionary learning is only applicable to samples with features represented as first-order tensors (i.e., vectors). It fails when dealing with samples whose features are second-order or higher-order tensors. To solve this limitation, tensor dictionary learning has been proposed, which is an extension of dictionary learning from vectors to tensors. It aims to learn a dictionary and a sparse representation tensor that can effectively represent and reconstruct high-order data. For a given third-order tensor , it can be decomposed into a dictionary and a sparse representation tensor . The objective is to minimize the reconstruction error, which is defined as
| (5) |
where is the th atom of dictionary . is the column fiber of representation tensor . The -norm constraint on each dictionary atom is to prevent the scaling indeterminacy between the atoms and the sparse representations, and the -sparse constraint on each column fiber of representation tensor is to filter out the redundant information. Problem (5) is non-convex in terms of (). As a result, there is no unique solution that minimizes the objective function. Similar to the dictionary learning problem, an alternating minimization approach can naturally be employed to iteratively optimize one variable while keeping the other fixed, leading to a stationary point in the optimization problem. This approach does not guarantee finding a global minimum, but it guarantees convergence to a local minimum of the objective function.
With the fixing dictionary, the sparse representation tensor can be obtained by decoupling problem (5) to the following subproblems
| (6) |
where is the column fiber of tensor data . It can be solved by the orthogonal matching pursuit (OMP) (Tropp & Gilbert, 2007) shown in Algorithm 3 of Appendix A. Assuming the sparse representation tensor is fixed, the dictionary can be updated by minimizing the unfolded form of problem (5) as
| (7) |
where and are mode-1 matricization of tensor data and representation tensor , respectively. The above problem can be solved by the K-SVD (Aharon, Elad, & Bruckstein, 2006) as shown in Algorithm 4 of Appendix A. Problem (5) can be solved by alternately updating the sparse representation tensor and the dictionary until a stopping criterion is reached.
3.2. Supervised tensor dictionary learning
Compared with unsupervised tensor dictionary learning, which is solely used for data reconstruction and dimensionality reduction, supervised tensor dictionary learning can fully leverage the label information of the data, thereby more effectively capturing the underlying discriminative features in the data. The common way for classification with dictionary learning is to maintain separate dictionaries for each class. To better reconstruct the high-order data of a certain class by the dictionary learned from that class compared to all other class dictionaries, label information is used in tensor dictionary learning to promote the discrimination between each class. The flowchart of the proposed framework, i.e., supervised tensor dictionary learning (STDL), is shown in Fig. 1.
Fig. 1.

The flowchart of the STDL framework. For each class, the tensor data is decomposed into the class-specific dictionary and the sparse representation tensor .
Assuming there are different classes, in which is the third-order tensor from the th class. The objective function of STDL is given by
| (8) |
in which and are dictionary and sparse representation tensor of the th class respectively. Problem (8) can be decoupled into subproblems, which can be solved by Algorithm 1. Generally, the initial dictionary can be randomly generated or obtained through SVD.

For a giving testing data , the sparse representation over dictionary can be obtained for . The different classes reconstructions are computed as
| (9) |
The label of the test data is assigned to the class which gives the closest approximation as
| (10) |
3.3. Feature transformation
Due to the heterogeneity in the feature space between BA data and dFC data, a feature transformation is necessary for effective knowledge transfer. The STDL framework enables the encoding of underlying discriminative features in high-order data into dictionaries, thereby achieving a linear representation of the original data. Additionally, since dFC data is derived from BA data using the Pearson correlation coefficient, there exists an exact mathematical relationship between them. Therefore, based on the dictionaries obtained from the STDL framework and the mathematical relationship between the heterogeneous data, a feature transformation between BA data and dFC data can be established. For a given BA data , the subseries of the th variable within the th window is represented by for and . Denoting as the mean value of the subseries and , the Pearson correlation coefficient between the subseries and is defined as
| (11) |
Assuming there is a dictionary and a sparse representation that can represent BA data , it can be expressed as . Supposing is the mean value of the subseries that is the th row of within the th window, then Eq. (11) can be rewritten as
| (12) |
Assuming for , then the correlation matrix of variables within the th window is
| (13) |
Let be the correlation vector of variables within the th window, which is formed by vectorizing the upper triangular elements of matrix . Denote as the operation of vectorizing the upper triangular elements of a matrix. , is defined as . Let , then Eq. (13) can be rewritten as
| (14) |
Denote as the correlation matrix of variables within windows, which can be represented by
| (15) |
in which . In this way, the dictionary learned form the BA data can be transferred to the dFC data based on Eq. (15).
3.4. TD-HTL
In dFC analysis based on dictionary learning methods, the dictionary is typically learned from dFC data with high-dimensional features and a small sample size. This often results in poor dictionary learning, leading to lengthy training times and high computational resource requirements. In this paper, we utilize the exact mathematical relationship between BA and dFC data to establish a feature transformation. By heterogeneous transfer learning, we can transfer the learned dictionary from BA data through the feature transformation into the dFC data as the corresponding dictionary. Once the dictionary in dFC analysis is obtained from BA data through heterogeneous transfer learning, it can be used to predict from dFC in independent subjects. In this way, we avoid the process of directly learning dictionaries from dFC data, leading to more accurate results and lower time and computational resource costs. Specifically, we propose the TD-HTL framework to enhance the model performance of dFC data using heterogeneous prior information obtained from BA data. This framework combines supervised tensor dictionary learning with both relational-based and parameter-based approaches for heterogeneous transfer learning. Specifically, the STDL framework can encode underlying discriminative features in BA data into dictionaries. By employing the feature transformation defined in Eq. (15), we can transfer the heterogeneous prior information encoded in the dictionaries to improve the model performance of dFC data. The flowchart of the proposed framework is shown in Fig. 2.
Fig. 2.

The flowchart of TD-HTL framework.
Assuming there are subjects for the class , each contains a BA matrix consisting of time series with a length of , which can form a third-order tensor for as the source domain. The dFC data derived from the BA data of class can be seen as the target domain. The dictionary of the th class for the source domain, , can be obtained by minimizing the following problem
| (16) |
where is the sparse representation tensor of the th class for the source domain. Then the dictionary learned in the source domain can be transferred into the target domain through the transformation rule . In other words, can be regarded as the dictionary of the target domain. Then, the sparse representation tensor for the target domain can be obtained by minimizing the following problem
| (17) |
The optimization process of the TD-HTL is shown in Algorithm 2.
Algorithm 2:
TD-HTL
| Input: Tensors and , Dictionary size , Sparsity and |
| Output: Dictionaries and , Sparse representation tensors and , |
| 1 |
| 2 for |
| 3 for |
For a giving testing data from the target domain, the sparse representation over dictionary can be obtained for . The different classes reconstructions are computed as
| (18) |
The label of the testing data from the target domain is assigned to the class which gives the closest approximation as
| (19) |
4. Results and analysis
To compare the performance of our proposed models with and without knowledge transfer, we evaluate both the transfer learning model TD-HTL and the non-transfer model STDL. Specifically, we apply the STDL model separately to the source and target domains, referred to as STDL-S and STDL-T, respectively. Experiments are conducted on simulation data, and two real datasets including EEG data for alcoholic subject classification and fMRI data for gender classification. In this paper, our primary focus is on identifying emotion-related brain gender differences during adolescence using the proposed method, and the experiments on EEG data and simulation data are conducted to demonstrate the effectiveness of the proposed model on different datasets. The grid search method is performed to find the optimal hyperparameters because it can effectively find a stable solution and easily be parallelized to expedite the search process (Saud et al., 2020). To ensure the robustness and reliability of the experimental results, we conduct 10 random 5-fold cross-validation (5-fold-CV) trials and leave-one-subject-out cross-validation (LOSO-CV), providing a more accurate and comprehensive evaluation of the performance and effectiveness of the method being tested. The code can be found in https://github.com/monicalan/TD-HTL.
4.1. Baselines
Support vector machine (SVM) (Cervantes, Garcia-Lamont, Rodríguez-Mazahua, & Lopez, 2020): It is a single-domain method, with input features being functional connectivity (FC) represented as vectors. To compare the performance differences in classification tasks between vector-based single-domain methods for processing FC features and our proposed tensor-based transfer learning method for processing dFC features, we use SVM as a competitor. Least squares support tensor machine (LSSTM) (Cichocki et al., 2017): It is a single-domain method, with input features being higher-order tensor data. To compare the performance differences in classification tasks between tensor-based single-domain methods and our proposed tensor-based transfer learning method, LSSTM is used as a competitor. LSSTM is applied separately to the source and target domains for gender classification, referred to as LSSTM-S and LSSTM-T, respectively. Graph convolutional neural networks (GCNs): They are single-domain methods, with input features including both FC and BA. These methods leverage both graph structure and graph signal information simultaneously to improve classification tasks. Consequently, classic high-performance methods in GCNs, such as the graph isomorphism network (GIN) (Xu, Hu, Leskovec, & Jegelka, 2018), gated graph neural networks (GGNN) (Li, Zemel, Brockschmidt, & Tarlow, 2016), and graph convolutional neural network (GCNN) (Wagh & Varatharajah, 2020), are applied for gender classification as competitors. Domain adaptation support tensor machine (DASTM) (Gao et al., 2023): It is a heterogeneous transfer learning method that enables the processing of heterogeneous tensor data from different sources for classification. To compare the performance differences in classification tasks between tensor-based heterogeneous transfer learning methods and our proposed method, DASTM is used as a benchmark.
4.2. Datasets
4.2.1. Simulation data
The details of generating simulated multivariate time series data for two classes is as follows:
The generation of -dimensional basis vectors: Each basis vector is generated from a multivariate normal distribution for . In which, with being a base value and being a deviation that derived from uniform distribution . for and with being the autocorrelation coefficient. Here, is randomly derived from a normal distribution and is randomly derived from uniform distribution when generating each basis vector. The deviation is set to be 0.3 when generating each basis vector. In this way, we can generate basis vectors denoted as a matrix , each basis vector with a distinct correlation pattern among the variables. Assuming that the number of common basis vectors for two classes is , then we view the first basis vectors of as common basic vectors denoted as . To generate differentiated atoms for two classes, we add a perturbation to the rest basis vectors of denoted as . The perturbation is generated as a random normal distribution scaled by a perturbation factor , i.e., . In this way, we generate two sets of basis vectors, i.e., and . Finally, we shuffle and normalize all columns of and . Here, we set the perturbation factor as 0.003.
The generation of time-varying information: We generate the time-varying matrix for samples. Each column of is derived from for . In which, with being a base value and being a deviation that derived from uniform distribution . is an identity matrix. To obtain the sparsity, we set the elements in that are less than sparse threshold to 0. Here, we set , and sparse threshold for each sample.
The generation of multivariate time series for two classes: We set , and the number of samples in each class is 50 . The multivariate time series for each sample is generated by multiplying the set of basic vectors of that class and its time-varying matrix, i.e., if the th sample belongs to the th class. Besides, each multivariate time series is added with a random normal noise . Here, we set the noise level as 0.1 For each sample, the source data is a multivariate time series , and the target data derived from the source data is a multivariate time series with variables and time points. In which, the window length and the scan length are 20 and 10 respectively.
4.2.2. EEG data
The EEG data comes from a large study aimed at examining the EEG correlates of genetic predisposition to alcoholism (Vuokko & Kaski, 2011). It includes measurements from 64 electrodes placed on the subjects’ scalps, sampled at 256 Hz (3.9-millisecond epochs) over a 1-second period. There are 116 subjects were divided into two groups including 66 alcoholics and 50 controls. For each subject, the source data is a multivariate time series with 64 variables and 256 time points, and the target data derived from the source data is a multivariate time series with variables and time points. In which, the window length and the scan length are 15 and 5 respectively.
4.2.3. fMRI data
The fMRI data is from the PNC project. The PNC project is a cooperation between the Brain Behavior Laboratory at the University of Pennsylvania and the Children’s Hospital of Philadelphia. It involves data collection from nearly 900 youth aged 8 to 22 using three fMRI paradigms, i.e., emotion identification, working memory, and resting-state (Satterthwaite et al., 2014). Among these subjects, 94 males and 81 females, aged between 144 and 179 months, completed the emotion identification fMRI paradigm. All fMRI scans were collected on a single 3T Siemens TIM Trio whole-body scanner using a single-shot, interleaved multi-slice, gradient-echo, echo planar imaging sequence, where the scan duration of the emotion identification fMRI paradigm were 10.5 min (210 TR). During emotion identification task, subjects were asked to identify 60 faces with neutral, happy, sad, angry, or fearful expressions. Statistical Parametric Mapping 12 were used to implement the preprocessing process including motion correction, co-registration, spatial normalization to standard Montreal Neurological Institute space (spatial resolution of 3×3×3 mm), and spatial smoothing with a 3 mm full width half maximum Gaussian kernel. To remove the influence of motion, a regression procedure was used and the time series were band-pass filtered using a 0.01 Hz to 0.1 Hz frequency range.
According to the Power coordinates with a sphere radius parameter of 5 mm (Power et al., 2011), 264 regions of interest (ROIs) containing 21384 voxels were extracted. 264 ROIs are assigned into 13 functional networks for better understanding of the functional relationships (Power et al., 2011). Functional networks mainly associated with the perception of movement, memory, language, vision, cognition and other functions of the brain include sensory/somatomotor network (SSN), cingulo-opercular task control network (COTCN), auditory network (AN), default mode network (DMN), memory retrieval network (MRN), visual network (VN), frontoparietal task control network (FPTCN), salience network (SN), subcortical network (SCN), ventral attention network (VAN), dorsal attention network (DAN), and cerebellar network (CN). The rest ROIs belong to the uncertain network (UN). In this way, each subject has a BA matrix consisting of 264 time series with a length of 210 for the emotion identification fMRI paradigm after averaging the time series of all voxels within the same ROI. To uncover temporal fluctuations in neuronal time series, the dFC is derived by sliding window correlation (SWC) method. In Mokhtari, Akhlaghi, Simpson, Wu, and Laurienti (2019), it is suggested that gradually increasing the step size led to a reduction in the occurrence of fast-weak ripples in the SWC time series. However, when the step size became excessively large, the SWC time series experienced distortion, making it challenging to distinguish the fast dynamic correlation. Thus, the results indicated that a step size of 10 is reasonable for computing SWC time series, because it can vanish the fast-weak ripples without losing information. By computing the SWC time series with step sizes of 1 and 10 for feature 1 and 21 for a subject, we obtained the consistent conclusion as in Mokhtari et al. (2019), and the results are shown in Fig. 3. In this paper, the window length and the scan length are 20 and 10 respectively. For each subject, dFC contains time series with a length of . The label of the BA data and the dFC data are same, which corresponds to the gender of the subject.
Fig. 3.

Sliding window correlation time series with step sizes of 1 (top) and 10 (bottom) of feature 1 (left) and 21 (right) for a subject.
Viewing brain regions, time, and subjects as dimensions, the BA matrices of all subjects within the same class can be organized into a third-order tensor. Similarly, by considering pairwise co-fluctuations of brain regions, time, and subjects as dimensions, the dFC of all subjects within the same class can also be represented as a third-order tensor. The BA data is viewed as the source domain, and the dFC data is viewed as the target domain. To better describe data differences between the source and target domains, the data characteristics is shown in Table 1.
Table 1.
Data characteristics of the source and target domains.
| Source (BA data) | Target (dFC data) | |
|---|---|---|
| Number of classes | 2 (males and females) | 2 (males and females) |
| Class distributions | 94/81 (males/females) | 94/81 (males/females) |
| Features space | 264 × 210 (ROIs × Time length) | 34716 × 20 (FC × Time length) |
| Features ranges | [−0.7, 0.8] | [−1, 1] |
| Features attribute | The neural activity of brain regions over a period of time | The time-varying dependencies between pairs of brain regions |
| Instances | 175 | 175 |
4.3. Results
4.3.1. Results on EEG data and simulation data
The dictionary size for each class in the source domain is set to be 22 and 10 for EEG data and simulation data, respectively. The sparsity for each class in the source domain is set to be 13 and 8 for EEG data and simulation data respectively, while the sparsity for each class in the target domain is set to be 177 and 44 for EEG data and simulation data respectively. The accuracy on EEG data and simulation data are shown in Table 2. Experimental results on EEG data demonstrate improved accuracy in the target domain compared to other methods. Additionally, results from simulation data reveal that our proposed single-domain method, STDL, achieves the highest classification scores in the source domain, while our heterogeneous transfer-based method, TD-HTL, excels in the target domain. These results not only demonstrate the effectiveness of our proposed approach but also indicate its capacity to efficiently transfer information from the source domain, significantly enhancing classification performance in the target domain.
Table 2.
Comparison of accuracies across methods on EEG data and simulation data.
| Method | 5-fold CV (mean ± std) |
LOSO-CV |
||
|---|---|---|---|---|
| EEG | Simulation | EEG | Simulation | |
|
| ||||
| GIN | 0.71 ± 0.098 | 0.63 ± 0.059 | 0.3913 | 0.5000 |
| GGNN | 0.73 ± 0.063 | 0.76 ± 0.081 | 0.5652 | 0.6500 |
| GCNN | 0.63 ± 0.043 | 0.62 ± 0.127 | 0.3478 | 0.4000 |
| SVM | 0.80 ± 0.086 | 0.74 ± 0.097 | 0.7826 | 0.6000 |
| LSSTM-S | 0.56 ± 0.108 | 0.82 ± 0.067 | 0.6522 | 0.5000 |
| LSSTM-T | 0.66 ± 0.144 | 0.57 ± 0.088 | 0.5217 | 0.3500 |
| STDL-S | 0.84 ± 0.063 | 0.92 ± 0.048 | 0.8696 | 0.8500 |
| STDL-T | 0.83 ± 0.090 | 0.52 ± 0.109 | 0.6957 | 0.6000 |
| DASTM | 0.59 ± 0.087 | 0.42 ± 0.109 | 0.5652 | 0.1500 |
| TD-HTL | 0.88 ± 0.073 | 0.84 ± 0.046 | 0.8261 | 0.7500 |
4.3.2. Results on fMRI data
The dictionary size for each class in the source domain is set to be 20. The sparsity for each class in the source domain is set to be 16, while the sparsity for each class in the target domain is set to be 63. The size of the dictionary plays a crucial role in the model performance as it directly impacts both the representational capacity and the quality of the learned features. Fig. 4 shows the impact of dictionary size on model performance, indicating that the proposed method performs best when the number of dictionary atoms is 20. In contrast, having too few or too many dictionary atoms can lead to suboptimal model performance. The accuracy for gender classification are shown in Fig. 5 and Table 3. Experimental results for gender classification show an improvement over other methods, indicating that transferring information from the source domain significantly improves the classification performance in the target domain.
Fig. 4.

The boxplot visualizes the accuracy distribution from 10 repeated experiments for gender classification with different dictionary size.
Fig. 5.

The boxplot visualizes the accuracy distribution from 10 repeated experiments for gender classification.
Table 3.
Comparison of accuracies across methods for gender classification.
| Method | 5-fold CV (mean ± std (best)) | LOSO-CV |
|---|---|---|
|
| ||
| GIN | 0.57 ± 0.056 (0.66) | 0.5429 |
| GGNN | 0.54 ± 0.108 (0.71) | 0.6286 |
| GCNN | 0.57 ± 0.021 (0.61) | 0.4857 |
| SVM | 0.72 ± 0.070 (0.81) | 0.7143 |
| LSSTM-S | 0.62 ± 0.068 (0.71) | 0.6286 |
| LSSTM-T | 0.62 ± 0.121 (0.80) | 0.4000 |
| STDL-S | 0.77 ± 0.016 (0.80) | 0.6286 |
| STDL-T | 0.66 ± 0.039 (0.71) | 0.5429 |
| DASTM | 0.54 ± 0.069 (0.69) | 0.6000 |
| TD-HTL | 0.82 ± 0.056 (0.91) | 0.7143 |
4.4. Emotion-related brain gender differences
4.4.1. Recurring patterns
Considering each atom of the dictionary of the BA as a recurring BA pattern, and each atom of the dictionary of the dFC as a recurring dFC pattern, there are 20 recurring BA patterns and 210 recurring dFC patterns for both males and females, respectively. In Fig. 6, we present all of the recurring BA patterns for both males and females. Additionally, there are 20 recurring dFC patterns specific to males and 22 recurring dFC patterns specific to females, which are shown in Figs. 1–2 of the supplementary material. And we randomly display one of the recurring dFC patterns for both males and females in Fig. 7. By considering the spatial information of the brain, the recurring BA patterns and dFC patterns are visualized in the brain for both males and females. Here we randomly display one of the recurring BA patterns and dFC patterns for both males and females, which are shown in Figs. 8–9. All of the recurring BA patterns and dFC patterns for both males and females can be found in Figs. 3–6 of the supplementary material.
Fig. 6.

Recurring BA patterns of the whole brain for males (top) and females (bottom).
Fig. 7.

Recurring dFC pattern 2 is observed in males (left), and recurring dFC pattern 3 is observed in females (right).
Fig. 8.

Recurring BA pattern 2 is observed in males (left), and recurring BA pattern 3 is observed in females (right).
Fig. 9.

Recurring dFC pattern 2 is observed in males (left), and recurring dFC pattern 3 is observed in females (right).
The similarity between the recurring patterns in males and females was calculated to identify the common and distinct characteristics of BA and dFC during an emotion task across genders. The similarity was quantified using the Pearson correlation coefficient, and the significance was assessed using a permutation test. To reduce the false positives caused by multiple hypothesis tests, the -values are corrected through the false discovery rate (FDR) correction method. The results are presented in Fig. 10, where the strong correlation defined as an absolute value greater than or equal to 0.7 and the corresponding -value smaller than 0.005 is marked in yellow box, and * and ** denote the significant level 0.01 and 0.005 respectively.
Fig. 10.

The similarity between males and females for the recurring BA patterns (left) and recurring dFC patterns (right).
To better understand the local and global information processing efficiency, clustering patterns, network integration, and modular organization of the brain’s dFC in both males and females, several network metrics were computed for each recurring dFC pattern in males and females, which are shown in Fig. 11. These metrics include local efficiency, global efficiency, average clustering coefficient, average participation coefficient, and modularity (Guimera & Nunes Amaral, 2005; Latora & Marchiori, 2001; Newman & Girvan, 2004; Watts & Strogatz, 1998). Local efficiency measures the effectiveness of information exchange between the immediate neighbors of a node, indicating the degree of local integration within the network. Global efficiency quantifies how efficiently information can spread from one node to any other node in the entire network, reflecting the level of global integration in the network. Average clustering coefficient measures the likelihood that its neighbors are also neighbors of each other, indicating the tendency of nodes to form tightly interconnected groups. Participation coefficient quantifies the level of interaction between a node and different modules in the network, indicating its role in facilitating communication between different modules. Modularity evaluates the strength of the clustering of nodes within modules compared to random expectations, reflecting the degree of segregation of a network into distinct modules.
Fig. 11.

The network metrics for each recurring dFC patterns in males (left) and females (right).
4.4.2. Temporal variability information
By considering the sparse representation tensor as the strength of recurring patterns, we can analyze the temporal variation of the intensity for each recurring pattern in males and females. Specifically, we calculate the median of the coefficients of all subjects in the same class at a specific time point for a particular recurring pattern to capture the intensity of the temporal variability information. In Figs. 12–13, we present the time-varying strength for one selected recurring BA pattern and dFC pattern in males and females, and the detailed information can be found in Figs. 7–10 in the supplementary material.
Fig. 12.

The time-varying strength of recurring BA pattern 2 in males (top) and recurring BA pattern 3 in females (bottom).
Fig. 13.

The time-varying strength of recurring dFC pattern 2 in males (top) and recurring dFC pattern 3 in females (bottom).
To further analyze the state transition patterns of the time-varying strength for recurring patterns in males and females, we employed the -means method. The time-varying strength for a chosen recurring pattern were clustered into states, with each state represented by a cluster centroid. The optimal value of was determined using the elbow method with the coefficient of determination being the evaluation index to assess the quality of the clustering results, which helps identify the number of clusters that best captures the underlying state transition patterns. Denote the state time series obtained by clustering the time-varying strength for a chosen recurring pattern as , where represents the state at time step and is the state space. The element of the transition probability matrix represents the probability of transiting from state to state , and it is defined as follows
where represents the count of transitions from state to state . The state transition patterns of the time-varying strength for recurring patterns in males and females are depicted in Figs. 14–15, and the detailed information can be found in Figs. 11–14 of the supplementary material.
Fig. 14.

The transition patterns of the time-varying strength for recurring BA pattern 2 in males (left) and recurring BA pattern 3 in females (right).
Fig. 15.

The transition patterns of the time-varying strength for recurring dFC pattern 2 in males (left) and recurring dFC pattern 3 in females (right).
5. Discussion
5.1. Differences in BA patterns
Notably, both females and males show a fundamental whole-brain synchronized pattern (i.e., BA pattern 1 for both genders) during emotion tasks. However, males tend to maintain a relatively consistent intensity level for a duration, while females demonstrate a tendency to transition from a high to a low intensity over time. In Markett, Jawinski, Kirsch, and Gerchen (2020), decreased activity in the SSN and SN was observed during the processing of emotional faces. In contrast to the prior findings, BA pattern 3 for both males and females exhibit substantial contrasting activity within the SSN and SN. Furthermore, this pattern also reveals an oppositional activity in the AN and FPTCN, particularly among females. When this pattern is temporarily activated, females tend to exhibit a gradual regression toward a baseline level, while males can demonstrate either gradual or rapid regression toward a baseline level. As reported in Wessing, Rehbein, Postert, Fürniss, and Junghöfer (2013), the interactions in the FPTCN and DAN play a crucial role in emotion-reappraisal, particularly within core regions associated with top-down visual selective attention. In Li et al. (2022), it is indicated that the sensory system, encompassing the SSN and VN, potentially engages in distinct attentional processes and mechanisms for emotion regulation. These findings in previous studies are consistent with the results observed in BA patterns 5 and 6 for both genders. Specifically, both females and males exhibit consistent activity within the FPTCN and DAN, while displaying opposite activity within the SSN and VN. Furthermore, the results suggest that MRN activity is concurrent with the activity within the FPTCN and DAN in males, while simultaneously coinciding with the activity within the SSN and VN in females. The activity within pattern 5 predominantly remains at the baseline level for most of the time, with intermittent periods of activation for females and suppression for males. Both males and females exhibit stable activity within pattern 6 for the majority of the time, with occasional minor fluctuations.
Within BA patterns 2, 4, and 7 for both genders, we observe considerable activity within the DMN, contrasting activity within the COTCN and DMN, and also opposite activity within the DMN and FPTCN during emotion processing. Moreover, females concurrently exhibit reversed activity in the VN and SN. It is posited that DMN activity tends to decrease during attention-demanding tasks while increasing during various cognitive processes, significantly contributing to the processing of abnormally negative emotions (Grimm et al., 2009; Smallwood et al., 2021). Previous research has underscored the importance of the COTCN and FPTCN in emotion regulation, alongside their distinctive roles in task control operations (Neta, Kelley, & Whalen, 2013; Neta, Schlaggar, & Petersen, 2014; Wessing et al., 2013). Notably, the COTCN appears to excel in maintaining stable task conditions, while the FPTCN displays a greater proficiency in facilitating online adaptive control. Thus, the observed opposite activity between the task-negative network DMN and the task-control networks COTCN and FPTCN during emotion tasks is a comprehensible finding. For both males and females, DMN will go through periods of activation or inhibition within a short time. However, the activity of DMN ultimately returning to a baseline level for males, while females exhibit frequent minor fluctuations in DMN activity. Compared to males who tend to quickly return to baseline levels, the opposite activity within the COTCN and DMN is more longer and more intense for females. The differences in the opposite activity within the DMN and FPTCN between genders lies in the greater intensity of this activity observed in females in comparison to males.
5.2. Differences in dFC patterns
Within dFC pattern 1 for both genders, an underlying whole-brain co-activity pattern is consistently observed throughout the duration of emotional tasks. Notably, the intensity of this whole-brain co-activity remains low for males, exhibiting continuous fluctuations at relatively low levels for females in the early stages. Towards the end of the time, there is a transition to higher levels of fluctuation in both genders. Additionally, sparse distributed connectivity is evident across the whole brain in dFC pattern 18 for females, and across the whole brain except for most regions in SSN and VN in dFC pattern 17 for males. The intensity of this pattern alternates cyclically between high and low levels over time, with males showing higher intensity than females. In Sripada et al. (2014), it was suggested that engaging in reappraisal is associated with alterations in functional connections across VN, DAN, FPTCN, and DMN compared to maintaining emotional responses. These alterations in functional connections within these networks are believed to play significant roles in various aspects of emotion regulation, including visual processing, stimulus salience, attentional control, and the interpretation and contextualization of stimuli. Consistent findings are also found in dFC pattern 12 for both genders, as well as pattern 19 for males and 20 for females. Notably, the intensity of pattern 12 is lower for males, exhibiting more frequent fluctuations over time compared to females. Moreover, the intensity of connectivity between FPTCN and DMN demonstrates more severe oscillations in females when contrasted with males.
During emotion tasks, dFC patterns 15, 16, and 18 in males, along with patterns 16, 17, and 19 in females, exhibit three connectivity profiles, i.e., SSN-AN-SN, COTCN-VN-SN-SCN, and SSN-DMN-MRN-VN-UN. The intensity of the SSN-AN-SN profile decreases in an oscillatory manner over time for males, whereas for females, the intensity alternates between high and low levels. The intensity of the COTCN-VN-SN-SCN profile gradually increases, then remains stable and gradually decreases over time for females, while it alternates between high and low levels for males. The intensity of the SSN-DMN-MRN-VN-UN profile remains relatively stable over time for males, while it oscillates more frequently for females. These findings are largely consistent with those reported in the literature (van den Heuvel & Pol, 2010; Markett et al., 2020; Smith et al., 2009). Connections involving the COTCN, MRN, and SN are thought to support working memory for task instructions and internal attentional control, while connections with the DMN might contribute to self-reflection and mentalizing, facilitating the understanding of evoked emotions. Furthermore, SSN might reflect overt motor responses, facial expressions, or the preparation of action tendencies related to emotional responses, while the VN may play a crucial role in visual processing during facial emotion recognition. The observed connectivity patterns offer support in comprehending emotion processing by identifying diverse affective, cognitive, and motor-related regions that are prone to activation during varying emotion tasks.
5.3. Association between BA patterns and dFC patterns
During the emotion task, BA patterns 1 for both genders exhibit whole-brain synchronous activation or inhibition. This observation further supports the presence of an underlying whole-brain co-activity pattern in dFC patterns 1 for both genders. Additionally, BA patterns indicate that brain regions in SSN, VN, DMN, COTCN, and FPTCN are most likely to activate during emotion tasks for both genders. It can be observed from the dFC patterns that these activated regions are likely to form dFC networks during emotion tasks for both genders. This observation aligns with (Tomasi & Volkow, 2018), where it is suggested that brain regions with high dFC are more likely to activate during tasks. In terms of gender differences related to emotion in the brain, both BA patterns and dFC patterns consistently reveal results. Furthermore, in conjunction with BA patterns, dFC patterns not only identify functional connections associated with emotion-related gender differences in the brain but also reveal the brain activity of regions corresponding to these functional connections.
6. Conclusion
This study introduces tensor dictionary-based heterogeneous transfer learning, which is to capture gender-related differences in both BA patterns and dFC patterns during emotion processing. Specifically, dictionaries learned in the source domain through supervised tensor dictionary learning are transferred to the target domain using feature transformation between the two domains. Experimental results indicate that incorporating prior knowledge in the source domain is beneficial for improving target performance. Moreover, the analysis reveals that both genders display a network structure characterized by high segregation and dense integration during emotion identification tasks. Gender differences are primarily observed in the temporal variability and transition patterns over time, supporting the finding that variations in brain activity dynamics contributes to trait emotional intelligence and differences in emotion regulation strategies between genders.
Supplementary Material
Appendix B. Supplementary data
Supplementary material related to this article can be found online at https://doi.org/10.1016/j.neunet.2024.106974.
Acknowledgments
This research was supported by the National Natural Science Foundation of China (Nos. 12271429, 12090021, and 12226007), the Natural Science Basic Research Program of Shaanxi (No. 2022JM-005), JSPS KAKENHI (Nos. 19H04071, 20H00576, and 23H03460), the National Institutes of Health (R01 MH104680, R01 GM109068, R01 MH121101, R01 MH116782, R01 MH118013 and P20-GM144641), and the HPC Platform, Xi’an Jiaotong University.
Appendix A. Algorithms related to unsupervised tensor dictionary learning


Footnotes
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
CRediT authorship contribution statement
Lan Yang: Writing – original draft, Visualization, Software, Methodology, Formal analysis, Data curation, Conceptualization. Chen Qiao: Writing – review & editing, Supervision, Methodology, Funding acquisition. Takafumi Kanamori: Writing – review & editing, Supervision, Methodology, Funding acquisition. Vince D. Calhoun: Writing – review & editing, Supervision, Resources. Julia M. Stephen: Writing – review & editing. Tony W. Wilson: Writing – review & editing. Yu-Ping Wang: Writing – review & editing, Supervision, Funding acquisition.
Data availability
I have shared the link to the code.
References
- Aggarwal M, Khullar V, Goyal N, Rastogi R, Singh A, Torres VY, et al. (2024). Privacy preserved collaborative transfer learning model with heterogeneous distributed data for brain tumor classification. International Journal of Imaging Systems and Technology, 34(2), Article e22994. [Google Scholar]
- Aharon M, Elad M, & Bruckstein A (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322. [Google Scholar]
- Aldayel MS, Ykhlef M, & Al-Nafjan AN (2020). Electroencephalogram-based preference prediction using deep transfer learning. IEEE Access, 8, 176818–176829. [Google Scholar]
- Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, & Lopez A (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189–215. [Google Scholar]
- Cichocki A, et al. (2017). Tensor networks for dimensionality reduction and large-scale optimization: Part 2 applications and future perspectives. Foundations and Trends® in Machine Learning, 9(6), 431–673. 10.1561/2200000067. [DOI] [Google Scholar]
- Day O, & Khoshgoftaar TM (2017). A survey on heterogeneous transfer learning. Journal of Big Data, 4(29), 1–42. [Google Scholar]
- Derks J, & Krabbendam L (2013). Is the brain the key to a better understanding of gender differences in the classroom? International Journal of Gender, Science and Technology, 5(3), 281–291. [Google Scholar]
- Gao T, Gu L, Chen H, & Zhou B (2023). Domain adaptation support tensor machine: An extended STM for object recognition using cross-source heterogeneous remote sensing data. IEEE Transactions on Geoscience and Remote Sensing, 61, 1–21. [Google Scholar]
- Grimm S, et al. (2009). Altered negative BOLD responses in the default-mode network during emotion processing in depressed subjects. Neuropsychopharmacology, 34(4), 932–943. [DOI] [PubMed] [Google Scholar]
- Gu X, Shen Z, Qu J, & Ni T (2022). Cross-domain EEG signal classification via geometric preserving transfer discriminative dictionary learning. Multimedia Tools and Applications, 81(29), 41733–41750. [Google Scholar]
- Guimera R, & Nunes Amaral LA (2005). Functional cartography of complex metabolic networks. Nature, 433(7028), 895–900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He Y, Li Q, Hu Q, & Liu L (2022). Transfer learning in high-dimensional semiparametric graphical models with application to brain connectivity analysis. Statistics in Medicine, 41(21), 4112–4129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van den Heuvel MP, & Pol HEH (2010). Exploring the brain network: A review on resting-state fMRI functional connectivity. European Neuropsychopharmacology, 20(8), 519–534. [DOI] [PubMed] [Google Scholar]
- Hu Y, et al. (2023). Source free semi-supervised transfer learning for diagnosis of mental disorders on fMRI scans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(11), 13778–13795. [DOI] [PubMed] [Google Scholar]
- İçer S, Acer İ, & Baş A (2020). Gender-based functional connectivity differences in brain networks in childhood. Computer Methods and Programs in Biomedicine, 192, Article 105444. [DOI] [PubMed] [Google Scholar]
- Jin R, Dontaraju KK, Kim S-J, Akhonda MABS, & Adali T (2020). Dictionary learning-based fMRI data analysis for capturing common and individual neural activation maps. IEEE Journal of Selected Topics in Signal Processing, 14(6), 1265–1279. [Google Scholar]
- Küchenhoff S, Bayrak Ş, Zsido RG, Saberi A, Bernhardt BC, Weis S, et al. (2024). Relating sex-bias in human cortical and hippocampal microstructure to sex hormones. Nature Communications, 15(1), 7279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kundakovic M, & Tickerhoof M (2023). Epigenetic mechanisms underlying sex differences in the brain and behavior. Review Trends Neuroscience, 47(1), 18–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Latora V, & Marchiori M (2001). Efficient behavior of small-world networks. Physical Review Letters, 87(19), Article 198701. [DOI] [PubMed] [Google Scholar]
- Li Y, Zemel R, Brockschmidt M, & Tarlow D (2016). Gated graph sequence neural networks. In Proceedings of iCLR’16 (pp. 1–20). [Google Scholar]
- Li Y, Zhuang K, Yi Z, Wei D, Sun J, & Qiu J (2022). The trait and state negative affect can be separately predicted by stable and variable resting-state functional connectivity. Psychological Medicine, 52(5), 813–823. [DOI] [PubMed] [Google Scholar]
- Lou Q, Sun W, Zhang W, Deng Z, Choi K-S, & Wang S (2024). Rules-based heterogeneous feature transfer learning using fuzzy inference. IEEE Transactions on Fuzzy Systems, 32(1), 306–321. [Google Scholar]
- Markett S, Jawinski P, Kirsch P, & Gerchen MF (2020). Specific and segregated changes to the functional connectome evoked by the processing of emotional faces: A task-based connectome study. Scientific Reports, 10(1), 4822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mignone P, Pio G, & Ceci M (2024). Distributed heterogeneous transfer learning. Big Data Research, 37, Article 100456. [Google Scholar]
- Mokhtari F, Akhlaghi MI, Simpson SL, Wu G, & Laurienti PJ (2019). Sliding window correlation analysis: Modulating window shape for dynamic brain connectivity in resting state. NeuroImage, 189, 655–666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neta M, Kelley WM, & Whalen PJ (2013). Neural responses to ambiguity involve domain-general and domain-specific emotion processing systems. Journal of Cognitive Neuroscience, 25(4), 547–557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neta M, Schlaggar BL, & Petersen SE (2014). Separable responses to error, ambiguity, and reaction time in cingulo-opercular task control regions. NeuroImage, 99, 59–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newman ME, & Girvan M (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), Article 026113. [DOI] [PubMed] [Google Scholar]
- Ni T, Gu X, & Jiang Y (2023). Transfer discriminative dictionary learning with label consistency for classification of EEG signals of epilepsy. Journal of Ambient Intelligence and Humanized Computing, 14(5), 5529–5540. [Google Scholar]
- Nielsen AN, Barch DM, Petersen SE, Schlaggar BL, & Greene DJ (2020). Machine learning with neuroimaging: Evaluating its applications in psychiatry. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(8), 791–798, Understanding the Nature and Treatment of Psychopathology: Letting the Data Guide the Way.
- Niu S, Liu Y, Wang J, & Song H (2020). A decade survey of transfer learning (2010–2020). IEEE Transactions on Artificial Intelligence, 1(2), 151–166. [Google Scholar]
- Power JD, et al. (2011). Functional network organization of the human brain. Neuron, 72(4), 665–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qiao C, Yang L, Calhoun VD, Xu Z-B, & Wang Y-P (2021). Sparse deep dictionary learning identifies differences of time-varying functional connectivity in brain neuro-developmental study. Neural Networks, 135, 91–104. [DOI] [PubMed] [Google Scholar]
- Satterthwaite TD, et al. (2014). Neuroimaging of the philadelphia neurodevelopmental cohort. NeuroImage, 86, 544–553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saud S, et al. (2020). Performance improvement of empirical models for estimation of global solar radiation in India: A k-fold cross-validation approach. Sustainable Energy Technologies and Assessments, 40, Article 100768. [Google Scholar]
- Smallwood J, Bernhardt BC, Leech R, Bzdok D, Jefferies E, & Margulies DS (2021). The default mode network in cognition: a topographical perspective. Nature reviews neuroscience, 22(8), 503–513. [DOI] [PubMed] [Google Scholar]
- Smith SM, et al. (2009). Correspondence of the brain’s functional architecture during activation and rest. Proceedings of the National Academy of Sciences, 106(31), 13040–13045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sripada C, et al. (2014). Volitional regulation of emotions produces distributed alterations in connectivity between visual, attention control, and default networks. NeuroImage, 89, 110–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tomasi D, & Volkow ND (2018). Association between brain activation and functional connectivity. Cerebral Cortex, 29(5), 1984–1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tropp JA, & Gilbert AC (2007). Signal recovery from random measurements via orthogonal matching pursuit. Institute of Electrical and Electronics Engineers. Transactions on Information Theory, 53(12), 4655–4666. [Google Scholar]
- Vuokko N, & Kaski P (2011). Significance of patterns in time series collections. In Proceedings of the 2011 SIAM international conference on data mining (pp. 676–686). [Google Scholar]
- Wagh N, & Varatharajah Y (2020). Eeg-gcnn: Augmenting electroencephalogram-based neurological disease diagnosis using a domain-guided graph convolutional neural network. In Machine learning for health (pp. 367–378). PMLR. [Google Scholar]
- Watts DJ, & Strogatz SH (1998). Collective dynamics of ‘small-world’networks. Nature, 393(6684), 440–442. [DOI] [PubMed] [Google Scholar]
- Wessing I, Rehbein MA, Postert C, Fürniss T, & Junghöfer M (2013). The neural basis of cognitive change: Reappraisal of emotional faces modulates neural source activity in a frontoparietal attention network. NeuroImage, 81, 15–25. [DOI] [PubMed] [Google Scholar]
- Xu K, Hu W, Leskovec J, & Jegelka S (2018). How powerful are graph neural networks? In International conference on learning representations (pp. 1–17). [Google Scholar]
- Yousefnezhad TM, Selvitella A, Zhang D, Greenshaw A, & Greiner R (2020). Shared space transfer learning for analyzing multi-site fMRI data. In Larochelle H, Ranzato M, Hadsell R, Balcan M, & Lin H (Eds.), Vol. 33, Advances in neural information processing systems (pp. 15990–16000). Curran Associates, Inc.. [Google Scholar]
- Zhan Q, Wang L, Ren L, & Huang X (2022). A novel heterogeneous transfer learning method based on data stitching for the sequential coding brain computer interface. Computers in Biology and Medicine, 151, Article 106220. [DOI] [PubMed] [Google Scholar]
- Zhang H, Chen P-H, & Ramadge P (2018). Transfer learning on fMRI datasets. In Storkey A, & Perez-Cruz F (Eds.), Proceedings of machine learning research: vol. 84, Proceedings of the twenty-first international conference on artificial intelligence and statistics (pp. 595–603). PMLR. [Google Scholar]
- Zhang C, Dougherty CC, Baum SA, White T, & Michael AM (2018). Functional connectivity predicts gender: Evidence for gender differences in resting brain connectivity. Human Brain Mapping, 39(4), 1765–1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhuang F, et al. (2021). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43–76. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
I have shared the link to the code.
