Abstract
Autism spectrum disorder (ASD) is a neurological and developmental disorder that affects social and communicative behaviors. It emerges in early life and is generally associated with lifelong disabilities. Thus, accurate and early diagnosis could facilitate treatment outcomes for those with ASD. Functional magnetic resonance imaging (fMRI) is a useful tool that measures changes in brain signaling to facilitate our understanding of ASD. Much effort is being made to identify ASD biomarkers using various connectome-based machine learning and deep learning classifiers. However, correlation-based models cannot capture the non-linear interactions between brain regions. To solve this problem, we introduce a causality-inspired deep learning model that uses time-series information from fMRI and captures causality among ROIs useful for ASD classification. The model is compared with other baseline and state-of-the-art models with 5-fold cross-validation on the ABIDE dataset. We filtered the dataset by choosing all the images with mean FD less than 15mm to ensure data quality. Our proposed model achieved the highest average classification accuracy of 71.9% and an average AUC of 75.8%. Moreover, the inter-ROI causality interpretation of the model suggests that the left precuneus, right precuneus, and cerebellum are placed in the top 10 ROIs in inter-ROI causality among the ASD population. In contrast, these ROIs are not ranked in the top 10 in the control population. We have validated our findings with the literature and found that abnormalities in these ROIs are often associated with ASD.
Index Terms—: Autism spectrum disorder, Functional MRI, Causal inference, Interpretability
1. INTRODUCTION
The human brain is a complex structure and operates as a constantly communicating and dynamic network. To study the change in brain activities, functional magnetic resonance imaging (fMRI) is often used to measure the blood-oxygen-level-dependent (BOLD) signals, which could provide insight into the diagnosis and treatment of brain disorders such as autism spectrum disorder (ASD) [1]. fMRI could also be utilized to study the connectivity between brain regions, revealing abnormal connectivity patterns associated with ASD.
A functional connectivity (FC) map is often reconstructed by calculating the correlations between fMRI signals of regions of interest (ROIs) to represent the brain networks. In recent years, machine learning models, such as CNNs, RNNs, and GNNs, have been developed to identify abnormal patterns in FC maps to help classify ASD [2][3][4][5][6]. However, FC is inherently linear, while we know that interactions in functional networks are complex and nonlinear [7][8]. Furthermore, the correlation between brain regions does not mean causality. Thus, developing techniques that leverage the temporal structure of BOLD signals to identify brain region interactions is crucial to investigating brain mechanisms.
One approach for inferring causality among time series is based on the Wiener-Granger principle [9], which could measure directed relationships between multiple ROIs of the brain by fitting a vector autoregressive model to forecast time series [10]. In this paper, we propose a new causal neural network to identify subjects with ASD with Granger-inspired interpretability by expanding the concept of multivariate vector autoregressive (VAR) [9] [10]. Moreover, the frequency loss function is used to train the network for better performance [11]. We further compare and validate our model performance with state-of-the-art models with a 5-fold cross-validation on the filtered Autism Brain Imaging Data Exchange I (ABIDE I) dataset [12]. Our model captures and explains the top 10 highest Granger-inspired causal ROIs.
2. METHOD
In this paper, we propose a new supervised learning network that takes ROI-level BOLD fMRI signals as input and outputs ROI-level BOLD fMRI signal forecasts and ASD classification results. The causality encoder is a single-layer long-short-term memory (LSTM). The encoded causality embeddings are further processed by the attentional pooling layer and a fully connected neural network for classification. The causality decoder is a 2-layer LSTM that decodes the causality embeddings. The decoder output could be compared with the real ROI time series to measure predictability.
2.1. Causal inspired fMRI ROI time series forecasting
We define fMRI ROI time series as and each ROI time series where represents each ROI in the brain, and is 116 determined by the total number of regions in the Automated Anatomical Labeling (AAL) atlas [13]. is the signal at time where is the sampling rate and is the total duration of the acquisition. In multivariate scenarios, the Granger causality could be captured by the linear VAR model , where is the predetermined time lag, L is the total time lag and is the random error [10]. is a matrix for every different for the mixing of different variables. The limitation of this model is the linear relationship between past and future time points, which could be an oversimplification of brain data, while deep learning models could capture more complex causal relationships.
We create input and target sequences separated by a time lag of seconds to train the model to forecast future ROI time series. Specifically, we discard the initial seconds of the input to form the target sequence, while the last seconds of the fMRI data are removed to create the input sequence. The structure of our network is illustrated in Fig. 1. Our model takes all the previous time points and predicts the next. The time series forecast can be expressed as:
| (1) |
is the non-linear function learned by our encoder-decoder path of the model. The choice of is 1 TR because the fMRI sample rate is low [14].
Fig. 1.

Architecture of the proposed network. The 4D fMRI data is combined with the 3D AAL atlas and the fMRI signals in the same ROI are averaged for each time point. The multichannel ROI time series from t=0 to t=T-1 is encoded by the causality encoder made up of long-short-term memory (LSTM) modules, which are used to both (1) predict future time points and (2) predict ASD classification labels. The attentional pooling layer further processed the causality of multichannel embeddings. The fully connected layer then made an ASD classification prediction. The causality decoder decodes causality embeddings and predicts the future time point from t=1 to t=T.
After training, we evaluate the model based on inter-ROI causalities inspired by Granger causality. Different from the original VAR model where the causality between all ROIs is expressed in the vector , the neural network could not be simply expressed in the format of linear multiplication. However, we could measure how close the time series forecast is to the real-time series. The inter-ROI causality depends on the function in the class , where measures the similarity of the true time series , and the predicted time series based on the previous time points. The mean squared error function could help as a proxy for measuring predictability as lower values mean better similarity between true time series and forecast generated by the model. The predictability can be measured by:
Here, represents predictabilty. As error E approaches 0, approaches 1, indicating perfect predictability.
2.2. Frequency loss function
In previous studies, frequency domain information of ROI time series has proven useful for ASD label prediction [4]. In this study, we train the model with the Frequency Loss function which captures the ROI fMRI signals frequency domain features [11]. As shown in Equation 2, the frequency loss is calculated based on the differences in the discrete Fourier transforms. The absolute value is used instead of the square to avoid overemphasizing the frequency bandwidth with a large spectrum.
| (2) |
2.3. ASD classification with Attentional Pooling
We use an attentional pooling module to compress the multichannel causal embedding before feeding it into the fully connected layer to generate the ASD classification. The multihead attention layer first projects the embedding with 3 linear layers into 3 vectors: , and [15]. Attention is calculated as shown in Equation 3. is the dimension of the vector . By dividing the dot product by , we could control the magnitude of to avoid network training problems such as gradient explosion. We use multiple sets of , and so that the model can focus on multiple aspects of the embedding. Finally, a linear layer combines these learned outputs into the original size as the embedding. We use the validation dataset to experiment with the number of heads in our application.
| (3) |
The embedded representation refined by the pooling layer is then fed into a dense network for the ASD classification. Binary Cross Entropy loss is combined with the frequency loss function to train the network by scaling the frequency loss function as shown in Equation 4. denotes the actual ASD/control label of the i-th subject, is the total number of subjects and is the predicted possibility of ASD.
| (4) |
3. EXPERIMENTS
3.1. Datasets and pre-processing
ABIDE I
The resting-state fMRI data was obtained from the ABIDE I database [12]. ABIDE I is collected from 17 international sites, and the neuroimaging and phenotypic data of 1112 subjects are publicly shared. As a quality control, fMRI images with mean FD larger than 0.15mm are excluded from this study. The total number of subjects filtered is 860.
Implementation Details
We set the learning rate to train the network to 0.005 and run 100 epochs. The learning rate is reduced to half for every 8 epochs. We apply the frequency loss function, and binary cross entropy loss as the loss function and Adam Optimizer to train the network [16].
Baselines and Ablation Studies
We have compared our proposed method with various baseline methods. One baseline method, which does not involve deep learning, is Connectome-based Predictive Modeling (CPM) using ridge regression [17]. The hyperparameter alpha in CPM, representing the coefficient of the regularization term, is searched within the range of 0.5 to 5×109. The optimal alpha value is selected by assessing the model’s performance on a validation dataset. We also compare the model with the graph-based methods STAGIN [3], BrainGNN [2], and SpectBGNN [4] and an attention-based approach BolT [18].
Furthermore, we also test the choice of , and the trade-off between training the network for better time series forecasts or better ASD label predictions. We have experimented with different scales of from 0.1 to 0.0001 to determine the best to use for training the network. The we would like to select is the one that yields high accuracy and comparable frequency loss in the validation set.
Evaluation Methods
We use 5-fold cross-validation by randomly splitting the dataset into train, test, and validation datasets. We apply each model and measure the accuracy, AUC, recall, and precision to compare the performance. We measure the mean absolute error to test the best suitable number of layers of the LSTM decoder.
4. RESULTS
4.1. Experimental Results
ASD classification performance
All of the baseline methods performance are shown in Table 1. Deep learning-based methods have better average accuracy and less variance than CPM. Our proposed method shows the best result in the accuracy AUC, and precision than the rest shown in Table 1.
Table 1.
The model performance in 5-fold cross-validation is summarized. The bolded has the highest average of all models.
| Model | Accuracy ↑ | AUC ↑ | Recall ↑ | Precision ↑ |
|---|---|---|---|---|
| CPM | 59.4% ± 5.2% | 66.7% ± 2.7% | 53.5% ± 10.5% | 56.1% ± 5.4% |
| BrainGNN | 61.6% ± 1.0% | 60.4% ± 4.0% | 61.3% ± 11.5% | 48.1% ± 3.5% |
| SpectBGNN | 61.5% ± 0.5% | 60.0% ± 1.6% | 63.4% ± 9.3% | 45.3% ± 10.5% |
| STAGIN | 69.4% ± 0.8% | 75.6% ± 0.7% | 74.2% ± 1.3% | 71.7% ± 1.7% |
| BolT | 68.0% ± 2.7% | 74.3% ± 3.7% | 73.5% ± 4.5% | 68.6% ± 2.4% |
| Ours | 71.9% ± 0.8% | 75.8% ± 1.0% | 59.6% ± 8.9% | 76.5% ± 11.7% |
Choice of
Fig. 2 shows the classification performance measured by accuracy and time series prediction performance measured by scaled frequency loss with the validation set. Overall, the lower the , the better performance in the classification of ASD subjects. However, when is lower than 1 * 10−4, the accuracy does not increase. Therefore, our best model is trained with .
Fig. 2.

Multiple values performance are visualized. For ease of interpretation, the frequency loss is scaled by dividing the maximum frequency loss in the experiment.
Predictability rank
We ranked each subject’s ROI by their predictability defined in the previous section. We could obtain the occurrence of each ROI that is ranked in the top 10 predictable ROIs from each individual whom our model correctly classified as ASD or control. Then we could get the top 10 most frequent ROI occurrences for both control and ASD populations. Compared to the control population, the population with ASD has the left precuneus, right precuneus, and left cerebellum lobule IV/V selected for the top 10 ROIs (shown in Fig. 3). The precuneus is associated with communication and recent studies have found that it has different connectivity patterns in the ASD population than control groups [19]. The cerebellum is also reported to be connected to other social brain areas and shown to be significantly associated with ASD pathology [20]. The control population has left precentral gyrus, right supplementary motor area, and left calcarine fissure in the top 10 that the ASD population does not have. The top 10 ASD predictability ranges from 0.09 to 0.2, while the top 10 ROI predictability ranges from 0.07 to 0.2. Both ASD and control populations have these ROIs in common: right mid-cingulum, left cerebellum, vermis, right lingual gyrus, left cerebellum crus I, right cerebellum crus I, and right cerebellum crus II. This means that these ROIs are important to both populations in the resting state, but this could not be interpreted as a denial of the association between these ROIs and ASD.
Fig. 3.

ROIs with high predictability for ASD and control population shown in MNI space.
5. DISCUSSION AND CONCLUSIONS
We propose a new network to classify ASD with the interpretability of causality. We use a combined frequency loss function and binary cross-entropy function to train the network to forecast time series and classify ASD simultaneously. The scaling factor to combine these functions is selected with thorough experiments. Our model is compared with other baseline methods and achieves a higher average ASD classification accuracy and AUC.
Furthermore, we identify the left precuneus, right precuneus, and cerebellum to have a higher ranking in the inter-ROI causality relationship in the ASD population than the control. These ROIs are validated by other studies which reported these ROIs to be associated with ASD.
6. COMPLIANCE WITH ETHICAL STANDARDS
This research study was conducted retrospectively using human subject data made available in open access by NITRC IR. Ethical approval was not required as confirmed by the license attached with the open-access data.
ACKNOWLEDGMENTS
The authors would like to thank all participants. This study is supported by the National Institute of Neurological Disorders and Stroke (NINDS) of the National Institutes of Health through grant R01 NS035193.
REFERENCES
- [1].Kennedy DP and Courchesne E, “The intrinsic functional organization of the brain is altered in autism,” Neuroimage, vol. 39, 2008. [Google Scholar]
- [2].Li X et al. , “Braingnn: Interpretable brain graph neural network for fmri analysis,” Medical Image Analysis, vol. 74, 2021. [Google Scholar]
- [3].Kim B et al. , “Learning dynamic graph representation of brain connectome with spatio-temporal attention,” in Advances in Neural Information Processing Systems, 2021, vol. 34. [Google Scholar]
- [4].Duan P et al. , “Spectral brain graph neural network for prediction of anxiety in children with autism spectrum disorder,” 2024 IEEE International Symposium on Biomedical Imaging (ISBI), 2024. [Google Scholar]
- [5].Kawahara J et al. , “Brainnetcnn: Convolutional neural networks for brain networks; towards predicting neurodevelopment.,” Neuroimage, vol. 146, 2017. [Google Scholar]
- [6].Zhou Y et al. , “Self-supervised pre-training tasks for an fmri time-series transformer in autism detection,” in Machine Learning in Clinical Neuroimaging, 2024. [Google Scholar]
- [7].Mohanty R et al. , “Rethinking measures of functional connectivity via feature extraction,” Sci Rep, vol. 10, 2020. [Google Scholar]
- [8].Park H and Friston K, “Structural and functional brain networks: From connections to cognition,” Science, vol. 342, no. 6158, 2013. [Google Scholar]
- [9].Granger CWJ, “Investigating causal relations by econometric models and cross-spectral methods,” Econometrica, vol. 37, 1969. [Google Scholar]
- [10].Sims CA, “Macroeconomics and reality,” Econometrica, vol. 48, 1980. [Google Scholar]
- [11].Zhang X et al. , “Not all frequencies are created equal: Towards a dynamic fusion of frequencies in time-series forecasting,” in ACMMM, 2024. [Google Scholar]
- [12].Di Martino Adriana, Yan Chao-Gan, Li Qingyang, Denio Erin, Castellanos Francisco X, Alaerts Kaat, Anderson Jeffrey S, Assaf Michal, Bookheimer Susan Y, Dapretto Mirella, et al. , “The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism,” Molecular psychiatry, vol. 19, no. 6, pp. 659–667, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Rolls ET et al. , “Automated anatomical labelling atlas 3,” NeuroImage, vol. 206, 2020. [Google Scholar]
- [14].Bielczyk N et al. , “Disentangling causal webs in the brain using functional magnetic resonance imaging: A review of current approaches,” Netw Neurosci, vol. 3, 2019. [Google Scholar]
- [15].Vaswani A et al. , “Attention is all you need,” in Neural Information Processing Systems, 2017. [Google Scholar]
- [16].Kingma DP and Ba J, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015. [Google Scholar]
- [17].Shen X et al. , “Using connectome-based predictive modeling to predict individual behavior from brain connectivity,” Nat Protoc, vol. 12, 2017. [Google Scholar]
- [18].Bedel HA et al. , “Bolt: Fused window transformers for fmri time series analysis,” Medical Image Analysis, vol. 88, 2023. [Google Scholar]
- [19].Xiao Y et al. , “Atypical functional connectivity of temporal cortex with precuneus and visual regions may be an early-age signature of asd,” Mol Autism, vol. 14, 2023. [Google Scholar]
- [20].Mapelli L et al. , “The cerebellar involvement in autism spectrum disorders: From the social brain to mouse models,” Int J Mol Sci, vol. 23, 2022. [Google Scholar]
