Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Nov 7.
Published in final edited form as: Proc Indian Conf Comput Vis Graphics Image Proc. 2021 Dec 19;2021:12. doi: 10.1145/3490035.3490269

Robust Brain State Decoding using Bidirectional Long Short Term Memory Networks in functional MRI*

Anant Mittal 1, Priya Aggarwal 2, Luiz Pessoa 3, Anubha Gupta 4
PMCID: PMC9639335  NIHMSID: NIHMS1789845  PMID: 36350798

Abstract

Decoding brain states of the underlying cognitive processes via learning discriminative feature representations has recently gained a lot of interest in brain imaging studies. Particularly, there has been an impetus to encode the dynamics of brain functioning by analyzing temporal information available in the fMRI data. Long-short term memory (LSTM), a class of machine learning model possessing a “memory” component, to retain previously seen temporal information, is increasingly being observed to perform well in various applications with dynamic temporal behavior, including brain state decoding. Because of the dynamics and inherent latency in fMRI BOLD responses, future temporal context is crucial. However, it is neither encoded nor captured by the conventional LSTM model. This paper performs robust brain state decoding via information encapsulation from both the past and future instances of fMRI data via bi-directional LSTM. This allows for explicitly modeling the dynamics of BOLD response without any delay adjustment. To this end, we utilize a bidirectional LSTM, wherein, the input sequence is fed in normal time-order for one LSTM network, and in the reverse time-order, for another. The two hidden activations of forward and reverse directions in bi-LSTM are collated to build the “memory” of the model and are used to robustly predict the brain states at every time instance. Working memory data from the Human Connectome Project (HCP) is utilized for validation and was observed to perform 18% better than it’s unidirectional counterpart in terms of accuracy in predicting the brain states.

Keywords: Brain State Decoding, Recurrent Neural Networks, Long Short-Term Memory Networks

1. INTRODUCTION

Learning informative and discriminative representations of the brain states’ underlying various cognitive processes has gained a lot of interest in Brain-computer interface (BCI) applications [20, 30]. Advances in non-invasive neuroimaging methods such as functional Magnetic Resonance Imaging (fMRI) are proving helpful in determining person’s cognitive or perceptual state [25], e.g., in decoding motor functions [8], identifying the perceived stimulus [3, 12], in the classification of shifts in attention [9], and for “brain reading” [6]. As a result, several techniques have been proposed for carrying-out brain-state decoding, from multi voxel-pattern analysis to understanding the behavior using deep learning architectures by integrating the spatio-temporal information.

Conventional decoding methods involved massive uni-variate analysis measuring activity from thousands of brain locations, analysing each of them separately [23]. The multivariate analysis takes into account the brain activity occurring at several locations simultaneously [1, 2, 4]. This helps in integrating the distributed, but overlapping information across the spatial domain [12, 13, 16]. Recent advances in time-sensitive machine learning frameworks [15] have attracted remarkable attention for sequential modelling. In particular, two variations of the general Recurrent Neural Networks (RNNs) [29], namely, Echo-state Networks [28] and Long short-term memory models [21] have shown to perform better than conventional decoding models in characterizing dynamic fMRI information during both naturalistic and tasks conditions.

During the acquisition of fMRI data, the ratio of oxygenated to de-oxygenated blood level at any location in the brain serves as the representative of the underlying neuronal activation. Due to the time-lag observed in the peak of blood oxygen level dependent (BOLD) response, it is typically not considered to be synchronized with the presentation of stimuli [5]. Thus, in general, before training any brain-state decoding model, each time point is adjusted according to the the estimated delay of the BOLD signal [22], assuming that all fMRI voxels have the same response delay [26]. Studies [19] on fMRI have empirically revealed variations in response delay of several seconds [18]. Long short-term memory (LSTM) [14], a class of RNNs, have been shown to model the temporal dynamic behaviour well. An LSTM model stores the information from past that has already passed through it and uses it as the contextual information for learning robust features for the intended task, say classification. Recently, some fMRI studies have used these networks for integrating the temporal information from past [21, 28].

Because of the variations in latency of the BOLD responses across time, we assert that the temporal context from future is also important for capturing the dynamics of BOLD response in order to generate accurate representations. In this paper, we have employed a variant of LSTM architecture called Bidirectional LSTM [27], which acquires the information from both the past and future time-instances. In particular, the input sequence is fed in the normal time-order for one LSTM network and in the reverse time-order for another. The two hidden activations are collated to generate the hidden cell state features of the RNN. We evaluated this method for predicting brain states in working memory fMRI data obtained from the Human Connectome Project (HCP) [11]. The performance of the bi-LSTM network has been compared with its conventional unidirectional counterpart in brain state decoding task. This is to further note that this framework does not require any time-delay adjustment for the synchronization of stimuli and BOLD response unlike the previous works.

2. MATERIAL AND METHODS

2.1. Data

We evaluated the bi-LSTM framework on task fMRI data of the working memory from Human Connectome Project (HCP) [11]. We randomly selected N = 400 participants from the N = 1200 data release. Participants performed a working memory task, indicating if the current stimulus matches with the one presented two stimuli before, called “2-back” task, or a control condition called “0-back”. The working memory task from HCP also combines the category representation task. Hence, participants were presented with separate blocks of trials consisting of 4 different types of stimuli namely tools, places, faces, and body parts, separated with the fixation period. Data for two runs is available for each participant. Within each run, there were 8 task blocks for every task (2-back or 0-back) and stimuli (places, tools, faces, body) combination, each lasting for 27.5 seconds with 4 fixation blocks of 15 seconds each after two task blocks. Thus, each scan is a 405 time-points long sequence of fMRI volumes. More details about fMRI data acquisition and task paradigm are available at [11].

2.2. Data Pre-processing

The available preprocessed data [11] contains field-map based distortion correction, functional to structural alignment, and intensity normalization. Additionally, motion-related variables (6 translation parameters and their derivatives) were regressed-out using the 3dDeconvolve with “ortvec” option in the AFNI software[7]. Changes in low frequency signals were regressed out using 3dDeconvolve routine with the “polort” option. Since our goal is to evaluate a general brain-state decoding methodology, we used only the cortical data, which is directly available in surface representation as a part of HCP preprocessing pipeline. To separate brain areas based on architecture and functional connectivity, we employed the cortical parcellation developed by [10]. The parcellation method collates the individual voxels within each region by averaging to generate 360 cortical regions of interest (ROIs). The region-averaged time-series was used as the input feature vector for the temporal analysis. The generated 405 time-points sequence of 360 ROIs were structured in a 360 by 405 2D-tensor. No stimuli and brain-state synchronization was performed to adjust for delay in the BOLD response in bi-LSTM network. Each time-point in task blocfks was marked as present in one of the above mentioned brain-states and the time instants belonging to the fixation blocks were labelled as “others”, yielding a total of S = 9 brain states.

2.3. Bidirectional Long short-term memory RNNs

Brain-state decoding is essentially modelled as the task of classifying the brain state. Given the time-series of ROI brain features xt at time t, RNN model predicts the brain state of each time point based on input activation, xt and temporal dependency on its preceding time points until time t − 1. The LSTM, in particular, defines gated cells that can act on the received input activation by passing or blocking the information based on the importance of the feature for task. The learning process, called Backpropagation through time (BPTT) [29], estimates the parameters, which allow the data in the cells either to be retained or deleted. The transition equations for a LSTM model are as follows:

ftl=σ(Wfl[ht1          l,xt  l]+bf  l),it  l=σ(Wil[ht1          l,xt  l]+bi  l),C˜tl=tanh(Wc  l[ht1          l,xt  l]+bc  l),Ct  l=ft  l*Ct1          l+it  l*C˜tl,otl=σ(Wol[ht1          l,xt  l]+bo  l),ht  l=ot  l*tanh(Ct  l) (1)

where ft  l, it  l, Ct  l, and xt  l denote the output of forget gate, input gate, cell activation, hidden activation, and the input activation of the lth LSTM layer at time point t. σ denotes the sigmoid activation function. A schematic illustration of Bidirectional LSTM [27] is provided in Fig. 1. It processes the time-series data in both the directions using separate hidden layers. It computes the forward hidden activation ht  l using the above equations. The backward hidden activation ht  l is calculated by using the future temporal dependency ht1        l. The input to the subsequent layer is generated by combining both forward and backward hidden activations as:

ht  l=ht  lht  l, (2)

Figure 1:

Figure 1:

Schematic representation of the proposed framework for robust brain-state decoding using Bidirectional LSTM. At time t, 360 regions of interest (ROIs) are passed in as input activations, xt, to a 3-layer deep RNN architecture. The whole architecture has two stacked bidirectional layers for encoding temporal dependencies, followed by a fully-connected layer with softmax activation for predicting the brain-states. One bidirectional layer comprises of a set of forward and backward layer of LSTM, and is highlighted using a dashed-line box. The hidden activations from forward and backward LSTMs can be merged in three different ways. The merging schemes are depicted in a box in the middle left. Based on this, the models can be named as bi-LSTM-c, bi-LSTM-μ, and bi-LSTM-a. The output vector from the fully-connected layer yt indicates the predicted brain-state corresponding to the input xt.

where dot (.) represents a merging scheme. The three possible ways of combining the activations are vector concatenation (bi-LSTM-c), element-wise vector addition (bi-LSTM-a), and element-wise averaging (bi-LSTM-μ). The input activations to the layer l = 1 at time t, xt, are extracted from the 360 brain ROIs and the input to the subsequent LSTM layers l = 2, 3, ., n are the hidden activations of the previous l − 1th layer. The last layer of bi-LSTM is followed by a fully-connected layer having S neurons and softmax activation, and is used to learn a mapping from the learned feature representations to the brain states as:

st=softmax(Wsht  n+bs). (3)

3. EXPERIMENTAL RESULTS AND DISCUSSION

For comparing the performance of the bi-LSTM with the conventional LSTM that was earlier evaluated on the working memory task, bi-LSTM architecture with the same specifications as in [21] was built. At any given time t, input activations xt pass through two hidden bi-LSTM layers with each LSTM cell having 256 hidden activations to encode the temporal dependencies. This is followed by a fully-connected layer containing S = 9 neurons predicting the brain states. We employed inter-subject 10-fold cross-validation. [17]. Data from N = 400 participants was divided into 10 parts, of which 9 parts (N = 360 subjects) were kept for model development and the remaining one part (N = 40 subjects) was kept unseen for evaluating the model performance. The development set (9 folds) was also randomly shuffled and split into 80–20:training-validation sets. The validation data was generated to tune the hyper-parameters and to prevent over-fitting. Since the task-paradigm for each run and each subject was same during the acquisition of the working memory task data [21], the full-length training data was windowed into small overlapping sizes with the window size w = 40 with an overlap of 10 points [21].

The proposed model was implemented in Keras deep learning framework. The model was trained on GeForce GTX 980 GPU with a batch size of 32, using ADAM optimizer with a learning rate of 0.001. The model was trained for 100 epochs and no early stopping was performed. To prevent the model from over-fitting, a dropout of 0.3 was applied in LSTM layers for training. The number of time-instances of class “others” in the data were much more than any other class. Thus, in order to prevent the model from predicting the states as per the underlying class distribution, weights for the imbalanced classes were estimated using Sklearn’s “compute_class_weight” [24] routine and were applied during loss function calculation, giving value to instances that was inversely proportional to their frequency in the data.

We compared the proposed architecture (bi-LSTM) with its conventional unidirectional counterpart LSTM and with a three layer feed-forward Neural Network (ff-NN), which used ROIs at individual time points as features discarding temporal dependencies. For better comparison, the number of layers and the number of neural units in the layers for the other models were kept same as in the proposed model. We also used different models of bi-LSTM Fig. 1 based on the combination of activations of the forward hl and backward hl hidden states, namely, bi-LSTM performing merging by concatenation (bi-LSTM-c), element-wise adding (bi-LSTM-a), and by taking element-wise mean (bi-LSTM-μ). Results in Table-1 were obtained by evaluating the performance on the unseen test data of N = 40 subjects in each fold. The averaged F1 score for each class are tabulated in Table-1. It is observed that the bidirectional models outperform ff-NN and LSTM. Further, bi-LSTM-c seems to perform slightly better than the bi-LSTM-μ and bi-LSTM-a. Possibly, summation or averaging may be merging the features activations leading to slightly inferior performance.

Mean normalised confusion matrices on the classification accuracy are illustrated in Fig. 2 for comparing the miss-classifications of both LSTM and bi-LSTM-c. The overall accuracy of the unidirectional LSTM model was 0.66 ± 0.18, whereas the classification accuracy of the bidirectional LSTM (bi-LSTM-c) was 0.84±0.02. For every brain-state, the LSTM model miss-classify to a larger extent compared to the bi-LSTM-c model, although the misclassification is highest for both the models to the “Others” class.

Figure 2:

Figure 2:

Brain-state decoding performance of (a) Long short-term memory (LSTM) and (b) Bidirectional LSTM with feature concatenated merging (bi-LSTM-c) on the unseen data of 40 particpants from working memory task fMRI data. The color bar indicates mean accuracy across 10 cross-folds of validation.

Furthermore, the second highest confusion in case when participants were stimulated with “faces” and “places” is with the task (0-back or 2-back), though the stimuli was detected correctly. The model also gets confused between the stimuli “body” and “tools”.

4. CONCLUSIONS AND FUTURE WORK

In this study, we use bidirectional LSTM network model for decoding brain states from task fMRI data in order to appropriately capture the dynamics of fMRI BOLD response. The experimental results on the working memory task fMRI data demonstrated superior performance of Bi-LSTM compared to the unidirectional LSTM. Further, this model works well without any hard-coded delay adjustment, emphasizing the availability of useful information in the immediate future samples as well. The model could improve performance in brain-decoding task in working memory task fMRI data comparing with the earlier proposed unidirectional LSTM based decoder. We worked with the fixed window length, although future work may involve tuning the window-size and overlap for time-series chunking. The problem of class imbalance, although, majorly handled, still requires more sophisticated handling. From analysis point of view, it would be interesting to study about cortical regions engaged in stimuli “body” and “tools” as the model sometimes gets confused between them.

Table 1:

Comparative performance of different models in terms of cross-validated F1 score for each brain state and the weighted-average performance using the unseen data of 40 participants from working memory task fMRI data.

Model 0-back Body 0-back back Faces 0-back Places 0-back Tools 2-back Body 2-back Faces 2-back Places 2-back Tools Others Weighted Average
ff-NN 0.53 0.54 0.52 0.48 0.48 0.60 0.53 0.52 0.79 0.55
u-LSTM 0.68 0.64 0.69 0.62 0.56 0.70 0.69 0.61 0.71 0.66
bi-LSTM-μ 0.85 0.83 0.86 0.81 0.79 0.87 0.86 0.84 0.87 0.85
bi-LSTM-c 0.85 0.83 0.87 0.83 0.80 0.87 0.87 0.85 0.88 0.86
bi-LSTM-a 0.85 0.83 0.85 0.81 0.80 0.86 0.87 0.85 0.87 0.85

Note: Best classification performance for each brain-state is highlighted in bold.

CCS CONCEPTS.

• Computing methodologies → Neural networks.

ACKNOWLEDGMENTS

A.M. is supported by a research assistantship by the Laboratory of Emotion and Cognition, University of Maryland, College Park. L.P. is supported by the National Institute of Mental Health (R01 MH071589 and R01 MH112517). Task data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. Authors would like to thank Manasij Venkatesh, University of Maryland, for helping to carry out the research.

Footnotes

*

Produces the permission block, and copyright information

The full version of the author’s guide is available as acmart.pdf document

Contributor Information

Anant Mittal, IIIT-Delhi, Delhi, India.

Priya Aggarwal, IIIT-Delhi, Delhi, India.

Luiz Pessoa, Laboratory of Cognition and Emotion, University of Maryland, USA.

Anubha Gupta, SBILab, Deptt. of ECE, IIIT-Delhi, Delhi, India.

REFERENCES

  • [1].Aggarwal Priya and Gupta Anubha. 2019. Group-fused multivariate regression modeling for group-level brain networks. Neurocomputing 363 (2019), 140–148. 10.1016/j.neucom.2019.06.042 [DOI] [Google Scholar]
  • [2].Aggarwal Priya and Gupta Anubha. 2019. Multivariate graph learning for detecting aberrant connectivity of dynamic brain networks in autism. Medical Image Analysis 56 (2019), 11–25. 10.1016/j.media.2019.05.007 [DOI] [PubMed] [Google Scholar]
  • [3].Aggarwal Priya, Gupta Anubha, and Garg Ajay. 2015. Joint estimation of activity signal and HRF in fMRI using fused LASSO. In 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP). 829–833. 10.1109/GlobalSIP.2015.7418313 [DOI] [Google Scholar]
  • [4].Aggarwal Priya, Gupta Anubha, and Garg Ajay. 2017. Multivariate brain network graph identification in functional MRI. Medical Image Analysis 42 (2017), 228–240. 10.1016/j.media.2017.08.007 [DOI] [PubMed] [Google Scholar]
  • [5].Buxton Richard B, Wong Eric C, and Frank Lawrence R. 1998. Dynamics of blood flow and oxygenation changes during brain activation: the balloon model. Magnetic resonance in medicine 39, 6 (1998), 855–864. [DOI] [PubMed] [Google Scholar]
  • [6].Cox David D and Savoy Robert L. 2003. Functional magnetic resonance imaging (fMRI)“brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex. Neuroimage 19, 2 (2003), 261–270. [DOI] [PubMed] [Google Scholar]
  • [7].Cox Robert W. 1996. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Computers and Biomedical research 29, 3 (1996), 162–173. [DOI] [PubMed] [Google Scholar]
  • [8].Dehaene Stanislas, Gurvan Le Clec’H Laurent Cohen, Poline Jean-Baptiste, van de Moortele Pierre-François, and Le Bihan Denis. 1998. Inferring behavior from functional brain images. Nature neuroscience 1, 7 (1998), 549. [DOI] [PubMed] [Google Scholar]
  • [9].Esterman Michael, Chiu Yu-Chin, Tamber-Rosenau Benjamin J, and Steven Yantis. 2009. Decoding cognitive control in human parietal cortex. Proceedings of the National Academy of Sciences 106, 42 (2009), 17974–17979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Glasser Matthew F, Coalson Timothy S, Robinson Emma C, Hacker Carl D, Harwell John, Yacoub Essa, Ugurbil Kamil, Andersson Jesper, Beckmann Christian F, Jenkinson Mark, et al. 2016. A multi-modal parcellation of human cerebral cortex. Nature 536, 7615 (2016), 171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Glasser Matthew F, Sotiropoulos Stamatios N, Wilson J Anthony, Coalson Timothy S, Fischl Bruce, Andersson Jesper L, Xu Junqian, Jbabdi Saad, Webster Matthew, Polimeni Jonathan R, et al. 2013. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage 80 (2013), 105–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Haxby James V, Gobbini M Ida, Furey Maura L, Ishai Alumit, Schouten Jennifer L, and Pietrini Pietro. 2001. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293, 5539 (2001), 2425–2430. [DOI] [PubMed] [Google Scholar]
  • [13].Haynes John-Dylan and Rees Geraint. 2006. Neuroimaging: decoding mental states from brain activity in humans. Nature Reviews Neuroscience 7, 7 (2006), 523. [DOI] [PubMed] [Google Scholar]
  • [14].Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780. [DOI] [PubMed] [Google Scholar]
  • [15].Jaeger Herbert. 2001. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report 148, 34 (2001), 13. [Google Scholar]
  • [16].Jang Hojin, Plis Sergey M, Calhoun Vince D, and Lee Jong-Hwan. 2017. Task-specific feature extraction and classification of fMRI volumes using a deep neural network initialized with a deep belief network: Evaluation using sensorimotor tasks. NeuroImage 145 (2017), 314–328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Kohavi Ron et al. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, Vol. 14. Montreal, Canada, 1137–1145. [Google Scholar]
  • [18].Kruggel Frithjof and von Cramon D Yves. 1999. Modeling the hemodynamic response in single-trial functional MRI experiments. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 42, 4 (1999), 787–797. [DOI] [PubMed] [Google Scholar]
  • [19].Kruggel Frithjof and von Cramon D Yves. 1999. Temporal properties of the hemodynamic response in functional MRI. Human brain mapping 8, 4 (1999), 259–271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Lemm Steven, Blankertz Benjamin, Dickhaus Thorsten, and Müller Klaus-Robert. 2011. Introduction to machine learning for brain imaging. Neuroimage 56, 2 (2011), 387–399. [DOI] [PubMed] [Google Scholar]
  • [21].Li Hongming and Fan Yong. 2018. Brain decoding from functional MRI using long short-term memory recurrent neural networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 320–328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Liao Chien Heng, Worsley Keith J, Poline J-B, Aston John AD, Duncan Gary H, and Evans Alan C. 2002. Estimating the delay of the fMRI response. NeuroImage 16, 3 (2002), 593–606. [DOI] [PubMed] [Google Scholar]
  • [23].Mumford Jeanette A, Turner Benjamin O, Ashby F Gregory, and Poldrack Russell A. 2012. Deconvolving BOLD activation in event-related designs for multi-voxel pattern classification analyses. Neuroimage 59, 3 (2012), 2636–2643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Pedregosa Fabian, Varoquaux Gaël, Gramfort Alexandre, Michel Vincent, Thirion Bertrand, Grisel Olivier, Blondel Mathieu, Prettenhofer Peter, Weiss Ron, Dubourg Vincent, et al. 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research 12, Oct (2011), 2825–2830. [Google Scholar]
  • [25].Poldrack Russell A. 2011. Inferring mental states from neuroimaging data: from reverse inference to large-scale decoding. Neuron 72, 5 (2011), 692–697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Saad Ziad S, Ropella Kristina M, Cox Robert W, DeYoe Edgar A. 2001. Analysis and use of FMRI response delays. Human brain mapping 13, 2 (2001), 74–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Schuster Mike and Paliwal Kuldip K. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681. [Google Scholar]
  • [28].Venkatesh Manasij, Jaja Joseph, and Pessoa Luiz. 2019. Brain dynamics and temporal trajectories during task and naturalistic processing. NeuroImage 186 (2019), 410–423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Williams Ronald J and Zipser David. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural computation 1, 2 (1989), 270–280. [Google Scholar]
  • [30].Wolpaw Jonathan R, Birbaumer Niels, McFarland Dennis J, Pfurtscheller Gert, and Vaughan Theresa M. 2002. Brain–computer interfaces for communication and control. Clinical neurophysiology 113, 6 (2002), 767–791. [DOI] [PubMed] [Google Scholar]

RESOURCES