Abstract
Functional ultrasound imaging (fUSI) is a cutting-edge technology that measures changes in cerebral blood volume (CBV) by detecting backscattered echoes from red blood cells moving within its field of view (FOV). It offers high spatiotemporal resolution and sensitivity, allowing for detailed visualization of cerebral blood flow dynamics. While fUSI has been utilized in preclinical drug development studies to explore the mechanisms of action of various drugs targeting the central nervous system, many of these studies rely on predetermined regions of interest (ROIs). This focus may overlook relevant brain activity outside these specific areas, which could influence the results. To address this limitation, we compared three machine learning approaches—convolutional neural network (CNN), support vector machine (SVM), and vision transformer (ViT)—combined with fUSI to analyze the pharmacodynamics of dizocilpine (MK-801), a potent non-competitive NMDA receptor antagonist commonly used in preclinical models for memory and learning impairments. While all three machine learning techniques could distinguish between drug and control conditions, CNN proved particularly effective due to its ability to capture hierarchical spatial features while maintaining anatomical specificity. Class activation mapping revealed brain regions, including the prefrontal cortex and hippocampus, that are significantly affected by drug administration, consistent with literature reporting a high density of NMDA receptors in these areas. Overall, the combination of fUSI and CNN creates a novel analytical framework for examining pharmacological mechanisms, allowing for data-driven identification and regional mapping of drug effects while preserving anatomical context and physiological relevance.
Keywords: functional ultrasound imaging, machine learning, MK-801, convolutional neural network, vision transformer, support vector machine
1. Introduction
Functional ultrasound imaging (fUSI) is an emerging hemodynamic-based neuroimaging technology that measures cerebral blood volume (CBV) changes by detecting backscattered echoes from red blood cells moving within its field of view (FOV) (Macé et al., 2011, 2013). It provides a unique combination of large spatial coverage, high spatiotemporal resolution (~100 µm and ~100 ms), and sufficient sensitivity to detect slow blood flow changes (less than 1 mm/s). The relative simplicity and portability of ultrasound scanners have allowed fUSI to be performed in a wide range of preclinical and clinical studies, providing minimally invasive neural imaging in species ranging from mice to humans (Demene et al., 2017; Griggs et al., 2023; Imbault et al., 2017; Norman et al., 2021; Osmanski, Martin, et al., 2014; Osmanski, Pezet, et al., 2014). Among its various applications, fUSI has been employed in preclinical drug development to elucidate the mechanisms of action of drugs targeting the central nervous system. In particular, the mechanisms of various drugs, including anesthetics, cholinesterase agonists and antagonists, N-methyl-D-aspartate (NMDA) receptor antagonists, and selective norepinephrine reuptake inhibitors (SNRIs), have been investigated to better understand their effects on brain function and vascular dynamics, and to help accelerate the development of new drug therapies (Crown et al., 2024; Rabut et al., 2020; Vidal, Droguerre, Valdebenito, et al., 2020; Vidal, Droguerre, Venet, et al., 2020).
Despite the important contribution of these neuropharmacological studies, they primarily examine the effects of drugs on hemodynamic signals within specific regions of interest (ROIs) and/or the functional connectivity between these regions. This approach may introduce bias by neglecting drug-induced changes in neural activity occurring outside these targeted areas. Therefore, there is a critical need to develop advanced analytical tools capable of assessing the dynamic effects of drugs on the brain without relying on predefined region specification. In this study, we develop and compare multiple machine learning approaches—convolutional neural network (CNN), support vector machine (SVM), and vision transformer (ViT)—combined with fUSI to elucidate the pharmacodynamics of dizocilpine (MK-801) in the brain. MK-801 is a potent and selective NMDA receptor antagonist originally used as a pharmacological model of psychosis in rodents (Andiné et al., 1999), and is still extensively employed to model schizophrenia (Zepeda et al., 2022). Our comparative analysis showed that although CNN and ViT achieve similar classification performance—with SVM exhibiting comparatively lower accuracy—they differ notably in biological interpretability. When combined with axiom-based grad-class activation mapping (XGrad-CAM) (Fu et al., 2020), CNN revealed anatomically specific activation patterns in cortical and hippocampal regions with high NMDA receptor expression (Watanabe et al., 1994). Post-hoc quantitative analysis of CBV changes in these regions confirmed that MK-801 administration caused significant hemodynamic reduction, validating that CNN can accurately detect and localize drug effects while maintaining interpretability. Overall, the combination of fUSI technology with CNN provides a powerful new framework for investigating pharmacological mechanisms in the brain and has potential applications for accelerating drug development.
2. Materials and Methods
2.1. Animals
Twenty-three (23) 8–12-week-old male mice were used in this study (C57BL/6, Charles River Laboratories, Hollister, CA). All mice were group-housed, fed ad libitum, and maintained at a regular light–dark cycle of 12 hours. The animals were divided into two groups: MK-801 drug group (n = 10) and saline control group (n = 13).
2.2. Surgical procedures
The mice were anesthetized using 5% isoflurane solution delivered in a mixture of oxygen and nitrous oxide (1:2 ratio) and then maintained at a constant isoflurane concentration of 1.5–2% throughout the experiment. Body temperature was kept stable using an electric warming pad. The animals were head-fixed in a stereotaxic frame with ear bars to minimize head movement and reduce motion artifacts. A commercially available depilatory cream (Nair, Pharmapacks) was utilized to remove hair from the scalp, followed by application of an echographic ultrasonic gel on the intact scalp-skin to enhance acoustic coupling during fUSI signal acquisition. All the experimental and surgical protocols were approved by the Institutional Animal Care and Use Committee of the University of Southern California (IACUC #21006).
2.3. Data acquisition
Transcranial power Doppler (pD) images were obtained using the Iconeus One scanner (Iconeus, Paris, France). A 128-element linear array transducer probe with a 15.6 MHz center frequency and 0.1 mm pitch was placed on intact mouse skulls. The probe was fixed in a motorized system during the course of the experiment (Fig. 1A). Before recording, the target sagittal plane for imaging was determined by performing a 3-D whole-brain fUSI scan for each animal. The plane was then aligned with the standard Allen Mouse Common Coordinate Framework brain atlas using the specialized software provided with the Iconeus One system (Wang et al., 2020). The details of fUSI parameters are described in Crown et al. (2024). Briefly, each image was constructed from 200 compounded frames, acquired at 500 Hz using 11 tilted plane waves (−10° to 10°, 2° increments). The pulse repetition frequency was 5.5 kHz, with continuous acquisition of 400 ms blocks of compounded images, separated by 600 ms intervals. This approach enables pD image acquisition with in-plane spatial resolution of 100 µm × 100 µm, slice thickness of 400 µm, FOV of 12.8 mm in width and 10.0 mm in depth, and an overall image-frame production rate of 1 Hz (Fig. 1B). The fUSI data acquisition protocol consisted of 5 minutes of pre-injection recording, followed by intraperitoneal (i.p.) injection of either 0.2 ml of saline or MK-801 (1.5 mg/kg) at the 5 minutes mark, followed by an additional 55 minutes of recording post-saline/MK-801 injection (Fig. 1C). For data analysis purposes, only the final 2 minutes of the pre-injection period (minutes 3–5) were used as the baseline reference, as the signal had stabilized during this interval.
Fig. 1.
Experimental setup and fUSI acquisition. (A) fUSI motorized system setup for signal acquisition. (B) Power Doppler-based 2D vascular map image of a mouse brain in a sagittal plane. (C) Diagram of the experimental protocol for the 60 minutes of continuous fUSI acquisition. After 5 minutes, saline or 1.5 mg/kg MK-801 was intraperitoneally injected to the animals. The blue/orange color means the mouse is either injected with MK-801 or saline.
2.4. Data pre-processing and registration
We implemented the NoRMCorre motion correction technique (Pnevmatikakis & Giovannucci, 2017), combined with our in-house algorithms for filtering breathing and high-frequency signal to reduce noise. Specifically, we applied a lowpass filter featuring a normalized passband frequency of 0.02 Hz and a stopband attenuation of 60 dB, which also offsets any delay caused by the filtering process, thereby eliminating high-frequency fluctuations.
Furthermore, all images were registered to a common reference image to prevent the machine learning models from being influenced by inter-subject variability in brain structures. We first identified the reference animal based on the imaging quality, selecting one with clearly defined subcortical structures. Then we computed the mean image of the first 10 images at the beginning of the third minute of the 5-minute pre-injection baseline period to create a stable fixed image (reference image), in order to mitigate any potential variability that might arise from using a single image as a reference. We then employed the Imregdeform algorithm, which is part of the Matlab Image Processing Toolbox (since 2022b). This algorithm employs a parametric approach to non-rigid image registration with total variation regularization. All images were aligned to the reference frame using the Imregdeform algorithm, which aids in simplifying subsequent data analysis. The default parameters were used for Imregdeform.
2.5. Models and feature selection
2.5.1. Implementation and parameter selection
SVM was implemented using scikit-learn (Pedregosa et al., 2011), while CNNs and ViTs were implemented using PyTorch (Paszke et al., 2019). Most of the code was executed on a 2.8 GHz Quad-Core Intel Core i7 processor with 16 GB of memory. Parameter sweeps were run using a Tesla K80 GPU with 64 GB of memory. Parameter sweep was performed using weights and biases (Biewald, 2020) grid sweep. After the grid sweep, we subjected the 15 best hyperparameter combinations to 10-fold cross-validation, from which the best model—that is, the model yielding the highest peak accuracy—was chosen.
2.5.2. Convolutional neural network (CNN)
The CNN progressively reduces the spatial dimensions of the image input while increasing the depth of feature maps through multiple convolutional layers. To ensure compatibility with class activation mapping, the architecture includes a global average pooling (GAP) layer followed by a single linear layer, which outputs two features corresponding to the binary classification task. We conducted extensive parameter sweeps to determine the optimal architecture and hyperparameters (Table 1, best-performing values in bold). The final architecture consists of 4 convolutional layers with 32 filters each, employing the activation function. The activation function, which was selected over ReLU and LeakyReLU alternatives, is defined by:
Table 1.
Hyperparameters used in the CNN parameter sweep.
| Hyperparameter | Value |
|---|---|
| Batch size | 32, 64 |
| Learning rate | 1e-4, 1e-3, 1e-2 |
| CNN nonlinearity | ReLU , , LeakyReLU |
| Number of filters | 32, 64 |
| Number of convolutional layers | 2, 3, 4 |
| Optimizer | Adam (Kingma & Ba, 2014) |
| Visualization technique | XGrad-CAM (Lu et al., 2017) |
| Loss function | Cross-entropy |
Bold values indicate hyperparameters used in the final selected model.
| (1) |
This activation function is known to speed up learning and lead to higher classification accuracies by pushing the network’s mean activation toward zero (Clevert, 2015). The model is trained using the Adam optimizer with a learning rate of 1e-3 and batch size of 32.
2.5.3. Class activation mapping
Class activation mapping was performed using XGrad-CAM (Lu et al., 2017) with a sliding time window of 1 minute. The localization map for a given input is computed as follows:
| (2) |
where and the coefficient being defined as
| (3) |
where is the activation of node in the target layer of the model at position and is the model output score for class before activation. Here, and are the height and width of the feature maps, respectively.
The target layer for this analysis was chosen to be the final convolutional layer, which contains higher-level feature representations and is commonly chosen for CAM (Zhou et al., 2016).
The weights found in Equation (3) were computed using the product of the gradient of the model’s raw score for the class with respect to each activation and a normalization term. This encodes the importance of each activation in the final convolutional layer regarding the final classification decision for class . In our case, the final convolutional layer produces 128 feature maps with height, , and width .
As described in Equation (2), after calculating the weights for the given class and node, the weighted average was computed and the activation function was applied to obtain the activation at each region. The result is a 12 16 activation map, corresponding to the dimensions of the feature maps from the final convolutional layer. For better comparison with the original image, the 12 16 image is up-sampled to the original image size of 91 128 via bilinear interpolation.
2.5.4. Vision transformer (ViT)
ViTs are deep learning models designed for computer vision tasks (Alexey, 2020). Unlike CNNs, ViTs process images by first splitting them into fixed-size patches and then applying a transformer architecture to the sequence of these patches. The architecture details of our ViT implementation are determined through parameter sweeps (Table 2, best-performing values in bold). The final model uses 16 transformer layers with 4 attention heads and an embedding dimension of 256, trained with a batch size of 16 and learning rate of 1e-4.
Table 2.
Hyperparameters used in the vision transformer parameter sweep.
| Hyperparameter | Value |
|---|---|
| Batch size | 16, 32 |
| Learning rate | 1e-5, 1e-4, 1e-3 |
| Depth | 4, 8, 16 |
| Number of heads | 2, 4, 8 |
| Embedding dimension | 128, 256, 512 |
| Patch size | (7, 8) |
| Kernel size | (7, 8) |
| Optimizer | Adam (Kingma & Ba, 2014) |
| Visualization technique | Attention rollout (Abnar & Zuidema, 2022) |
| Loss function | Cross-entropy |
Bold values indicate hyperparameters used in the final selected model.
To visualize the regions important for classification, we employed attention rollout (Abnar & Zuidema, 2022) with max fusion and a discard ratio of 0.9. For an in-depth description of attention rollout, see Abnar and Zuidema (2022). Briefly, attention rollout recursively computes the token attentions in each layer of a trained transformer model, resulting in an attention map, analogous to class activation maps in CNNs.
2.5.5. Support vector machine (SVM)
SVM is a supervised learning model that classifies data by identifying the optimal hyperplane that separates data points into distinct classes. We implemented a linear SVM classifier that directly processes the registered fUSI data, without relying on predefined ROIs. In this setup, each pixel in the fUSI images is treated as an individual feature, allowing the SVM to perform unbiased feature selection across the entire imaging field. This approach enables the model to explore all aspects of the fUSI data, providing a comprehensive classification without any imposed spatial constraints.
2.6. Training and testing data sets
All MK-801 vs. saline classification task models were subjected to 10-fold cross-validation for robust performance assessment. In each cross-validation fold, data from 5 out of 10 MK-801-injected animals and 7 out of 13 saline-injected animals were randomly selected for training, while the remaining data were held out for testing. This near 50–50 training/testing was chosen to ensure that a substantial number of animals were allocated to the test set for each cross-validation fold, which is particularly important for generating reliable class activation maps, feature weights visualizations, and attention maps. For each cross-validation fold, these visualizations are generated via averaging single-animal maps from the given testing set. While increasing the training set size could improve classification accuracy, it would reduce the number of animals available for visualization, potentially introducing bias. Specifically, with fewer test subjects, the class activation maps/attention maps would be averaged over a smaller sample, making them less representative and more susceptible to individual variability. For smaller datasets, a larger proportion of the available data may need to be allocated for training to ensure the models learn effectively.
CNN and ViT models were trained on fUSI data from the final 5 minutes of post-injection recordings (Supplementary Figure S1) since our previous study shows that the effects of MK-801 on brain activity become more potent over time (Crown et al., 2024). Testing was conducted on the entire post-injection period using a 1-minute sliding window. For SVM, we trained and tested a separate model on each successive 1-minute interval to capture temporal variations in feature importance. This is done to compare the resulting feature weights visualization with the class activation maps and attention maps from CNN and ViT, which are dynamic despite static training data. For a given SVM, the feature weights are determined solely by the data used to train the model. To capture temporal variations in feature performance, we trained a new model for each 1-minute interval. Each model generates a feature weight map corresponding to the SVM trained on that specific interval. These individual maps are then concatenated to create a comprehensive feature weights visualization, allowing for the analysis of how feature importance evolves over time.
2.7. Model performance
Model performance was assessed using 10-fold cross-validation. For each iteration, the model was trained on the designated training set and tested on the remaining test set, and a sliding window of 1 minute was utilized to analyze the model’s performance over time. The adequacy of the model was evaluated using the following metrics:
Accuracy: (TP + TN)/ (P + N).
Precision: TP / (TP + FP).
Recall: TPR = TP/P.
Area under curve (AUC): Area under the receiver operating characteristic (ROC), which is the plot of the true positive rate (TPR) against the false positive rate (FPR). FPR is defined as FP/N.
F1-Score: Harmonic mean of precision and recall: 2TP/(2TP + FP + FN).
Here P and N represent the number of positive and negative instances, respectively, and TP, TN, FP, and FN denote true positives, true negatives, false positives, and false negatives. To reduce potential biases from individual training/testing set selections, we averaged evaluation metrics in each temporal window across all cross-validation iterations.
2.8. Statistical analysis
For each model (CNN, SVM, and ViT), two-way repeated measure ANOVAs were conducted separately for MK-801-injected and saline-injected animals to evaluate differences in CBV changes between identified and non-identified regions during the post-injection period, with region (identified and non-identified) and time as factors, with data pooled across all animals within the MK/saline group.
A separate two-way repeated measures ANOVA compared classification performance over time among the three models during the same post-injection period, with model type (CNN, SVM, and ViT) and time as factors. Here, performance metrics for each model incorporated all cross-validation folds. For significant main effects of model type in the model comparison ANOVA, post-hoc pairwise comparisons were conducted using Tukey’s test to determine which specific models differed significantly from each other, while controlling for the family-wise error rate in these multiple comparisons. All statistical analyses were conducted using Matlab R2020b.
3. Results
3.1. Comparative model performance in detecting MK-801 effects
To investigate the spatiotemporal effects of MK-801 on brain hemodynamics, we developed an analytical framework that integrates fUSI with machine learning to classify drug-induced changes (Fig. 2). Power Doppler (pD) images were recorded from anesthetized mice receiving either MK-801 or saline injections. The analysis consisted of three main stages: data acquisition and preprocessing, model development and training, and performance evaluation (Fig. 2A-C). In the preprocessing stage, we implemented an advanced non-rigid image registration method Imregdeform. To establish a common reference frame, we selected a mouse with high-quality vascular maps to serve as the reference animal. We then created a stable reference image by averaging the first 10 consecutive frames taken from the third minute of the 5-minute baseline recording period from this animal. All images were subsequently aligned to this reference frame, ensuring the models did not rely on varying brain anatomical structures across animals for classification.
Fig. 2.
Overview of analysis combining fUSI and machine learning to detect MK-801 effects in the brain. (A) fUSI mice brain signal acquisition. (B) Mice 2D power Doppler (pD) image time series alignment to a common reference image. (C) Data partitioning. (D) CNN, feature selection, and class activation mapping. (E) SVM and feature weights visualization. (F) ViT and attention map.
The CNN and ViT architectures underwent extensive parameter sweep through grid search followed by cross-validation of the top-performing configurations (Tables 1, 2), while we also included a linear SVM as a classical machine learning approach for comparison. We trained each model using pD images from the final 5 minutes of recording, a period in which MK-801 effects are expected to peak, then evaluated their performance across successive 1-minute intervals spanning the entire 60-minute experiment using all cross-validation folds. We assessed the ability of each model to distinguish the effect of MK-801 over time (Fig. 3). All three models showed dynamic classification performance that evolved with the progression of MK-801-induced effects. Consistent with our hypothesis that classification accuracy would improve as the drug effects emerged, all models performed near chance level at the time of injection (CNN: 55.9 1.4%, ViT: 55.7 2.4%, SVM: 52.8 2.2%, mean SD across folds). As MK-801 gradually took effect, both CNN and ViT demonstrated robust performance improvements, with CNN reaching a peak accuracy of 79.2 6.2% at 32 minutes post-injection, and ViT achieving a comparable peak of 79.5 5.8% at 37 minutes post-injection. SVM showed more modest performance, reaching a maximum accuracy of 74.2 12.8% at 35 minutes post-injection. Two-way repeated measures ANOVA performed on the post-injection period (minutes 5–60) revealed a significant main effect of method ( ) and time ( ) but no significant method time interaction ( ). Post-hoc pairwise comparisons with the Tukey test showed that CNN significantly outperformed SVM (mean difference: 6.69%, ), while no significant differences were found between CNN and ViT (mean difference: 0.71%, ) or between SVM and ViT (mean difference: -5.98%, ). Additional model performance metrics including precision, recall, F1-score, and ROC-AUC are presented in Supplementary Figure S2.
Fig. 3.
Model classification accuracy for distinguishing MK-801 group from saline group using CNN, ViT, and SVM. Shaded areas represent standard error. The vertical dotted line at 5 minutes marks the injection time, and the gray shaded area (minutes 3–5) indicates the baseline period.
3.2. CNN captures anatomically relevant MK-801-induced brain hemodynamic responses
To ensure a fair comparison between models, we standardized the region selection approach by using a fixed area percentage threshold across all three methods. For each model’s output (CAMs for CNN, feature weights for SVM, and attention maps for ViT), we first computed the mean across the duration of the experiment, and then selected the top 15% of pixels using percentile-based thresholding. This approach addresses the inherent differences in value distributions across methods and provides a consistent basis for comparing their biological relevance. To validate the robustness of our findings, we performed sensitivity analyses across different threshold percentages (see Videos 1–3 in Supplementary Materials), which showed that our conclusions remained consistent across a range of reasonable thresholds.
The class activation maps for CNN (generated using XGrad-CAM) revealed consistent activation patterns centered on cortical regions and extending into specific subcortical structures including the hippocampus and midbrain (Fig. 4A). These activation patterns offer an unbiased view on the brain regions that have a key role in differentiating the effects of MK-801 from saline control conditions. By focusing on the MK-801 group, the activation patterns reflect regions where the hemodynamic signal is strongly influenced by MK-801 administration. To validate the biological relevance of these CNN-identified regions, we measured CBV changes within these areas. The CBV changes were calculated as the percentage change of pD signal relative to baseline—that is, the mean pD signal during the last 2 minutes (minutes 3–5) before injection. This specific baseline window was chosen to ensure signal stability before drug administration. Notably, in MK-801-injected animals, CNN-identified brain regions exhibited a progressive decrease in CBV, reaching -13.63 10.02% (mean SD across cross animals) from baseline at 55 minutes post-injection, compared with -2.55 2.04% for non-identified regions—that is, regions not identified by the CNN as affected by MK-801 administration. A repeated measures ANOVA performed on the post-injection period (5–60 minutes) revealed a significant main effect of region ( ), a significant effect of time ( ), and a significant region time interaction ( ), indicating that the CBV values significantly changed over the course of the experiment regardless of region type, and the temporal evolution of CBV changes was significantly different between regions (Fig. 4B). This interaction reflects the gradual divergence of CBV responses, with identified regions showing an increasing reduction over time. However, analysis of the same regions in saline-injected animals showed minimal changes at 55 minutes post-injection (0.20 5.94% in identified regions vs -0.19 2.03% in non-identified regions). A repeated measures ANOVA performed on the post-injection period (5–60 minutes) revealed no significant main effect of region ( ), and although there was a significant effect of time ( ), there was no significant region time interaction ( ) (Fig. 4C). The spatial distribution of drug-sensitive regions aligns with previous findings demonstrating MK-801-induced CBV reduction in the hippocampus and medial prefrontal cortex (mPFC) (Crown et al., 2024), areas known for their high density of NMDA receptors (Watanabe et al., 1994).
Fig. 4.
CNN-derived CAM and CBV dynamics in MK-801 and saline-injected animals. (A) Mean CAM for the MK-801 class overlay on the reference image, highlighting regions of importance in CNN-based classification of MK-801 effects. The top 15% of pixels were selected using percentile-based thresholding. Brighter regions indicate higher relevance. (B) Time series of (i.e., CBV) changes in CNN-identified regions (red curves) compared with non-identified regions (blue curves) for MK-801 group of animals. (C) Similar to B but for the saline group of animals. The shaded areas represent standard error derived from averaging across animals. The red dashed line at 5 minutes marks the injection time. A gray patch (3–5 minutes) indicates the pre-injection baseline window.
3.3. SVM and ViT show limited spatial specificity
In contrast to CNN’s anatomically specific activation patterns, SVM’s feature weights revealed diffuse activation across the entire imaging field (Fig. 5A). In MK-801-injected animals, SVM-identified regions showed a significant decrease in CBV compared with non-identified regions, reaching -11.60 7.04% (mean SD across animals) at 55 minutes post-injection, compared with -2.91 2.48% in non-identified regions. A repeated measures ANOVA performed on the post-injection period (5–60 minutes) revealed a significant main effect of region ( ), a significant effect of time ( ) as well as a significant region time interaction ( ), indicating that the CBV responses in SVM-identified regions differed significantly from non-identified regions throughout the recording (Fig. 5B). In contrast, while saline-injected animals showed significant differences between regions (main effect of region: ) and changes over time (main effect of time: ), they lacked the differential temporal evolution seen in the MK-801 group (no significant region time interaction: ). SVM-identified regions in saline-injected animals exhibited minimal CBV changes at 55 minutes post-injection ( ) compared with non-identified regions ( ) (Fig. 5C). These findings indicate that while SVM successfully identified brain regions with distinct hemodynamic responses to MK-801, these regions were less spatially specific than those identified by CNN and CAM. SVM appears to capture meaningful temporal dynamics following MK-801 administration but relies on a broader, more diffuse spatial distribution of features, potentially reducing its anatomical specificity compared with the more focused CNN-based techniques.
Fig. 5.
SVM-derived feature weights and CBV dynamics in MK-801 and saline-injected animals. (A) Mean feature weights overlay on the reference image, highlighting regions of importance in SVM-based classification. The top 15% of pixels were selected using percentile-based thresholding. Negative weights were extracted and inverted to specifically identify regions driving MK-801 (versus saline) classification, then normalized using percentile-based thresholding. Brighter regions indicate higher relevance for MK-801 detection. (B) Time series of (i.e., CBV) changes in SVM-identified regions (red curves) compared with non-identified regions (blue curves) for MK-801 group of animals. (C) Similar to B but for the saline group of animals. The shaded areas represent standard error derived from averaging across animals. The red dashed line at 5 minutes marks the injection time. A gray patch (3–5 minutes) indicates the pre-injection baseline window.
ViT also detected drug-induced changes in brain hemodynamics, but its spatial activation patterns differed markedly from CNN and SVM (Fig. 6A). Focusing on the MK-801 attention map, in MK-801-injected animals, ViT-identified regions exhibited a substantial decrease in CBV, reaching -8.87 5.86% (mean SD across animals) at 55 minutes post-injection, compared with -3.39 2.74% in non-identified regions. A repeated measures ANOVA performed on the post-injection period (5–60 minutes) revealed significant effects of region ( ), time ( ), and a significant region time interaction ( ), indicating that the CBV reduction was more pronounced in ViT-identified regions and evolved distinctly over time (Fig. 6B). In contrast, analysis of saline-injected animals showed no significant differences between ViT-identified regions (reaching -0.01 4.10% at 55 minutes post-injection) and non-identified regions (-0.16 2.24% at 55 minutes post-injection), with no significant main effect of region ( ) and no significant region time interaction ( ), despite a significant effect of time ( ) (Fig. 6C). While ViT successfully identified regions showing differential drug responses, its attention maps included areas outside the brain structure (Fig. 6A), potentially limiting the biological interpretation of these findings. This “checkerboard” phenomenon has been highlighted as a drawback of attention rollout (Achtibat et al., 2024). Together, these results suggest that while SVM and ViT could effectively distinguish between MK-801 and saline groups, they lacked anatomical precision in their feature selection compared with CNN’s more anatomically precise identification of the corresponding brain regions.
Fig. 6.
ViT-derived attention maps and CBV dynamics in MK-801- and saline-injected animals. (A) Mean attention map for the MK-801 class overlay on the reference image, highlighting regions of importance in ViT-based classification. The top 15% of pixels were selected using percentile-based thresholding. Brighter regions indicate higher relevance. (B) Time series of (i.e., CBV) changes in ViT-identified regions (red curves) compared with non-identified regions (blue curves) for MK-801 group of animals. (C) Similar to B but for the saline group of animals. The shaded areas represent standard error derived from averaging across animals. The red dashed line at 5 minutes marks the injection time. A gray patch (3–5 minutes) indicates the pre-injection baseline window.
4. Discussion
FUSI enables high-resolution measurement of CBV changes (Macé et al., 2011, 2013). While its applications in neuropharmacology have grown (Crown et al., 2024; Rabut et al., 2020; Vidal, Droguerre, Valdebenito, et al., 2020; Vidal, Droguerre, Venet, et al., 2020), most analyses rely on predefined ROIs, which may miss important drug effects in other brain areas and introduce bias in result interpretation. To address these limitations, we developed and evaluated multiple machine learning approaches for automated, whole-brain analysis of drug-induced hemodynamic changes using MK-801 as a model NMDAR antagonist.
Our systematic comparison of CNN, SVM, and ViT revealed distinct capabilities in analyzing fUSI data. CNN achieved both superior classification performance and biological specificity, demonstrating its unique ability to capture meaningful spatial patterns in brain hemodynamics. While SVM generated diffuse activation patterns and ViT exhibited attentions extending beyond brain boundaries, CNN demonstrated unique ability to capture meaningful spatial patterns in brain hemodynamics. The biological relevance of these CNN-identified regions is further validated by our recent comprehensive analysis using ROI-based methods (Hakopian et al., 2025), which independently showed that the hippocampus and mPFC exhibit the most pronounced CBV reductions and the greatest disruption in functional connectivity (hippocampus-mPFC pathway, t-score = -3.86). This convergence between our data-driven machine learning approach and ROI-based quantitative analysis provides strong validation that CNN can reliably identify biologically relevant brain regions without prior anatomical assumptions. Furthermore, the temporal progression of CNN performance, which peaks at approximately 32 minutes post-injection, corresponds with the period of pronounced connectivity disruption (25–30 minutes) transitioning to sparse network connectivity (35–40 minutes) (Hakopian et al., 2025), confirming the gradual, progressive nature of NMDAR antagonist effects on brain function.
4.1. ROI-independent analysis
The validation of CNN’s biological specificity highlights the broader methodological advantages of our data-driven framework over traditional ROI-dependent approaches. Conventional analyses typically require manual or atlas-based ROI selection before signal extraction, which inherently constrains the analysis to predetermined regions. Our CNN-based technique evaluates spatial patterns across the complete imaging field, identifying relevant hemodynamic changes based on functional responses rather than anatomical assumptions. This methodology mitigates potential selection bias in neuropharmacological imaging analysis. The class activation mapping technique provides visualization of regions contributing to classification decisions, effectively generating data-driven ROIs based solely on functional responses to drug administration. This approach allows the detection of drug effects that might occur outside canonical regions associated with a drug’s known mechanism of action.
4.2. Limitations and future directions
While our study demonstrates the potential of combining fUSI with interpretable machine learning techniques for investigating drug-induced changes in brain activity, it also opens avenues for technical refinement and further methodological exploration. The image registration performed using the Imregdeform algorithm provided sufficient alignment to detect drug-induced effects. Nonetheless, future optimization of registration parameters may further improve sensitivity to subtle hemodynamic changes. Additionally, while our current analysis effectively captured drug-induced patterns, the interpretability of these models presents opportunities for advancement. For instance, future studies could consider ViT architectures with advanced interpretability techniques to overcome the “checkerboard” artifacts common in traditional attention visualizations as noted by Achtibat et al. (2024), potentially providing complementary insights while maintaining anatomical specificity.
The interpretation of our findings should take into account the anesthetic context under which they were obtained. Isoflurane use is an important factor to consider when comparing our findings with those from conscious animal studies, where MK-801 has been reported to increase CBV in rats (Roussel et al., 1992). In contrast, our observation of a CBV decrease aligns with results obtained under other anesthetics, such as halothane or -chloralose, where MK-801 has similarly been shown to reduce CBV (Park et al., 1989; Roussel et al., 1992). These findings suggest that the cerebrovascular response to MK-801 is influenced by the anesthetic state. The known properties of isoflurane may add further complexity to this interaction: its vasodilatory effects (Franceschini et al., 2010) could alter baseline vascular tone, thereby modulating drug-induced hemodynamic changes, while its inhibitory action on NMDA receptors (Nishikawa & MacIver, 2000) could interact directly with MK-801’s mechanism of action. By applying an identical anesthesia protocol to both control and treatment groups, our design allows for a valid comparison between conditions. Thus, the observed between-group differences likely reflect the impact of MK-801 within this anesthetized context. An important future direction will be to adapt this methodology for awake, freely moving animals to disentangle the intrinsic effects of MK-801 from those of anesthesia.
Looking ahead, the framework presented here holds considerable potential for broader application and eventual clinical translation. Its generalizability can be evaluated by extending it to other pharmacological agents and experimental models. Moreover, since fUSI has already been successfully applied to human brain and spinal cord imaging in various contexts (K. Agyeman et al., 2025; K. A. Agyeman et al., 2024; Demene et al., 2017; Imbault et al., 2017; Rabut et al., 2024; Soloukey et al., 2020), our machine learning framework could be adapted to these clinical scenarios, providing novel insights into physiological processes, pathological conditions, and therapeutic interventions in the human central nervous system. While challenges exist—including the need for larger training datasets and potentially more sophisticated registration techniques—these represent technical rather than fundamental limitations. The minimally invasive nature, high spatiotemporal resolution, and cost-effectiveness of functional ultrasound make this technique favorable for clinical applications.
4.3. Conclusion
Overall, our study establishes a novel data analysis framework integrating machine learning with fUSI for neuropharmacological research. The proposed methodology offers an unbiased approach for detecting and localizing drug-induced changes without relying on predetermined ROIs. Our comparison of machine learning approaches demonstrates that CNNs provide superior biological specificity while maintaining classification performance, establishing a foundation for future fUSI data analysis. This methodological innovation has the potential to accelerate drug development by revealing previously overlooked drug effects and enabling more comprehensive characterization of therapeutic compounds across both preclinical and clinical settings.
Supplementary Material
Acknowledgments
This work has been partially supported by the Army Research Laboratory Cooperative Agreement No W911NF2120186, the Army Research Office W911NF-21-1-0094, the Keck School of Medicine Dean’s Pilot Funding Program (DL- PI), the National Institute of Mental Health (NIMH: 1K08MH121757-01A1), the USC Neurorestoration Center, and the Marlan and Rosemary Bourns College of Engineering at the University of California Riverside through start-up funding.
Data and Code Availability
The datasets generated and analyzed during the current study are available from the corresponding authors on reasonable request and after signing a formal data sharing agreement. The code is available on Github at https://github.com/DeightonJared/fUSI_CAM.
Author Contributions
J.D.: Methodology, Software, Formal Analysis, Visualization, Writing—Original Draft. S.Z.: Methodology, Formal Analysis, Visualization, Writing—Original Draft. K.A.: Methodology, Visualization, Writing—Original Draft. W.C.: Investigation. C.L.: Conceptualization, Writing—Review & Editing, Funding Acquisition. D.L.: Conceptualization, Writing—Review & Editing, Funding Acquisition. V.M.: Conceptualization, Supervision, Writing—Review & Editing, Funding Acquisition. V.C.: Conceptualization, Supervision, Writing—Review & Editing, Funding Acquisition.
Declaration of Competing Interest
The authors declare no competing interests.
Supplementary Materials
Supplementary material for this article is available with the online version here: https://doi.org/10.1162/IMAG.a.139
References
- Abnar, S., & Zuidema, W. (2022). Quantifying attention flow in transformers. arxiv 2020. arXiv preprint arXiv:2005.00928. 10.20944/preprints202411.2377.v1 [DOI] [Google Scholar]
- Achtibat, R., Hatefi, S. M. V., Dreyer, M., Jain, A., Wiegand, T., Lapuschkin, S., & Samek, W. (2024). Attnlrp: Attention-aware layer-wise relevance propagation for transformers. arXiv preprint arXiv:2402.05602. 10.1038/s42256-023-00711-8 [DOI] [Google Scholar]
- Agyeman, K., Lee, D., Abedi, A., Sakellaridi, S., Kreydin, E., Russin, J., Lo, Y., Wu, K., Choi, W., Iyer, S., Edgerton, V., Liu, C., & Christopoulos, V. (2025). Human spinal cord activation during filling and emptying of the bladder. Nature Communications, 16(1), 6506. 10.1101/2024.02.16.580736 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Agyeman, K. A., Lee, D. J., Russin, J., Kreydin, E. I., Choi, W., Abedi, A., Lo, Y. T., Cavaleri, J., Wu, K., Edgerton, V. R., Liu, C., & Christopoulos, V. (2024). Functional ultrasound imaging of the human spinal cord. Neuron, 112(10), 1710–1722. 10.1016/j.neuron.2024.02.012 [DOI] [PubMed] [Google Scholar]
- Alexey, D. (2020). An image is worth 16 x 16 words: Transformers for image recognition at scale. arXiv preprint arXiv: 2010.11929. 10.20944/preprints202411.2377.v1 [DOI] [Google Scholar]
- Andiné, P., Widermark, N., Axelsson, R., Nyberg, G., Olofsson, U., Mårtensson, E., & Sandberg, M. (1999). Characterization of MK-801-induced behavior as a putative rat model of psychosis. The Journal of Pharmacology and Experimental Therapeutics, 290(3), 1393–1408. 10.1016/s0022-3565(24)35047-5 [DOI] [PubMed] [Google Scholar]
- Biewald, L. (2020). Experiment tracking with weights and biases [Software available from wandb.com]. 10.1089/glre.2016.201011 [DOI]
- Clevert, D.-A. (2015). Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289. 10.20944/preprints202411.2377.v1 [DOI] [Google Scholar]
- Crown, L. M., Agyeman, K. A., Choi, W., Zepeda, N., Iseri, E., Pahlavan, P., Siegel, S. J., Liu, C., Christopoulos, V., & Lee, D. J. (2024). Theta-frequency medial septal nucleus deep brain stimulation increases neurovascular activity in MK-801-treated mice. Frontiers in Neuroscience, 18, 1372315. 10.3389/fnins.2024.1372315 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Demene, C., Baranger, J., Bernal, M., Delanoe, C., Auvin, S., Biran, V., Alison, M., Mairesse, J., Harribaud, E., Pernot, M., Tanter, M., & Baud, O. (2017). Functional ultrasound imaging of brain activity in human newborns. Science Translational Medicine, 9(411), eaah6756. 10.1126/scitranslmed.aah6756 [DOI] [PubMed] [Google Scholar]
- Franceschini, M. A., Radhakrishnan, H., Thakur, K., Wu, W., Ruvinskaya, S., Carp, S., & Boas, D. A. (2010). The effect of different anesthetics on neurovascular coupling. Neuroimage, 51(4), 1367–1377. 10.1016/j.neuroimage.2010.03.060 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu, R., Hu, Q., Dong, X., Guo, Y., Gao, Y., & Li, B. (2020). Axiom-based grad-cam: Towards accurate visualization and explanation of CNNs. arXiv preprint arXiv:2008.02312. 10.5244/c.34.146 [DOI] [Google Scholar]
- Griggs, W. S., Norman, S. L., Deffieux, T., Segura, F., Osmanski, B.-F., Chau, G., Christopoulos, V., Liu, C., Tanter, M., Shapiro, M. G., & Anderson, R. (2023). Decoding motor plans using a closed-loop ultrasonic brain–machine interface. Nature Neuroscience, 27(1), 196–207. 10.1038/s41593-023-01500-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hakopian, E., Stepanian, A. E., Zhong, S., Agyeman, K. A., Zepeda, N., Wu, K., Liu, C., Lee, D. J., & Christopoulos, V. (2025). Functional ultrasound imaging and prewhitening analysis reveal MK-801-induced disruption of brain network connectivity. Frontiers in Pharmacology, 16, 1562102. 10.3389/fphar.2025.1562102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Imbault, M., Chauvet, D., Gennisson, J.-L., Capelle, L., & Tanter, M. (2017). Intraoperative functional ultrasound imaging of human brain activity. Scientific Reports, 7(1), 7304. 10.1038/s41598-017-06474-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 10.1063/pt.5.028530 [DOI] [Google Scholar]
- Lu, F., Wu, F., Hu, P., Peng, Z., & Kong, D. (2017). Automatic 3D liver location and segmentation via convolutional neural network and graph cut. International Journal of Computer Assisted Radiology and Surgery, 12, 171–182. 10.1007/s11548-016-1467-3 [DOI] [PubMed] [Google Scholar]
- Macé, E., Montaldo, G., Cohen, I., Baulac, M., Fink, M., & Tanter, M. (2011). Functional ultrasound imaging of the brain. Nature Methods, 8(8), 662–664. 10.1038/nmeth.1641 [DOI] [PubMed] [Google Scholar]
- Macé, E., Montaldo, G., Osmanski, B.-F., Cohen, I., Fink, M., & Tanter, M. (2013). Functional ultrasound imaging of the brain: Theory and basic principles. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 60(3), 492–506. 10.1109/tuffc.2013.2592 [DOI] [PubMed] [Google Scholar]
- Nishikawa, K.-I., & MacIver, M. (2000). Excitatory synaptic transmission mediated by NMDA receptors is more sensitive to isoflurane than are non-NMDA receptor-mediated responses. Anesthesiology, 92(1), 228. 10.1097/00000542-200001000-00035 [DOI] [PubMed] [Google Scholar]
- Norman, S. L., Maresca, D., Christopoulos, V. N., Griggs, W. S., Demene, C., Tanter, M., Shapiro, M. G., & Andersen, R. A. (2021). Single-trial decoding of movement intentions using functional ultrasound neuroimaging. Neuron, 109(9), 1554–1566. 10.1016/j.neuron.2021.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Osmanski, B.-F., Martin, C., Montaldo, G., Lanièce, P., Pain, F., Tanter, M., & Gurden, H. (2014). Functional ultrasound imaging reveals different odor-evoked patterns of vascular activity in the main olfactory bulb and the anterior piriform cortex. NeuroImage, 95, 176–184. 10.1016/j.neuroimage.2014.03.054 [DOI] [PubMed] [Google Scholar]
- Osmanski, B.-F., Pezet, S., Ricobaraza, A., Lenkei, Z., & Tanter, M. (2014). Functional ultrasound imaging of intrinsic connectivity in the living rat brain with high spatiotemporal resolution. Nature Communications, 5(1), 5023. 10.1038/ncomms6023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park, C., Nehls, D., Teasdale, G., & McCulloch, J. (1989). Effect of the NMDA antagonist MK-801 on local cerebral blood flow in focal cerebral ischaemia in the rat. Journal of Cerebral Blood Flow & Metabolism, 9(5), 617–622. 10.1038/jcbfm.1989.88 [DOI] [PubMed] [Google Scholar]
- Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32, 1–12. 10.52591/lxai2019120813 [DOI] [Google Scholar]
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12, 2825–2830. 10.3389/fninf.2014.00014 [DOI] [Google Scholar]
- Pnevmatikakis, E., & Giovannucci, A. (2017). Normcorre: An online algorithm for piecewise rigid motion correction of calcium imaging data. Journal of Neuroscience Methods, 291, 83–94. 10.1016/j.jneumeth.2017.07.031 [DOI] [PubMed] [Google Scholar]
- Rabut, C., Ferrier, J., Bertolo, A., Osmanski, B., Mousset, X., Pezet, S., Deffieux, T., Lenkei, Z., & Tanter, M. (2020). Pharmaco-fUS: Quantification of pharmacologically-induced dynamic changes in brain perfusion and connectivity by functional ultrasound imaging in awake mice. Neuroimage, 222, 117231. 10.1016/j.neuroimage.2020.117231 [DOI] [PubMed] [Google Scholar]
- Rabut, C., Norman, S. L., Griggs, W. S., Russin, J. J., Jann, K., Christopoulos, V., Liu, C., Andersen, R. A., & Shapiro, M. G. (2024). Functional ultrasound imaging of human brain activity through an acoustically transparent cranial window. Science Translational Medicine, 16(749), eadj3143. 10.1126/scitranslmed.adj3143 [DOI] [PubMed] [Google Scholar]
- Roussel, S., Pinard, E., & Seylaz, J. (1992). The acute effects of MK-801 on cerebral blood flow and tissue partial pressures of oxygen and carbon dioxide in conscious and alpha-chloralose anaesthetized rats. Neuroscience, 47(4), 959–965. 10.1016/0306-4522(92)90043-2 [DOI] [PubMed] [Google Scholar]
- Soloukey, S., Vincent, A. J., Satoer, D. D., Mastik, F., Smits, M., Dirven, C. M., Strydis, C., Bosch, J. G., Steen van der, A. F., De Zeeuw, C. I., Koekkoek, S. K., & Kruizinga, P. (2020). Functional ultrasound (fUS) during awake brain surgery: The clinical potential of intra-operative functional and vascular brain mapping. Frontiers in Neuroscience, 13, 1384. 10.3389/fnins.2019.01384 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vidal, B., Droguerre, M., Valdebenito, M., Zimmer, L., Hamon, M., Mouthon, F., & Charvériat, M. (2020). Pharmaco-fUS for characterizing drugs for Alzheimer’s disease–the case of THN201, a drug combination of donepezil plus mefloquine. Frontiers in Neuroscience, 14, 835. 10.3389/fnins.2020.00835 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vidal, B., Droguerre, M., Venet, L., Zimmer, L., Valdebenito, M., Mouthon, F., & Charvériat, M. (2020). Functional ultrasound imaging to study brain dynamics: Application of pharmaco-fUS to atomoxetine. Neuropharmacology, 179, 108273. 10.1016/j.neuropharm.2020.108273 [DOI] [PubMed] [Google Scholar]
- Wang, Q., Ding, S.-L., Li, Y., Royall, J., Feng, D., Lesnar, P., Graddis, N., Naeemi, M., Facer, B., Ho, A., Dolbeare, T., Blanchard, B., Dee, N., Wakeman, W., Hirokawa, K. E., Szafer, A., Sunkin, S. M., Oh, S. W., Bernard, A.,… Ng, L. (2020). The Allen mouse brain common coordinate framework: A 3D reference atlas. Cell, 181(4), 936–953. 10.1016/j.cell.2020.04.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watanabe, M., Mishina, M., & Inoue, Y. (1994). Distinct distributions of five NMDA receptor channel subunit mRNAs in the brainstem. Journal of Comparative Neurology, 343(4), 520–531. 10.1002/cne.903430403 [DOI] [PubMed] [Google Scholar]
- Zepeda, N. C., Crown, L. M., Medvidovic, S., Choi, W., Sheth, M., Bergosh, M., Gifford, R., Folz, C., Lam, P., Lu, G., Featherstone, R., Liu, C., Siegel, S. J., & Lee, D. J. (2022). Frequency-specific medial septal nucleus deep brain stimulation improves spatial memory in MK-801-treated male rats. Neurobiology of Disease, 170, 105756. 10.1016/j.nbd.2022.105756 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. Proceedings of the IEEE conference on computer vision and pattern recognition, 2921–2929. 10.1109/cvpr.2016.319 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated and analyzed during the current study are available from the corresponding authors on reasonable request and after signing a formal data sharing agreement. The code is available on Github at https://github.com/DeightonJared/fUSI_CAM.






