Estimating Intrinsic Manifold Dimensionality to Classify Task-Related Information in Human and Non-Human Primate Data

Zachary Bretton-Granatoor; Hannah Stealey; Samantha R Santacruz; Jarrod A Lewis-Peacock

doi:10.1109/BioCAS54905.2022.9948604

. Author manuscript; available in PMC: 2023 Feb 21.

Published in final edited form as: IEEE Biomed Circuits Syst Conf. 2022 Nov 16;2022:650–654. doi: 10.1109/BioCAS54905.2022.9948604

Estimating Intrinsic Manifold Dimensionality to Classify Task-Related Information in Human and Non-Human Primate Data

Zachary Bretton-Granatoor ^1,¹, Hannah Stealey ^2,¹, Samantha R Santacruz ³, Jarrod A Lewis-Peacock ⁴

PMCID: PMC9942267 NIHMSID: NIHMS1873284 PMID: 36820790

Abstract

Feature selection, or dimensionality reduction, has become a standard step in reducing large-scale neural datasets into usable signals for brain-machine interface and neurofeedback decoders. Current techniques in fMRI data reduce the number of voxels (features) by performing statistics on individual voxels or using traditional techniques that utilize linear combinations of features (e.g., principal component analysis (PCA)). However, these methods often do not account for the cross-correlations found across voxels and do not sufficiently reduce the feature space to support efficient real-time feedback. To overcome these limitations, we propose using factor analysis on fMRI data. This technique has become increasingly popular for extracting a minimal number of latent features to explain high-dimensional data in non-human primates (NHPs). Here, we demonstrate these methods in both NHP and human data. In NHP subjects (n=2), we reduced the number of features to an average of 26.86% and 14.86% of the total feature space to build our multinomial classifier. In one NHP subject, the average accuracy of classifying eight target locations over 64 sessions was 62.43% (+/−6.19%) compared to a PCA-based classifier with 60.26% (+/−6.02%). In healthy fMRI subjects, we reduced the feature space to an average of 0.33% of the initial space. Group average (n=5) accuracy of FA-based category classification was 74.33% (+/− 4.91%) compared to a PCA-based classifier with 68.42% (+/−4.79%). FA-based classifiers can maintain the performance fidelity observed with PCA-based decoders. Importantly, FA-based methods allow researchers to address specific hypotheses about how underlying neural activity relates to behavior.

Keywords: fMRI, non-human primate, electrophysiology, factor analysis, dimensionality reduction, feature selection, classification

I. Introduction

Rapid advancements in recording technologies have enabled researchers to collect large-scale neural datasets. Surveying the entire brain or large sums of neural activity is ideal when investigating complex behaviors, but interpreting this high-dimensional data in terms of low-dimensional behavioral output presents a challenge. Reducing the number of features is necessary to reduce the noise of the input data, gain interpretable neuroscientific insight, and train pattern classifier algorithms for neurofeedback applications.

Functional magnetic resonance imaging (fMRI) data typically contains up to 100,000 voxels in whole brain recordings, collected every 1 or 2 seconds for about 60 minutes. fMRI data often suffers from the “curse of dimensionality,” where the feature dimensions outweigh the available samples to train on, requiring a reduction in dimensions to ensure validity in the results [1]. Statistical tests and dimensionality reduction techniques can produce meaningful results while reducing computational demands. A typical first step is anatomical feature selection, whereby a priori regions of interest are selected based on experimental task demands or specific hypotheses. The next step is functional feature selection, whereby statistical tests (e.g., t-tests, ANOVAs, Pearson’s correlations) are computed for individual voxels to determine if the activity correlates with the experimental variables [2]. These methods can still leave thousands to tens of thousands of voxels for analysis.

Typical voxel-wise feature selection methods do not reduce the cross-correlations between voxels, so the selected features may still be redundant. Alternative methods (i.e., wrapper methods) allow for the discovery of interactions among features but are computationally expensive. They require extensive heuristic optimization approaches with no universal parameter and hyperparameter selection guidelines. Furthermore, this unsupervised method may result in relevant features being removed. An alternative technique to the traditional feature selection approach is some form of dimensionality reduction (e.g., principal component analysis (PCA), independent component analysis (ICA), or Random Forest) [2]. While these methods can reduce the feature dimensions, the minimization often occurs at the loss of unique information represented in the higher dimensional data that can contribute to decoding performance. For example, PCA, which assumes no variance is unique, can miss vital multivariate information necessary for proficient classifier performance [3].

In the non-human primate (NHP) space, researchers have applied factor analysis (FA) to transform neural spike data before input into a brain-machine interface (BMI) [4]–[7]. This aims to define a low-dimensional space that captures common neural patterns in task-relevant activity. This plane, coined the “intrinsic manifold,” is determined by latent factors not immediately observable by traditional dimensionality reduction techniques. Whereas conventional methods (e.g., PCA) derive an independent linear combination of features, FA extracts salient information from factors that describe common sources of variance. While this approach has been explored in NHP data, its application to fMRI data has yet to be thoroughly investigated. Often we find that insight gained from animal models can be challenging to transfer directly to humans [8]. Our work seeks to address these limitations by applying FA to human fMRI data and NHP spiking data in parallel to emphasize the potential for cross-species translation of this approach. We demonstrate how FA captures task-relevant neural patterns and maintains classification accuracy relative to traditional methods in both model species while substantially reducing the number of features.

II. METHODS

A. fMRI Data Collection and Pre-Processing

A total of 25 healthy participants (13 Female/12 Male: age; M=20.413, SD=2.44, handedness: right = 25) were recruited from the Austin, Texas, area for the fMRI study. A subset of five participants was used in the proof-of-concept analysis presented here. All participants had normal or corrected-to-normal vision, provided informed consent, and received $60 in compensation.

MRI data were acquired on a Siemens Skyra 3.0 Tesla scanner at the Biomedical Imaging Center on the campus of The University of Texas at Austin. Functional MRI scans of a picture viewing task were acquired using a sequence with the following parameters: TR (repetition time) = 1000ms, TE (echo time) = 30ms, FOV (field of view) = 230mm - 100% phase, multiband acceleration factor = 4, with a 2.4 × 2.4 × 2.4 mm³ voxel size, acquired across 56 axial slices and aligned along the anterior commissure-posterior commissure line. Six runs were acquired in total, with each run consisting of 246 echo planar images (EPIs), for a total of 1476 images.

To characterize the image representations used in the task (faces and scenes), we focused our analyses on the ventral visual stream (VVS) in the occipitotemporal lobes [9]. This mask consists of anatomically defined regions: intracalcarine cortex, lingual gyrus, lateral occipital cortex, occipital fusiform gyrus, occipital pole, parahippocampal gyrus, temporal fusiform cortex, temporal occipital fusiform cortex, inferior temporal gyrus, middle temporal gyrus, superior temporal gyrus, and temporal pole. To construct masks for each participant, individual masks of these regions were extracted from each subject’s parcellated cortical MNI map (from fmriprep) and were summed together. The ROI masks were then binarized so that voxels within the mask had a value of 1 and voxels outside the mask had a value of 0 (VVS: M = 15,122, SD = 540).

fMRI results included in this manuscript come from preprocessing performed using fMRIPrep 21.0.1 [10], [11], which is based on Nipype 1.6.1 [12], [13].

The fmriprep preprocessing pipeline was used to prepare the BOLD data for further processing (https://fmriprep.org/). This Nipype-based analysis pipeline combines software from several neuroimaging packages with custom-written code to apply state-of-the-art preprocessing algorithms to structural and functional neuroimages. The steps involved brain extraction, distortion correction, head motion correction, and confound calculation. The fMRI data used here were collected in an ongoing project in our lab. They consist of a picture viewing task in which subjects performed a recognition memory test on a set of face and scene images (a subset of which was shown during a prior experimental phase). Participants completed six runs (4.1 min each, 24.60 min total), with 30 trials per run, for a total of 180 trials (60 faces/120 scenes). A trial consisted of a unique image presentation for 2 sec, followed by a memory response and an inter-trial interval jittered between 5 – 7 secs. The 2 sec of brain data during which the stimulus was present on the screen were labeled according to the image category (face or scene). These labels were shifted forward by 5 sec to account for the hemodynamic delay.

B. NHP Data Collection and Pre-Processing

Rhesus macaque monkeys (Macaca mulatta; n=2) were trained in a center-out BMI task in which subjects volitionally modulated neural activity from their pre-motor and motor cortices to control a computer cursor. On a given trial, subjects moved the cursor from the center of the screen to the presented peripheral target located 10 cm from the center. Eight target locations were defined, with targets spaced evenly in a circle. Daily sessions included a baseline block of 384 trials (48 trials times eight targets), which was used for the analyses presented here. Subject A completed 64 sessions, and Subject B completed 40 sessions with an average of 37 (+/−7) and 106 (+/−15) decoder neurons, respectively. Bins of 100ms summed spiking activity, aligned to 10Hz decoder updates, were used for analysis.

C. Factor Analysis and Estimating Intrinsic Manifold Dimensionality

FA is a dimensionality reduction technique used to identify hidden structures in data by maximizing the shared variance explained. Through maximum likelihood estimation (MLE), combinations of neural units are weighted by their factor loadings to form latent factors. Factor loadings are the amount that each feature correlates with a latent variable. The distribution of data is assumed to take a Gaussian distribution. Using (1), which is the probabilistic function that is maximized through MLE, the variance of the data (x) can be decomposed into shared (WW.T) and private (ψ) components.

p (x) = N (μ, W W . T + ψ)

(1)

Shared variance is maximized in the first factor, with subsequent factors contributing a decreasing proportion of explained shared variance. To calculate the shared variance explained by factors, we squared the factor loadings and summed across neural units (i.e., (NHP) decoder neurons and (fMRI) voxels).

To determine the number of factors that define the estimated intrinsic dimensionality (EID), we computed the average log likelihood (LL) of observing the data transformed by a specific number of factors. For fMRI data, 12 factor amounts (2, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100) were tested. In NHP data, we used the first 160 seconds (1600 bins of 100ms) from each session’s baseline to replicate the procedure from Sadtler et al. [4]. We then tested 19 factors for subject A and 72 factors for subject B, with the minimum number of factors being two and the maximum number of factors being defined by the lowest number of decoder units from each subject’s sessions (A:21, B:74). The maximum in the curve was interpreted as the number of factors that define the intrinsic manifold. Factor analysis was performed using the scikit-learn Python library.

D. Classifier Training and Validation

We transformed the data to validate that the EID contained sufficient task-relevant information and subsequently built non-multinomial (one-vs.-others) logistic regression models with L2 regularization. A three-category classifier (face, scene, and rest) was built on the FA-transformed fMRI data. In the NHP data, an eight-category classifier was built (one class per target) on FA-transformed spiking data. The L2 logistic regression model training and cross-validation were performed using the scikit-learn Python library.

fMRI data were iteratively broken up into samples of each class (face, scene, and rest). There were twice as many scene trials as face trials, so only half of the scene trials and a matching sample of the resting periods between trials were selected for analysis. This resulted in an equal set of 120 face, scene, and rest samples per subject. To construct the model and validate its performance, these data were split in half, and we performed a 2-fold cross-validation. This procedure was then repeated, using the other half of the scene trials, so all data was used.

NHP spiking data were aligned to trials and labeled based on the end target location. This resulted in eight classes of equal sizes (48 trials/target). To ensure that the datasets were balanced when training a classifier, a single time point (100ms bin, 500ms after center hold) of spiking data were extracted from each trial. To construct the model and validate its performance, we performed 4-fold cross-validation. In this, the data were split into four balanced folds. Three of the four folds were used to train the classifier, and the remaining fold was used to test the model’s performance. This process was repeated until each fold was used as a training set. The folds were then randomly resampled, and the entire process was repeated over 10,000 iterations.

To define random-chance performance, we created empirical null distributions. We randomized the training data labels prior to assessing the classifier accuracy. Repeating this process 10,000 times created a distribution that enabled us to determine the chance level empirically. Scores that fell above the 95% percentile of this distribution can be accepted as above chance performance.

Results are presented in a mean confusion matrix in which the on-diagonals represent the percentage of correctly classified (i.e., true positive) testing labels. As an additional performance comparison, we repeated this process using PCA as a reduction technique in place of FA. We used the average estimated intrinsic manifold dimensionality determined by the initial log-likelihood procedure in both FA and PCA processes.

III. Results

A. Intrinsic Manifold Dimensionality

In Fig. 1, we demonstrate example log-likelihood plots (arbitrary units) for NHP Subject A (Fig. 1A) and five human fMRI subjects (Fig. 1D). We defined the maximum of these plots as the EID. In both humans and NHPs, the EID was a fraction of the total possible features. Specifically, we found the EID to be, on average, 27% (15 %) of the total neural units in NHP subjects (Fig. 1B). Notably, in the fMRI data, the number of features is reduced to 0.33% of the initial dimensionality (after selecting for voxels in our VVS ROI). Not only does the FA transformation reduce the “noise” in the data, but it also retains relevant information within individual voxels and between voxels. Across the five subjects, an average EID of 50 factors is observed. We posit that the difference in percent dimensionality across species is due to the nature of the features. The NHP data consists of a priori spike-sorted neural units, which would have more private components of variability. Conversely, the fMRI voxels capture more population-level activity, contributing to shared variability. Therefore, we expect the feature reduction to be more drastic in fMRI data.

B. Classification Performance

To train multinomial logistic models on NHP data, we transformed each session’s data by the average number of factors (10; Fig 1B, blue). Over Subject A’s 64 sessions, the average classification accuracy was 62.43% (Fig. 2), which is significantly above chance performance (12.47%). Notably, the performance for accurately classifying the eight targets (on-diagonal) was not significantly different (ANOVA: p >> 0.05). Overall, the 10-dimensional feature set captured approximately equal information for decoding each target. Furthermore, the FA-based decoder performed similarly to and with some improvement to the PCA-based decoder using the same number of features (Mean - FA: 62.43%, PCA: 60.26%).

Fig. 2. — NHP classifier confusion matrix for decoding the eight target locations based on spiking data aligned to cursor updates. Mean chance level of 12.47%. Colorbar indicates percent correct [N = 64 sessions].

One limitation for fitting an FA model on fMRI data is that the maximum possible dimensionality is limited by the lowest dimension: the number of trials. However, we found that the EID was well within the maximum number of samples and including additional factors beyond the estimated intrinsic dimensionality did not significantly improve performance.

In the fMRI data, we found the category classifier performance retained performance fidelity (Fig. 3) when compared to a classifier trained on data that was feature selected with an ANOVA F-test (Accuracy: (FA) 74.3% vs. (ANOVA) 71.1%). Notably, the number of features used to train the FA model versus the ANOVA model was approximately a 28-fold reduction (mean number of features: (FA) 50 vs. (ANOVA) 1,413). While there was some error in identifying the face and scene classes, the results were well above chance (34.3%). The FA-based decoder had a similar level of accuracy to PCA in identifying the categories, using the same number of features (Mean – FA: 74.3%, PCA: 68.42%). In addition, the results show that the FA-fitted model could robustly separate task-negative samples (rest) from task-positive (scenes/faces) with a low error rate.

IV. Conclusions

We presented evidence that FA provides a dimensionality reduction framework that maintains classification performance in both NHP spiking data and human fMRI data. Importantly, we observe that FA was able to achieve this performance using far fewer features than the traditional voxel-wise ANOVA selection method. Reducing the feature dimensions by orders of magnitude while preserving vital task-relevant information shows the robustness of this dimensionality reduction technique in fMRI data. This research serves as a bridge for translating more complex factor analyses from the NHP space to fMRI data, including rotating the dimensions to align directly with task variables to increase classification performance.

In addition to being a valid feature selection method, FA can be used to define a low-dimensional intrinsic manifold, as presented here. This subspace of neural activity can be used to define and capture neural representations of task-relevant variables. Borrowing from the NHP domain, this process would allow fMRI researchers to identify patterns of neural activity that are easier to learn (e.g., within-manifold) or more difficult to learn (e.g., outside-manifold) and how representations change over time. Gaining mechanistic insight into the learning process will help to improve neurofeedback paradigms that have implications ranging from stroke rehabilitation to alternative treatments for neuropsychiatric disorders.

Acknowledgments

This work was supported by funds from the National Institute of Health Training Grant T32-MH-106454 for Z.H.B.G. and the National Institute of Health Grant R01-EY 028746 awarded to J.A.L.-P. H.M.S. is funded by the National Defense Science and Engineering Graduate Fellowship.

Contributor Information

Zachary Bretton-Granatoor, Institute for Neuroscience The University of Texas at Austin Austin, TX, USA.

Hannah Stealey, Department of Biomedical Engineering The University of Texas at Austin Austin, TX, USA.

Samantha R. Santacruz, Department of Biomedical Engineering The University of Texas at Austin Austin, TX, USA

Jarrod A. Lewis-Peacock, Department of Psychology The University of Texas at Austin Austin, TX, USA

REFERENCES

[1].Lewis-Peacock JA and Norman KA, “Multi-Voxel Pattern Analysis of fMRI Data,” in The Cognitive Neurosciences, MIT Press, 2014, pp. 911–920. [Google Scholar]
[2].Mwangi B, Tian TS, and Soares JC, “A Review of Feature Reduction Techniques in Neuroimaging,” Neuroinform, vol. 12, no. 2, pp. 229–244, Apr. 2014, doi: 10.1007/s12021-013-9204-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Thomas CG, Harshman RA, and Menon RS, “Noise Reduction in BOLD-Based fMRI Using Component Analysis,” NeuroImage, vol. 17, no. 3, pp. 1521–1537, Nov. 2002, doi: 10.1006/nimg.2002.1200. [DOI] [PubMed] [Google Scholar]
[4].Sadtler PT et al. , “Neural constraints on learning,” Nature, vol. 512, no. 7515, pp. 423–426, Aug. 2014, doi: 10.1038/nature13665. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Hennig JA et al. , “Constraints on neural redundancy,” eLife, vol. 7, p. e36774, Aug. 2018, doi: 10.7554/eLife.36774. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Golub MD et al. , “Learning by neural reassociation,” Nat Neurosci, vol. 21, no. 4, pp. 607–616, Apr. 2018, doi: 10.1038/s41593-018-0095-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Oby ER et al. , “New neural activity patterns emerge with long-term learning,” Proc. Natl. Acad. Sci. U.S.A, vol. 116, no. 30, pp. 15210–15215, Jul. 2019, doi: 10.1073/pnas.1820296116. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Leenaars CHC et al. , “Animal to human translation: a systematic scoping review of reported concordance rates,” J Transl Med, vol. 17, no. 1, p. 223, Dec. 2019, doi: 10.1186/s12967-019-1976-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Grill-Spector K and Weiner KS, “The functional architecture of the ventral temporal cortex and its role in categorization,” Nat Rev Neurosci, vol. 15, no. 8, pp. 536–548, Aug. 2014, doi: 10.1038/nrn3747. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Esteban O et al. , “FMRIPrep: a robust preprocessing pipeline for functional MRI,” p. 30, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Esteban O et al. , “Crowdsourced MRI quality metrics and expert quality annotations for training of humans and machines,” Sci Data, vol. 6, no. 1, p. 30, Dec. 2019, doi: 10.1038/s41597-019-0035-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Gorgolewski K et al. , “Nipype: A Flexible, Lightweight and Extensible Neuroimaging Data Processing Framework in Python,” Front. Neuroinform, vol. 5, 2011, doi: 10.3389/fninf.2011.00013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Esteban Oscar et al. , “nipy/nipype: 1.8.1.” Zenodo, May 16, 2022. doi: 10.5281/ZENODO.6555085. [DOI] [Google Scholar]

[R1] [1].Lewis-Peacock JA and Norman KA, “Multi-Voxel Pattern Analysis of fMRI Data,” in The Cognitive Neurosciences, MIT Press, 2014, pp. 911–920. [Google Scholar]

[R2] [2].Mwangi B, Tian TS, and Soares JC, “A Review of Feature Reduction Techniques in Neuroimaging,” Neuroinform, vol. 12, no. 2, pp. 229–244, Apr. 2014, doi: 10.1007/s12021-013-9204-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Thomas CG, Harshman RA, and Menon RS, “Noise Reduction in BOLD-Based fMRI Using Component Analysis,” NeuroImage, vol. 17, no. 3, pp. 1521–1537, Nov. 2002, doi: 10.1006/nimg.2002.1200. [DOI] [PubMed] [Google Scholar]

[R4] [4].Sadtler PT et al. , “Neural constraints on learning,” Nature, vol. 512, no. 7515, pp. 423–426, Aug. 2014, doi: 10.1038/nature13665. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Hennig JA et al. , “Constraints on neural redundancy,” eLife, vol. 7, p. e36774, Aug. 2018, doi: 10.7554/eLife.36774. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Golub MD et al. , “Learning by neural reassociation,” Nat Neurosci, vol. 21, no. 4, pp. 607–616, Apr. 2018, doi: 10.1038/s41593-018-0095-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Oby ER et al. , “New neural activity patterns emerge with long-term learning,” Proc. Natl. Acad. Sci. U.S.A, vol. 116, no. 30, pp. 15210–15215, Jul. 2019, doi: 10.1073/pnas.1820296116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Leenaars CHC et al. , “Animal to human translation: a systematic scoping review of reported concordance rates,” J Transl Med, vol. 17, no. 1, p. 223, Dec. 2019, doi: 10.1186/s12967-019-1976-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Grill-Spector K and Weiner KS, “The functional architecture of the ventral temporal cortex and its role in categorization,” Nat Rev Neurosci, vol. 15, no. 8, pp. 536–548, Aug. 2014, doi: 10.1038/nrn3747. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Esteban O et al. , “FMRIPrep: a robust preprocessing pipeline for functional MRI,” p. 30, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Esteban O et al. , “Crowdsourced MRI quality metrics and expert quality annotations for training of humans and machines,” Sci Data, vol. 6, no. 1, p. 30, Dec. 2019, doi: 10.1038/s41597-019-0035-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Gorgolewski K et al. , “Nipype: A Flexible, Lightweight and Extensible Neuroimaging Data Processing Framework in Python,” Front. Neuroinform, vol. 5, 2011, doi: 10.3389/fninf.2011.00013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Esteban Oscar et al. , “nipy/nipype: 1.8.1.” Zenodo, May 16, 2022. doi: 10.5281/ZENODO.6555085. [DOI] [Google Scholar]

PERMALINK

Estimating Intrinsic Manifold Dimensionality to Classify Task-Related Information in Human and Non-Human Primate Data

Zachary Bretton-Granatoor

Hannah Stealey

Samantha R Santacruz

Jarrod A Lewis-Peacock

Abstract

I. Introduction

II. METHODS

A. fMRI Data Collection and Pre-Processing

B. NHP Data Collection and Pre-Processing