Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Aug 22.
Published in final edited form as: Proc IEEE Int Symp Biomed Imaging. 2024 Aug 22;2024:10.1109/isbi56570.2024.10635372. doi: 10.1109/isbi56570.2024.10635372

PRESERVING HUMAN LARGE-SCALE BRAIN CONNECTIVITY FINGERPRINT IDENTIFIABILITY WITH RANDOM PROJECTIONS

Duy Duong-Tran 1,2,*,, Mark Magsino 1,*, Joaquín Goñi 3,4,5, Li Shen 2
PMCID: PMC11452154  NIHMSID: NIHMS1974011  PMID: 39371470

Abstract

The complex etiology of various neurodegenerative diseases and psychiatric disorders, especially at the individual level, has posed unmatched challenges to the advancement of personalized medicine. Recent technical advancements in functional magnetic resonance imaging has enabled researchers to map brain large-scale connectivity at an unprecedented level of subject precision. Nonetheless, along with the early dawn of promises in personalized medicine using various neuroimaging modalities rose the challenge of clinical utility of brain connectomics (e.g., functional connectomes). Besides many established challenges of functional connectome utility such as edge reliability, there exists an easily overlooked challenge that does not get the same level of attention: computationality of functional connectome. To improve clinical utility of functional connectomics, we propose a random projection method that would preserve a practically similar level of subject identifiability while sampling and retaining only a proportion of functional edges in subjects’ functional connectome. Our work pave a way towards computational improvements, hence clinical utility, of functional connectomes while not compromising the integrity of biomarkers learnt from whole-brain large-scale functional connectivity imaging modality.

Keywords: brain fingerprint, identifiability, personalized medicine, connectome sub-sampling, random projection

1. INTRODUCTION

Recent development in large-scale network neuroscience has witnessed a flurry of success, depositing compelling evidence of subject-level cognitive signatures using functional Magnetic Resonance Imaging (fMRI) data modality [1, 2, 3]. Indeed, improved scanning acquisition parameters and fMRI technology coupled with explosion of publicly available neuroimaging dataset have supplied unprecedented opportunities to investigate and develop stable subject-level biomarkers, denominated by the concept of brain fingerprint. Brain fingerprint plays an increasingly critical role in both healthy controls and neurodegenerative diseases as it enabled researchers to learn subtle cognitive patterns in each individual that could potentially leads to observable wide spectrum of behavioral differences. In clinical domain, capturing subject-level biomarkers, also known as subject fingerprint, has shown initial promises in advancing personalized medicine for neuro/psychiatric disorders.

The concept of identifiability based on brain connectivity is very novel, originated by the seminal work of Finn and colleagues [4]. In this pioneer work, the authors show that it is feasible to robustly identify the functional connectome (FC)1 of a target subject from a sample database of FCs, simply by computing the spatial (Pearson) correlation of the target FC against the database ones. To quantify the notion of identifiability by comparing between target and database set of FCs, a dataset needs to comprise of, at least, two scans of the same subject for a fixed fMRI task or resting condition. In 2018, using dimensionality reduction technique, Amico and Goni in [3] showed that the degree of subject differentiability (e.g. subject fingerprint) could be thought as a spectrum of identifiability by optimizing heuristically the peer-reviewed and biologically-relevant objective function Identifiability Score (ID):

ID=mean(withinsubjectvariability)mean(betweensubjectvariability) (1)

Along with early signs of promise in connectivity-based brain fingerprint [5, 6] and personalized medicine comes the challenge of clinical utility of functional connectomes [7]. Besides the addressed challenges (e.g., scan length, choice of parcellations) that would impact reliability of FC edges [7], an easily overlooked aspect of clinical utility of functional connectome is its computational cost. In this paper, we show that the identifiability score (ID), as proposed in [3], can be approximated effectively without sampling the entire large-scale FC, but rather a random subset of functional edges. The rest of the paper is structured as follows: in Sections 2 and 3, we propose the approximation model; in Section 4, we create a simulation based on the proposed model; in Section 5, we apply equation 1 and our proposed model to the Human Connectome Project 100 Unrelated subject data release and show, to a great extent, that identifiability score is statistically preserved under random projection. In Section 6, we provide some closing remarks and future directions of random matrix computations in the context of brain connectomics.

2. APPROXIMATION MODEL

Suppose our similarity scores are stored in an m×n matrix X. Rows correspond to the similarity score for a pair of brain regions and columns correspond to each patient. Without loss of generality, we assume that the columns of X are normalized to be unit vectors and have zero mean. Let A be the matrix of correlations of the columns of X. Following [3] we define the identifiability, I, as

I=1μ(Aoff),

where μ(Aoff) is the average of the off diagonal entries of A.

The goal is to approximate A in a way that preserves I. Let s=(s1,,sn) be a random vector where each entry si is an independent Bernoulli random variable with probability of success p. We construct a matrix, Y, by the matrix product

Y=SX,

where S=ssT is the projection matrix corresponding to s. This is equivalent to keeping or discarding each row of X with probability p of keeping each row. Note that the expected number of rows of Y is pm, but the actual number of rows depends on the outcome of the random procedure.

We then compute the Pearson correlations of Y and call the resulting matrix B. This procedure renormalizes and recenters Y so that the correlations of B are comparable to the original normalizations contained in A. We then compute the approximate identifiability score, I¯ in a similar way to before

I¯=1μ(Boff),

where μ(Boff) is the average of the off diagonal entries of B.

3. APPROXIMATING DAY-TO-DAY DIFFERENCES

Suppose our similarity scores for the first snapshot of a network are stored in an m×n matrix X1. We model the scores from another time snapshot as X2=X1+η, where η is a Gaussian noise matrix with small standard deviation relative to the size of the entries of X1. Let A be the matrix of cross-correlations comparing columns of X1 with the columns of X2. This time, we define the identifiability score, I, by

I=μ(Adiag)μ(Aoff),

where Adiag is the average of the diagonal entries of A and Aoff is the average of the off diagonal entries of A. Note that this time, because we are computing cross-correlations between columns of X1 and X2, the main diagonal is not necessarily all ones. Hence, the μ(Adiag) term is necessary. We now approximate this by using the same random projection method in the noiseless case. Create submatrices Y1 and Y2 by applying the random projection matrix S and computing the matrix products

Y1=SX1andY2=SX2

Once again, this is equivalent to randomly keeping or retaining rows X1 and X2 with retention probability p. It is important to either keep rows for both or discard rows for both since we want to apply the same (random) projection matrix to both. We then compute B, the submatrix of Pearson cross-correlations between the columns of Y1 and columns of Y2. We now define the approximate identifiability score, I¯, by

I¯=μ(Bdiag)μ(Boff),

where μ(Bdiag) is the average of the main diagonal of B and μ(Boff) is the average of the off diagonal of B.

4. SIMULATION RESULTS

The code used to perform this simulation is publicly available on GitHub at https://github.com/magsino-usna/ISBI24-brain-id-projection.

Our empirical results show that I can be preserved with high accuracy even when p is as small as .05. For the following experiments, we created three 10000 × 100 data matrices for X. For the first, each entry is drawn from a uniform distribution. For the second, each entry is drawn from a Gaussian. For the last one, a heavily correlated matrix is formed by making each column a copy of the previous column with some white noise added. For the first experiment, we took each matrix, and created 1000 random subsamples with sampling probability p = .05. We computed the identifiability scores for each and compared them to the original matrix. We do this by creating a histogram with the identifiability scores of the subsampled matrices and mark the identifiability score of the original matrix. We found that the identifiability scores form a Gaussian-like distribution that appears to be centered around the identifiability of the original matrix, which can be seen in Figure 1 A-C. We repeated this computing cross-correlations of each matrix with noise-added versions of themselves to simulate day-to-day noise as described in section III and got similar results also depicted in Figure 1 D-F. For the second experiment, we took each matrix and created 1000 random subsamples with various sampling probabilities p. We then created a box-and-whisker plot at each probability level marking the true identifiability score of the original matrix. We observed that the median for each was generally very close to the true identifiability score, with variance decreasing as probability of edge retention goes up. We then repeated this procedure for identifiability scores with cross-correlations of each matrix with noise-added versions of themselves simulating day-to-day changes and saw similar results. These results are depicted in Figure 2. To further highlight the strong accuracy of the approximations, we compared the root mean square error (RMSE) of the trials and plotted them in Figure 3. Even at 1% retention rate the root mean square error was within 2 decimal places in some cases. In all 3 cases, the identifiability was extremely well preserved, with the averages being accurate up to about 4 decimal places. The root mean square error for each type of data is very low, indicating a tight concentration of mean estimate.

Fig. 1.

Fig. 1.

Approximated identifiability of subsampled functional connectivity data. In all experiments, 1000 subsamples are generated by keeping rows with probability 0.05. A-C simulates data with each row as a ”connection” and each column as a ”patient”. D-F simulates day-to-day noise by adding noise to generated data. G is rest data from the Glasser HCI dataset.

Fig. 2.

Fig. 2.

Approximate identifiability scores from functional connectivity data subsampled with various probabilities of keeping rows. All experiments draw 1000 subsamples for each probability. A-C simulates data with each row as a ”connection” and each column as a ”patient”. D-F simulates day-to-day noise by adding noise to generated data. G is rest data from the Glasser HCI dataset.

Fig. 3.

Fig. 3.

Root mean square error plot of approximate identifiability scores from subsampled functional connectivity data. All experiments draw 1000 subsamples for each probability.

5. BRAIN NETWORK IDENTIFIABILITY PRESERVATION THROUGH RANDOM PROJECTION

5.1. Background

Problem Formulation.

Let I be the “identifiability matrix” as defined in [3] (see also Fig. 1). The dimension of I is N×N, where N is the number of subjects. The ID score, which quantifies the differential identifiability (ID=Idiff) is defined, based on equation (1), as follows:

ID=Idiff=IselfIothers=μ(Idiag)μ(Ioff) (2)

where Iself=μ(Idiag) represents the average of the main diagonal elements of I (e.g., within subject variability - the Pearson correlation values between different visits of same subjects); Iothers=μ(Ioff) is the average of the off-diagonal elements of matrix I (e.g., beteen subject variability - the correlation between different visits of different subjects).

HCP Functional Data Processing and FC construction.

The fMRI data from the 100 unrelated subjects in the HCP Q3 release (http://www.humanconnectome.org/) were employed in this study. The two resting-state functional MRI acquisitions (HCP filenames: rfMRI REST1) were acquired in separate sessions on two different days, with two distinct scanning patterns (left to right and right to left) in each day. This release also included fMRI data from seven different fMRI tasks which were not in the scope of work. fMRI Data after downloaded, were processed; FC are constructed using parcellation from Glasser and colleagues which contains 360 cortical brain regions [8], see [3, 9] for further details.

5.2. Results

In Figure 1-G, 1000 simulations were performed on p = 0.05 edge retention probability of resting condition. The empirical result indicated that with as little as 5% retention of the connectome, the identifiability score does not vary significant, compare with the full projection on the functional edge space which lives in the (3602)2 dimensional space. In Figure 2-G, we observed that, as expected, the degree of variation and approximation error significantly reduced with increasing edge retention probability. By randomly sampling half (p = 0.5) of the Glasser-parcellated cortex, we were able to achieve a practically similar identifiability score, compared to the full-blown (e.g., using all functional edges) approach in [3].

6. DISCUSSION AND FUTURE WORK

As the neuroscientific community marches towards big neurodata era, there has emerged a critical quest of associating large-scale brain connectivity features to genetic, behavioral/cognitive performances. To this end, unraveling more robust fingerprinting features in the connectivity domain (both functional and structural) becomes the next most important mission in brain connectomics. Recent advancements of fMRI technology has enabled unprecedented level of subject-level brain connectivity maps, enabling researchers to make tremendous progress towards clinical utility of functional connectomes [7]. Here, we show that although it might be appealing to sample the entire parcellated cortex to extract robust subject differentiability biomarkers, random sampling approach could deliver a practically indifferent outcome without significant compromises. Furthermore, random sampling speeds up FC processing time (a computationally demanding job) and subsequently analyses (e.g., quantifying subject identifiability). In this work, we has pioneered and paved the way towards random matrix computational research to be applied for various challenges in clinical utility of brain connectomics (e.g., functional connectomes). In future studies, random projections should be investigated in different fMRI task conditions [3, 5, 9] and in the context of other addressed challenges in FC clinical utility such as canning length [7], cognitive measure associations [9, 10], neurodegenerative diseases such as Alzheimer’s disease [11], and parcellation choices [11]. Future studies should also focus on economical analysis between computational cost and associated biomarkers. In terms of broader impact, it is worthy to note that our framework can be generalized to other similarity measures among snapshot instances of networked systems.

7. COMPLIANCE WITH ETHICAL STANDARDS

This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of our institutions.

ACKNOWLEDGMENTS

This work was supported in part by the NIH grant RF1 AG068191. Data were obtained from the Human Connectome Project WU-Minn Consortium.

Footnotes

1

FC: a large-scale brain connectivity weighted matrix whose entries represent the functional couplings between pairs of brain region of interest (ROI). Functional coupling is typically measured by Pearson’s correlation.

9. REFERENCES

  • [1].Gratton Caterina, Laumann Timothy O, Nielsen Ashley N, Greene Deanna J, Gordon Evan M, Gilmore Adrian W, Nelson Steven M, Coalson Rebecca S, Snyder Abraham Z, Schlaggar Bradley L, et al. , “Functional brain networks are dominated by stable group and individual factors, not cognitive or daily variation,” Neuron, vol. 98, no. 2, pp. 439–452, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Seitzman Benjamin A, Gratton Caterina, Laumann Timothy O, Gordon Evan M, Adeyemo Babatunde, Dworetsky Ally, Kraus Brian T, Gilmore Adrian W, Berg Jeffrey J, Ortega Mario, et al. , “Trait-like variants in human functional brain networks,” Proceedings of the National Academy of Sciences, vol. 116, no. 45, pp. 22851–22861, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Amico Enrico and Goñi Joaquín, “The quest for identifiability in human functional connectomes,” Scientific reports, vol. 8, no. 1, pp. 8254, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Finn Emily S, Shen Xilin, Scheinost Dustin, Rosenberg Monica D, Huang Jessica, Chun Marvin M, Papademetris Xenophon, and Constable R Todd, “Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity,” Nature neuroscience, vol. 18, no. 11, pp. 1664–1671, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Abbas Kausar, Amico Enrico, Svaldi Diana Otero, Tipnis Uttara, Duong-Tran Duy Anh, Liu Mintao, Rajapandian Meenusree, Harezlak Jaroslaw, Ances Beau M, and Goñi Joaquín, “Geff: Graph embedding for functional fingerprinting,” NeuroImage, vol. 221, pp. 117181, 2020. [DOI] [PubMed] [Google Scholar]
  • [6].Chiêm Benjamin, Abbas Kausar, Amico Enrico, Duong-Tran Duy Anh, Crevecoeur Frédéric, and Goñi Joaquín, “Improving functional connectome fingerprinting with degree-normalization,” Brain Connectivity, vol. 12, no. 2, pp. 180–192, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Svaldi Diana O, Goñi Joaquín, Abbas Kausar, Amico Enrico, Clark David G, Muralidharan Charanya, Dzemidzic Mario, West John D, Risacher Shannon L, Saykin Andrew J, et al. , “Optimizing differential identifiability improves connectome predictive modeling of cognitive deficits from functional connectivity in alzheimer’s disease,” Human brain mapping, vol. 42, no. 11, pp. 3500–3516, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Glasser Matthew F, Coalson Timothy S, Robinson Emma C, Hacker Carl D, Harwell John, Yacoub Essa, Ugurbil Kamil, Andersson Jesper, Beckmann Christian F, Jenkinson Mark, et al. , “A multi-modal parcellation of human cerebral cortex,” Nature, vol. 536, no. 7615, pp. 171–178, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Duong-Tran Duy, Abbas Kausar, Amico Enrico, Corominas-Murtra Bernat, Dzemidzic Mario, Kareken David, Ventresca Mario, and Goñi Joaquín, “A morphospace of functional configuration to assess configural breadth based on brain functional networks,” Network Neuroscience, vol. 5, no. 3, pp. 666–688, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Garai Sumita, Xu Frederick, Duong-Tran Duy Anh, Zhao Yize, and Shen Li, “Mining correlation between fluid intelligence and whole-brain large scale structural connectivity,” AMIA Summits on Translational Science Proceedings, vol. 2023, pp. 225, 2023. [PMC free article] [PubMed] [Google Scholar]
  • [11].Xu Frederick, Garai Sumita, Duong-Tran Duy, Saykin Andrew J, Zhao Yize, and Shen Li, “Consistency of graph theoretical measurements of alzheimer’s disease fiber density connectomes across multiple parcellation scales,” in 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2022, pp. 1323–1328. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES