Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Aug 22.
Published in final edited form as: Proc IEEE Int Symp Biomed Imaging. 2025 May 12;2025:10.1109/isbi60581.2025.10980709. doi: 10.1109/isbi60581.2025.10980709

TV-BASED DEEP 3D SELF SUPER-RESOLUTION FOR FMRI

Fernando Pérez-Bueno a,*, Hongwei B Li b, Matthew S Rosen b, Shahin Nasr b, César Caballero-Gaudes a,c,**, Juan E Iglesias b,e,d,**
PMCID: PMC12370177  NIHMSID: NIHMS2104447  PMID: 40852640

Abstract

While functional Magnetic Resonance Imaging (fMRI) offers valuable insights into cognitive processes, its inherent spatial limitations pose challenges for detailed analysis of the fine-grained functional architecture of the brain. More specifically, MRI scanner and sequence specifications impose a trade-off between temporal resolution, spatial resolution, signal-to-noise ratio, and scan time. Deep Learning (DL) Super-Resolution (SR) methods have emerged as a promising solution to enhance fMRI resolution, generating high-resolution (HR) images from low-resolution (LR) images typically acquired with lower scanning times. However, most existing SR approaches depend on supervised DL techniques, which require training ground truth (GT) HR data, which is often difficult to acquire and simultaneously sets a bound for how far SR can go. In this paper, we introduce a novel self-supervised DL SR model that combines a DL network with an analytical approach and Total Variation (TV) regularization. Our method eliminates the need for external GT images, achieving competitive performance compared to supervised DL techniques and preserving the functional maps.

Keywords: fMRI, Super Resolution, Self-Supervised, Deep Learning, Total Variation

1. INTRODUCTION

fMRI provides a non-invasive window for observing brain function in vivo. However, the spatial resolution of fMRI scans is significantly lower than other MRI approaches, as it is limited by the trade-off with scan time, spatial resolution, and signal-to-noise ratio [1]. The use of ultra-high-field scanners enables the capture of images at the mesoscale organization of the brain, yet the acquisition of higher spatial resolution remains challenging due to the increased scan times required to compensate for the low signal and contrast-to-noise ratio. Specifically, limits in spatial resolution hampers the ability to study fine-scale neural processes involved in sensory and cognitive processing, e.g. at the level of cortical layers or small subcortical regions.

Super Resolution (SR) methods, especially Deep Learning (DL) based, have arisen as a possible solution to circumvent the limitations of fMRI [2]. However, most recent SR approaches for fMRI use a supervised approach that relies on pairs of LR and HR images [3]. While this approach can effectively yield SR fMRI images, it has two major drawbacks: (i) It requires access to adequate GT images for supervised learning, and (ii) it cannot surpass current fMRI limitations, as the resolution is limited to that of the available GT images.

This paper presents a novel approach to overcoming these limitations through the development of a self-supervised 3D DL SR method specifically designed for fMRI data. Our model assumes a degradation model and a Total Variation (TV) prior for the HR images and trains a Convolutional neural network (CNN) using only the observed data. Our method does not require HR GT for training and can double the resolution of the input.

1.1. Related work

Very few works have explored the SR of fMRI and its impact on functional maps. One of the pioneer works was the 2D patch-based approach in [4], which combined an adapted fMRI acquisition with in-plane model-based SR with a Huber regularization. This work, and its extension in [5], facilitate the acquisition of fMRI by reducing the slice thickness with SR.

The lack of matching fMRI training data, which should have the same contrast, field of view, sequence, and even possible pathology as the test data to process, makes it difficult to adapt supervised SR methods for every need. This domain shift problem motivated the works in [6] by using a Generative Adversarial Network (GAN) to produce synthetic HR fMRI using T2*-weighted HR images as a reference, and in [2] by training a 3D-CNN on resting state (RS) images and testing on visual-related tasks.

SR methods are explored more widely in other MRI modalities. The supervised approach with paired LR (synthetic) and HR training data is the most common and has been explored using diverse methods such as Compressive Sensing [7], Random Forest [8], or non-parametric patch-based [9]. Supervised CNNs have been applied to the field of MRI SR in combination with spline interpolation [10] or nearest neighbors [11]. The different resolutions between planes on anisotropic MRI have also been used as self-similarity [12, 13] to reconstruct thick slice MRI.

However, despite recent advances in DL, single-image analytical interpolation methods are still commonly used in MRI [13]. They use an observation model for the LR image and regularization terms for the HR image such as TV [14, 15], or low-rank matrices [15], but require optimization for each new image.

2. METHOD

We propose a DL SR CNN, trained in a self-supervised loss building on the classical analytical approach [16]. Let us denote y the vectorized HR 3D image (unavailable), and x the vectorized LR image (observed).

The relation between the HR and LR images is defined by x = DHy+n = By+n where n is assumed to be independent Gaussian noise, and B = DH is a downsampling operator that combines decimation (D) and blurring (H). Then, the observation model can be expressed using a normal distribution.

p(x|y;λ2)=𝒩(x|By,λ2I) (1)

where λ2 is the noise variance. Although the actual downsampling operator is also unknown, we assume it is linear, which is a common choice in SR [2, 10].

A TV prior is included for the unknown HR image y to promote a sharp image. This prior is a common choice [14, 15] in analytical approaches. It is easy to calculate and has good noise-reduction properties while preserving edges in the image. Furthermore, it has been proven to work well in combination with deep priors[17]. We use p(y|α)exp[αTV(y)] with α>0 controlling the image smoothness. The TV function is the isotropic 3D norm of the first-order differences for each axis at each image voxel.

2.1. Deep inference

Instead of classical variational inference, which requires bounding the TV prior and re-training the model for each new image [18], we use a combination of analytical and DL modeling [16], where a CNN fθ(·) predicts a MAP estimate for y (see figure 1).

Fig. 1.

Fig. 1.

Overview of the proposed self SR model. The observed LR image is fed to the SR Network to produce an HR estimation. During training, the HR output is used to calculate the TV regularization and downsampled to be compared to the observed input.

This choice simplifies the Evidence Lower Bound (ELBO) into the classical CNN loss function with fidelity and regularization terms [16]. The noise in the observation model is assumed constant for all images in the dataset. Hence, both terms are weighted using the confidence parameter α on the TV prior:

=xBfθ(x)22+αTV(fθ(x)) (2)

The first term (“fidelity”) encourages the SR network fθ to produce an output that, when downsampled, is as close as possible to the observed LR image. The second term (“TV regularizer”) promotes a sharp HR image where the edges are preserved.

This leads to a self-supervised model that requires no HR GT image for training. When α = 0 it becomes an unconstrained Deep Image Prior (DIP) approach [19] where the only requisite for an estimated HR image is to be the output of a CNN.

2.2. Network architecture and training details

Following [2] we use trilinear interpolation followed by a 3D fully CNN with ten dense residual layers, kernel size 3, and zero padding. The interpolation enables us to use the same architecture, irrespective of the upscaling factor used. The convolutional layers and residual connections learn local features that allow us to recover fine details in the upscaled image. We use an Adam optimizer with an initial learning rate of 10−3, halved on the plateau after 5 epochs.

We train the SR network for a fixed upsampling factor f ∈ {1.25, 1.5, 1.75, 2}. The network can be trained using a single image (non-amortized) or with a dataset of images (amortized). While the single-image training might offer a higher structural fidelity, the amortized approach significantly reduces the testing computational cost and might improve SR performance [10]. Therefore, our network is trained with random images from the time series of as many subjects and runs are available in a given dataset.

3. EXPERIMENTS AND RESULTS

3.1. Data: Gorgolewski Resting State dataset

This publicly available dataset [20] contains whole-brain T2*-weighted fMRI scans acquired at 7T with a gradient-recalled-echo echo planar imaging sequence during 15 min while subjects remained at rest with eyes-open (FOV= 192 × 192mm2 (R-L; A-P), matrix size=128 × 128, 70 axial oblique slices with 1.5 mm isotropic voxel size, TR=3.0 s, TE=17 ms, FA=70°, Partial Fourier 6/8, GRAPPA=3 with 36 reference lines). For further details, please see [20]. We used data from 12 participants, avoiding those with reported acquisition issues. The first eight subjects were used for training and the other four for validation.

3.2. Image quality comparison

We compare our proposed method with standard interpolation approaches (trilinear, nearest neighbor, and cubic spline implemented in nilearn) and the following DL-SR methods: the resolution agnostic method proposed by Li et al. [2] (Agnostic-Li), the fixed-factor supervised version of the same method (Super-Li), our base network [10] using supervised training (Super-MSE), and the unconstrained DIP approach [19]. For this comparison, we generated synthetic LR observations that were then upsampled to the resolution of the original images (1.5mm). The value of the hyperparameter α in eq. (2) was experimentally set to 0.01 after a grid search in the interval {0, 0.1}. Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) were calculated between the original (GT) and reconstructed images using validation data.

Table 1 shows the comparison with the interpolation methods, where only the validation data is used and no GT is required. Table 2 reports the results of the DL approaches, where the methods are trained and validated in separated subjects. The training subset is the same for all methods, but the supervised approaches have access to the training GT while DIP and the proposed method only have access to the LR observations. The proposed method outperforms the interpolation methods and is competitive against the supervised DL approaches despite training only with the LR images. The proposed method also outperforms the unconstrained DIP approach, showing that the addition of the TV prior can improve the performance to up to 1.28 dB for an upscaling factor of ×1.25. The larger training set used in table 2 is beneficial for the proposed method, which performs better than when trained only with the validation set.

Table 1.

Comparison with interpolation methods. No GT is required, all methods run on the validation set only.

Input (mm) Factor Trilinear Nearest 3D-spline Proposed
PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
1.875 ×1.25 78.16 0.9446 74.92 0.9351 81.01 0.9571 83.93 0.9673
2.25 ×1.5 75.07 0.9265 70.68 0.9077 75.57 0.9334 79.76 0.9497
2.62 ×1.75 73.54 0.9169 66.90 0.8712 71.60 0.9073 76.50 0.9365
3 ×2 69.59 0.8855 68.47 0.8874 67.42 0.8729 73.53 0.9218

Table 2.

Comparison with DL methods. Separated Training/Validation sets. Target resolution 1.5mm

input (mm) Factor Agnostic-Li Super-Li Super-MSE DIP Proposed
PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
1.857 ×1.25 84.09 0.9707 85.91 0.9733 86.21 0.9731 84.54 0.9648 85.82 0.9713
2.25 ×1.5 81.32 0.9590 80.46 0.9572 82.46 0.9607 79.85 0.9503 80.04 0.9546
2.62 ×1.75 77.43 0.9426 78.58 0.9468 78.66 0.9460 75.74 0.9273 76.72 0.9368
3 ×2 73.78 0.9204 76.32 0.9375 76.27 0.9332 73.17 0.9104 73.45 0.9192

3.3. Functional analysis

In this section, we evaluate the effect of the proposed SR algorithm on seed-based correlation analyses to assess functional connectivity in single datasets. A minimal data preprocessing pipeline was applied to a single subject’s original and reconstructed images in the testing set, including volume realignment, nuisance regression of up to 4th-order Legendre polynomials, motion parameters, and their derivatives, band-pass filtering between 0.005 – 2 Hz, censoring scans with excessive motion (Euclidean norm of motion derivatives ≥ 0.3), and spatial smoothing with 3mm FWHM Gaussian kernel1. Seed correlation maps were calculated using AFNI’s @InstaCorr plugin from a seed of radius of 3mm.

The original and reconstructed brain masks from 3dAutomask had a Jaccard similarity index of 0.9769 and 0.9684 for factors ×1.25 and ×2, respectively. Figure 2 displays the corresponding seed correlation maps from a seed located in the precentral gyrus of the primary motor cortex. Both SR reconstructed maps exhibit a high spatial similarity, although the larger upsampling factor (×2) produces a slightly more blurred map. The edge-preserving effect of the TV prior can be seen more clearly in the unthresholded maps shown in the bottom row of Figure 2 as the factor increases, where the edges of the temporally-correlated clusters are more clearly delimited.

Fig. 2.

Fig. 2.

Single-subject seed correlation maps. Top: Thresholded maps (r ≥ 0.5) showing a clear pattern of the sensorimotor network with clusters in bilateral motor and somatosensory cortices, and supplementary motor areas. Bottom: Unthresholded maps. a) Real observed image 1.5mm. Super-resolved image at 1.5mm from an input of isotropic voxels size b) 1.875mm (f = 1.25) and c) 3mm (f = 2).

For a quantitative comparison, we binarize the thresholded functional maps and consider those obtained from the original image as GT. The accuracy and false discovery rate (FDR) are presented in Table 3. The resulting maps achieve an average accuracy of 0.9694 and 0.9129 and a false discovery rate of 0.0064% and 0.0120% for factors ×1.25 and ×2, respectively. This demonstrates the ability of the proposed method to preserve functional analysis.

Table 3.

Accuracy and FDR(%) for the functional maps obtained with the reconstructed images.

RS Network f = ×1.25
f = ×2
Acc. FDR Acc. FDR
Sensory Motor 0.9690 0.0099 0.9598 0.0128
Default Mode 0.9522 0.0055 0.7916 0.0209
Visual 0.9869 0.0038 0.9873 0.0024

4. DISCUSSION AND CONCLUSION

This study introduces a novel method for 3D self super-resolution of fMRI images that integrates DL with a TV prior and does not require HR GT data for training. Our approach demonstrates competitive performance compared with supervised DL methods, effectively enhancing the spatial resolution of fMRI data while preserving the integrity of RS functional analyses. The use of a TV prior enhances the output of the CNN and promotes that the super-resolved images retain the sharp edges while reducing the noise. These results suggest that our method offers a promising alternative to enhance fMRI spatial resolution without compromising functional information.

5. COMPLIANCE WITH ETHICAL STANDARDS

This research study was conducted retrospectively using human subject data made available in open access [20]. Ethical approval was not required as confirmed by the license attached with the data.

6. ACKNOWLEDGMENTS

This research is supported by JDC2022-048784-I, funded by MCIN/ AEI/ 10.13039/ 501100011033 and the European Union “NextGenerationEU” / PRTR, by the Basque Government (BERC 2022-2025 program), by the Spanish State Research Agency (BCBL Severo Ochoa excellence accreditation CEX2020-001010/ AEI/ 10.13039/ 501100011033), and by NIH (RF1MH123195, R01AG070988, UM1MH130981, RF1AG080371)

Footnotes

1

AFNI commands are included in the available code.

The code will be made available at https://github.com/zalteck

7. REFERENCES

  • [1].Vu A et al. , “Tradeoffs in pushing the spatial resolution of fMRI for the 7T Human Connectome Project,” NeuroImage, vol. 154, 23–32, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Li H et al. , “Resolution- and Stimulusagnostic Super-Resolution of Ultra-High-Field fMRI: Application to Visual Studies,” in IEEE Int Symp Biomed Imaging (ISBI), 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Lugmayr A et al. , “Unsupervised Learning for Real-World Super-Resolution,” in IEEE Int Conf Comput Vis (ICCV), 2019, 3408–3416. [Google Scholar]
  • [4].Kornprobst P et al. , “A Superresolution Framework for fMRI Sequences and Its Impact on Resulting Activation Maps,” in Med Image Comput Comput Assist Interv (MICCAI), 2003, 117–125. [Google Scholar]
  • [5].Peeters R et al. , “The use of super-resolution techniques to reduce slice thickness in fMRI,” Int J Imaging Syst Technol, vol. 14, 131–138, 2004. [Google Scholar]
  • [6].Ota J et al. , “Super-resolution generative adversarial networks with static T2* WI-based subject-specific learning to improve spatial difference sensitivity in fMRI activation,” Sci Rep, vol. 12, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Li Y et al. , “Super-Resolution of Brain MRI Images Using Overcomplete Dictionaries and Nonlocal Similarity,” IEEE Access, vol. 7, 25897–25907, 2019. [Google Scholar]
  • [8].Jog A et al. , “Improving magnetic resonance resolution with supervised learning,” in IEEE Int Symp Biomed Imaging (ISBI), 2014, 987–990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Rousseau F et al. , “A supervised patch-based image reconstruction technique: Application to brain MRI super-resolution,” in IEEE Int Symp Biomed Imaging (ISBI), 2013, 346–349. [Google Scholar]
  • [10].Pham C et al. , “Multiscale brain MRI superresolution using deep 3D convolutional networks,” Comput Med Imaging Graph, vol. 77, 101647, 2019. [DOI] [PubMed] [Google Scholar]
  • [11].Shi J et al. , “MR Image Super-Resolution via Wide Residual Networks With Fixed Skip Connection,” IEEE J Biomed Health Inform, vol. 23, 1129–1140, 2019. [DOI] [PubMed] [Google Scholar]
  • [12].Zhao C et al. , “Self super-resolution for magnetic resonance images using deep networks,” in IEEE Int Symp Biomed Imaging (ISBI), 2018, 365–368. [Google Scholar]
  • [13].Zhao C et al. , “SMORE: A Self-Supervised Anti-Aliasing and Super-Resolution Algorithm for MRI Using Deep Learning,” IEEE Trans Med Imaging, vol. 40, 805–817, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Tourbier S et al. , “An efficient total variation algorithm for super-resolution in fetal brain MRI with adaptive regularization,” NeuroImage, vol. 118, 584–597, 2015. [DOI] [PubMed] [Google Scholar]
  • [15].Shi F et al. , “LRTV: MR Image Super-Resolution With Low-Rank and Total Variation Regularizations,” IEEE Trans Med Imaging, vol. 34, 2459–2466, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Yang S et al. , “BCD-net: Stain separation of histological images using deep variational Bayesian blind color deconvolution,” Digit. Signal Process, vol. 145, 104318, 2024. [Google Scholar]
  • [17].Ren D et al. , “Neural Blind Deconvolution Using Deep Priors,” in IEEE Conf Comput Vis Pattern Recognit (CVPR), 2020, 3338–3347. [Google Scholar]
  • [18].Pérez-Bueno F et al. , “A TV-based image processing framework for blind color deconvolution and classification of histological images,” Digit. Signal Process, vol. 101, 102727, 2020. [Google Scholar]
  • [19].Ulyanov D et al. , “Deep Image Prior,” Int J Comput Vis, vol. 128, 1867–1888, 2020. [Google Scholar]
  • [20].Gorgolewski C et al. , “A high resolution 7-T resting-state fMRI test-retest dataset with cognitive and physiological measures,” 2019, Open-Neuro. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES