Abstract
Data-hungry neuro-AI modelling requires ever larger neuroimaging datasets. CNeuroMod-THINGS meets this need by capturing neural representations for a wide set of semantic concepts using well-characterized images in a new densely-sampled, large-scale fMRI dataset. Importantly, CNeuroMod-THINGS exploits synergies between two existing projects: the THINGS initiative (THINGS) and the Courtois Project on Neural Modelling (CNeuroMod). THINGS has developed a common set of thoroughly annotated images broadly sampling natural and man-made objects which is used to acquire a growing collection of multimodal neural responses. Meanwhile, CNeuroMod is acquiring hundreds of hours of fMRI data from a core set of participants during controlled and naturalistic tasks, including visual tasks like movie watching and videogame playing. For CNeuroMod-THINGS, four CNeuroMod participants each completed 33–36 sessions of a continuous recognition paradigm using 4320 images from the THINGS stimulus set spanning 720 categories. We report behavioural and neuroimaging metrics that showcase the quality of the data. By bridging together large existing resources, CNeuroMod-THINGS expands our capacity to model human vision in controlled and naturalistic settings.
Subject terms: Neural encoding, Object vision, Perception
Background & Summary
The growing availability of large neuroimaging datasets is creating new opportunities to apply data-hungry computational techniques to model how the brain supports cognitive functions like perception and object processing. We introduce CNeuroMod-THINGS1, an extensively sampled functional magnetic resonance imaging (fMRI) dataset that captures brain responses across a broad segment of the human visual experience. Four subjects from the Courtois Project on Neural Modelling (CNeuroMod; https://www.cneuromod.ca) each completed between 33 and 36 fMRI sessions of a continuous image recognition task during which they were shown up to 4320 naturalistic images from the THINGS dataset covering 720 categories of concrete nameable objects2,3. By design, the CNeuroMod-THINGS dataset forms a bridge between two large data ecosystems, CNeuroMod and the THINGS initiative. In doing so, it expands our ability to model visual brain processes along semantically diverse dimensions defined by a well-characterized stimulus set, using subject-specific data from the most extensively scanned neuroimaging participants to date.
The CNeuroMod-THINGS dataset1 adds to a growing number of large fMRI datasets that also feature brain responses to naturalistic images, including BOLD50004,5, the Natural Scenes Dataset (NSD; https://naturalscenesdataset.org)6, the fMRI dataset from THINGS-data (THINGS-fMRI)7–9 and the Natural Object Dataset (NOD)10,11. Importantly, CNeuroMod-THINGS contributes to the growing collection of datasets assembled under the THINGS initiative (https://things-initiative.org), which includes multimodal behavioral, neurophysiological and neuroimaging correlates of a common core set of stimulus images7,12–14. These images provide a broad, comprehensive and systematic sampling of nameable object concepts from the American English language, in contrast with other large image datasets that focus primarily on size rather than sampling of semantic space, and that feature strong biases toward overrepresented object categories. The THINGS images are also accompanied by a growing body of meta-data, ratings and annotations3,15–18, including 4.7 million human judgments of perceived image similarities collected from over 12,000 participants via online crowdsourcing7.
For the CNeuroMod-THINGS dataset, four participants were shown images from the same 720 object categories sampled by the THINGS-fMRI dataset7, which were selected to be visually and conceptually representative of the full THINGS image set. Each participant was shown the same set of approximately 4,300 images (6 images/category), making it possible to contrast representations across individuals. Images were shown three times per participant according to a continuous recognition paradigm adapted from the NSD task6. The inclusion of image repetitions makes CNeuroMod-THINGS the only fMRI dataset based on THINGS that supports data-driven analyses at the single image level. For comparison, the THINGS-fMRI dataset includes twice as many unique images from the same 720 categories each shown once to three participants7, while NSD includes ~73k unique images shown three times to at least one of eight subjects6. Of note, NSD maximized the number of images shown by presenting mostly distinct stimuli to each of their participants. With the current paradigm, we aimed to strike a balance between wide sampling and robust image-specific signal that can be compared across individuals.
Crucially, the CNeuroMod-THINGS dataset is part of CNeuroMod19, a deep phenotyping20 project for which six core subjects have each completed several controlled and naturalistic fMRI tasks, including movie watching, video game playing, listening to and recalling narratives, resting state, reading, working memory and language tasks. Other deep phenotyping fMRI datasets include MyConnectome21,22, the Midnight Scan Club23 and the Individual Brain Charting (IBC)24–27 datasets. Notably, the CNeuroMod subjects are the most extensively scanned neuroimaging participants to date, with approximately 200 hours of fMRI data per subject that include around 80 hours of video watching28. Four of these exceptionally well-characterized individuals each completed 33–36 sessions of the CNeuroMod-THINGS task, complementing free-viewing video data with controlled image viewing defined by the THINGS stimulus set. CNeuroMod’s deep phenotyping approach makes it possible to train and test models of visual brain processes under naturalistic and well-controlled conditions using data from single subjects, and to combine data across tasks that target different modalities and cognitive domains in order to build versatile individual models of brain function20.
The core CNeuroMod-THINGS dataset1 includes raw and pre-processed fMRI data and key derivatives like trial-specific beta scores estimated at the voxel level. It also comprises naturalistic image stimuli and annotations that characterize their content, behavioural data that reflect performance on the image recognition task and eye-tracking data to assess trial-wise gaze fixation. To help delineate subject-specific variability, we also provide fMRI data from two vision localiser tasks—fLoc29 and retinotopy30—and derivatives that include individually-defined functional regions of interest (ROIs). Finally, the current data release includes anatomical scans and whole-brain patches to project statistical results onto individual flat maps of the cortical surface. With the results reported below, we characterize the CNeuroMod-THINGS dataset and report proof-of-concept analyses that showcase the quality of the data.
Methods
Participants
The Courtois Project on Neural Modelling (CNeuroMod) has acquired hundreds of hours of fMRI data from six core participants using a large variety of tasks. Four of the six CNeuroMod participants contributed to the CNeuroMod-THINGS dataset: sub-01, sub-02, sub-03 and sub-06. All were healthy right-handed people with no record of neurological disorders, normal hearing and normal or corrected-to-normal visual acuity for their age (aged 39 to 49 at the beginning of acquisition). Two were female (sub-03 and sub-06) and two were male (sub-01 and sub-02). Participants provided informed consent for participation and data sharing. The research was approved by the Comité d’éthique de la recherche — Vieillissement et neuroimagerie — of the CIUSSS du centre-sud-de-l’île-de-Montréal (under number CER VN 18-19-22).
Task stimuli
Stimulus images were selected among 720 of the 1854 categories of images available through the THINGS initiative (https://things-initiative.org). The THINGS images provide a broad and systematic sampling of object concepts that is representative of the American English language, with each category depicting a unique nameable concept that is either manmade or natural2. The 720 categories used in the current study were also used to collect the THINGS-fMRI dataset7, and were selected to be visually and conceptually representative of the full THINGS image set. To characterize images during data driven analyses, we used higher order categorical labels (e.g., “animal”, “plant”) and object concept annotations (e.g., “size”, “natural”) generated by the THINGSplus project17, as well as boolean flags that reflect image content (e.g., the presence of human or animal faces) generated manually by author MSL (Supplementary Information, Table S2).
The three participants (sub-01, sub-02 and sub-03) who completed 36 sessions of the image recognition task were shown 6 unique images per category (the first 6 images of a category based on their numbering in the THINGS image set). Sub-06, who completed 33 sessions, was shown 5 images for 480 categories, and 6 images for the remaining 240. With the exception of 120 images shown for the first time during the last session, every image was repeated once within a session and once between consecutive sessions. In total, participants saw 4320 unique stimuli throughout the experiment (3840 for sub-06). By comparison, for the THINGS-fMRI dataset from THINGS-data7, 8,640 unique images from the same 720 categories (12 images / category) were shown once (and 100 images were shown 12 times) to three participants who completed 12 fMRI sessions7, sampling twice as many images as the current paradigm with a single presentation for most images.
Continuous image recognition task paradigm
Trial structure
Participants completed between 33 (sub-06) and 36 (sub-01, sub-02 and sub-03) fMRI sessions during which they performed a continuous image recognition task (Fig. 1a) designed to ensure subject engagement without introducing block-design effects in the neural signal6. The first session included 3 fMRI runs while all subsequent sessions included 6 runs, totalling 213 runs for sub-01, 02 and 03, and 195 runs for sub-06. Each run included 60 experimental trials during which 190 functional brain volumes (TR = 1.49 s) were acquired over 283 s. For each trial, a single 900 × 900 pixel stimulus image was presented in the center of a 1280 × 1024 screen, occupying 10° of visual angle. The image was shown for 2.98 s, followed by a 1.49 s ISI (onset and offset times were time-locked to the fMRI sequence). A black fixation marker (2° of visual angle) combining cross hair and a bulls eye (a central dot and four wedges located cardinally31) was visible at all times and overlaid onto the image center during image presentation. Participants were instructed to maintain fixation on the central dot throughout a run.
Fig. 1.
Experimental paradigm. (a) Unique stimulus images were shown three times over the course of the experiment. fMRI sessions are represented as 3 × 2 blocks where each square (n = 60 trials) represents the proportion of images shown for the first (yellow, “unseen”), second (orange, “seen”) and third (red, “seen”) time. Half the images were first repeated within and then across sessions, and vice versa. Trials from each type (1st, 2nd and 3rd viewing, repeated within/between sessions) were intermixed within each run. (b) Response-to-button mapping symbol shown on the central fixation marker (sure seen:++, unsure seen:+, unsure unseen: -, and sure unseen:- -). Responses were made with the right thumb by pressing buttons on a custom-made MRI compatible video game controller [top: X (green), bottom: B (yellow), left: Y (blue), right: A (red)]. (c) Frequency distribution of delays between image repetitions, per trial (“seen” trials only, 2nd and 3rd viewings), for each subject—identified by a different color. Top charts: repetition delays for all repeated trials (“all reps”), measured in days. Images were either repeated within session (0 days) or between consecutive sessions, most of which were 7 days apart. Bottom charts: delays for within-session repetitions only (“within-sess. reps”, 2nd and 3rd viewing), measured in seconds. The red vertical line indicates the duration of a single run (283 s), illustrating how the majority of within-session repeats were within the same run.
Subject response
Most stimulus images were presented three times throughout the duration of the experiment to capture robust image-specific responses. For each trial, participants reported whether the displayed image was shown for the first time (“unseen”) or whether it had been shown previously (“seen”), either during the current or a previous session (or both). Participants also reported whether or not they felt confident in their answer. Responses (seen/unseen × sure/unsure) were made with the right thumb by pressing one of four buttons (top, bottom, left, right) on a video game controller (Fig. 1b). The response-to-button pairing was indicated on the fixation marker wedges during image presentation (++ sure seen, +unsure seen, - unsure unseen, - -sure unseen). To dissociate memory responses from motor responses, the response-to-button pairing varied from trial to trial with random vertical and horizontal flips. Participants could provide their answer until the next image appeared. Multiple answers were recorded within the 4.47 s response time window to allow for self-correction. Results reported below are based on the first recorded button press. Behavioural metrics derived from the last button press and raw records of all button presses are also included in the released dataset. No feedback was given to participants about their performance on the memory task throughout the entire duration of the longitudinal experiment.
Image repetition pattern
A given image was either repeated first within-session and then between-sessions, or vice versa. Runs from the first session (3 runs) had a 2:1 ratio of unseen and seen images, and no between-session repeats. Runs from subsequent sessions (6 runs per session) had a 1:2 ratio of unseen and seen images, and an equal number of images shown for the first, second and third time within each run (20 each). The number of within- and between-session repeats were the same for all sessions (except for session 1), while the ratio of within-to-between image repeats increased over runs within the course of a session. Typically, the within-to-between repeat ratio was 12:28, 16:24, 20:20, 20:20, 24:16 and 28:12 for runs 1 to 6, although a small number of sessions administered out of the pre-planned order featured atypical patterns of repetition (see Supplementary Information, Table S1 for a list of misordered sessions). Within a typical run, half the within- and half the between-session repeats were for second and third time repeats, respectively. Most sessions were interspaced by one week, with a few exceptions due to scanner and participant availability (Fig. 1c).
Image categories
No more than one image per category (out of 720) was shown per session in order to minimize interference with the image recognition task. For each subject, the 720 image categories were randomly assigned to one of six folds of 120 categories each (e.g., for sub-01, images from the “walrus” category were assigned to the 2nd fold). These six folds, used in rotation, determined the categories of the 120 novel images introduced in a session. Exemplar images from these categories were sampled in an order unique to each subject (e.g., 2, 4, 3, 6, 5, 1 for sub-01). Specifically, all 720 exemplars with a given number (here starting with number “2”; e.g., alligator_02s.jpg, banner_02s.jpg, bucket_02n.jpg) were introduced over 6 consecutive sessions (one session for each 120-category fold). The 720 exemplars with the next number (“4”; e.g., alligator_04s.jpg) were then introduced over the next 6 sessions, etc.
For a given session, one half (60) of the 120 novel (“unseen”) images were randomly set to be repeated within the current and then the subsequent session; the other half (60) was set to be repeated twice during the subsequent session. Stimulus images were assigned to runs by sampling randomly from the novel and repeated image viewings planned for a session, according to the number of trials per condition planned for each run. For example, for a first run, 6 and 14 images were sampled randomly among the novel images set to be repeated within and between sessions, respectively (totalling 20 novel image trials in that run). The order of trial presentations was also randomized within each run. The exact stimulus ordering can be re-created using the following script: https://github.com/courtois-neuromod/task_stimuli/blob/main/src/sessions/ses-thingsmem.py.
MRI setup and data acquisition
Participants were scanned with a Siemens PRISMA Fit scanner equipped with a 64-channel receive head/neck coil available at the functional neuroimaging unit (UNF) in the Centre de Recherche de l’Institut Universitaire de Gériatrie de Montréal (CRIUGM). During scanning, each participant wore a personalized polystyrene headcase to minimize head movement32. Visual stimuli were projected with a Epson Powerlite L615U projector onto a blank screen positioned behind the MRI bore made visible to the participant through a mirror mounted on the head coil. The presentation of stimuli, the recording of responses and the synchronization of the task with scanner trigger pulses were performed with a custom overlay based on the PsychoPy library (> = v2020.2.4)33. This software also triggered the onset and calibration of the eye-tracking system, which collected eye-tracking data from the right eye at 250 Hz with Pupil Core software34 and a head-coil mounted MRC High-Speed camera. Participant responses to stimuli were collected using a 3D printed custom-made MRI compatible video game controller35. Throughout each session, physiological data were acquired using BIOPAC MP160 MRI compatible systems and amplifiers (BIOPAC AcqKnowledge 5.0 software, 10000 Hz sampling rate). Physiological signals included electrocardiogram (ECG) activity (EL-508 wet electrodes, ECG100C-MRI amplifier), plethysmography (PPG; TSD200-MRI transducer, PPG100C-MRI amplifier), skin conductance (EDA; EL509 dry electrodes with BIOPAC 101 A isotonic gel, EDA 100C-MRI amplifier) and respiratory activity (DA100C amplifier with a respiratory belt, TSD221-MRI transducer). All data acquisition scripts are available on this CNeuroMod repository (https://github.com/courtois-neuromod/task_stimuli).
Task sessions included only functional MRI runs. fMRI data were acquired with an accelerated simultaneous multi-slice, gradient echo-planar imaging sequence36 developed for the Human Connectome Project37 (slice acceleration factor = 4, TR = 1.49 s, TE = 37 ms, flip angle = 52°, 2 mm isotropic spatial resolution, 60 slices, 96 × 96 acquisition matrix). All fMRI data were preprocessed using the fMRIprep pipeline38,39 (version 20.2.5; slice-timing correction was applied). For the data quality analyses reported below, all images were processed in native subject space (T1w). However, preprocessed BOLD data are available in both T1w and MNI152NLin2009cAsym volumetric space and in fsLR surface space in the full data release. The 6th run of sub-06’s ses-08 was excluded from the final dataset and from all fMRI analyses due to poor spatial alignment after preprocessing. Anatomical images of each participant were acquired periodically during separate dedicated sessions40. Structural data were acquired using a T1-weighted MPRAGE 3D sagittal sequence (duration = 6:38 min, TR = 2.4 s, TE = 2.2 ms, flip angle = 8°, voxel size = 0.8 mm isotropic, R = 2 acceleration) and a T2-weighted FSE (SPACE) 3D sagittal sequence (duration = 5:57 min, TR = 3.2 s, TE = 563 ms, voxel size = 0.8 mm isotropic, R = 2 acceleration). The T1w and T2w scans from each subject’s first two anatomical sessions were coregistered, averaged and preprocessed with sMRIprep version 0.7.041. Please see the Courtois-Neuromod documentation for additional details on data acquisition and preprocessing42.
Temporal signal-to-noise ratio
We used Nipype (version 1.10.0)43 to compute the temporal signal-to-noise ratio (tSNR) for each run of fMRI volumes preprocessed with fMRIprep, by dividing a voxel’s mean signal intensity by its standard deviation. We also averaged the voxelwise tSNR across runs for each subject.
Cortical flat maps
Brain surfaces reconstructed using recon-all (FreeSurfer 6.0.1, RRID:SCR_001847, @fs_reconall) by the sMRIprep pipeline version 0.7.0 were cut manually into whole-brain cortical patches and flattened with TkSurfer 6.0.0. Flattened patches were imported into PyCortex 1.2.544 to support the visualisation of brain data on individual cortical flat maps. The current release includes anatomical files that can be used to project results from any CNeuroMod dataset on flattened cortical surfaces for all six CNeuroMod subjects, including the four subjects who completed the CNeuroMod-THINGS task.
Fixation compliance
We derived trial-wise measures of fixation compliance from in-scan eye-tracking data. We performed quality checks to exclude runs with missing, corrupt or low quality (i.e., very noisy) data, and performed drift-correction on the remaining runs with the following steps. First, pupils detected with low confidence by Pupil Core software were filtered out (threshold >0.9 for sub-02, sub-03 and sub-06; a lower threshold of >0.75 was adopted for sub-01 because Pupil Core had more difficulty detecting that participant’s pupil). Then, gaze positions recorded during a trial were realigned (i.e., drift-corrected) based on their distance from the median gaze position during the last known period of central fixation (i.e., the reference period), which we assumed to correspond closely to the central fixation mark. Because sustained central fixation was required throughout the task, this reference period was defined as the previous trial’s image presentation and subsequent ISI. For a run’s first trial, its own image presentation and ISI was used as the reference period.
To estimate fixation compliance, we calculated the proportion of drift-corrected gaze points within different bins of visual angle from central fixation (0.5, 1.0, 2.0 and 3.0°) during the image viewing portion of each trial. We also compiled trial-wise quality metrics like captured gaze count (to flag eye-tracking camera freezes) and the proportion of pupils detected above 0.9 and 0.75 confidence thresholds (to estimate data quality; e.g., good camera focus). To estimate gradual shifts in head position, we also calculated the distance in median gaze position during image viewing between consecutive trials. These trial-wise metrics are included in the *events.tsv file for each run for which usable eye-tracking data were available.
Single trial response estimate with GLMsingle
For each subject, we estimated single trial responses to individual image presentations with beta scores computed with the GLMsingle toolbox45. BOLD volumes in native (T1w) functional space preprocessed with fMRIPrep were masked with a whole brain functional mask and normalized (i.e., z-scored) within voxel along the time dimension. The first two volumes of each run were dropped for signal equilibrium before submitting BOLD data to GLMSingle (https://github.com/cvnlab/GLMsingle.git at commit c4e298e). Denoising was performed internally by the GLMsingle toolbox with GLMdenoise46. Cross-validation was performed to select denoising and regularization (ridge regression) parameters to prevent overfitting and improve the amount of variance explained by the beta scores. We specified a custom 13-fold cross-validation scheme (15–17 runs per fold) for which consecutive runs (e.g. run 6 of session 4, followed by run 1 of session 5) were systematically assigned to the next fold, so that runs from two consecutive sessions were never assigned to the same fold. In this manner, trials with the same image were split across at least two folds, since images were repeated once between and once within sessions (sometimes but not always within the same run; Fig. 1c). fMRI sessions were specified to the model to account for gross changes in betas across sessions during hyperparameter selection. Final trial-wise beta scores were estimated with the best combination of hyperparameters selected for each voxel. The final beta scores were normalized (z-scored) across all voxels and all trials for each subject and saved as trial-specific beta maps (one map per trial). Beta maps associated with the same stimulus image were also averaged (over 1–3 repetitions) and saved as image-specific beta maps. All maps are included in the data release.
Analyses of memory conditions
All behavioural and fMRI analyses of memory performance excluded trials with no recorded answer and trials from session 1 during which the absence of between-session repetition reduced the task difficulty. Analyses also excluded a handful of trials impacted by out-of-order sessions (see Supplementary Information, Table S1. Misordered sessions) that modified the planned repetition pattern for a subset of images. Specifically, we excluded sub-03’s sessions 24–26 and sub-06’s sessions 19–26 from all analyses of memory performance.
To assess whether recognition effects are present in the BOLD data, we performed t-tests on normalized trial-wise beta scores estimated with GLMsingle45 for each subject, using a procedure similar to the one adopted for the Natural Scenes Dataset6. Specifically, we used behavioural responses to identify trials for which a subject successfully recognized previously shown images as “seen” (hits), and correctly identified never-shown images as “unseen” (correct rejections). Our task design allowed us to further dissociate hits for images last repeated within and between consecutive imaging sessions (“within-session hits” and “between-session hits”, respectively), highlighting memory recognition after short and long retention intervals (between-session hits typically followed a 7-day retention interval; Fig. 1c).
For each voxel, we performed two-sample t-tests comparing betas from either “within-session hit” or “between-session hit” trials to betas from “correct rejections” trials. Betas were concatenated per subject across all sessions. Unequal variance was allowed across conditions in the two-sample t-test to account for variability in the relative difficulty of long- and short-term memory recognition compared to correct rejections. The resulting t-values are included in the released dataset.
Noise ceilings
We computed voxelwise noise ceilings on trial-specific beta maps to estimate the maximal proportion of beta score variance that could be explained by the identity of the stimulus image, given the presence of measurement noise. Noise ceilings were estimated with a technique described by Allen et al.6, which assumes that voxel variance can be separated into stimulus-driven signal and unrelated noise. Raw voxel betas were split into repetitions 1, 2 and 3, and standardized (z-score) across images within each repetition. A voxel’s noise variance was estimated as the beta variance across repetitions for a given image (normalized with n-1 to correct for small sample size), averaged across all images. As the variance of the standardized betas is 1, we estimate the signal variance as 1 — the noise variance, corrected with a half-wave rectification. We then calculate the noise ceiling as:
where is the number of repetitions per image, and ncsnr is the signal standard deviation divided by the noise standard deviation6. We excluded images from this calculation that did not have a behavioral response (i.e., button press) on three distinct trials, with the assumption that subjects may have been inattentive during no-response trials. Final noise ceiling scores for each subject were based on 3247 (sub-01), 4178 (sub-02), 4179 (sub-03) and 3398 (sub-06) images, respectively. Voxelwise noise ceilings are included in the data release.
Population receptive fields
Task design
In addition to the main THINGS image-recognition task, three participants also completed multiple sessions (6 sessions for sub-01 and sub-02; 5 sessions for sub-03) of a retinotopy task adapted from Kay and colleagues30 and implemented in Psychopy33. These data were used to derive population receptive field (pRF) properties at the voxel level and to delineate ROIs from the early visual cortex. Each session included three functional runs of 301 s (202 volumes at TR = 1.49 s), each of which used a different aperture shape to stimulate the visual field: ring, bar or wedge. Each run included eight cycles of 31.2 s during which an aperture moved slowly across the visual field to reveal a portion of visual pattern. Patterns were made of color objects shown at multiple spatial scales on a pink-noise background to drive both low-level and high-level visual areas. Patterns were drawn randomly at a rate of 15 frames per second from 100 different RGB images of 768 × 768 pixels (the Human Connectome Project retinotopy stimuli47,48).
Visual stimulation
The stimulated visual field was a circular area whose diameter corresponded to 10° of visual angle in the center of the screen; i.e., 1280 × 1024 pixels covering 17.5 × 14 ° of visual angle. The exact procession of the pattern depended on its aperture.
For ring runs, a thick circle aperture expanded from the center of the stimulated visual field for four consecutive cycles (each 28 s of stimulation +3.2 s of rest), followed by a 12 s pause, and then by four more cycles during which the ring contracted from the periphery to the center.
For wedge runs, a rotating wedge aperture that corresponded to 1/4 of the stimulated visual field completed four consecutive counter-clockwise rotations (each a 31.2 s cycle), followed by a 12 s pause and then by four more clockwise rotations.
For bar runs, a wide bar aperture swept eight times (each 28 s of sweep + 3.2 s of rest) across the stimulated visual field, first from left to right, then from bottom to top, then right to left and top to bottom. After a 12 s pause, the bar then swept diagonally from bottom left to top right, bottom right to top left, top right to bottom left, and top left to bottom right.
Participants were instructed to fixate their gaze throughout on a dot (diameter = 0.15° visual angle) presented centrally that alternated in color between blue and orange, and to press a button with the right thumb whenever the dot changed color using a custom MRI compatible video game controller35.
Population receptive field estimation
Voxel-wise population receptive fields were estimated with the analyzePRF matlab toolbox30 (https://github.com/cvnlab/analyzePRF; commit a3ac908 based on release 1.6) in matlab R2021a. We used temporal averaging to downsample binary aperture masks and obtain TR-locked (TR = 1.49 s) binary masks which we resized to 192 × 192 pixels to reduce processing time. For each subject, BOLD data were preprocessed with the fMRIPrep pipeline38,39 (version 20.2.6), detrended, normalized and averaged over repeated runs of the same type (e.g., ring aperture). The first three volumes of each run were dropped to allow for signal equilibrium. Whole-brain voxels were vectorized and split into chunks of up to 240 voxels each, and processed in parallel with analyzePRF, after which voxelwise output metrics were reassembled into volumes. Receptive field sizes and eccentricities were converted from pixels into ° of visual angle, while angles were converted from compass to signed north-south. Volumes were converted into surfaces with FreeSurfer mri_vol2surf, and visual ROI boundaries (V1, V2, V3, hV4, VO1/VO2, LO1/LO2,TO1/TO2, and V3a/V3b) were estimated with Neuropythy49 (https://pypi.org/project/neuropythy/; version 0.11.9) using a bayesian mapping approach that refines individual parameters with group atlas priors. Surface values were reconverted into volumes in functional native subject space (T1w) with FreeSurfer’s mri_convert, FSL’s fslreorient2std and Nilearn0’sresample_to_img. The current data release includes binary ROI masks in native (T1w) volume space for sub-01, sub-02 and sub-03.
Functional localizer (fLoc)
Task design
Three subjects (sub-01, sub-02 and sub-03) also completed six sessions of a functional localizer task to identify brain regions responding preferentially to specific stimulus categories. The task was based on a Psychopy implementation (https://github.com/NBCLab/pyfLoc) of the Stanford VPN lab’s fLoc task29 using stimuli from the fLoc functional localizer package downloaded from https://github.com/VPNL/fLoc (commit 9f29cbe). Each session included two functional runs of 231 s (155 volumes at TR = 1.49 s) with randomly ordered 5.96 s blocks of rapidly presented images from one of five categories: faces, places, body parts, objects and characters. Each block included 12 trials for which a 768×768 image from the block’s category was displayed centrally for 0.4 s, followed by a 0.095-0.1 s ISI. Subjects were instructed to fixate on a red dot shown in the middle of the screen throughout the run. To maintain engagement, they were instructed to press a button on a custom MRI compatible video game controller35 with the right thumb whenever the same image appeared twice in a row; i.e., the “one-back” task variation. Blocks during which no image was shown—only the red fixation dot appeared on a grey background for 5.96 s—were intermixed in the block sequence to estimate a baseline condition. Each functional run included 6 blocks from each of the five image categories and 6 blocks of baseline. The first run of each session featured images from the house (places), body (body parts), word (characters), adult (faces) and car (objects) sub-categories from the fLoc package; the second run featured images from the corridor (places), limb (body parts), word (characters), adult (faces) and instrument (objects) sub-categories.
ROI delineation
Functional data were preprocessed with fMRIPrep (version 20.2.5)38,39 and analyzed in native space (T1w) with a general linear model implemented in nilearn 0.9.250. Each run’s first three functional volumes were dropped for signal equilibrium. Data were fitted with the canonical SPM HRF using a cosine drift model and an autoregressive noise model of order 1. Regressed out confounds included the mean global, white matter and CSF signal as well as the six basic head motion parameters. Data were normalized (z-scored within voxel along the time dimension), spatially smoothed (FWHM = 5 mm) and masked with a binary mask created from the intersection of all 12 run-specific functional masks outputted by fMRIPrep. The following t-contrasts were applied to identify category-specific ROIs:
face > (bodies + characters + places + objects): fusiform face area (FFA), occipital face area (OFA) and posterior superior temporal sulcus (pSTS)
place > (face + bodies + characters + objects): parahippocampal place area (PPA), occipital place area (OPA) and medial place area (MPA)
bodies > (face + character + place + object): extrastriate body area (EBA)
To delineate ROI boundaries (FFA, OFA, pSTS, EBA, PPA, OBA and MPA), clusters from subject-specific maps of fLoc t-contrasts were intersected with an existing group-derived parcellation of category-selective brain regions made accessible by the Kanwisher lab51,52. Binary group parcels were warped from CVS (cvs_avg35) to MNI space with Freesurfer 7.1.1 and from MNI to T1w space with ANTs 2.3.5. Each subject-specific ROI was identified within a mask of thresholded clusters from the relevant t-contrast map (e.g., face > other conditions for FFA; alpha = 0.0001, t > 3.72, clusters > 20 voxels) that intersected with the smoothed group parcel (mask values > 0.01 post spatial smoothing, fwhm = 5 mm). Within this intersection mask, voxels with the highest t-values (at least 3.72) were selected in proportion to the warped group parcel size (up to 80% of the pre-smoothing voxel count). The current data release includes binary ROI masks in native volume space (T1w) for sub-01, sub-02 and sub-03.
ROIs on cortical flat maps
For visualization purposes, ROI boundaries delineated with fLoc for the FFA, OFA, pSTS, EBA, PPA, OBA and MPA, and ROI boundaries estimated with Neuropythy for V1, V2 and V3 were projected onto flat cortical surfaces using Pycortex 1.2.544, and drawn manually in Inkscape 1.3.2. For V1, V2 and V3, voxels with estimated eccentricities greater than 10° of visual angle were masked out, restricting the final ROIs to reflect the portion of visual field stimulated by our retinotopy paradigm. Individual ROI boundaries can be made visible as annotations on the cortical flat maps that are parts of this data release to help interpret the location of brain activity patterns (e.g., Fig. 2a). Users unfamiliar with flattened cortical maps may also appreciate the “Cortical anatomy viewer” (https://gallantlab.org/brain-viewers/), an interactive tool developed by Jack Gallant’s laboratory that projects commonly referenced brain sulci and gyri onto folded, inflated and flattened views of the human brain.
Fig. 2.
Quality metrics. (a) Voxelwise noise ceiling (% of variance explained; left) and mean temporal signal-to-noise ratio (tSNR, averaged across runs; right) per participant shown on flattened cortical surfaces. For subjects who completed visual localizer tasks, the labeled outlines indicate functionally defined ROIs identified with the retinotopy (V1, V2 and V3) and fLoc (face preference: FFA, OFA and pSTS; body preference: EBA; scene preference: PPA, OPA and MPA) tasks. (b) Gaze position in relation to the screen center, in ° of visual angle, during image presentation (data downsampled to 50 Hz). Contours represent 25, 50 and 75% of the gaze density. c. For each subject—identified by a different color: proportion of trials with a recorded behavioral response (“response rate”) shown per run (left), and distribution of framewise displacement (FD, in mm) per run (averaged; middle) and per frame (right) as an indicator of head motion.
Data Records
We use DataLad53, a data version control tool built on top of git and git-annex, to track the provenance of all data assets in this release. The data, documentation and code are organized as a nested set of DataLad submodules inside a repository that can be cloned directly from Github (https://github.com/courtois-neuromod/cneuromod-things). Alternatively, an archived version of the repository (1.0.1) that matches the data described in the current report can be downloaded from Zenodo1. Due to the dataset size, the repository includes many symbolic links to larger data files hosted remotely by the Canadian Open Neuroscience Platform54 (CONP).
Data files can be downloaded from CONP servers via symbolic links using DataLad without requiring registered access (see Usage Notes). The four participants have requested access to their own data, and have released them openly as citizen scientists under a liberal Creative Commons (CC0) data license via the Data Portal of the CONP. Non-identifying raw and derivative data are available, while identifying files (i.e., detailing scan dates) are only shared in their anonymized form. BOLD data are organized in the Brain Imaging Data Structure (BIDS) standard55.
The repository structure and content are detailed in the main README.md file. Files related to the main THINGS image recognition task are found under cneuromod-things/THINGS:
cneuromod-things/THINGS/fmriprep includes raw and preprocessed BOLD data, eye-tracking data, *events.tsv files with trial-wise metrics (stimulus- and task-related), image stimuli, and stimulus annotations.
cneuromod-things/THINGS/tsnr includes maps of temporal signal-to-noise ratios per run and averaged per subject.
cneuromod-things/THINGS/behaviour includes analyses of the subjects’ fixation compliance and performance on the continuous recognition task.
cneuromod-things/THINGS/glmsingle includes fMRI analyses and derivatives, including trial-wise and image-wise beta scores estimated with GLMsingle45, voxel-wise noise ceilings, and data-driven analyses to showcase the quality of the data.
cneuromod-things/THINGS/glm-memory includes GLM-based analyses of memory effects in the preprocessed BOLD data and associated statistical maps.
In addition, the CNeuroMod-THINGS dataset includes data, scripts and derivatives from the two vision localizer tasks completed by three of the four subjects (sub-01, sub-02 and sub-03), which we used to derive subject-specific ROIs. Those files are found under cneuromod-things/fLoc and cneuromod-things/retino for the functional localizer and retinotopy tasks, respectively. The cneuromod-things/anatomical submodule also includes patch files and instructions to project voxel-wise statistics from any CNeuroMod dataset onto subject-specific flattened cortical surfaces (flat maps) for visualization. Surfaces feature manually traced outlines of visual ROIs for subjects who completed the fLoc and retinotopy localizers.
Finally, cneuromod-things/datapaper includes a collection of Jupyter Notebooks with step-by-step instructions and code to recreate figures from the current manuscript using source data and files from the relevant DataLad sub-modules.
Technical Validation
Data quality metrics
Response rate
Response rate was high across participants, ranging from 91.13% (sub-01) to 99.84% (sub-03). For each subject, a majority of runs had near perfect response rates (median run response rate > 98%), although the number of runs with lower response rates was higher for sub-01 than for the other participants due to self-reported bouts of drowsiness (Fig. 2c). The number of trials with no recorded response out of 12,780 trials was 1134 (sub-01), 24 (sub-02) and 21 (sub-03), and 96 out of 11,700 trials (sub-06).
fMRI data
We assessed the intrinsic quality of the fMRI data by computing the temporal signal-to-noise ratio (tSNR) for each run of fMRI, and show the mean voxelwise tSNR (averaged across runs) for each subject (Fig. 2a, right). We also report framewise displacement (FD, in mm) as a measure of head motion. FD distributions indicate low levels of motion in each participant. We report a majority of frames with less than 0.1 mm FD, and a mean FD per run inferior to 0.15 mm for a majority of runs in all subjects (Fig. 2c). We further note that sub-03 demonstrated exceptionally low levels of motion with a mean FD below 0.1 mm. To assess task-evoked signal, we calculated noise ceilings as an estimate of the percentage of signal explained by stimulus images in each voxel (Fig. 2a, left). Higher noise ceilings were observed in low-level visual areas (V1, V2 and V3, identified via pRF) as well as visual cortical regions with known categorical preferences like the FFA, EBA and PPA (identified with fLoc), indicating consistent stimulus-specific signal in the BOLD data. Maximal noise ceilings were 56.51 (sub-01), 66.43 (sub-02), 73.89 (sub-03), and 54.19 (sub-04). We note that the exceptionally low levels of motion in sub-03 likely contributed to their high noise ceilings.
Visual fixation
We used eye-tracking data to estimate fixation compliance during image viewing. We performed quality checks to exclude runs with missing, corrupt or low quality (i.e., very noisy) data. The proportion of runs with usable, drift-corrected eye-tracking data was generally high for each subject: 140/213 (sub-01), 188/213 (sub-02), 190/213 (sub-03), 164/195 (sub-06). The distribution of drift-corrected gaze position in relation to the fixation marker during image viewing indicates high levels of fixation compliance in sub-01, sub-02 and sub-03, and sub-06 to a lesser extent (Fig. 2b).
Memory effects
Behavioural performance
To determine whether subjects recognized images above chance level, we computed d’, the standardized difference between the hit rate—the proportion of previously shown items correctly recognized as “seen”—and the false alarm rate—the proportion of items shown for the first time wrongly identified as “seen”. High d’ scores indicate that all subjects performed above chance: 1.744 (sub-01), 1.536 (sub-02), 1.623 (sub-03), 1.898 (sub-06). Predictably, hit rates show that more images were correctly recognized when repeated within rather than between sessions (Fig. 3a), indicating greater task difficulty at longer delays (days versus minutes). This effect was dampened in sub-02 whose hit rate was closest to ceiling, although this subject’s responses also included the highest number of false alarms. Faster response times were also observed for within-session hits compared to between-session hits for all subjects (Fig. 4a).
Fig. 3.
Memory accuracy. (a) Proportion of “seen” answers per image repetition, averaged across sessions for each subject. For the 1st image presentation (left, in grey), “seen” answers are false alarms. For the 2nd and 3rd image presentations (middle and right, in green), “seen” answers are hits, and response rates are split between images repeated within (w, darker green) and between (b, paler green) sessions. Error bars are standard deviations. (b) Proportion of answer types per image repetition (1st, 2nd and 3rd presentation), averaged across sessions for each subject. Responses include “seen” and “unseen” answers split between low and high confidence (LC and HC). Error bars are standard deviations. For the 2nd and 3rd presentation, results are split between images repeated within (w) and between (b) sessions. “Seen” answers (high confidence in darker blue, low confidence in pale blue) are incorrect—false alarms—for the 1st rep, and correct—hits—for the 2nd and 3rd reps. “Unseen” answers (high confidence in red, low confidence in pink) are correct—correct rejections—for the 1st rep, and incorrect—misses—for the 2nd and 3rd reps.
Fig. 4.
Memory effects. (a) Reaction time in seconds for hits, misses, correct rejections (CR) and false alarms (FA) is averaged across sessions for each subject. Hit and Miss reaction times are plotted separately for within-session (“-w”) and between-session (“-b”) repetitions. Error bars represent the standard deviation. (b) Thresholded t-scores per participant for two-sample t-tests contrasting trial-wise beta scores between memory conditions shown on flattened cortical surfaces (betas concatenated across all sessions, p < 0.0001 uncorrected, unequal variance assumed). Top panel results compare BOLD responses for “within-session hits” (positive values) and for “correct rejections” (negative values). Bottom panel results compare BOLD responses for “between-session hits” (positive values) and for “correct rejections” (negative values).
The distribution of response types (“seen” and “unseen” answers given with low and high confidence) per condition further illustrates the difficulty of recognizing images after longer delays. For all subjects, within-session repetitions were mostly high confidence hits (previously seen images correctly labeled as “seen”; Fig. 3b). Between-session repetition trials included greater proportions of low confidence hits, and of low or high confidence misses (previously seen images incorrectly labeled as “unseen”), indicating much weaker memory. The impact of repetition delays was observed for both second and third image presentations. In fact, the distribution of response types for images repeated between sessions (labelled “b” under 2nd and 3rd reps, Fig. 3b) is comparable to the response profile of first-time image presentations (1st rep) for which there is no memory (identical response distributions between seen and unseen conditions indicate chance level). Of note, the response profile of sub-06 indicates the strongest memory signal for between-session repetitions, as it is most distinct from the response distribution for first image presentations.
fMRI signal
Contrasting BOLD activity patterns between memory conditions also highlights more salient memory effects for images repeated within- rather than between-sessions. We performed two-sample t-tests contrasting trial-wise beta scores (estimated with GLMsingle and concatenated across sessions) associated with hits (correctly recognized images) and correct rejections (never seen images correctly identified as “unseen”). The results (Fig. 4b) reveal widespread deactivation in visual cortical areas for within-session hits compared to first-time presentations. This “repetition suppression effect” could be mediated by neural fatigue at very short delays (e.g., for consecutive trials), and by familiarity, attention, perceptual expectations and response time at slightly longer delays56–58. Of note, this effect was greatly reduced when contrasting between-session hits to first-time presentations. Successful memory recognition was also associated with enhanced prefrontal and parietal activation.
To assess within-run memory effects, we also performed fixed-effects analyses on first-level GLMs applied to fMRIPrep preprocessed run-level BOLD data using Nilearn50. These analyses, whose resulting t-values are included in the current data release, revealed non-significant patterns similar to those shown in Fig. 4b when contrasting within-session hits and between-session hits to correct rejections. This lack of significance indicates high variability in memory effects across runs and sessions. Further modelling of memory effects that takes different sources of variability into account is therefore warranted.
Dimensionality reduction analyses (t-SNE)
We conducted data-driven dimensionality reduction analyses to visualize the representation of semantic information in brain regions with categorical preferences. Specifically, we generated t-distributed stochastic neighbor embedding (t-SNE) plots59 from beta scores estimated with GLMsingle within voxels with preferences for faces, bodies or scenes. Voxels were selected individually for each subject using the following criteria: all voxels with t > 2.5 for either the face, the body or the scene fLoc contrast from unsmoothed BOLD data (sub-01: 1345 voxels, sub-02: 1224 voxels, sub-03: 1817 voxels). For each subject, one t-SNE plot was generated from trial-specific beta scores (excluding no-response trials) and another from image-specific beta scores averaged across repetitions (including only images with three repetitions with recorded responses). Beta scores were z-scored within voxel and then reduced with PCA, keeping the top 50 components (PCs). T-SNE plots generated with Scikit-learn 1.0.1 were initialized with the betas’ first two PCs scaled by 0.0001 of the first PC’s standard deviation (perplexity = 100, learning rate = 500, max 2000 iterations).
Figure 5 showcases annotated t-SNE plots derived from trial-wise (left columns) and image-wise (right columns) beta values for sub-01, sub-02 and sub-03. We used higher-order WordNet categorical labels (“animal”, “plant”, “vehicle”) and object concept ratings between 0 and 7 (i.e., “moves”, “size”, “natural”) from the THINGSplus project17 to annotate image content within the plot clusters. Clustering patterns indicate greater coherence for image-wise signal, suggesting that averaging over repetitions reduces noise in stimulus-specific signal and increases the robustness of analyses conducted at the item level.
Fig. 5.
Dimension reduction analyses (t-SNE). T-SNE plots on 50 principal components derived from normalized beta scores per trial and per image from voxels with category-specific signal identified functionally for each subject (t > 2.5 for either the face, the body or the scene fLoc contrast from unsmoothed BOLD data; 1345, 1224 and 1817 voxels for sub-01, sub-02 and sub-03, respectively). Image content is annotated with categorical labels (i.e., “animal”, “plant”, “vehicle”) and object concept ratings ranging between 0 and 7 (i.e., “moves”, “natural”) from the THINGSplus project17.
For greater specificity, we generated additional t-SNE plots using beta values from subject-specific ROIs delineated according to the following criteria:
Face-specific ROIs: concatenation of voxels from the FFA and OFA. Final ROI extents: 409 voxels (sub-01), 327 voxels (sub-02), and 501 voxels (sub-03).
Place-specific ROIs: concatenation of voxels from the PPA, OPA and MPA. Final extents: 176 voxels (sub-01), 190 voxels (sub-02), and 537 voxels (sub-03).
Early visual cortex ROI: voxels from V1, V2 and V3, identified with Neuropythy. Final extents: 4458 voxels (sub-01), 4543 voxels (sub-02), and 4274 voxels (sub-03).
We selected the FFA, OFA, PPA, OPA and MPA voxels with the highest ranking t-scores (no lower than 3.72) on the relevant fLoc contrast (face or place > all other conditions, unsmoothed BOLD data) within a group-derived binary parcel51,52 (e.g., FFA) warped to the subject’s native functional (T1w) space and smoothed with a FWHM = 5 kernel. A cut-off was set for the number of selected voxels not to exceed 30% of the group parcel voxel count (post-warping, pre-smoothing). Figure 6 showcases annotated t-SNE plots derived from image-wise beta values from these three sets of ROIs for sub-01, sub-02 and sub-03. Clustering patterns indicate the greatest coherence in face-specific ROIs, and the lowest of coherence in low-level visual ROIs.
Fig. 6.
Dimension reduction analyses (t-SNE) per ROI. T-SNE plots on 50 principal components derived from normalized beta scores per image from functionally-defined, subject-specific ROIs. “V1-V2-V3” ROIs include voxels from low-level visual ROIs V1, V2 and V3 (4458, 4543 and 4274 voxels for sub-01, sub-02 and sub-03). “Place” ROIs include voxels from the PPA, OPA and MPA (176, 190 and 537 voxels for sub-01, sub-02 and sub-03). “Face” ROIs include voxels from the FFA and OFA (409, 327 and 501 voxels for sub-01, sub-02 and sub-03). Image content is annotated with categorical labels (i.e., “animal”, “plant”, “vehicle”) and object concept ratings ranging between 0 and 7 (i.e., “moves”, “natural”) from the THINGSplus project17.
Beta distributions within single ROI voxels
We assessed whether stimulus image content—i.e., the presence of faces and the complexity of scene elements in the image—influenced the distribution of image-specific beta scores within functionally defined ROIs known for their preference for faces (FFA), body parts (EBA) and scenes (PPA). For sub-06—who did not complete the fLoc task—ROIs were estimated using group-derived binary parcels of the FFA, EBA, and PPA51,52 warped to the subject’s T1w space with Freesurfer 7.1.1 and smoothed with a FWHM = 3 kernel.
Within each ROI, we first identified the single voxel with the highest noise ceiling. We then split that voxel’s image-specific beta scores into separate distributions based on the type of face contained in the image (human or animal; Fig. 7, in blue), and then based on whether the image depicted a full-blown scene, an object in a rich or in a minimalist background or a lone object (Fig. 7, in green). The split was based on boolean image annotations generated manually by author MSL (Supplementary Information, Table S2). Only images with three repetitions with recorded button presses were included in this analysis. Beta distributions reveal a clear preference for images that contain faces (human or otherwise), and an indifference to scene elements, in both the FFA and the EBA voxel across subjects—keeping in mind that faces and body parts frequently co-occur in natural images. Meanwhile, beta distributions from the PPA voxel indicate a preference for complex scene elements, and a slight preference for the absence of faces.
Fig. 7.
Voxel-level image-specific beta distributions within functionally defined ROIs. ROIs include the FFA (left), EBA (middle) and PPA (right). For each subject, charts show image-specific betas from the voxel with the highest noise ceiling in the ROI split into separate distributions based on image content. In blue, betas are split according to the presence of human, non-human mammal, and other faces (e.g., bird or insect) within an image’s central focus. In green, betas are split according to whether the image is a scene, has a rich background, features some central object(s) with minimal background, or is a lone object without any background. For each subject and ROI, the images with the five highest beta scores are shown as red dots in the distributions. Five public domain or CC0 images that resemble these dataset images are shown above each set of distributions. Table 1 lists the THINGS stimulus images with the twelve highest beta values for each ROI voxel.
Public domain or CC0 look-alikes of the stimulus images that received the five highest beta scores per ROI are shown in Fig. 7, while the images that received the twelve highest beta scores are listed in Table 1. These results illustrate how clear categorical preferences can be observed for single images at the voxel level in unsmoothed BOLD signal in the current dataset.
Table 1.
Top-12 stimulus images per ROI.
| ROI \ subject | sub-01 | sub-02 | sub-03 | sub-06 |
|---|---|---|---|---|
| FFA |
dog_05s ferret_03s man_04s skunk_06s weasel_02s warthog_01b beaver_02s squirrel_02s gargoyle_03s chess_piece_06s gargoyle_02s lion_01b |
koala_03s undershirt_03s gondola_03s sweatsuit_01s sloth_06s boy_03s lion_06s cufflink_04s trap_04s chalice_04s possum_03s poster_04s |
groundhog_04s koala_05s ferret_05s tiger_03s face_03s face_01b warthog_04n weasel_05n football_helmet_02s bear_01b wig_01s hamster_05s |
face_01b shower_cap_03s rat_03s girl_04s lion_01b boy_04s woman_03s hamster_01b man_04s otter_06n boy_01b snorkel_05s |
| EBA |
hip_02s sweatsuit_05s hip_01b overalls_03s kangaroo_05s subway_01b wheelbarrow_05s boa_02s uniform_02s lawnmower_01b raft_01b tuxedo_05s |
chin_04s tuxedo_01b monkey_02s koala_06s leggings_04s sloth_04s footbath_06s undershirt_03s chest1_04s handcuff_01b shower_cap_05s sweatsuit_01s |
hula_hoop_06s gondola_03s scarecrow_02s lawnmower_01b uniform_03s clarinet_02n sloth_03s clarinet_06s elephant_03n uniform_04s leggings_04s coat_06s |
sweatsuit_05s hula_hoop_04s penguin_06s hula_hoop_03s coverall_05s leopard_04s elephant_03n sweatsuit_03s horse_04s pogo_stick_05s wolf_03s pogo_stick_01s |
| PPA |
dishwasher_06s sink_05s chicken_wire_02s burner_03s locker_05s punching_bag_01b cassette_03s anchor_05s fence_02s bunkbed_05s trashcan_06s birdcage_03s |
windowsill_02s television_04s stair_01b towel_rack_04s stair_05s scanner_05s projector_04s guardrail_06s anvil_03s doormat_01b drawer_02s tray_04s |
guardrail_03s guillotine_06s stair_01b tent_04s railing_03s mailbox_03s windowsill_03s mosquito_net_01b stair_06s windowsill_06s guardrail_01b fence_02s |
computer_screen_05s fence_03s canoe_03s windowsill_05s train_03s hedge_01b candelabra_05s bench_03s stopwatch_01b sink_04s rug_04s shower_03s |
Usage Notes
The CNeuroMod-THINGS dataset is available as a DataLad collection that can be cloned directly from GitHub (https://github.com/courtois-neuromod/cneuromod-things). Alternatively, users can download an archived version of the dataset repository (1.0.1) hosted on Zenodo1. Due to the dataset size, the repository contains many symbolic links to larger data files hosted remotely by the Canadian Open Neuroscience Platform54 (CONP).
Data files can be downloaded selectively, according to the user’s needs, from the CONP servers via symbolic links using DataLad without requiring registered access. The four subjects have requested access to their data and chosen to share them openly via the Data Portal of the CONP as citizen scientists (https://portal.conp.ca/dataset?id=projects/cneuromod). The data are distributed under a liberal Creative Commons (CC0) data license that authorizes the re-sharing of derivatives.
You will need the DataLad53 software (version > 1.0.0, https://www.datalad.org/), a tool for versioning and accessing large data structures organized in git repositories available for Linux, OSX and Windows. For secure data transfers, we recommend using SSH protocols by creating an SSH key on the machine where the dataset will be installed, and adding the key to your GitHub account.
Method 1: accessing the dataset via Github
First, clone the dataset repository onto your local machine:
datalad clone git@github.com:courtois-neuromod/cneuromod-things.git
A warning message will be thrown because the remote origin does not have git-annex installed. This issue will not prevent installation. You can now download the repository data. The CNeuroMod-THINGS repository is organised as a nested collection of submodules whose overall structure is described in the root README.md file. When you first clone the repository, submodules will appear empty (e.g., cneuromod-things/THINGS/glmsingle). To download a specific data subset, you can navigate to the submodule whose content you need and pull the files directly from there. Use the ‘datalad get’ command once to download the submodule’s symbolic links and files stored directly on GitHub, and then a second time to pull (larger) files from the remote CONP server.
For example, the commands below download files saved under ‘sub-01/qc’ in the ‘cneuromod-things/THINGS/glmsingle’ submodule.
cd cneuromod-things/THINGS/glmsingledatalad get *datalad get -r sub-01/qc/*
While it is technically feasible to pull the entire content of all nested submodules recursively in a single command (with datalad get -r cneuromod-things/*), we strongly recommend against it due to the complexity and depth of the nested repository structure and sheer dataset size.
Method 2: downloading the dataset from Zenodo
From the Zenodo website (https://zenodo.org/records/17881592), download cneuromod-things-1.0.1.tar.gz, an archive of version 1.0.1 of the CNeuroMod-THINGS repository1.
Unzip the repository onto your local machine:
tar -xvzf cneuromod-things-1.0.1.tar.gz
Upon download, the repository mostly contains symbolic links to files hosted remotely. Navigate the repository structure (described in the root README.md file), and use DataLad to pull data files selectively from the remote CONP server.
For example, the commands below download files saved under ‘sub-01/qc’ in the ‘cneuromod-things-1.0.1/THINGS/glmsingle’ submodule.
cd cneuromod-things-1.0.1/THINGS/glmsingledatalad get -r sub-01/qc/*
You can consult our official documentation42 for additional information on accessing the different CNeuroMod datasets.
Supplementary information
Acknowledgements
This work was supported by the Courtois Foundation and an NSERC discovery grant awarded to LB (RGPIN-2025-06022), and by a Max Planck Research Group Grant (M.TN.A.NEPF0009) and an ERC Starting Grant COREDIM (StG-2021-101039712) awarded to MNH.
Author contributions
M.S.L.: Methodology, Software, Data curation, Formal analysis, Visualization, Writing – original draft, review & editing B.P.: Conceptualization, Project administration, Investigation, Methodology, Software, Data curation, Formal analysis, Visualization O.C.: Software, Formal analysis, Visualization, Writing – review & editing E.D.: Software, Formal analysis, Data curation, Writing – review & editing K.S.: Software, Writing – review & editing V.B.: Conceptualization J.B.: Conceptualization, Funding acquisition, Resources, Project administration, Investigation, Data curation, Writing – review & editing L.B.: Conceptualization, Funding acquisition, Resources, Project administration, Investigation, Data curation, Formal analysis, Visualization, Writing – review & editing M.N.H.: Conceptualization, Funding acquisition, Supervision, Methodology, Software, Data curation, Formal analysis, Visualization, Writing – original draft, review & editing.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Data availability
The data, documentation and code can be accessed with DataLad by cloning the dataset repository from Github (https://github.com/courtois-neuromod/cneuromod-things), or by downloading an archived version (1.0.1) of the repository hosted on Zenodo1. The repository contains a nested set of submodules with symbolic links to larger data files hosted remotely on the Canadian Open Neuroscience Platform54 (CONP) servers. Data files can be pulled with DataLad via symbolic links without requiring registered access. Data are released under a liberal Creative Commons (CC0) license that authorizes the re-sharing of derivatives.
Code availability
BOLD data were acquired with the PsychoPy library (https://www.psychopy.org/) and preprocessed with the fMRIprep pipeline38,39 (versions 20.2.3 and 20.2.5; https://fmriprep.org/en/stable/). All CNeuroMod data acquisition and data preprocessing scripts are available on the CNeuroMod GitHub (https://github.com/courtois-neuromod).
• Data acquisition scripts: https://github.com/courtois-neuromod/task_stimuli
• BOLD data preprocessing scripts: https://github.com/courtois-neuromod/ds_prep
The code used to generate derivatives from the preprocessed CNeuroMod-THINGS data is integrated throughout the https://github.com/courtois-neuromod/cneuromod-things repository and its submodules. This repository includes scripts to:
• extract trial-wise and image-wise beta scores per voxel from preprocessed BOLD data (‘THINGS/glmsingle/code/glmsingle’)
• generate maps of temporal signal-to-noise ratio from preprocessed BOLD data (‘THINGS/tsnr/code‘, ‘retinotopy/tsnr/code’, ‘fLoc/tsnr/code’,)
• perform t-tests and GLM fixed-effects analyses to assess memory recognition effects on the BOLD data (‘THINGS/glm-memory/code’)
• quantify in-scan head motion (‘THINGS/glmsingle/code/qc’)
• organize trial-wise metrics (stimulus image annotations, task conditions, task accuracy, reaction time, gaze fixation compliance; ‘THINGS/fmriprep/sourcedata/things/code’)
• analyze behavioural performance on the image recognition task (‘THINGS/behaviour/code’)
• process eye-tracking data (‘THINGS/fmriprep/sourcedata/things/code’)
• perform data-driven analyses to characterize stimulus representation in the brain signal (‘THINGS/glmsingle/code/descriptive’)
• derive ROI masks from two functional localizer tasks (‘retinotopy/prf/code’ and ‘fLoc/rois/code’)
• visualize results onto flattened cortical maps of the subjects’ brains (‘anatomical/pycortex’)
The cneuromod-things repository also includes a collection of Jupyter Notebooks with step-by-step instructions and code to pull data files directly from the DataLad collection and reproduce the figures included in the current manuscript. The content of these Notebooks can be visualized directly on Github (https://github.com/courtois-neuromod/cneuromod-things/tree/main/datapaper), and provides concrete examples of analyses that can be conducted with the current dataset.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41597-026-06591-y.
References
- 1.St-Laurent, M. et al. cneuromod-things. Zenodo10.5281/ZENODO.17881592 (2025).
- 2.Hebart, M. N. et al. THINGS: A database of 1,854 object concepts and more than 26,000 naturalistic object images. PLoS One14, e0223792 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Stoinski, L. M. & Hebart, M. N. THINGS object concept and object image database. OSF10.17605/OSF.IO/JUM2F (2019).
- 4.Chang, N. et al. BOLD5000, a public fMRI dataset while viewing 5000 visual images. Sci. Data6, 49 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chang, N., Pyles, J., Prince, J., Tarr, M. & Aminoff, E. BOLD5000 Collection. Carnegie Mellon University10.1184/R1/C.5325683 (2021). [Google Scholar]
- 6.Allen, E. J. et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat. Neurosci.25, 116–126 (2022). [DOI] [PubMed] [Google Scholar]
- 7.Hebart, M. N. et al. THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. Elife12 (2023). [DOI] [PMC free article] [PubMed]
- 8.Hebart, M. et al. THINGS-data: A multimodal collection of large-scale datasets for investigating object representations in brain and behavior. Figshare+ 10.25452/FIGSHARE.PLUS.C.6161151 (2025). [DOI] [PMC free article] [PubMed]
- 9.Hebart, M. N. et al. THINGS-fMRI. Openneuro10.18112/OPENNEURO.DS004192.V1.0.5 (2022).
- 10.Gong, Z. et al. A large-scale fMRI dataset for the visual processing of naturalistic scenes. Sci. Data10, 559 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gong, Z. et al. A large-scale fMRI dataset for the visual processing of naturalistic scenes. OpenNeuro10.18112/openneuro.ds004496.v2.1.2 (2023). [DOI] [PMC free article] [PubMed]
- 12.Grootswagers, T., Zhou, I., Robinson, A. K., Hebart, M. N. & Carlson, T. A. Human EEG recordings for 1,854 concepts presented in rapid serial visual presentation streams. Sci Data9, 3 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Grootswagers, T., Zhou, I., Robinson, A. & Carlson, T. Human electroencephalography recordings from 50 subjects for 22,248 images from 1,854 object concepts. Openneuro10.18112/OPENNEURO.DS003825.V1.1.0 (2021).
- 14.Grootswagers, T. THINGS-EEG: Human electroencephalography recordings for 22,248 images from 1,854 object concepts. figshare10.6084/M9.FIGSHARE.14721282 (2021).
- 15.Zheng, C. Y., Pereira, F., Baker, C. I. & Hebart, M. N. Revealing interpretable object representations from human behavior. arXiv [stat.ML] Preprint at http://arxiv.org/abs/1901.02915 (2019).
- 16.Hebart, M. N., Zheng, C. Y., Pereira, F. & Baker, C. I. Revealing the multidimensional mental representations of natural objects underlying human similarity judgements. Nat Hum Behav4, 1173–1185 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Stoinski, L. M., Perkuhn, J. & Hebart, M. N. THINGSplus: New norms and metadata for the THINGS database of 1854 object concepts and 26,107 natural object images. Behav. Res. Methods10.3758/s13428-023-02110-8 (2023). [DOI] [PMC free article] [PubMed]
- 18.Kramer, M. A., Hebart, M. N., Baker, C. I. & Bainbridge, W. A. The features underlying the memorability of objects. Sci Adv9, eadd2981 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Boyle, J. et al. The Courtois NeuroMod project: quality assessment of the initial data release (2020). in 2023 Conference on Cognitive Computational Neuroscience. 10.32470/ccn.2023.1602-0 (Cognitive Computational Neuroscience, Oxford, United Kingdom, 2023).
- 20.Thirion, B., Thual, A. & Pinho, A. L. From deep brain phenotyping to functional atlasing. Curr. Opin. Behav. Sci.40, 201–212 (2021). [Google Scholar]
- 21.Poldrack, R. A. et al. Long-term neural and physiological phenotyping of a single human. Nat. Commun.6, 8885 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Poldrack, R. Myconnectome. OpenNeuro 10.18112/openneuro.ds000031.v2.0.2 (2023).
- 23.Gordon, E. M. et al. The Midnight Scan Club (MSC) dataset. OpenNeuro.10.18112/openneuro.ds000224.v1.0.4 (2023). [Google Scholar]
- 24.Pinho, A. L. et al. Individual Brain Charting, a high-resolution fMRI dataset for cognitive mapping. Sci. Data5, 180105 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Pinho, A. L. et al. Individual Brain Charting dataset extension, second release of high-resolution fMRI data for cognitive mapping. Sci. Data7, 353 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Pinho, A. L. et al. Individual Brain Charting dataset extension, third release for movie watching and retinotopy data. Sci. Data11, 590 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Pinho, A. L. G., Hertz-Pannier, L. & Thirion, B. IBC. OpenNeuro10.18112/openneuro.ds002685.v2.0.0 (2024).
- 28.Gifford, A. T. et al. The Algonauts Project 2025 challenge: How the Human Brain Makes Sense of Multimodal Movies. arXiv [q-bio.NC]10.48550/ARXIV.2501.00504 (2025).
- 29.Stigliani, A., Weiner, K. S. & Grill-Spector, K. Temporal Processing Capacity in High-Level Visual Cortex Is Domain Specific. J. Neurosci.35, 12412–12424 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Kay, K. N., Winawer, J., Mezer, A. & Wandell, B. A. Compressive spatial summation in human visual cortex. J. Neurophysiol.110, 481–494 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Thaler, L., Schütz, A. C., Goodale, M. A. & Gegenfurtner, K. R. What is the best fixation target? The effect of target shape on stability of fixational eye movements. Vision Res.76, 31–42 (2013). [DOI] [PubMed] [Google Scholar]
- 32.Power, J. D. et al. Customized head molds reduce motion during resting state fMRI scans. Neuroimage189, 141–149 (2019). [DOI] [PubMed] [Google Scholar]
- 33.Peirce, J. et al. PsychoPy2: Experiments in behavior made easy. Behav. Res. Methods51, 195–203 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kassner, M., Patera, W. & Bulling, A. Pupil: An Open Source Platform for Pervasive Eye Tracking and Mobile Gaze-based Interaction. arXiv [cs.CV] (2014).
- 35.Harel, Y. et al. Open design and validation of a reproducible videogame controller for MRI and MEG. PsyArXiv10.31234/osf.io/m2x6y (2022). [DOI] [PMC free article] [PubMed]
- 36.Xu, J. et al. Evaluation of slice accelerations using multiband echo planar imaging at 3 T. Neuroimage83, 991–1001 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Glasser, M. F. et al. The Human Connectome Project's neuroimaging approach. Nat. Neurosci.19, 1175–1187 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Esteban, O. et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods16, 111–116 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Esteban, O. et al. Analysis of task-based functional MRI data preprocessed with fMRIPrep. Nat. Protoc.15, 2186–2202 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Boudreau, M. et al. Longitudinal reproducibility of brain and spinal cord quantitative MRI biomarkers. Imaging Neuroscience3 (2025). [DOI] [PMC free article] [PubMed]
- 41.Esteban, O., Markiewicz, C. J., Blair, R., Poldrack, R. A. & Gorgolewski, K. J. sMRIPrep: Structural MRI PREProcessing Workflows. 10.5281/ZENODO.15579662 (Zenodo, 2025).
- 42.Courtois Project on Neuronal Modelling. CNeuroMod Documentation Version 82adf004. 10.5281/zenodo.17644207 (2025).
- 43.Esteban, O. et al. Nipy/nipype: 1.10.0. 10.5281/ZENODO.15054182 (Zenodo, 2025).
- 44.Gao, J. S., Huth, A. G., Lescroart, M. D. & Gallant, J. L. Pycortex: an interactive surface visualizer for fMRI. Front. Neuroinform.9, 23 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Prince, J. S. et al. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. Elife11 (2022). [DOI] [PMC free article] [PubMed]
- 46.Kay, K. N., Rokem, A., Winawer, J., Dougherty, R. F. & Wandell, B. A. GLMdenoise: a fast, automated technique for denoising task-based fMRI data. Front. Neurosci.7, 247 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Kay, K. analyzePRF: stimuli and code for pRF analysis. Accessed on July 14, 2021. http://kendrickkay.net/analyzePRF.
- 48.Benson, N. C. et al. The Human Connectome Project 7 Tesla retinotopy dataset: Description and population receptive field analysis. J. Vis.18, 23 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Benson, N. C. & Winawer, J. Bayesian analysis of retinotopic maps. Elife7 (2018). [DOI] [PMC free article] [PubMed]
- 50.Nilearn contributors et al. Nilearn. 10.5281/ZENODO.8397156 (Zenodo, 2025).
- 51.Julian, J. B., Fedorenko, E., Webster, J. & Kanwisher, N. An algorithmic method for functionally defining regions of interest in the ventral visual pathway. Neuroimage60, 2357–2364 (2012). [DOI] [PubMed] [Google Scholar]
- 52.Kanwisher, N. GSS. Kanwisher Lab. Accessed on November 15, 2022. https://web.mit.edu/bcs/nklab/GSS.shtml#download.
- 53.Halchenko, Y. O. et al. DataLad: distributed system for joint management of code, data, and their relationship. J. Open Source Softw.6, 3262 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Harding, R. J. et al. The Canadian Open Neuroscience Platform-An open science framework for the neuroscience community. PLoS Comput. Biol.19, e1011230 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Gorgolewski, K. J. et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci. Data3, 160044 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Epstein, R. A., Parker, W. E. & Feiler, A. M. Two kinds of FMRI repetition suppression? Evidence for dissociable neural mechanisms. J. Neurophysiol.99, 2877–2886 (2008). [DOI] [PubMed] [Google Scholar]
- 57.Larsson, J. & Smith, A. T. fMRI repetition suppression: neuronal adaptation or stimulus expectation? Cereb. Cortex22, 567–576 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Barron, H. C., Garvert, M. M. & Behrens, T. E. J. Repetition suppression: a means to index neural representations using BOLD? Philos. Trans. R. Soc. Lond. B Biol. Sci. 371 (2016). [DOI] [PMC free article] [PubMed]
- 59.Maaten, L. & Hinton, G. Visualizing high-dimensional data using t-sne journal of machine learning research. J. Mach. Learn. Res.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- St-Laurent, M. et al. cneuromod-things. Zenodo10.5281/ZENODO.17881592 (2025).
- Grootswagers, T. THINGS-EEG: Human electroencephalography recordings for 22,248 images from 1,854 object concepts. figshare10.6084/M9.FIGSHARE.14721282 (2021).
- Pinho, A. L. G., Hertz-Pannier, L. & Thirion, B. IBC. OpenNeuro10.18112/openneuro.ds002685.v2.0.0 (2024).
Supplementary Materials
Data Availability Statement
The data, documentation and code can be accessed with DataLad by cloning the dataset repository from Github (https://github.com/courtois-neuromod/cneuromod-things), or by downloading an archived version (1.0.1) of the repository hosted on Zenodo1. The repository contains a nested set of submodules with symbolic links to larger data files hosted remotely on the Canadian Open Neuroscience Platform54 (CONP) servers. Data files can be pulled with DataLad via symbolic links without requiring registered access. Data are released under a liberal Creative Commons (CC0) license that authorizes the re-sharing of derivatives.
BOLD data were acquired with the PsychoPy library (https://www.psychopy.org/) and preprocessed with the fMRIprep pipeline38,39 (versions 20.2.3 and 20.2.5; https://fmriprep.org/en/stable/). All CNeuroMod data acquisition and data preprocessing scripts are available on the CNeuroMod GitHub (https://github.com/courtois-neuromod).
• Data acquisition scripts: https://github.com/courtois-neuromod/task_stimuli
• BOLD data preprocessing scripts: https://github.com/courtois-neuromod/ds_prep
The code used to generate derivatives from the preprocessed CNeuroMod-THINGS data is integrated throughout the https://github.com/courtois-neuromod/cneuromod-things repository and its submodules. This repository includes scripts to:
• extract trial-wise and image-wise beta scores per voxel from preprocessed BOLD data (‘THINGS/glmsingle/code/glmsingle’)
• generate maps of temporal signal-to-noise ratio from preprocessed BOLD data (‘THINGS/tsnr/code‘, ‘retinotopy/tsnr/code’, ‘fLoc/tsnr/code’,)
• perform t-tests and GLM fixed-effects analyses to assess memory recognition effects on the BOLD data (‘THINGS/glm-memory/code’)
• quantify in-scan head motion (‘THINGS/glmsingle/code/qc’)
• organize trial-wise metrics (stimulus image annotations, task conditions, task accuracy, reaction time, gaze fixation compliance; ‘THINGS/fmriprep/sourcedata/things/code’)
• analyze behavioural performance on the image recognition task (‘THINGS/behaviour/code’)
• process eye-tracking data (‘THINGS/fmriprep/sourcedata/things/code’)
• perform data-driven analyses to characterize stimulus representation in the brain signal (‘THINGS/glmsingle/code/descriptive’)
• derive ROI masks from two functional localizer tasks (‘retinotopy/prf/code’ and ‘fLoc/rois/code’)
• visualize results onto flattened cortical maps of the subjects’ brains (‘anatomical/pycortex’)
The cneuromod-things repository also includes a collection of Jupyter Notebooks with step-by-step instructions and code to pull data files directly from the DataLad collection and reproduce the figures included in the current manuscript. The content of these Notebooks can be visualized directly on Github (https://github.com/courtois-neuromod/cneuromod-things/tree/main/datapaper), and provides concrete examples of analyses that can be conducted with the current dataset.







