Abstract
Humans possess an intuitive understanding of the environment's physical properties and dynamics, which allows them to predict the outcomes of physical scenarios and successfully interact with the physical world. This predictive ability is thought to rely on mental simulations and has been shown to involve frontoparietal areas. Here, we investigate whether such mental simulations may be accompanied by visual imagery of the predicted physical scene. We designed an intuitive physical inference task requiring participants to infer the parabolic trajectory of an occluded ball falling in accordance with Newtonian physics. Participants underwent fMRI while (i) performing the physical inference task alternately with a visually matched control task, and (ii) passively observing falling balls depicting the trajectories that had to be inferred during the physical inference task. We found that performing the physical inference task activates early visual areas together with a frontoparietal network when compared with the control task. Using multivariate pattern analysis, we show that these regions contain information specific to the trajectory of the occluded ball (i.e., fall direction), despite the absence of visual inputs. Using a cross‐classification approach, we further show that in early visual areas, trajectory‐specific activity patterns evoked by the physical inference task resemble those evoked by the passive observation of falling balls. Together, our findings suggest that participants simulated the ball trajectory when solving the task, and that the outcome of these simulations may be represented in form of the perceivable sensory consequences in early visual areas.
Keywords: decoding, fMRI, intuitive physics, MVPA, physical inference, simulation, visual imagery
We designed an intuitive physical inference task requiring participants to infer the parabolic trajectory of an occluded ball falling in accordance with Newtonian physics. Performing this task activates early visual areas together with a frontoparietal network. These regions contain information specific to the trajectory of the occluded ball despite the absence of visual inputs. Moreover, in early visual areas, the trajectory‐specific fMRI activity patterns evoked by the physical inference task resemble those evoked by the passive observation of falling balls. Together, our findings suggest that participants visually simulated the occluded trajectory.

1. INTRODUCTION
Humans and animals successfully interact with the physics of the world during most everyday tasks. For example, when they see an object fall, slide, or collide, they can predict its trajectory, a process which is essential for catching the object or moving away from it to avoid a collision. Formally, such predictions can be made by using explicit knowledge of Newton's laws of motion. In real‐world situations however, humans can solve these tasks effortlessly. This ability is thought to rely on an ‘intuitive physics engine’ that approximates physical outcomes by running mental simulations on generative models of the world (Battaglia et al., 2013). This framework has been successful at modelling human behaviour in a range of physics tasks, such as predicting whether and in what direction a tower of bricks is going to fall (Battaglia et al., 2013), inferring the relative masses of objects (Hamrick et al., 2016), estimating the extent of physical support provided by an object (Gerstenberg et al., 2017), making causal judgments about collisions (Gerstenberg et al., 2021), or predicting liquid dynamics (Bates et al., 2015). What remains relatively unexplored is where and how such simulations may occur in the brain.
A set of brain regions comprising the dorsal premotor cortex (PMd), superior parietal lobule (SPL), and supramarginal gyrus (SMG) has been shown to be systematically involved in intuitive physical inference (Fischer et al., 2016). This frontoparietal network was identified through a series of experiments comparing physical to matched nonphysical tasks, such as viewing a depiction of an unstable block tower and making a judgment about its fall direction versus the colours of its blocks. Recently, a functional magnetic resonance imaging (fMRI) study involving the observation of interacting objects of different masses and materials showed that this frontoparietal network contains mass representations that remain stable across variations in tasks and stimuli (Schwettmann et al., 2019). The abstract nature of these representations, as inferred from their task invariance, has been interpreted as evidence of a generalized neural intuitive physics engine that performs mental simulations. Here, we investigate whether such mental simulations are accompanied by visual imagery of the predicted physical scene, which is characterized by the generation of perception‐like images in the absence of visual stimulation (Pearson, 2019). As part of the dorsal visual pathway and together with visual regions, the frontoparietal network is involved in visuospatial processing (for a review, see Kravitz et al., 2011), which underlies most physical inference tasks. Thus, if a physical scene is being simulated visually (and not just abstractly), these regions should contain task‐specific information even if the inferred physical scene is occluded. In addition, there should be some overlap between neural activity evoked by physical inference and perception of the physical scene, similar to the shared neural representations between visual imagery and perception that have been reported in early visual areas (Albers et al., 2013).
To investigate this, we study the neural representations associated with predicting the parabolic trajectory of objects falling in accordance with Newtonian physics when the physical scene is occluded. The absence of physics‐related visual inputs allows us to isolate representations of physical scenarios that are purely internally generated. We designed an intuitive physical inference task in which participants had to estimate the fall time and location of an occluded ball falling in a parabolic arc, from various heights and with various horizontal velocities. Participants were first trained on the task by receiving feedback on their performance. Afterwards, they underwent fMRI while performing the physical inference task alternately with a visually matched control task, which was followed by passive observation of falling balls depicting the trajectories that had to be inferred during the physical inference task. We performed univariate analyses to identify brain regions preferentially involved in physical inference and used multivariate pattern analysis (MVPA) to test whether areas activated by the physical inference task represent trajectory‐specific information even if the true ball trajectory is occluded. Specifically, we tested whether direction‐specific activity patterns evoked by physical inference (i) are encoded in the frontoparietal regions that have previously been linked to physical inference (Fischer et al., 2016), (ii) extend to visual regions that are typically involved in imagery and perception of moving objects (Kamitani & Tong, 2006), and (iii) resemble those evoked by observing falling balls.
2. MATERIALS AND METHODS
2.1. Participants
Twenty healthy volunteers with normal or corrected‐to‐normal vision participated in this study, which was approved by the Ethics Committee of the Swiss Federal Institute of Technology (EK 2020‐N‐31; Zurich, Switzerland) and conducted in accordance with the declaration of Helsinki. This sample size is bigger or comparable to those in previous studies using MVPA to decode physical inference or visual imagery content from fMRI data (Kamitani & Tong, 2006; Pramod et al., 2022; Schwettmann et al., 2019). All participants provided written informed consent before participation and received financial compensation upon completion. One participant had to be excluded from the analyses due to incorrect task execution in the scanner, and one due to behavioural performance outlier detection (mean ± 2 SD) following training. As a precautionary measure, two additional participants were excluded from the analyses because they reported exclusively using a cognitive strategy (i.e., counting) rather than intuitively solving the task, as revealed by the post‐experimental debriefing (see Table S3 for further details). The final analyses included 16 participants (10 females; 6 males; mean age: 28.31 ± 9.26). It is worth noting however, that control analyses were performed including all 18 subjects, which yielded consistent results (see Figure S1).
2.2. Study procedure
The study consisted of a behavioural training session, followed by an fMRI session happening no more than 7 days later (Figure 1b). Participants received performance‐based feedback during the behavioural but not during the fMRI session. All participants were naïve to the purpose of the experiment throughout both sessions.
FIGURE 1.

Experimental design. (a) Depiction of the intuitive physical inference task. Participants first view a horizontally moving object with a ball attached, coming from either the left or the right side of the screen (left panel). Once the object reaches the centre, the screen gets occluded such that the moving object and falling ball (depicted with white dotted lines on the figure) are hidden. Participants then indicate via button presses (i) when (middle panel) and (ii) where (right panel) they think the ball lands. (b) Study procedure. Behavioural feedback‐based (FB) training session containing three phases (i.e., pre‐FB, FB and post‐FB), followed by an fMRI session happening no more than 7 days later. (c) fMRI experimental task. The top and bottom rows represent the sequence of events in an example physical inference and control trial, respectively.
To study intuitive physical inference in these two sessions, we exposed participants to a 3D physics world created with the Unity3D physics engine (version 2019.2.3; http://unity3d.com). The 3D environment was displayed with a resolution of 1920 × 1080 pixels.
2.2.1. Behavioural session
At the beginning of the behavioural session, participants received computerized instructions in which they were told to ignore friction from the air. To get familiarized with gravity in the virtual world, they were shown 16 falling balls. Half of these balls fell straight‐down from either the upper left (N = 4) or the upper right (N = 4) quadrant of the screen. The other half of the balls fell parabolically, starting from outside of the screen bounds and falling toward the middle, from either the left (N = 4) or the right (N = 4) side. The heights and velocities of these falling balls were randomly chosen but differed from the heights and velocities used in the intuitive physical inference task. Subsequently, participants were familiarized with the intuitive physical inference task and performed four practice trials. A cross was displayed at the centre of the screen and participants were instructed to fixate the cross while doing the task.
The intuitive physical inference task required participants to view an object that moved horizontally either from left to right or from right to left (Figure 1a). The object carried a ball which was suddenly dropped. At the same time, the screen was occluded such that neither the moving object nor the trajectory of the falling ball was visible. The scene followed Newtonian physics, with the ball entering projectile motion as soon as it got dropped. Subsequently, participants had to estimate (i) when the ball would reach the ground (i.e., ‘time estimation’), and (ii) where it would land (i.e., ‘location estimation’). Participants' time and location estimations were indicated via button presses. After the occlusion, a button had to be pressed once the ball would reach the ground. This was followed by the appearance of a basket at the bottom of the screen, which had to be moved to the location at which the ball would land. Importantly, the height and velocity of the object carrying the ball varied across trials. Three sets of nine trials were generated by combining 3 heights × 3 velocities (set1: [44, 61, 78 m] × [1.3, 1.7, 2.1 m/s]; set2: [49, 59, 71 m] × [1.1, 1.5, 1.9 m/s]; set3: [54, 66, 76 m] × [1.2, 1.6, 2 m/s]). This resulted in trials representing varying true fall times (2.99–3.99 s) and locations (3.48–8.37 m from the drop location). Additionally, the object could move either toward the left or the right side of the screen, leading to a total of n = 54 trials (9 height/speed combinations × 3 sets × 2 directions). We used a large number of different trials to prevent participants from learning a simple heuristic to solve the task.
The behavioural session consisted of three phases: (i) pre‐feedback (pre‐FB) (ii) training with feedback (FB), and (iii) post‐feedback (post‐FB), each containing the same 54 trials but presented in a different randomized order. During the pre‐FB and post‐FB phases, participants performed the intuitive physical inference task without receiving any feedback regarding time or location estimation. Only during the training phase, participants received FB on their time and location estimation. Immediately after the fall time estimation, verbal feedback was displayed for 6 s, indicating whether the estimation was accurate (i.e., ‘well done’) or whether the ball fell faster or slower than estimated (i.e., ‘a bit faster/slower’ or ‘a lot faster/slower’). The feedback ‘well done’ corresponded to a time estimation error <5%, ‘a bit slower/faster’ to an error of 5%–25%, and ‘a lot slower/faster’ to an error >25% of the true fall time. Once the location had been estimated, the true location was shown by the appearance of a red rectangle on the ground. Subsequently, participants could start the next trial via a button press.
At the end of the behavioural session, participants received instructions for the fMRI session and performed two practice blocks of the fMRI experimental task.
2.2.2. fMRI session
The physical inference task performed in the scanner was nearly identical to the task of the behavioural session (Figure 1c). It was composed of 18 trials generated from the parameters of set1. Participants had to indicate the fall time estimation with a button press but received no FB. In contrast to the behavioural session, participants were only prompted to indicate the location estimation in 3 randomly chosen catch trials out of those 18.
In addition to the physical inference task, participants performed a control task on the same visual stimuli (i.e., on 18 trials generated from the same height and velocity parameters). During the control condition, instead of pressing a button to indicate fall time estimation, participants had to press a button as soon as the colour of the fixation cross changed. The timings of the colour changes were randomly drawn from a distribution ranging from the minimum (i.e., 3 s) to the maximum (i.e., 4 s) true fall times ±500 ms. Each trial was followed by a 3 s rest period.
The fMRI experimental task comprised 6 runs, each containing the same 18 physical inference and 18 control trials but differently pseudo‐randomized. Within each run, trials were presented in 3 blocks of 6 physical inference trials, and 3 blocks of 6 control trials. Each block started with a word cue indicating the task to be performed: ‘ball’ (i.e., physical inference task) or ‘cross’ (i.e., control task). The blocks were alternated within each run, with half of the runs starting with the physical inference and the other half with the control condition. One run lasted 10.42 min.
At the end of the fMRI session, an 8‐min‐long perception task was performed during which participants passively observed balls falling parabolically. This task served a double purpose: (i) localizing the regions involved in the perception of falling objects, and (ii) having non‐occluded trials where the ball trajectory corresponded to the physical inference trials. There were 18 unique perception trials, again generated from the parameters of set1. Instead of being dropped from a moving object, the balls fell from the middle of the screen but from the respective heights and with the corresponding projectile motion velocities. These 18 trials were repeated three times, resulting in a total of 54 trials that were presented in a randomized order. Each trial was followed by a 3 s rest period. Participants were instructed to always fixate the cross at the centre of the screen during both the experimental and the perception fMRI tasks.
In a post‐fMRI debriefing questionnaire, participants were asked whether they ‘imagined the falling ball (i.e., saw it in their mind's eye) during the experiment’ (yes/no). They were also asked whether they ‘used any other strategy’ (yes/no) and if so, to describe it in a few words.
2.3. Behavioural data analyses
Processing of behavioural data was conducted in Matlab (version 9.9; The Mathworks Inc, Natick, MA) and statistical analyses in SPSS (version 26; SPSS Inc., Chicago, IL). To quantify performance, we computed fall time errors as the difference between true and estimated fall times, and location errors as the difference between true and estimated landing locations. For each participant, absolute time and location errors were computed for each trial and then averaged across all trials within each phase of the behavioural session (i.e., pre‐FB, FB, post‐FB), and across all trials of the physical inference condition in the fMRI experiment. Mean absolute time errors were also computed for each fMRI run separately.
The normality of the data was assessed by means of Shapiro–Wilks' test of normality. Since the data did not significantly deviate from a normal distribution, repeated measures analyses of variance (rmANOVA) were used to analyse behavioural data. To test for time and location error differences over time, two rmANOVAs were performed with the factor phase (i.e., pre‐FB, FB, post‐FB, and fMRI). To test for time error differences between the runs of the fMRI experiment, a rmANOVA with the factor run was performed. Sphericity was assessed using Mauchly's sphericity test and violations of the sphericity assumption were accounted for with the Greenhouse–Geisser correction. In case of significant ANOVA results, post hoc paired t‐tests and a Bonferroni correction for multiple comparisons were applied. The significance level of all statistical tests was set at p < .05. In case of non‐significant results of interest, we conducted the equivalent Bayesian post‐hoc test and assessed the evidence in favour of the null hypothesis by means of the Bayes factor (BF10), using the JASP software (version 0.16.2, https://jasp-stats.org/).
2.4. MRI data acquisition
MRI data were acquired using a 3 tesla Philips Ingenia system with a 32‐channel head coil. Anatomical images were acquired in 160 sagittal slices using a T1‐weighted sequence with the following parameters: TR/TE = 8.3/3.9 ms, voxel size = 1 mm3, matrix size = 240 × 240, flip angle = 8°, FOV = 240 mm (AP) × 240 mm (RL) × 160 mm (FH). Functional images were acquired in 40 interleaved transversal slices using a whole‐brain echo‐planar imaging (EPI) sequence with the following parameters: TR/TE 2500/35 ms, voxel size = 2.75 × 2.75 × 3.3 mm, matrix size = 80 × 78, flip angle = 82°, FOV = 220 mm (AP) × 220 mm (RL) × 132 mm (FH). We acquired 250 volumes per run of the experimental task, and 180 volumes during the perception task.
2.5. fMRI data pre‐processing
MRI data were pre‐processed using FSL's Expert Analysis Tool (FEAT, version 6.0; https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FEAT). The first four volumes were discarded to account for T1 saturation effects. The following pre‐processing steps were applied to each run: brain extraction using the automated Brain Extraction Tool (BET; Smith, 2002), motion correction using the Motion Correction Linear Image Registration Tool (MCFLIRT; Jenkinson et al., 2002), high‐pass filtering using a cut‐off of 100 s, and spatial smoothing using a Gaussian kernel of 5 mm full‐width‐at‐half‐maximum (FWHM). Each functional run was assessed for excessive motion and excluded from the analyses if the absolute mean displacement was greater than half the voxel size (i.e., 1.4 mm); for two participants, one of the six runs had to be excluded. Functional images were aligned to structural images using boundary‐based registration (Greve & Fischl, 2009). Structural images were aligned to the 2 mm Montreal Neurological Institute (MNI‐152) standard space using nonlinear registration (FNIRT), and the resulting warp fields applied to the functional images.
2.6. Univariate fMRI analysis
Univariate fMRI analyses were conducted using FSL's Expert Analysis Tool. For the experimental task, a first‐level general linear model (GLM) was computed for each run and averaged across runs for each participant separately using a fixed‐effects analysis. The design matrix included two regressors of interest: a ‘physical inference’ regressor modelling the period between the start of the occlusion and the button press (indicating the estimated landing time of the ball) minus 500 ms to account for motor preparation, and a ‘control’ regressor modelling the period between the start of the occlusion and the colour change. Additionally, there were five regressors of no interest, modelling the periods of the (i) instructions (ball or cross), (ii) horizontally moving object, (iii) button presses (including 500 ms of motor preparation in the physical inference condition, and the time between the colour change and button press in the control condition), (iv) the occlusion period of missed trials in which there were no button presses, and (v) location estimation in the catch trials. All these conditions were modelled using an event‐related design. Six motion parameters (i.e., rotations and translations along the x, y, and z‐axes), as well as white matter (WM) and cerebrospinal fluid (CSF) time‐series, were added as nuisance regressors in the GLM. To further reduce motion artefacts, volumes with an absolute mean displacement greater than half the voxel size were scrubbed. All regressors were convolved with a double gamma hemodynamic response function (HRF) and its first temporal derivative, except the nuisance regressors that were not convolved.
To isolate brain regions engaged in intuitive physical inference, a physical inference > control contrast was defined. Note that these conditions only differ due to the task instructions while the visual input is identical, that is, the moving object is occluded and only the fixation cross is visible. Contrast images for each participant and GLM were then entered into a mixed effects higher‐level analysis. The group z‐statistic images were thresholded at Z > 3.1 and corrected for family‐wise‐error (FWE) using a cluster significance level of p FWE < .05.
For the perception task where participants passively observed falling balls, the first‐level GLM included a ‘falling ball’ regressor of interest and the same motion and nuisance regressors as described above. To isolate brain regions engaged in observing falling objects, a perception > rest contrast was defined at the individual subject level, entered into a mixed effects higher‐level analysis, and thresholded at Z > 3.1, with p FWE < .05 at the cluster level.
2.7. Definition of regions of interest for MVPA
ROIs were defined on the basis of anatomical and functional criteria, that is, by intersecting predefined masks with relevant functional activations detected in our sample. Four masks, covering the regions that were hypothesized to be involved in physical inference of falling objects, were created using the Jülich Histological Atlas (Eickhoff et al., 2007), the Harvard–Oxford cortical structural atlas (Desikan et al., 2006), and the Human Motor Area Template (HMAT; Mayka et al., 2006). These masks were defined as follows: (a) bilateral early visual areas combining V1, V2, and V3, (b) bilateral MT/V5, (c) bilateral parietal regions combining SPL and SMG, and (d) bilateral PMd. To create the ROIs, we intersected these masks with the group‐level activation maps revealed by the physical inference > control contrast except for the bilateral MT/V5 mask which was intersected with the perception > rest contrast. All contrasts were thresholded at Z > 3.1 and cluster‐level FWE‐corrected (p FWE < .05).
Additionally, we defined four exploratory group ROIs corresponding to activity clusters that were found in our univariate analyses by contrasting physical inference > control but were not part of our predefined masks (see Section 3.2 for details). These were located in the ventral premotor cortex, paracingulate gyrus/supplementary motor area, middle frontal gyrus, and cerebellum. This yielded a total of eight group‐level ROIs (see Section 3.2 for details). Each ROI was nonlinearly transformed to each subject's native space. A conjunction between a grey matter mask from the Harvard–Oxford subcortical structural atlas (Desikan et al., 2006) and each cortical ROI was then computed in each subject's native space, to discard any voxel that may extend into WM or CSF.
2.8. Multivariate pattern analyses
Beta images were generated using statistical parametric mapping (SPM12; http://www.fil.ion.ucl.ac.uk/spm/), and multivariate pattern analyses (MVPA) were performed with the scikit‐learn python library (Pedregosa et al., 2011).
To investigate whether our ROIs contain representations of the occluded balls' fall direction during the physical inference task, we performed a first decoding analysis in which we trained a linear support vector machine (SVM) to predict which side the ball was falling toward (i.e., left vs. right) based on the respective period of interest. To investigate whether those representations are similar to perception of falling balls, we performed a second decoding analysis in which we trained a linear SVM on perception trials where balls were falling to the left versus right and tested it on physical inference trials. We used L2‐regularized SVMs with a default regularization parameter C of 1. Note that the left versus right comparison is mathematically orthogonal to the contrasts that were used for functionally defining our ROIs (see Section 2.10), which ensures independence for subsequent statistical analyses (Kriegeskorte et al., 2009).
Single‐trial beta images for the physical inference and control conditions were computed using an HRF‐based first‐level GLM. The GLM's design matrix included one regressor of interest for each individual trial of the physical inference and control conditions, and five regressors of no interest corresponding to the periods of instructions, horizontally moving objects, button presses, missed trials, and location estimation. This resulted in 108 parameter‐estimate images (18 trials × 6 runs) per condition and participant. Within each condition, there were two types of trials: ‘left’ and ‘right’, corresponding to the motion direction of the previously shown horizontally moving object (i.e., the fall direction of the occluded ball in physical inference trials). Thus, there were 54 ‘left’ and 54 ‘right’ parameter estimate images per condition and participant. These beta images were used for both decoding analyses described below.
2.8.1. Decoding analysis 1—Fall direction
The first decoding analysis was carried out using a leave‐one‐run‐out cross‐validation approach in which an SVM was trained on trials from five runs and tested on trials from the left‐out run. This process was iterated over each left‐out run and resulted in one mean classification accuracy per ROI and participant. The significance of each classification accuracy was determined by generating a null distribution based on 1000 random permutations of the trial labels (i.e., ‘left’ and ‘right’). An empirical p‐value was then computed as the number of permutation‐based classification accuracies that were greater than or equal to the true classification accuracy +1, divided by the number of permutations +1. To determine statistical significance at the group level, the 16 participant's empirical p‐values were combined for each ROI using Fisher's method (Fisher, 1925). This analysis was performed separately for each condition (i.e., physical inference and control) and the resulting p‐values were corrected for multiple‐comparison using a Bonferroni correction accounting for the 16 tests (i.e., 8 ROIs × 2 conditions). To compare the group classification accuracies between the physical inference and control conditions in each ROI, the normality of the data was assessed by means of Shapiro–Wilks' test of normality, after which paired t‐tests were used and a Bonferroni correction accounting for the eight tests applied.
Control analysis
For each ROI that yielded significant differences, a control classification analysis was performed on the period of the horizontally moving object, to account for potential condition‐dependent confounds associated with previously seeing a moving object. We were interested in whether such differences in classification accuracy were specific to the occluded phase. Therefore, we repeated the same classification analysis but for the period where the visible object moved either to the left or to the right (i.e., just before the occlusion) and calculated the classification accuracy differences between the physical inference and the control condition. Finally, we performed Bonferroni‐corrected repeated measures analyses of covariance (rmANCOVAs) to test whether condition‐specific classification differences during the occlusion remained significant after removing the variance explained by potential condition‐specific classification differences during the period of the horizontally moving object.
Follow‐up exploratory analysis
As our early visual and parietal ROIs covered multiple regions, we conducted a follow‐up exploratory analysis to investigate the relative contribution of individual regions to the results obtained in our initial analysis. We used the same classification pipeline but on distinct early visual (i.e., V1, V2, and V3) and parietal (i.e., SPL and SMG) ROIs. For each participant, we separated early visual and parietal activity into distinct ROIs using a ‘winner‐takes‐it‐all’ approach. This approach uses the weight of the probabilistic anatomical masks to assign overlapping voxels to a singular ROI. We compared group classification accuracies between the physical inference and control condition using paired t‐tests, and applied a Bonferroni correction accounting for the three tests conducted on the visual ROIs and the two tests conducted on the parietal ROIs. To assess differences in classification accuracies across individual early visual and parietal areas during physical inference specifically, we conducted a rmANOVA to compare mean accuracies between the three visual ROIs and a paired t‐test to compare mean accuracies between the two parietal ROIs.
2.8.2. Decoding analysis 2—Shared representations
For the second decoding analysis, additional beta images were generated. Single‐trial beta images for the perception task were computed using an HRF‐based first‐level GLM, with one regressor modelling each individual trial. This resulted in 54 parameter‐estimate images per participant, corresponding to 27 ‘left’ and 27 ‘right’ perception trials representing the direction toward which the ball was falling.
The training set of the second decoding analysis consisted of the perception trial beta images of the perception task where participants passively observed falling balls. A linear SVM was fitted to these perception trials and used to predict the ‘left’ versus ‘right’ physical inference trials of each experimental run. The classification accuracies were averaged over the six runs, resulting in one mean classification accuracy per ROI and participant. A similar procedure as for the first decoding analysis of fall direction (see Section 2.8.1) was used to determine the statistical significance of the classification accuracies, except that only the test set labels were permuted 1000 times. The same formula was then used to compute the empirical p‐values and Fisher's method was used to perform group‐level statistical inference. The same analysis steps were repeated for the trials of the control condition and a Bonferroni correction accounting for the 16 tests was applied. Condition differences in classification accuracies were again tested for each ROI, by using paired t‐tests and applying a Bonferroni correction to account for the eight tests.
2.9. Eye tracking data acquisition
To assess whether decoding of fall direction could be driven by eye movements, we ran a control experiment using eye tracking. A new cohort of six participants (6 females; mean age: 26.33 ± 4.13) underwent the same testing procedure as in the original study, except that the ‘fMRI experimental task’ (outlined in Section 2.2.2) was performed in a laboratory‐setup instead of the scanner. During the second session, eye movement data were recorded with the Tobii Pro Nano eye tracker (Tobii Technology AG, Stockholm, Sweden) sampling at 60 Hz, while participants rested their head on a fixed chin rest placed 70 cm away from the screen. The eye tracker was calibrated at the beginning of the session and in‐between runs using a 5‐point calibration.
2.10. Eye tracking data analysis
Following eye‐blink removal, we computed the mean eye positions along the x‐ and y‐axis in degrees of visual angle relative to the centre of the screen, and the number, durations, amplitudes, and directions of saccades during ‘left’ and ‘right’ trials of the physical inference and control conditions. Saccades were detected using a velocity‐based detection algorithm, with a threshold of 30 degrees/seconds (Olsen & Matos, 2012). To compare the mean values of the six eye movement metrics listed above between the physical inference and control conditions, and between the ‘left’ and ‘right’ trials of each condition, we conducted paired samples t‐tests. Although the sample size was relatively small, we chose to use parametric statistics to increase the sensitivity of our analysis. This was done separately for both, the occluded period and the preceding period of the moving object.
3. RESULTS
3.1. Performance improves after training
Participants significantly improved in the physical inference task during the training phase where performance FB was provided. Mean absolute errors for time estimation (Figure 2a) and for location estimation (Figure 2b) decreased substantially from the pre‐FB to the FB, post‐FB, and fMRI phase. This decrease, indicating performance improvement, is reflected in a significant main effect of phase on both time (F (1.11,16.58) = 23.146, p < .001, Greenhouse–Geisser corrected, η p 2 = .61) and location (F (1.22,18.23) = 19.917, p < .001, Greenhouse–Geisser corrected, η p 2 = .57) estimation errors. Post‐hoc pairwise comparisons further revealed that both error measures were significantly higher in the pre‐FB phase than in the FB, post‐FB and fMRI phase (t (15) ≥ 3.31, p ≤ .05, g ≥ 0.78), indicating that both time and location estimation performance improves during feedback‐based training. Time estimation errors during the fMRI experiment were similar to the post‐FB phase of the behavioural session. Accordingly, a t‐test revealed no significant difference (t (15) = −.578, p = 3.432) and the corresponding Bayesian t‐test provided moderate evidence for the null hypothesis (BF10 = 0.296), suggesting that training‐induced performance improvements in time estimation remained consistent for up to 7 days (Figure 2a). Location estimation errors during the fMRI experiment were higher than during the post‐FB phase (t (15) = −4.724, p < .01), but lower than during the pre‐FB phase (t (15) = 3.31, p < .05), showing that although training‐induced performance improvements in location estimation decreased over time, performance remained significantly higher up to 7 days following training compared with before training (Figure 2b). The decrease in location estimation performance in the scanner compared with the post‐FB phase of the behavioural session may be due to the slightly different setup and/or the lower number of trials containing location estimation in the scanner.
FIGURE 2.

Behavioural and univariate results. Asterisks represent significant differences with *p < .05; **p < .01 after Bonferroni correction. (a) Mean absolute time errors and 95% CI for each phase. (b) Mean absolute location errors and 95% CI for each phase. (c) Mean absolute time errors and 95% CI for each run of the experimental fMRI task. (d) Group random effects activation map for the physical inference > control contrast, thresholded at Z > 3.1 and FWE‐corrected using a cluster significance level of p FWE < .05. The visualization was created by projecting the statistical map into a Connectome Workbench cortical surface (Marcus et al., 2011). (f) Group ROIs in MNI space: pink = early visual areas, dark blue = MT/V5, cyan = parietal, light green = PMd, yellow = PMv, dark green = paraCG/SMA, red = MFG.
Time estimation errors were similar across the 6 runs of the fMRI experiment and differed only non‐significantly (F (5,75) = .265, p > .897, η p 2 = .017). The corresponding Bayesian rmANOVA provided strong evidence in favour of the null hypothesis (BF10 = 0.047), indicating a stable performance across the experiment (Figure 2c).
3.2. Physical inference of falling objects activates occipital, parietal and frontal regions
To identify brain regions preferentially involved in physical inference, a univariate mixed effects analysis was performed. As shown in Figure 2d, the contrast between the physical inference and control conditions revealed significant bilateral activations in early visual cortex (i.e., peak activity in V2, activations extending to V1 and V3), superior parietal lobule (SPL), supramarginal gyrus (SMG), dorsal premotor cortex (PMd), paracingulate gyrus extending to the supplementary motor area (paraCG/SMA), and middle frontal gyrus (MFG), as well as left activations in the ventral premotor cortex (PMv) and cerebellum (for further details see Figure S2 and Table S1). The univariate mixed effects analysis of the perception task revealed significant activations in bilateral MT/V5, right SPL, and right SMG during perception of falling balls (see Figure S3 and Table S2).
These univariate results were used to define the ROIs for the MVPA analyses (Figure 2e). We used the predefined masks to constrain the clusters from the physical inference > control contrast and ensure that voxels located outside of our ROIs were not included, resulting in the following three group‐ROIs: ‘early visual’ (pink), ‘parietal’ (cyan), and ‘PMd’ (light green). The hypothesized motion‐responsive ‘MT/V5’ ROI was obtained by intersecting the anatomical mask with the activations during the perception task. Additionally, we defined four exploratory group‐ROIs corresponding to the remaining clusters of activity: ‘PMv’, ‘paraCG/SMA’, ‘MFG’, and ‘cerebellum’. Unilateral clusters were then projected to the other hemisphere and integrated, resulting in symmetrical bilateral ROIs. The same procedure was applied to each unilateral part of the bilateral clusters, resulting in symmetrical ROIs as well.
3.3. Fall direction can be decoded in the absence of visual stimuli in visual, parietal and premotor regions
Next, we investigated whether the fall direction of the occluded ball can be decoded from fMRI activity patterns during the physical inference task, when visual information was occluded (Figure 3a). Our results show that during physical inference and in the absence of corresponding visual stimuli, fall direction can successfully be decoded from parietal regions (i.e., SPL and SMG, mean accuracy = 0.68, p < .001), from PMd (mean accuracy = 0.58, p < .001), from early visual areas (mean accuracy = 0.81, p < .001), and from motion‐responsive (MT/V5) areas (mean accuracy = 0.63, p < .001; Figure 3a). The decoding accuracies did not reach significance in any of the exploratory ROIs (mean accuracy ≤ 0.53, p ≥ .26).
FIGURE 3.

MVPA results. Asterisks represent significant differences with *p < .05; **p < .01; ***p < .001 after Bonferroni correction. (a) Mean and 95% CI of the classification accuracies for the classifier trained and tested on physical inference trials (dark green) and control trials (light green) in each ROI. Grey dots represent single‐subject mean classification accuracies. (b) Mean and 95% CI of the classification accuracies for the classifier trained on perception trials and tested on physical inference trials (dark pink) or control trials (light pink) in each ROI. Grey dots represent single‐subject mean classification accuracies. (c) Mean and 95% CI of the classification accuracies that were significantly higher for classifying physical inference trials versus control trials.
In visual regions, fall directions could also be successfully decoded during the control condition, with a mean accuracy of 0.64 (p < .001) in early visual areas and of 0.59 (p < .001) in MT/V5 (Figure 3a). However, decoding accuracies were significantly lower during the control than the physical inference condition in early visual areas (t (15) = 7.28, p < .001), parietal regions (t (15) = 4.47, p < .01), and the PMd (t (15) = 3.28, p < .05) (Figure 3c). Regarding MT/V5, the mean decoding accuracy was also higher during the physical inference than the control condition, but this difference did not reach statistical significance.
We found similar results when condition differences were assessed using a rmANCOVA with decoding accuracies during the preceding period of the horizontally moving object (see S4 of the Supporting Information) modelled as a covariate: decoding accuracies during the occluded period remained significantly higher during the physical inference than the control condition in early visual (F (1,29) = 18.32, p < .001), parietal (F (1,29) = 15.14, p < .01), and PMd (F (1,29) = 8.08, p < .05) regions.
The follow‐up decoding analysis revealed that during physical inference, fall direction could be decoded above chance level from all three visual (mean accuracy ≥ 0.70, p < .001) and two parietal (mean accuracy ≥ 0.56, p < .01) regions. Decoding accuracies were significantly higher during the physical inference than the control condition in V1 (t (15) = 6.15, p < .001), V2 (t (15) = 7.49, p < .01), and SPL (t (15) = 4.6, p < .05) (Figure 4a,b).
FIGURE 4.

MVPA results of the follow‐up analysis. Asterisks represent significant differences with *p < .05; **p < .01; ***p < .001 after Bonferroni correction. (a,b) Mean and 95% CI of the classification accuracies for the classifier trained and tested on physical inference trials (dark green) and control trials (light green) for each early visual (a) and parietal (b) ROI. Grey dots represent single‐subject mean classification accuracies.
The mean classification accuracies in the physical inference condition varied across the different early visual areas, as evidenced by the rmANOVA showing a significant effect of region (F (1,15) = 18.44, p < .001). Post‐hoc pairwise comparisons further revealed that classification accuracies were significantly higher in V2 compared with V1 (t (15) = 2.99, p < .05) and V3 (t (15) = 5.88, p < .001), and in V1 compared with V3 (t (15) = 3.17, p < .05). For parietal regions, the mean classification accuracies were significantly higher in SPL compared with SMG (t (15) = 7.07, p < .001).
3.4. Eye positions and movements do not differ across conditions
During the occluded period, saccades were rare (0.54/trial on average) and eye positions were kept relatively constant along both x‐ and y‐axes (−0.36 ± 0.27 to 0.2 ± 0.32 and −0.46 ± 0.48 to 0.27 ± 0.53 degrees of visual angle, respectively). Paired t‐tests revealed no difference between the physical inference and control condition nor between the ‘left’ and ‘right’ trials of either condition (t (5) ≤ 3.06, p ≥ .09).
During the period of the moving object, saccades were more frequent (1.73/trial on average) and the saccade direction would be toward the moving object. Accordingly, pairwise comparisons revealed a significant difference in saccade direction between ‘left’ and ‘right’ trials of both, the physical inference (t (5) = 3.04, p < .05) and the control (t (5) = 2.6, p < .05) conditions. We also observed significant differences between mean eye position along the x‐axis for ‘left’ versus ‘right’ trials of the control condition (t (5) = 4.12, p < .05). No other comparison reached significance (t (5) ≤ 1.81, p ≥ .13). Importantly, during the period of the moving object, there were no significant differences in mean eye positions, nor in the number, durations, amplitudes, and directions of saccades between the physical inference and control conditions.
3.5. Physical inference and perception have shared representations in early visual areas
To test whether representations of the ball's fall direction during physical inference are similar to those evoked by observing falling balls during the perception task, we trained a classifier to decode fall direction on perception trials and tested it on physical inference trials, in the same eight ROIs. The classifier fitted to perception trials could successfully decode fall direction during physical inference from early visual areas (mean accuracy = 0.62, p < .001), MT/V5 (mean accuracy = 0.53, p < .001), parietal regions (i.e., SPL and SMG, mean accuracy = 0.54, p < .001), and PMd (mean accuracy = 0.53, p < .05; Figure 3b).
In control trials, fall direction could be decoded from early visual areas with a mean accuracy of 0.52 (p < .01), from MT/V5 with a mean accuracy of 0.55 (p < .001), from SPL and SMG with a mean accuracy of 0.53 (p < .01), and from paraCG (including SMA) with a mean accuracy of 0.53 (p < .05; Figure 3b).
Most importantly, only in early visual areas, the classifier trained on perception trials could decode fall direction significantly better during physical inference than control trials (t (15) = 3.49, p < .05; Figure 3c).
4. DISCUSSION
The present study investigated the neural representations involved in predicting the parabolic trajectory of objects falling in accordance with Newtonian physics. We demonstrated that participants can learn to predict the fall time and landing location of a ball falling from various heights and with various horizontal velocities when the ball trajectory is occluded. Solving this physical inference task engaged a network of visual, parietal, and frontal regions when compared with a visually matched control task. Using multivariate pattern analysis (MVPA), we could show that information specific to the trajectory of the occluded ball (i.e., fall direction) is represented in the fMRI activity patterns of these regions and that in early visual areas, some of these direction specific activity patterns resemble those evoked by passively observing falling balls. Together, our results indicate that participants simulated the ball trajectory to solve the task, which is consistent with the idea that participants have activated an internally generated model of physics. Our decoding results, which were strongest for early visual areas, further suggest that the outcomes of this physical model might be represented in form of the perceivable sensory consequences.
4.1. Predicting the parabolic trajectory of occluded objects engages a network of visual, parietal and frontal regions
Our behavioural results show that after receiving feedback‐based training, participants can predict the fall time and landing location of balls falling from various heights and with various horizontal velocities when the ball trajectory is occluded. It is possible that participants have learnt a simple mapping between the height/velocity of the moving object and the landing time/position of the ball. Even though such a mapping strategy cannot be completely ruled out, we tried to prevent it by systematically varying the height and velocity parameters while keeping the physical principle constant. Accordingly, we consider it most likely that participants have successfully built a mental model of the physical environment and used it to simulate outcomes. In addition to imagining the falling ball's trajectory, many participants reported using mental counting to solve the time estimation task (see Table S3). However, having been trained to also estimate fall locations precluded them from exclusively relying on counting to solve the task. Random catch trials requiring location estimation were added to the fMRI experiment and confirmed that participants applied the trained strategy of predicting the ball's trajectory throughout the experiment.
Solving this physical inference task revealed increased activity in brain regions typically involved in visuospatial processing when compared with a control task that was matched for visual information (Kravitz et al., 2011). The activations observed in early visual and parietal regions overlap with areas that have been shown to process occluded motion (Ban et al., 2013; Erlikhman & Caplovitz, 2017; Olson et al., 2004; Shuwairi et al., 2007). These regions are thought to keep track of a moving object's position through perceptual gaps by retaining information about its direction and speed prior to the occlusion and using this information to predict the time and location of its reappearance (Teichmann et al., 2021). Moreover, the identified activity in the dorsal premotor cortex, superior parietal lobule, supramarginal gyrus and early visual areas (spanning over V1, V2 and V3) is consistent with regions that are typically involved in visual imagery (Winlove et al., 2018), and that have previously been shown to be involved in intuitive physical inference (Fischer et al., 2016). Additionally, activation was found in ventral premotor cortex, paracingulate gyrus, middle frontal gyrus and cerebellum, which were involved in solving the physical inference task but did not represent trajectory specific information of the falling ball. This suggests that these areas were involved in other cognitive aspects of the task and it is tempting to speculate that particularly the cerebellum and paracingulate gyrus might have contributed to time estimation (Hinton et al., 2004; O'Reilly et al., 2008).
4.2. Early visual regions contain perception‐like representations of fall direction
Our MVPA results showed that even in the absence of physics‐related visual input, fall direction can be decoded from fMRI activity patterns in early visual areas. Importantly, decoding accuracy was significantly higher during the physical inference than the control task and this task‐specific difference remained significant even after statistically accounting for differences between the physical inference and control task that might have occurred while previously seeing the horizontally moving object. This control analysis makes it unlikely that task‐specific differences in classification accuracy were driven by the sluggishness of the BOLD response evoked by the presentation of a moving object (see Section 4.4 for further details). Our follow‐up exploratory analysis suggests that decoding of fall direction in early visual areas is driven mostly by V2 but also by V1.
We suggest that activity in early visual areas is associated with visual simulation of the trajectory of the falling ball. This is consistent with previous studies on explicitly instructed visual imagery, which have shown that early visual areas carry information about the imagined content (Albers et al., 2013; Kamitani & Tong, 2006; Koenig‐Robert & Pearson, 2019). Furthermore, we performed a cross‐classification analysis in which we trained a classifier to discriminate fall direction when participants were passively observing falling balls (i.e., in the absence of any task related to the underlying physics) and then decoded fall direction during the physical inference task (i.e., in the absence of visual stimuli). Early visual cortex was the only ROI exhibiting significantly higher cross‐classification accuracies during the physical inference than the control task, suggesting that perceiving and simulating falling objects activate similar neural representations in early visual areas. This echoes previous findings reporting shared neural representations between instructed visual imagery and perception in these regions based on a similar cross‐classification approach (Albers et al., 2013), and is consistent with results showing representational similarity between simulation and perception of falling objects in a broader network of motion‐sensitive regions (Ahuja et al., 2022). In summary, our fMRI results suggest that early visual areas encode perception‐like representations of falling objects, which is consistent with the idea that participants have used visual imagery during physical inference. Although our findings show that simulations underlying physical inference involve perception‐like depictions of physical scenarios encoded in early visual areas, they cannot establish a causal role of visual imagery in physical inference. Further work should be undertaken to assess the role of visual imagery when predicting physical outcomes, for example by using transcranial magnetic stimulation to interfere with activity in the early visual cortex that contains depictive representations of physical scenes. In addition, future studies on physical inference could benefit from the use of different control conditions to further disseminate the involvement of various cognitive processes. For instance, comparing intuitive physical inference with an explicit visual imagery task or a control task that has similar cognitive demands could provide valuable insights.
4.3. Frontoparietal regions represent the occluded ball's fall direction
We also found significantly higher decoding accuracies in the dorsal premotor and parietal cortex (comprising superior parietal lobule and supramarginal gyrus) when participants performed the physical inference versus control task. A follow‐up analysis showed that decoding in parietal areas was mostly driven by the superior parietal lobule. Dorsal premotor and parietal areas have also been implicated in visuospatial processing (Kravitz et al., 2011) and are thought to modulate early visual activity during visual imagery (Dentico et al., 2014; Dijkstra et al., 2017; Ishai et al., 2000; Mechelli, 2004). Additionally, they have been linked to physical inference when participants had to observe specific scenarios and predict how a physical event would unfold (Fischer et al., 2016). For example, previous work has shown that the mass of various objects (Schwettmann et al., 2019) or the stability of statically depicted physical scenes (Pramod et al., 2022) can be decoded from parietal and premotor areas. Interestingly, we did not find shared representations between passive perception and physical inference in these regions, a finding which is consistent with previous work showing that frontoparietal activity encodes abstract rather than explicit physical properties (Pramod et al., 2022; Schwettmann et al., 2019). However, our study was not designed to decode abstract physical properties from neural activity. Instead, decoding fall direction is rather an index that task‐specific information is represented in parieto‐premotor areas. Direction‐specific frontoparietal activity might be related to covert orientation of endogenous attention, that is, to participants moving their ‘attentional spotlight’ along the ball trajectory (Wu et al., 2022). This process might activate different neural representations when executed endogenously as during the physical inference task rather than being driven by a visual stimulus of a falling ball as during the perception task. The frontoparietal network has been shown to modulate visual cortex activity via top‐down control of the location of spatial attention (for a review, see Corbetta & Shulman, 2002). Thus, we could speculate that the covert orientation of endogenous attention during our physical inference task was driven by an internally generated model of physics and contributed to the generation of a visual simulation of the ball trajectory via top‐down modulation of early visual activity.
4.4. Interpretational considerations
4.4.1. Potential perceptual confounds
In early visual regions, fall direction could also be decoded during the control condition in which participants did not have to perform physical inference. This may be explained by involuntary simulation of the falling ball despite it not being required by the task, which could result from having undergone repetitive training. An alternative explanation may be the presence of potential perceptual carryover effects linked to the preceding presentation of a horizontally moving object. Indeed, since the moving stimulus immediately preceded the occlusion phase, the BOLD signal of the latter may have been ‘temporally contaminated’ by the preceding brain activity. We tried to statistically account for this potential confound by, first, directly comparing the physical inference and control condition and, second, adding the decoding accuracies during the moving stimulus phase as a covariate of no interest. This rmANCOVA analysis showed that decoding accuracies during the occluded period remained significantly higher during the physical inference than the control condition in early visual, parietal and PMd regions. Accordingly, we suggest that ‘temporal contamination effects’ are unlikely to be of concern when comparing the physical inference and control conditions as both trial types feature identical visual stimuli. Note that this argument also holds for the cross‐classification analysis, which revealed that a classifier trained on early visual areas during perception trials could decode fall direction significantly better during physical inference than control trials. Thus, significant differences in decoding accuracies between the physical inference and control conditions indicate that, even in the presence of potential residual motion‐evoked perceptual confounds, early visual, parietal and dorsal premotor regions contain representations of inferred fall direction. Regarding MT/V5, however, our decoding results are less conclusive. The fall direction decoding accuracy was high for both the physical inference and the control task that differed only insignificantly from each other. Similarly, we also did not find any significant MT/V5 activations during physical inference in our univariate analyses. A potential explanation for the lack of task‐specific activation may be that previously seeing a horizontally moving object induced strong and lingering MT/V5 activity in both conditions, thus reducing the power of our physical inference versus control contrast. Taken together, the role of MT/V5 in physical inference remains elusive and needs further investigation.
4.4.2. Potential attentional confounds
Although physical inference and control trials feature the same visual stimuli, attentional demands may vary between the two conditions since participants were required to attend to the horizontally moving object's height and velocity in physical inference but not control trials. This could contribute to some of the differences in classification accuracy observed during the occlusion period via carryover effects. To address this concern, we conducted a control classification analysis on the period of the moving object in each region that yielded significant differences in decoding accuracies. To ensure that the significant differences in classification accuracies observed during the occlusion phase were not driven by differences occurring during the presentation of the moving object, we ran a repeated‐measures ANCOVA using the decoding accuracies during the presentation of the moving stimuli as a covariate. We found that decoding accuracies during the occluded period remained significantly higher during the physical inference than the control condition in all regions. Thus, significant differences in classification accuracies during occlusion are unlikely to be mainly driven by carryover effects linked to condition‐dependent attentional confounds during the presentation of the horizontally moving object.
4.4.3. Potential eye movement confounds
To minimize the impact of eye movements on our decoding analysis, we instructed participants to maintain fixation throughout the experiment. However, it is still possible that some eye movements may have occurred (Teichmann et al., 2022). To address this potential confound, we conducted an eye tracking experiment during which participants performed the same task. Our findings indicate that there were no significant differences in eye position and movements between the physical inference and control conditions, nor between the ‘left’ and ‘right’ trials of either condition during the occluded period. Another possible way in which eye movements could have contributed to our decoding results is via the preceding presentation of a moving stimulus triggering eye movements, which could potentially have contaminated responses during the occluded period via carryover effects. Even though the eye tracking analysis revealed differences between left versus right trials, there were no significant differences in the number, durations, amplitudes and directions of saccades between the physical inference and control conditions during the period of the moving object. Thus overall, there is no evidence suggesting that our results might be driven by unwanted eye movements.
5. CONCLUSION
Our study shows that solving a physical inference task that requires participants to predict the parabolic trajectory of a ball falling under occlusion activates early visual areas together with a frontoparietal network. During physical inference, these areas represent trajectory‐specific information (i.e., fall direction) even in the absence of visual inputs. Moreover, solving the physical inference task evokes activity patterns in early visual areas that resemble those evoked by observing how a ball is falling parabolically to the left or to the right. Our findings are in line with the idea that this task activated an ‘intuitive physics engine’ (Fischer et al., 2016; Schwettmann et al., 2019) which generated sensory predictions of the expected outcome in early visual areas to simulate the ball's trajectory in sensory coordinates. These insights shed new light on how the brain intuitively draws physical inferences by simulating central aspects of our everyday interactions.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflict of interest.
Supporting information
Data S1. Supporting Information.
ACKNOWLEDGMENTS
We thank Daniel Woolley for his help and feedback on the implementation of the experiment, Weronika Potok for her help and feedback on the manuscript, Sven Frauchiger for his help with data collection, and all the participants for their time and effort.
Zbären, G. A. , Meissner, S. N. , Kapur, M. , & Wenderoth, N. (2023). Physical inference of falling objects involves simulation of occluded trajectories in early visual areas. Human Brain Mapping, 44(10), 4183–4196. 10.1002/hbm.26338
Contributor Information
Gabrielle Aude Zbären, Email: gabrielle.zbaeren@hest.ethz.ch.
Nicole Wenderoth, Email: nicole.wenderoth@hest.ethz.ch.
DATA AVAILABILITY STATEMENT
Data are openly available on the ETH Library Research Collection with the DOI: 10.3929/ethz‐b‐000578094.
REFERENCES
- Ahuja, A. , Desrochers, T. M. , & Sheinberg, D. L. (2022). A role for visual areas in physics simulations. Cognitive Neuropsychology, 38, 425–439. 10.1080/02643294.2022.2034609 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Albers, A. M. , Kok, P. , Toni, I. , Dijkerman, H. C. , & De Lange, F. P. (2013). Shared representations for working memory and mental imagery in early visual cortex. Current Biology, 23(15), 1427–1431. 10.1016/j.cub.2013.05.065 [DOI] [PubMed] [Google Scholar]
- Ban, H. , Yamamoto, H. , Hanakawa, T. , Urayama, S. , Aso, T. , Fukuyama, H. , & Ejima, Y. (2013). Topographic representation of an occluded object and the effects of spatiotemporal context in human early visual areas. The Journal of Neuroscience, 33(43), 16992–17007. 10.1523/JNEUROSCI.1455-12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bates, C. J. , Yildirim, I. , Tenenbaum, J. B. , & Battaglia, P. W. (2015). Humans predict liquid dynamics using probabilistic simulation. In 37th annual conference of the cognitive science society.
- Battaglia, P. W. , Hamrick, J. B. , & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences of the United States of America, 110(45), 18327–18332. 10.1073/pnas.1306572110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corbetta, M. , & Shulman, G. L. (2002). Control of goal‐directed and stimulus‐driven attention in the brain. Nature Reviews Neuroscience, 3(3), 201–215. 10.1038/nrn755 [DOI] [PubMed] [Google Scholar]
- Dentico, D. , Cheung, B. L. , Chang, J.‐Y. , Guokas, J. , Boly, M. , Tononi, G. , & van Veen, B. (2014). Reversal of cortical information flow during visual imagery as compared to visual perception. NeuroImage, 100, 237–243. 10.1016/j.neuroimage.2014.05.081 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Desikan, R. S. , Ségonne, F. , Fischl, B. , Quinn, B. T. , Dickerson, B. C. , Blacker, D. , Buckner, R. L. , Dale, A. M. , Maguire, R. P. , Hyman, B. T. , Albert, M. S. , & Killiany, R. J. (2006). An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31(3), 968–980. 10.1016/j.neuroimage.2006.01.021 [DOI] [PubMed] [Google Scholar]
- Dijkstra, N. , Zeidman, P. , Ondobaka, S. , van Gerven, M. A. J. , & Friston, K. (2017). Distinct top‐down and bottom‐up brain connectivity during visual perception and imagery. Scientific Reports, 7(1), 5677. 10.1038/s41598-017-05888-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eickhoff, S. B. , Paus, T. , Caspers, S. , Grosbras, M. H. , Evans, A. C. , Zilles, K. , & Amunts, K. (2007). Assignment of functional activations to probabilistic cytoarchitectonic areas revisited. NeuroImage, 36(3), 511–521. 10.1016/j.neuroimage.2007.03.060 [DOI] [PubMed] [Google Scholar]
- Erlikhman, G. , & Caplovitz, G. P. (2017). Decoding information about dynamically occluded objects in visual cortex. NeuroImage, 146, 778–788. 10.1016/j.neuroimage.2016.09.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fischer, J. , Mikhael, J. G. , Tenenbaum, J. B. , & Kanwisher, N. (2016). Functional neuroanatomy of intuitive physical inference. Proceedings of the National Academy of Sciences, 113(34), E5072–E5081. 10.1073/pnas.1610344113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh, Oliver and Boyd. [Google Scholar]
- Gerstenberg, T. , Goodman, N. D. , Lagnado, D. A. , & Tenenbaum, J. B. (2021). A counterfactual simulation model of causal judgments for physical events. Psychological Review, 128, 936–975. 10.1037/rev0000281 [DOI] [PubMed] [Google Scholar]
- Gerstenberg, T. , Zhou, L. , Smith, K. A. , & Tenenbaum, J. B. (2017). Faulty towers: A hypothetical simulation model of physical support. In Proceedings of the 39th annual meeting of the cognitive science society.
- Greve, D. N. , & Fischl, B. (2009). Accurate and robust brain image alignment using boundary‐based registration. NeuroImage, 48(1), 63–72. 10.1016/j.neuroimage.2009.06.060 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hamrick, J. B. , Battaglia, P. W. , Griffiths, T. L. , & Tenenbaum, J. B. (2016). Inferring mass in complex scenes by mental simulation. Cognition, 157, 61–76. 10.1016/j.cognition.2016.08.012 [DOI] [PubMed] [Google Scholar]
- Hinton, S. C. , Harrington, D. L. , Binder, J. R. , Durgerian, S. , & Rao, S. M. (2004). Neural systems supporting timing and chronometric counting: An FMRI study. Cognitive Brain Research, 21(2), 183–192. 10.1016/j.cogbrainres.2004.04.009 [DOI] [PubMed] [Google Scholar]
- Ishai, A. , Ungerleider, L. G. , & Haxby, J. V. (2000). Distributed neural systems for the generation of visual images. Neuron, 28(3), 979–990. 10.1016/S0896-6273(00)00168-9 [DOI] [PubMed] [Google Scholar]
- Jenkinson, M. , Bannister, P. , Brady, M. , & Smith, S. (2002). Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage, 17(2), 825–841. 10.1006/nimg.2002.1132 [DOI] [PubMed] [Google Scholar]
- Kamitani, Y. , & Tong, F. (2006). Decoding seen and attended motion directions from activity in the human visual cortex. Current Biology, 16, 1096–1102. 10.1016/j.cub.2006.04.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koenig‐Robert, R. , & Pearson, J. (2019). Decoding the contents and strength of imagery before volitional engagement. Scientific Reports, 9(1), 1–14. 10.1038/s41598-019-39813-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kravitz, D. J. , Saleem, K. S. , Baker, C. I. , & Mishkin, M. (2011). A new neural framework for visuospatial processing. Nature Reviews Neuroscience, 12(4), 230. 10.1038/nrn3008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kriegeskorte, N. , Simmons, W. K. , Bellgowan, P. S. , & Baker, C. I. (2009). Circular analysis in systems neuroscience: The dangers of double dipping. Nature Neuroscience, 12, 535–540. 10.1038/nn.2303 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marcus, D. S. , Harwell, J. , Olsen, T. , Hodge, M. , Glasser, M. F. , Prior, F. , Jenkinson, M. , Laumann, T. , Curtiss, S. W. , & Van Essen, D. C. (2011). Informatics and data mining tools and strategies for the human connectome project. Frontiers in Neuroinformatics, 5, 4. 10.3389/fninf.2011.00004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mayka, M. A. , Corcos, D. M. , Leurgans, S. E. , & Vaillancourt, D. E. (2006). Three‐dimensional locations and boundaries of motor and premotor cortices as defined by functional brain imaging: A meta‐analysis. NeuroImage, 31(4), 1453–1474. 10.1016/j.neuroimage.2006.02.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mechelli, A. (2004). Where bottom‐up meets top‐down: Neuronal interactions during perception and imagery. Cerebral Cortex, 14(11), 1256–1265. 10.1093/cercor/bhh087 [DOI] [PubMed] [Google Scholar]
- O'Reilly, J. X. , Mesulam, M. M. , & Nobre, A. C. (2008). The cerebellum predicts the timing of perceptual events. Journal of Neuroscience, 28(9), 2252–2260. 10.1523/JNEUROSCI.2742-07.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olsen, A. , & Matos, R. (2012). Identifying parameter values for an I‐VT fixation filter suitable for handling data sampled with various sampling frequencies. In Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 317–320. 10.1145/2168556.2168625 [DOI]
- Olson, I. R. , Gatenby, J. C. , Leung, H.‐C. , Skudlarski, P. , & Gore, J. C. (2004). Neuronal representation of occluded objects in the human brain. Neuropsychologia, 42(1), 95–104. 10.1016/S0028-3932(03)00151-9 [DOI] [PubMed] [Google Scholar]
- Pearson, J. (2019). The human imagination: The cognitive neuroscience of visual mental imagery. Nature Reviews Neuroscience, 20(10), 10. 10.1038/s41583-019-0202-9 [DOI] [PubMed] [Google Scholar]
- Pedregosa, F. , Michel, V. , Grisel Olivier, O. , Blondel, M. , Prettenhofer, P. , Weiss, R. , Vanderplas, J. , Cournapeau, D. , Pedregosa, F. , Varoquaux, G. , Gramfort, A. , Thirion, B. , Grisel, O. , Dubourg, V. , Passos, A. , Brucher, M. , Perrot, M. , & Duchesnay, É. (2011). Scikit‐learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. [Google Scholar]
- Pramod, R. , Cohen, M. A. , Tenenbaum, J. B. , & Kanwisher, N. (2022). Invariant representation of physical stability in the human brain. eLife, 11, e71736. 10.7554/eLife.71736 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwettmann, S. E. , Tenenbaum, J. B. , & Kanwisher, N. (2019). Invariant representations of mass in the human brain. eLife, 8, 1–14. 10.7554/eLife.46619 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shuwairi, S. M. , Curtis, C. E. , & Johnson, S. P. (2007). Neural substrates of dynamic object occlusion. Journal of Cognitive Neuroscience, 19(8), 1275–1285. 10.1162/jocn.2007.19.8.1275 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith, S. M. (2002). Fast robust automated brain extraction. Human Brain Mapping, 17, 143–155. 10.1002/hbm.10062 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teichmann, L. , Edwards, G. , & Baker, C. I. (2021). Resolving visual motion through perceptual gaps. Trends in Cognitive Sciences, 25(11), 978–991. 10.1016/j.tics.2021.07.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teichmann, L. , Moerel, D. , Rich, A. N. , & Baker, C. I. (2022). The nature of neural object representations during dynamic occlusion. Cortex, 153, 66–86. 10.1016/j.cortex.2022.04.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Winlove, C. I. P. , Milton, F. , Ranson, J. , Fulford, J. , MacKisack, M. , Macpherson, F. , & Zeman, A. (2018). The neural correlates of visual imagery: A co‐ordinate‐based meta‐analysis. Cortex, 105, 4–25. 10.1016/j.cortex.2017.12.014 [DOI] [PubMed] [Google Scholar]
- Wu, T. , Mackie, M.‐A. , Chen, C. , & Fan, J. (2022). Representational coding of overt and covert orienting of visuospatial attention in the frontoparietal network. NeuroImage, 261, 119499. 10.1016/j.neuroimage.2022.119499 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1. Supporting Information.
Data Availability Statement
Data are openly available on the ETH Library Research Collection with the DOI: 10.3929/ethz‐b‐000578094.
