Skip to main content
eLife logoLink to eLife
. 2024 Mar 28;12:RP90583. doi: 10.7554/eLife.90583

Body size as a metric for the affordable world

Xinran Feng 1,, Shan Xu 2,, Yuannan Li 1, Jia Liu 1,
Editors: Clare Press3, Timothy E Behrens4
PMCID: PMC10987089  PMID: 38547366

Abstract

The physical body of an organism serves as a vital interface for interactions with its environment. Here, we investigated the impact of human body size on the perception of action possibilities (affordances) offered by the environment. We found that the body size delineated a distinct boundary on affordances, dividing objects of continuous real-world sizes into two discrete categories with each affording distinct action sets. Additionally, the boundary shifted with imagined body sizes, suggesting a causal link between body size and affordance perception. Intriguingly, ChatGPT, a large language model lacking physical embodiment, exhibited a modest yet comparable affordance boundary at the scale of human body size, suggesting the boundary is not exclusively derived from organism-environment interactions. A subsequent fMRI experiment offered preliminary evidence of affordance processing exclusively for objects within the body size range, but not for those beyond. This suggests that only objects capable of being manipulated are the objects capable of offering affordance in the eyes of an organism. In summary, our study suggests a novel definition of object-ness in an affordance-based context, advocating the concept of embodied cognition in understanding the emergence of intelligence constrained by an organism’s physical attributes.

Research organism: Human

Introduction

Man is the measure of all things. - Protagoras

The assertion by the ancient Greek philosopher Protagoras highlights the notion that reality is defined by how the world is perceived by humans. A contemporary interpretation of this statement is the embodied theory of cognition (e.g. Chemero, 2013; Gallagher, 2017; Yu et al., 2005; Wilson, 2002; Varela et al., 2017), which, diverging from the belief that size and shape are inherent object features (e.g. Op de Beeck et al., 2008; Konkle and Oliva, 2011), posits that human body scale (e.g. size) constrains the perception of objects and the generation of motor responses. For instance, humans evaluate the climbability of steps based on their leg length (Mark, 1987; Warren, 1984), and determine the navigability of apertures according to the critical aperture-to-shoulder-width ratio (Warren and Whang, 1987). Additionally, grasping strategies have been shown to be contingent upon object size relative to one’s body (Cesari and Newell, 2000; Newell et al., 1989) or hand size (Castiello et al., 1993; Tucker and Ellis, 2004). However, the question of how object perception is influenced by the relative size of objects in relation to the human body remains open. Specifically, it is unclear whether this relative size simply acts as a continuous variable for locomotion reference, or if it affects differentiating and organizing object representation based on their ensued affordances.

To underscore the latter point, Gibson, 1979, the pioneer of embodied cognition research, stated that “Detached objects must be comparable in size to the animal under consideration if they are to afford behavior (p.124).” This implies that an object’s affordance, encompassing all action possibilities offered to an animal, is determined by the object’s size relative to the animal’s size rather than its real-world size. For instance, in a naturalistic environment, such as a picnic scene shown in Figure 1a, there may exist a qualitative distinction between objects within (the objects with warm tints in Figure 1a) and beyond (those with cold tints) the size range of humans. Only objects within the range, such as the apple, the umbrella, and the bottles, may afford actions, while those beyond this range, such as the trees and the tent, are largely viewed as part of the environment. Consequently, visual perception may be ecologically constrained, and the body may serve as a metric that facilitates meaningful engagement with the environment by differentiating objects that are accessible for interactions from those not. Further, grounded cognition theory (see Barsalou, 2008 for a review) suggests that the outputs of such differentiation might transcend sensorimotor processes and integrate into supramodal concepts and language. From this perspective, we proposed two hypotheses: first, the affordance of objects will exhibit a qualitative difference between objects within and beyond the size range of an organism’s body; second, affordance-related neural activity will emerge exclusively for objects within the organism’s size range.

Figure 1. An affordance boundary in the affordable world.

(a) An illustration of a picnic scene, featuring objects of various sizes relative to human body. Example objects within the normal body size range are painted red, and those beyond green. We hypothesized qualitative differences between perceived affordances of these two kinds of objects. (b) A demonstration of the object-action relation judgement task for human participants (top) and AI models (bottom). The question in the task for human participants was presented in Chinese. (c) The representational similarity matrix (RSM) for objects based on human rating of affordance similarity. Object sizes are denoted with red to green. Two primary clusters emerged in the clustering analysis of the similarity pattern are outlined with black boxes. (d) Left panel: The overall affordance similarity and that of each gender (left y-axis) as well as real-world size similarity (right y-axis) between neighboring size ranks. The error bars represent the standard error (SE), computed from sample size n = 14. Right panel: The point clouds of pairwise correlations between objects from the same rank or neighboring ranks. Each colored dot represents the affordance similarity (y-axis) and the average real-world size (x-axis) of a specific object pair. The grey dots indicate the averaged size (x-axis) and pairwise similarity (y-axis) of object pairs in different rank compositions. Left to right: both from size rank 3, from size rank 3 and 4, both from size rank 4, from size rank 4 and 5, both from size rank 5, from size rank 5 and 6, and both from size rank 6. The horizontal error bars represent 95% confidence interval (CI) of the averaged object size in each pair, and the vertical error bars denote the CI of pairwise affordance similarity (sample size n = 100).

Figure 1.

Figure 1—figure supplement 1. Two exemplar objects with different affordance profiles.

Figure 1—figure supplement 1.

The values represented the percentage of participants who agreed on a certain action being afforded by an object.

To test these hypotheses, we first measured the affordance of a diverse array of objects varying in real-world sizes (e.g. Konkle and Oliva, 2011). We found a dramatic decline in affordance similarity between objects within and beyond the human body size range, as these objects afforded distinct sets of action possibilities. Notably, the affordance boundary varied in response to the imagined body sizes and showed supramodality. It could also be attained solely through language, as evidenced by the large language model (LLM), ChatGPT (OpenAI, 2023). A subsequent fMRI experiment corroborated the qualitative difference in affordances demarcated by the body size, as affordances of objects within humans’ size range, but not those beyond, were represented in both dorsal and ventral visual streams of the brain. This study advances our understanding of the role of body size in shaping object representation and underscores the significance of body size as a metric for determining object affordances that facilitates meaningful engagement with the environment.

Results

To illustrate how human body size affects object affordances with different sizes, we first characterized the affordances of a set of daily objects. In each trial, we presented a matrix consisting of nine objects and asked participants to report which objects afforded a specific action (e.g. sit-able: a chair, a bed, a skateboard, but not a phone, a laptop, an umbrella, a kettle, a plate, or a hammer; Figure 1b). In this task, there were 14 actions commonly executed in daily life and 24 object images from the THINGS database (Hebart et al., 2019), with sizes ranging from size rank 2 to 8 according to Konkle and Oliva, 2011’s classification. These objects covered real-world sizes from much smaller (17 cm on average, rank 2) to orders of magnitude larger (5317 cm on average, rank 8) than the human body size (see Materials and methods for details). Consequently, affordances for each object were indexed by a 14-dimensional action vector, with the value for each dimension representing the percentage of participants who agreed on a certain action being afforded by the object (e.g. 88% for the action of grasping on a hammer indicating 88% of participants agreed that a hammer affords grasping). Figure 1—figure supplement 1 shows the affordances of two example objects.

An affordance similarity matrix was then constructed where each cell corresponded to the similarity in affordances between a pair of objects (Figure 1c). A clustering analysis revealed a two-cluster structure. Visual inspection suggested that the upper-left cluster consisted of objects smaller than human body size (red labels), and the lower-right cluster contained objects larger than human body size (green labels). Critically, the between-cluster similarity in the affordance similarity matrix approached zero, suggesting a division in affordances located near the body size. To quantify this observation, we calculated the similarity in affordances between each neighboring size rank. Indeed, we identified a clear trough in affordance similarity, dropping to around zero, between size rank 4 (77cm on average) and 5 (146cm on average), which was significantly smaller than that between size rank 3 and 4 (Z=3.91, p<0.001) and that between size rank 5 and 6 (Z=1.66, p=0.048). This trough suggested an affordance boundary between size rank 4 and 5, while affordance similarities between neighboring ranks remained high (rs >0.45) and did not significantly differ from each other (ps >0.05, all BF10 < 10) on either side of the boundary (Figure 1d, left panel, green lines). This pattern was evident for both genders, indicating no gender difference. Note that the abrupt change in affordance similarity across the boundary cannot be explained by changes in objects’ real-world size, as the similarity in objects’ real-world size was relatively stable across ranks, without any trend of a trough-shape curve (Figure 1d, left panel, yellow line). Intriguingly, rank 4 and rank 5 correspond to 80 cm to 150 cm, a boundary situated between these two ranks is within the range of the body size of a typical human adult. This finding suggested that objects were classified into two categories based on their affordances, with the boundary aligning with human body size.

To better locate the boundary, we focused on the affordance similarity between individual objects within size rank 3–6 (approximately ranging from 30cm to 220 cm in real-world size, the area with grey shade in Figure 1d), where the trough-shape curve was identified. Specifically, we traversed all pairs of objects with similar real-world diagonal sizes (from either the same rank or from neighboring ranks), calculated their average real-world size as an index of the approximate location of the boundary between this pair of objects, and plotted the affordance similarity against the average real-world size of each object pair. As shown in the inset (grey box) of Figure 1d, consistent with the rank-wise analysis, the abrupt decrease in affordance similarity exclusively happened between objects from size rank 4 and 5 (light green dots). The averaged real-world size in these object pairs was 104 cm (95% CI, 105–130 cm) and the affordance similarity in such object pairs was around zero. This result further narrowed the location estimation of the boundary, and demonstrated that the affordance boundary persisted at the level of individual objects.

One may argue that the location of the affordance boundary coincidentally fell within the range of human body size, rather than being directly influenced by it. To rule out this possibility, we directly manipulated participants’ body schema, referring to an experiential and dynamic functioning of the living body within its environment (Merleau-Ponty and Smith, 1962). This experimental approach was able to establish a causal link between body size and affordance boundary, as other potential factors remained constant. Specifically, we instructed a new group of participants to imagine themselves as small as a cat (typical diagonal size: 77 cm, size rank 4, referred to as the ‘cat condition’), and another new group to envision themselves as large as an elephant (typical diagonal size: 577 cm, size rank 7, referred to as the ‘elephant condition’) throughout the task (Figure 2a). A between-subject design was adopted to minimize contamination between conditions. This manipulation was effective, as evidenced by the participants’ reported imagined heights in the cat condition being 42 cm (SD=25.6) and 450 cm (SD=426.8) in the elephant condition on average, respectively, when debriefed at the end of the task.

Figure 2. A disembodied origin of the affordance boundary.

Figure 2.

(a) The schematic diagram of the imagined size in the cat condition (top) and the elephant condition (bottom), with the mean estimated height reported by participants for each condition. (b), The affordance similarity between neighboring size ranks for manipulated body sizes (Red line: cat-size body; Green line: elephant-size body). The dashed line marks the boundary of the human-size body. The red and green arrows indicate the corresponding boundary shift in each condition. (c), The affordance similarity between neighboring size ranks for different large language models, and human data from Figure 1d was re-drawn as a reference. The stars indicate significant contrasts between affordance similarities between neighboring data points. (d), The trough value of each model at between size rank 4–5. The stars here indicate the significant trough value compared to zero. The error bars represent the estimated standard error (SE), computed from sample size n = 14. *p<0.05, **p<0.01, ***p<0.001.

With exactly the same set of objects, a distinct shift in the affordance boundary was observed for each condition (Figure 2b). In the cat condition, the affordance boundary was identified between size rank 3 and 4, with affordance similarity between size rank 3 and 4 being significantly lower than that between size rank 2 and 3 (Z=1.76, p=0.039) and that between size rank 4 and 5 (Z=1.68, p=0.047). In contrast, in the elephant condition, the affordance boundary shifted to the right, as demonstrated by a decrease in affordance similarity between size rank 6 and 7, and that between size rank 7 and 8 as compared to that between size rank 5 and 6, with a trend towards significance (with size rank 6–7: Z=1.28, p=0.099; with size rank 7–8: Z=1.48, p=0.069). The observation that the affordance boundary shifted to the left under the cat condition and to the right under the elephant condition suggests that affordance perception is influenced even by imagined body size. Furthermore, the cognitive penetrability (Pylyshyn, 1999) of affordance perception implies potential susceptibility of affordance perception to semantic or conceptual transformation or modification.

To test the further speculation that the affordance boundary can be derived solely from conceptual knowledge without direct sensorimotor experience, we employed a disembodied agent, the large language model (LLM) ChatGPT (Chat Generative Pre-trained Transformer; https://openai.com/blog/chatgpt/). This model was trained on a massive corpus of language materials originated from humans, yet it can not receive any sensorimotor information from the environment. Here we asked whether language alone would be sufficient to form an affordance boundary in ChatGPT models as well as in smaller LLMs, BERT (Devlin et al., 2018), and GPT-2 (Radford et al., 2018).

The experimental procedure was similar to that conducted with human participants, except that images were replaced by the corresponding words (see Methods). Given the randomness embedded in response generation, each model was tested 20 times to simulate the sampling of human participants. We found that the affordance similarity curves demonstrated by the ChatGPT models were both trough-shaped between size rank 4 and 5, the same location where the boundary emerged in human participants (Figure 2c, green lines). Further statistical analyses showed a significant difference in affordance similarity between size rank 3 and 4, and that between size rank 4 and 5 (ChatGPT (GPT-3.5): Z=1.98, p=0.024; ChatGPT (GPT-4): Z=2.73, p=0.003). The affordance similarity between size rank 4 and 5 was also lower than that between size rank 5 and 6, yet the difference did not reach the significance (ChatGPT (GPT-3.5): Z=0.96, p=0.17; ChatGPT (GPT-4): Z=1.27, p=0.10). In contrast, no trough-shaped boundary was observed in either BERT or GPT-2 (Figure 2c, yellow lines), despite an apparent but non-significant decrease in affordance similarity in GPT-2 between size rank 5 and 6 (ps >0.20). To further quantify the magnitude of the decrease in affordance similarity between the size rank 4 and 5, we measured the decrease by subtracting the similarity value at the trough from the neighboring similarity values and then subjected it to a permutation test (see Materials and methods). We found a significant decrease in affordance similarity in humans (permutation N=5000, p(T>Tobs) = 0.015) and ChatGPT (GPT-4) (permutation N=5000, p(T>Tobs) = 0.046), a marginal significant decrease in ChatGPT (GPT-3.5) (permutation N=5000, p(T>Tobs) = 0.061), and no significance in either BERT or GPT-2 (ps >0.46, Figure 2d). Thus, the affordance boundary can be derived from language solely without sensorimotor information from environment. Interestingly, it appears to spontaneously emerge when the language processing ability of the LLMs surpasses a certain threshold (i.e. GPT-2 /BERT < ChatGPT models).

A further analysis on the affordances separated by the boundary revealed that objects within human body size range were primarily subjected to hand-related actions such as grasping, holding, and throwing. These affordances typically involve object manipulation with humans’ effectors. In contrast, objects beyond the size range of human body predominantly afforded actions such as sitting and standing, which typically require locomotion or posture change of the whole body around or within the objects. The distinct categories of reported affordances demarcated by the boundary imply that the objects on either side of the boundary may be represented differently in the brain. We thus speculated that the observed behavioral discontinuity is likely underpinned by distinct neural activities, which give rise to these discrete ‘representations’ separated by the boundary.

To test this speculation, we ran an fMRI experiment with a small number of participants to preliminarily investigate the neural basis of the affordance boundary in the brain by measuring neural activity in the dorsal and ventral visual streams when participants were instructed to evaluate whether an action was affordable by an object (Figure 3a). Four objects were chosen from the behavioral experiment: two within the body size range (i.e. bottle and football, WITHIN condition) and the two beyond (i.e. bed and piano, BEYOND condition). Accordingly, four representative actions (to grasp, to kick, to sit, and to lift) were selected in relation to the respective objects. During the scan, the participants were asked to decide whether a probe action was affordable (e.g. grasp-able – bottle, Congruent condition) or not (e.g. sit-able – bottle, Incongruent condition) by each subsequently presented object. The congruency effect, derived from the contrast of Congruent versus Incongruent conditions, is a well-established measure of affordance processing (e.g. Kourtis et al., 2018).

Figure 3. Affordance representation in the visual streams.

(a) An example block with the probe action ‘graspable’. The participants indicated whether each of the subsequently presented objects was graspable by pressing the corresponding button. The action probing question was presented in Chinese during the experiment. (b) The ROIs included in this experiment. (c) The activation of each condition in the pFs and SPL. The bars represent the contrast estimates of each condition versus baseline. The stars indicate the significant difference between congruent and incongruent conditions. *p<0.05, **p<0.01, ***p<0.001, otherwise non-significance. Error bars represent the standard error (SE), computed from sample size n = 12.

Figure 3.

Figure 3—figure supplement 1. Brain areas showing significantly greater neural activation for objects within body size and beyond body size versus baseline.

Figure 3—figure supplement 1.

Figure 3—figure supplement 2. The activation in LO and M1 in response to objects within body size in the congruent and incongruent conditions, respectively.

Figure 3—figure supplement 2.

The bars refer to the contrast estimates of each condition versus baseline. The stars indicate whether the contrast was significant. *p<0.05, **p<0.01, ***p<0.001, otherwise not significant. Error bars represent the standard error (SE), computed from sample size n = 12.

Figure 3—figure supplement 3. Significant brain activations of different contrasts in the whole-brain level analysis.

Figure 3—figure supplement 3.

(a) The effect of object type, positive values (warm color) indicated higher activation for objects within than objects beyond and negative values (cold color) indicated the opposite. (b) The effect of congruency, positive values indicated higher activation in congruent than incongruent condition. (c) The effect of interaction between object type and congruency, positive values indicated the larger congruency effect for objects within than beyond. (d) The congruency effect for objects within. All contrasts were corrected with cluster-level correction at p<0.05. The detailed cluster-level results for each contrast map can be found in Supplementary file 1b-1e.

We examined the congruency effect in two object-selective regions defined by the contrast of objects against baseline (see Materials amd methods), each representing a corresponding visual stream: the posterior fusiform (pFs) in the ventral stream, which is involved in object recognition (e.g. Grill-Spector et al., 2000; Malach et al., 1995) and objects’ real-world size processing (Konkle and Oliva, 2012; Snow et al., 2011), and the superior parietal lobule (SPL) in the dorsal stream, one of the core tool network regions (e.g. Filimon et al., 2007; Matić et al., 2020). For the rest object-selective regions identified in this experiment, see Figure 3—figure supplement 1 and Supplementary file 1a. A repeated-measures ANOVA with object type (WITHIN versus BEYOND) and congruency (Congruent versus Incongruent) as within-subject factors was performed for each ROI, respectively. A significant interaction between object type and congruency was observed in both ROIs (SPL: F(1,11) = 15.47, p=0.002, η2 = .58; pFs: F(1,11) = 24.93, p<0.001, η2 = .69), suggesting that these regions represented affordances differentially based on object type (Figure 3c). A post hoc simple effect analysis revealed the congruency effect solely for objects within body size range (SPL: p<0.001; pFs: p=0.021), not for objects beyond (ps >0.41). In addition, the main effect of object type was not significant in either ROI (ps >0.17), suggesting that the absence of the congruency effect for objects beyond the body size cannot be attributable to compromised engagement in viewing these objects. In addition, a whole-brain analysis was performed, and no region showed the congruency effect for the objects beyond the body size. Taken together, the affordance boundary not only separated the objects into two categories based on their relative size to human body, but also delineated the range of objects that evoked neural representations associated with affordance processing.

In addition to the pFs and SPL, we also examined the congruency effect in the lateral occipital cortex (LO), which is involved in object representation (e.g. Grill-Spector et al., 2000; Konkle and Caramazza, 2013) and provides inputs to both the pFs and SPL (Hebart et al., 2018). Meanwhile, the primary motor cortex (M1), which receives inputs from the dorsal stream (Vainio and Ellis, 2020), is involved in affordance processing (e.g. McDannald et al., 2018) and action executions (Binkofski et al., 2002). Although both the LO and M1 showed a significantly higher response to objects than baseline, no congruency effect in affordance for objects within the body size was observed (main effect of congruency: F(1,11) = 1.74, p=0.214, η2 = 0.13, Figure 3—figure supplement 2). Therefore, it is unlikely that the representation of affordance is exclusively dictated by visual inputs or automatically engaged in motor execution. This finding suggests that affordance perception likely requires perceptual processing and is not necessarily reflected in motor execution, diverging from Gibsonian concept of direct perception.

Discussion

One long-lasting debate on affordance centers on the distinction between representational and direct perception of affordance. An outstanding theme shared by many embodied theories of cognition is the replacement hypothesis (e.g. van Gelder, 1998), which challenges the necessity of representation as posited by computationalism’s cognitive theories (e.g. Fodor, 1975). This hypothesis suggests that input is discretized/categorized and subjected to abstraction or symbolization, creating discrete stand-ins for the input (e.g. representations/states). Such representationalization would lead to a categorization between the affordable (the objects) and those beyond affordance (the environment), in contrast to the perspective offered by embodied theories. The present study probed this ‘representationalization’ of affordance by examining whether affordance perception introduces discontinuity and qualitative dissociation in response to continuous action-related physical features (such as object size relative to the agents), which allows sensorimotor input to be assigned into discrete states/kinds, in line with the representation-based view under the constraints of body size. Alternatively, it assessed whether activity directly mirrors the input, free from discretization/categorization/abstraction, in line with the representation-free view.

First, our study found evidence demonstrating discretization in affordance perception. Then, through the body imagination experiment, we provided causal evidence suggesting that this discretization originates from sensorimotor interactions with objects rather than amodal sources, such as abstract object concepts independent of agent motor capability. Finally, we demonstrated the supramodality of this embodied discontinuity by leveraging the recent advances in AI. We showed that the discretization in affordance perception is supramodally accessible to disembodied agents such as large language models (LLMs), which lack sensorimotor input but can access linguistic materials built upon discretized representations. These results collectively suggest that sensorimotor input undergoes discretization, as implied in the computationalism’s idea of representation. Note that, these results are not contradictory to the claim of the embodied theories, as these representations do shape processes beyond the sensorimotor domain but after discretization.

This observed boundary in affordance perception extends the understanding of the discontinuity in perception in response to the continuity of physical inputs (Harnad, 1987; Young et al., 1997). Perceptual boundary has been demonstrated in various domains, such as color perception (Bornstein and Korda, 1984), speech-sounds perception (Liberman et al., 1957), and facial gender discrimination (Campanella et al., 2001). The boundaries reflect a fundamental adaptation of perception to facilitate categorizations necessary for an organism (Goldstone and Hendrickson, 2010). Our study, for the first time, unveiled a boundary in object affordance, wherein affordance similarity across the boundary was significantly lower than that within the boundary. Critically, the boundary separating object affordances along a size axis coincided with human body size, suggesting that object affordances are characterized in a dimension scaled by human body size.

What is the function of the affordance boundary? About four decades ago, Gibson, 1979 postulated that only objects of sizes comparable to an animal’s body size are amenable to interaction and capable of providing affordances to the animal, thereby possessing ecological values that distinctly differ from those of larger objects. In this study, we expand upon this notion by arguing that the affordance boundary serves to delineate (manipulable) objects from their surrounding environment. In other words, objects within the range of an animal’s body size are indeed objects in the animal’s eye and possess affordances as defined by Gibson. In contrast, objects larger than that range typically surpass the animal’s motor capabilities, rendering them too cumbersome for effective manipulation. Consequently, these larger objects are less likely to be considered as typical targets for manipulation by the animal, as opposed to the smaller objects. That is, they are perceived not as the “objects” in the animal’s eye, but as part of the background environment, due to their impracticality for direct interactions. Future studies should incorporate a broader range of objects and a more comprehensive set of affordances for finer delineation of the representational discontinuity between objects and the environment.

This speculation aligns with previous fMRI studies where large objects activated the medial portion of the ventral temporal cortex (Huang et al., 2022; Magri et al., 2021), overlapping with the parahippocampus gyrus involved in scene representation (Park et al., 2011; Troiani et al., 2014), and smaller objects activated the lateral portion, such as the pFs, where the congruency effect of affordance was identified in our study. Furthermore, we found that the congruency effect was only evident for objects within the body size range, but not for objects beyond, supporting the idea that affordance is typically represented only for objects within the body size range. While it is acknowledged that the sample size of the fMRI study was small (12 participants), necessitating cautious interpretation of its results, the observed neural-level affordance discontinuity is notable. That is, qualitative differences in neural activity between objects within the affordance boundary and those beyond replicated our behavioral findings. This convergent evidence reinforced our claim that objects were discretized into two broad categories along the continuous size axis, with affordance only being manifested for objects within the boundary.

In this context, an animal’s body size and sensorimotor capacity determine the boundary of manipulation, and thus, the boundary between manipulable objects and the environment. Therefore, our study provides a novel perspective on a long-standing question in psychology, cognitive science, and philosophy: what constitutes an object? Existing psychological studies, especially in the field of vision, define objects in a disembodied manner, primarily relying on their physical properties such as shape (e.g. Op de Beeck et al., 2008) and absolute size (e.g. Konkle and Oliva, 2011). Our identification of the affordance boundary presents a new source of object-ness: the capability of being a source of affordance under the constraints of an animal’s sensorimotor capacity, which resonates the embodied influence on the formation of abstract concepts (e.g. Barsalou, 1999; Lakoff and Johnson, 1980) of objects and environment. Consistently, our fMRI data did not show the congruency effect for objects beyond the body size range, distinct from objects within this range, suggesting a categorization influenced by objects’ relative size to the human body. In this respect, man is indeed the measure of all things.

The metric provided by the body size, however, was changeable when the body schema was intentionally altered through participants’ imagination of possessing either a cat- or elephant-sized body, with which the participants had no prior sensorimotor experience. Importantly, they perceived new affordances in a manner as if they had had embodied experience with this new body schema. Therefore, this finding suggests that the affordance boundary is cognitively penetrable, arguing against the directness of affordance perception (e.g. Gibson, 1979; Greeno, 1994; Prindle et al., 1980) or the exclusive sensorimotor origin of affordances (e.g. Gallagher, 2017; Thompson, 2010; Hutto and Myin, 2012; Chemero, 2013). Further, this finding that the boundary adapted to manipulation on body schema suggests that the abstraction/representationalization may be dynamically updated in response to the current motor capacity and body schema of the agent, suggesting that the affordance-based process is probably determined dynamically by the nature of the agent-object dyads, rather than being a fixed belief about objects. Future studies could explore the dynamics of affordance representationalization, probably by investigating how affordance representations evolve during active interactions with novel objects or under conditions of altered motor capabilities. Finally, our findings also suggest that disembodied conceptual knowledge pertinent to action likely modulates affordance perception. Indeed, it has been proposed that conceptual knowledge is grounded in the same neural system that supports action (Barsalou, 1999; Glenberg et al., 2013; Wilson and Golonka, 2013), thereby suggesting that sensorimotor information, along with other model inputs, may be embedded in language (e.g. Casasanto, 2011; Glenberg and Gallese, 2012; Stanfield and Zwaan, 2001), as the grounded theory proposed (see Barsalou, 2008 for a review).

Direct evidence for this speculation comes from the disembodied ChatGPT models, which showed an evident affordance boundary despite lacking direct interaction with the environment. We speculated that ChatGPT models may have formed the affordance boundary through a human prism ingrained within its linguistic training corpus. In fact, when inquired about the size of a hypothetical body constructed for its use, ChatGPT (GPT-4) replied, “It could be the size of an average adult human, around 5 feet 6 inches (167.6 cm) tall. This would allow me to interact with the world and people in a familiar way.” Critically, this size corresponds to the location where the affordance boundary of ChatGPT models was found. In essence, a virtual body schema may have automatically emerged in ChatGPT models, possibly based on the body schema inherited from humans through language, enabling ChatGPT models to display a preliminary ability to reason the relationship between bodily action and objects. It should be noted that the affordance boundary was not present in all LLMs tested. Specifically, LLMs with a smaller number of parameters, such as BERT and GPT-2, did not exhibit any robust boundary, suggesting the emergence of the boundary may depend on language processing ability determined by the scale of training datasets and the complexity of the model (Hestness et al., 2017; Brown et al., 2020), as well as alignment methods used in fine-tuning the model (Ouyang et al., 2022). Nevertheless, caution should be taken when interpreting the capability of LLMs like ChatGPT, which are often considered ‘black boxes’. That is, our observation indicates that certain sensorimotor information is embedded within human language materials presumably through linguistic statistics, but it is not sufficient to assert that LLMs have developed a human-like ability to represent affordances. Furthermore, such information alone may be insufficient for LLMs to mimic the characteristics of the affordance perception in biological intelligence. Future studies are needed to elucidate such limitations.

While the primary focus of our study concerns the nature of human perception of affordance, our findings on ChatGPT models raise an intriguing question that extends beyond psychology and neuroscience into the domain of artificial intelligence (AI). The AI field has predominantly concentrated on disembodied cognition, such as vision and language. In contrast, the utilization of sensorimotor information to interact with and adapt to the world, including affordance perception in our study, represents a crucial human cognitive achievement that remains elusive for AI systems. Traditional AI (i.e. task-specific AI) has been confined to narrowly defined tasks, with substantial limitations in adaptability and autonomy. Accordingly, these systems have served primarily as tools for humans to achieve specific outcomes, rather than as autonomous agents capable of independently formulating goals and translating them into actionable plans. In recent years, significant efforts have been directed towards evolving traditional AI into more agent-like entities, especially in domains like navigation, object manipulation, and other interactions with the physical world. Despite these advancements, the capabilities of AI still fall behind human-level intelligence. On the other hand, embodied cognition theories suggest that sensorimotor interactions with the environment are foundational for various cognitive domains. From this point of view, endowing AI with human-level abilities in physical agent-environment interactions might provide an unreplaceable missing piece for achieving Artificial General Intelligence (AGI). This development would significantly facilitate AI’s role in robotics, particularly in actions essential for survival and goal accomplishment, a promising direction for the next breakthrough in AI (Gupta et al., 2021; Smith and Gasser, 2005).

However, equipping a disembodied AI with the ability for embodied interaction planning within a specific environment remains a complex challenge. By testing the potential representationalization of action possibilities (affordances) in both humans and LLMs, the present study suggests a new approach to enhancing AI’s interaction ability with the environment. For instance, our finding of supramodal affordance representation may indicate a possible pathway for disembodied LLMs to engage in embodied physical interactions with their surroundings. From an optimistic view, these results suggest that LLM-based agents, if appropriately designed, may leverage affordance representations embedded in language to interact with the physical world. Indeed, by clarifying and aligning such representations with the physical constitutes of LLM-based agents, and even by explicitly constructing an agent-specific object space, we may foster the sensorimotor interaction abilities of LLM-based agents. This progression could lead to achieving animal-level interaction abilities with the world, potentially sparking new developments in the field of embodied cognition theories.

Although our study showed the supramodality of the representationalization of affordance, two questions remain. First, the magnitude of the boundary observed in ChatGPT models was smaller than that in humans. This discrepancy might be compensated by merely enhancing the language processing ability of LLMs. Alternatively, direct interaction with the environment may be necessary for LLMs to achieve human-level performance in affordance perception. Second, the size of virtual body schema of ChatGPT models, if present, coincided with human body size. Integrating LLMs with real robots (e.g. Driess et al., 2023) may pose a challenge because the to-be-supported robots or cars for autopilot might not fall within human body size range. Future studies may be needed to align the inherited body schema with the actual constitution of the robots. Addressing these questions is beyond the scope of the present study but may hold significant implications for the development of AI systems possessing human-level ingenuity and adaptability in interacting with the world.

In summary, our findings regarding the affordance boundary highlight the interdependence between an agent and the external world in shaping cognition. Furthermore, taking our finding with embodied humans and disembodied LLMs into account, we propose a revision to the purely sensorimotor-based concept of affordance by emphasizing a disembodied, perhaps conceptual, addition to it. That is, the embodied cognition and symbolic processing of language may be more intricately and fundamentally connected than previously thought: perception-action problems and language problems can be treated as the same kind of process (Wilson and Golonka, 2013). In this context, man is the measure of both the world and the words, for both humans and AIs. The presence of such a metric may shed light on the development of AI systems that can fully capture essential human abilities founded on sensorimotor interactions with the world.

Materials and methods

Participants

A total of five hundred and thirty-four Chinese participants were recruited for the original object-action relation judgment task online (https://www.wjx.cn/) in China. Research advertisements were distributed through an online message board associated with Tsinghua University. Most participants were undergraduate or graduate students of Tsinghua University, and the rest were from the general public. Six participants were excluded from the data analyses because their task completion time did not pass the predetermined minimum completion time criteria, leaving us with a final sample of 528 participants (311 males, aged from 16 to 73, mean age=24.1 years). For the object-action relation judgment task with manipulated body schema, another 139 participants from the same population were recruited from the same platform. We chose a smaller sample size for the imagination experiment compared to than for the object-action relation judgment task, because inspection of the data of the first sample showed that the affordance pattern became stable after the first 50 participants. To maximize the validity of the manipulation, data from participants whose imagined height fell within the average human size range (100cm - 200cm) were excluded from further analysis. Consequently, 100 participants (49 males, aged from 17 to 39 years, mean age=23.2 years) remained in the analysis. This exclusion criterion was broader than the standard adult human height range of 140cm to 180cm (NCD Risk Factor (NCD-RisC), 2016). This approach ensured that our analysis focused on participants who unambiguously imagined a body schema different from humans, yet within the known height range of cats and elephants. Each participant completed an online consent form before starting the experiment. For the fMRI experiment, twelve undergraduate or graduate Chinese students (8 males, aged from 19 to 31 years, mean age=23.7 years) from Tsinghua University participated. The sample size of the fMRI study was decided according to the sample size of existing studies reporting action-related effects with similar paradigms. All participants reported normal or corrected-to-normal vision. Each participant completed a pre-scan MRI safety questionnaire and a consent form before the experiment.

This study was approved by the Institutional Review Board at Beijing Normal University (202003180020). All participants were compensated financially for their time.

Stimuli

For all the behavioral tasks, the stimuli comprised 27 objects from the THINGS database (Hebart et al., 2019). Each image was portrayed as a typical exemplar of a daily-life object isolated against a white background, sized 400×400 pixels. The objects spanned real-world size rank 2–8, as classified by Konkle and Oliva, 2011, where the actual size of each object was measured as the diagonal size of its bounding box. The size rank was calculated as a logarithmic function of the diagonal size, with smaller ranks corresponding to smaller real-world sizes (e.g. the airplane is in size rank 8 and the apple is in size rank 2). The full list of objects, their diagonal size, and size rankings were provided in Supplementary file 2. The objects were selected from the dataset in Konkle and Oliva, 2011 to cover typic object sizes in the world (ranging from 14 cm to 7618 cm), and actions related to these objects were selected to span a spectrum of daily humans-objects/environments interactions, from single-point movements (e.g. hand, foot) to whole-body movements (e.g. lying, standing), based on the Kinetics Human Action Video Dataset (Kay et al., 2017).

For the fMRI experiment, the stimuli included images of 4 objects (bed, bottle, ball, and piano), with 5 exemplars for each object. The resulting 20 images (4 objects ×5 exemplars, from the THINGS database) each depicted an isolated object against a white background, all sized 400×400 pixels.

Procedure

Object-action relation judgment task for human participants

To measure the perceived affordances of objects, we developed an object-action relation judgement task, requiring participants to map 27 objects with 14 actions. The 27 object images were pre-randomly divided into three groups (nine images each) to form nine-box grids for display convenience. The 14 actions covered common interactions between human and objects or environments identified in the kinetics human action video dataset (Kay et al., 2017).

The task comprised 42 trials (14 actions ×3 object groups) in total. In each trial, one group of object images (nine object images) and a question asking the appropriateness of applying a specific action to each object were shown (e.g. ‘Which objects are sit-able?’, see Figure 1b, top panel). Participants were asked to choose the objects that afforded the specific action according to their own senses. They were informed that there were no right or wrong answers. Each object-action combination would only be presented once during the task. From this task, we would calculate the percentage that one object was judged affording each of the 14 actions across participants. Since previous research has demonstrated a fundamental separation between the processing of animate and inanimate objects (e.g. Konkle and Caramazza, 2013), and the affordances of inanimate objects differ from those of animate objects (Gibson, 1979), we only include 24 inanimate objects in the following analysis by excluding three animate objects (animals: bird, dog, and horse).

Manipulation of body schema

To manipulate participants’ perceived body schema, we asked the participants to imagine themselves as small as a cat, or as large as an elephant. Each participant was randomly assigned to one body-schema condition. Before the experiment started, we presented an instruction screen with an illustration: “Please imagine that you have now grown smaller/larger than your real size, to roughly the same size as a cat/an elephant, as shown in the image below. Please answer the following questions based on this imagined situation.” The illustration was also presented in each trial, above the action question and the object images. At the end of the task, as a manipulation check, participants were asked to indicate their imagined body size by responding to the question: “What is the approximate height (cm) you imagine yourself to be during the whole task?”

Object-action relation judgment task for large language models

To test the perceived affordance of the same set of objects by large language models (LLMs), BERT (Bidirectional Encoder Representations from Transformers), GPT-2, and ChatGPT models (based on GPT-3.5 and GPT-4, respectively) were tasked with the same object-action judgement task. Different from the human task, nouns were presented to the models instead of object images (Figure 1b, bottom panel, for example).

For BERT, the task was formatted as a mask-filling task, in which the inputs were questions such as ‘Among airplanes, kettles, plates, umbrellas, laptops, beds, [MASK] can be sit-able’. We recorded the likelihood score that BERT provided for each listed object at the masked position. For the example question, the possibility score for the word ‘airplane’ was 0.00026.

For GPT-2, the input questions were like, ‘Among airplanes, kettles, plates, umbrellas, laptops, beds, the thing that can be sit-able is the [blank space]’. The likelihood scores GPT-2 provided for each listed object in the position after the input sentence (blank space) were recorded.

To mimic sampling from human participants, we ran BERT and GPT-2 each 20 times with different random seeds in the dropout layers, considering them as different subjects.

For ChatGPT models, the task was in a direct question-and-answer format. We asked, for example, “Which objects are sit-able: ‘airplane, kettle, plate, …brick’? ” and the models responded by naming a subset of the object list. To get the probability for each object-action pair, ChatGPT models were run on the same task 20 times, with each new conversation on the OpenAI website (https://chat.openai.com/chat) considered as one subject. The percentage that an object was judged affording each of the 14 actions was calculated by averaging the output across conversations.

Representational similarity matrix for perceived affordance

For each object, we calculated the probability that it was judged affording each of the 14 actions across participants to create a 14-dimension vector. Affordance similarity (r) between each object pair was then calculated based on the Pearson’s correlation between these affordance vectors. A 24 × 24 symmetric matrix was then generated, with the affordance similarity between object i and object j being denoted in cell (i,j). A hierarchical clustering analysis was performed, employing the seaborn clustermap method with Euclidean distance and Complete linkage (Waskom, 2021).

Affordance similarity between neighboring size ranks

To test the relationship between object affordance and object sizes, we first averaged the affordance vector among objects within each size rank. Next, the Pearson’s correlation between the average vectors of neighboring size ranks was calculated as the similarity index for each pair of neighboring size ranks, representing how similar the affordance was collectively provided by objects in these two ranks. Pearson and Filon, 1898 Z, implemented in the R package ‘cocor’ (Diedenhofen and Musch, 2015) was used to evaluate the significance of these similarities (alpha level=.05, one-tail test). For significance tests, Bayesian statistical analyses were conducted using the web version of the ‘bayesplay’ R package (Colling, 2021). Specifically, the data (likelihood) model was specified as a normal distribution, where the correlation coefficients were transformed to Fisher’s z. The null hypothesis was specified as a standard normal distribution centered at zero. Conversely, the alternative hypothesis was specified as a normal distribution centered at 2. Bayes factors (BF10) were calculated and interpreted using the classification scheme suggested by Wagenmakers et al., 2011, wherein a Bayes factor greater than 10 is considered strong evidence for accepting H1 over H0.

Size similarity between neighboring size ranks

The size of each object was indexed by its real-world size documented in Konkle and Oliva, 2011. Size similarity between size rank i to j was represented as the difference between the averaged diagonal sizes of objects in size rank i and j relative to that of objects in rank i:

Sizesimilarity(i,j)=1diagonalsizejdiagonalsizeidiagonalsizei

Object-level affordance similarity

This analysis focused on objects within size rank 3–6. Pearson’s correlations between affordance vectors were conducted for objects within the same size rank as well as for objects from adjacent ranks. We traversed all possible object pairs and plotted the resulting correlation values against the mean sizes of the two objects. We also plotted the average similarity indexes across objects of the same rank composition.

Trough value

To quantify the magnitude of the trough (sharp decrease) observed in the affordance similarity curve, we first measured the trough value by subtracting the similarity value at the trough from the similarity values at its two banks (the sites neighboring the trough site):

Troughvalue=ri+1+ri12ri

where ri indicates the affordance similarity between size rank i and size rank i+1. The higher the trough value is, the larger the decrease is.

A permutation test was conducted to evaluate if the trough value was significant above zero for both LLMs and human data. The p-value for this test follows the formula adapted from Unpingco, 2016:

p(T>Tobs)=1Ni=1,NI(TiTobs)

where Tobs is the observed trough value, and I is the indicator function. Under the alpha level of 0.05, if p<0.05, then the Tobs is considered a significant value above zero.

fMRI experiment

The fMRI scanning consisted of one high-resolution T1 anatomical run and four task runs for each participant. In each task run, participants performed four action blocks (grasp, kick, lift, and sit). The block order was counterbalanced across runs. Within each block (see Figure 3a), an introduction screen showing a question “Which objects are [grasp, kick, lift, sit]-able” was presented for 2 s at the beginning to indicate the action type, followed by 20 object images (4 objects ×5 exemplars). The object images were presented in a random order, for 2 s each, with a jittered inter-stimulus interval (ISI) varying between 2–4 s. Participants were asked to judge whether the object shown was grasp/kick/lift/sit-able or not by pressing corresponding buttons (e.g. yes: right index finger; no: left index finger). The response buttons were also counterbalanced across participants. The task run lasted for 464 s in total, with the four blocks separated by 10 s fixation periods.

With this design, we were able to measure the neural activation of objects within agent size range and those beyond. Further, for each object, there would be congruent trials (e.g. grasp-able – bottle: affordance=1) and incongruent trials (e.g. sit-able – bottle: affordance=0). We were then able to locate the brain regions representing the objects’ affordance by comparing trials in which the presented objects afforded the presented action option with those that did not, that is to locate the regions showing congruency effect (congruent - incongruent).

fMRI data acquisition

Imaging data were collected using a 3T Siemens Prisma MRI scanner with a 64-channel phase-arrayed head coil at the Centre for Biomedical Imaging Research at Tsinghua University. High-resolution T1-weighted images were acquired with a magnetization-prepared rapid acquisition gradient-echo (MPRAGE) sequence (TR/TE=2530/2.27ms, flip angle = 7°, voxel resolution=1 × 1×1 mm). Functional blood-oxygen-level-dependent (BOLD) images were acquired with a T2*-weighted gradient echo-planar sequence (TR/TE=2000/34.0ms, flip angle=90°, voxel resolution=2 × 2×2 mm, FOV=200 × 200 mm). Earplugs were used to attenuate the scanner noise, and a foam pillow and extendable padded head clamps were used to restrain head motion. All the stimuli were projected onto a screen at the back of the scanner with a resolution of 1024×768, and were viewed from a distance of approximately 110 cm via a mirror placed on the head coil.

fMRI data analyses

Structural T1 and functional images were preprocessed using FSL (FMRIB’s Software Library, https://fsl.fmrib.ox.ac.uk/fsl/fslwiki) v6.0.5 (Jenkinson et al., 2012). A standard preprocessing pipeline was applied, including skull stripping using the BET (Brain Extraction Tool; Smith, 2002), slice-timing correction, motion correction using the MCFLIRT method (Jenkinson et al., 2002), temporal high-pass filtering (100 s), and spatial smoothing using a Gaussian kernel of full-width half magnitude (FWHM) 5 mm. Each run’s functional data were registered to a T1-weighted standard image (MNI152) with FLIRT.

For functional data analysis, a first-level voxel-wise general linear model (GLM) implemented in a FEAT analysis was performed on each run separately. To get neural activation maps for objects within and beyond versus baseline, the GLM included three regressors: objects within body size (bottle and football), objects beyond body size (bed and piano), and fixation period as baseline; ISI period, response key press and introduction image were included as 3 nuisance factors. The resultant first-level contrasts of parameter estimates (COPE) were entered into the next higher-level group analyses, and performed using a random-effects model (FLAME stage 1, Beckmann et al., 2003). We focused on two critical contrasts: objects within vs. fixation, and objects beyond vs. fixation, and the conjunction of these two contrasts. The resulting Z-statistic images were thresholded at Z>2.3, p=0.05 (Worsley, 2001), and corrected for multiple comparisons using an adjusted cluster-wise (FWE: family-wise error) significance threshold of p=0.05.

Region of interest (ROI) definition

We chose the pFs, LO, SPL, and M1 as ROIs based on existing literature highlighting their distinct contributions to affordance perception (Borghi, 2005; Sakreida et al., 2016). Eight ROIs depicted in Figure 3b were constructed based on the overlap between the whole-brain map activated by both objects within and beyond and corresponding functional atlases (the pFs and LO from Zhen et al., 2015; the SPL and M1 from Fan et al., 2016). To achieve ROIs of similar sizes, we applied varying thresholds to each cortical area: for the pFs and LO, the atlases were thresholded at 55% and 90%, resulting 266 voxels in the lpFs, 427 voxels in the rpFs, 254 voxels in the lLO and 347 voxels in the rLO; for the SPL and M1, the atlases were thresholded at 78% and 94%, resulting 661 voxels in the lSPL, 455 voxels in the rSPL, 378 voxels in the lM1, and 449 voxels in the rM1. In the subsequent analysis, homologous areas spanning both cortical hemispheres were merged.

Affordance congruency effect

For the affordance congruency effect of each object type, we modelled another GLM containing 5 regressors: congruent conditions for objects within/beyond, respectively, incongruent conditions for objects within/beyond, respectively, and fixation period as baseline; ISI period, response key press and introduction image were included as 3 nuisance factors. The resultant first-level COPEs were subjected to the following ROI analysis. A repeated-measures ANOVA with Object type (WITHIN and BEYOND) and Congruency (Congruent, Incongruent) as within-subjects factors was run on the average beta values (contrast estimate) extracted from their respective contrasts versus the fixation for each ROI.

To search all the possible brain regions that revealed congruency effect of objects beyond, we also ran a whole-brain analysis on the contrast between congruent vs. incongruent condition for objects beyond. The corresponding first-level COPE was entered into the group-level analyses with a random-effects model (FLAME stage 1, Beckmann et al., 2003). The resulting Z-statistic images were thresholded at Z>2.3, p=0.05 (Worsley, 2001), and corrected for multiple comparisons using an adjusted cluster-wise (FWE: family-wise error) significance threshold of p=0.05.

For the whole-brain analyses on the congruency effect, the object size effect, and their interaction, see Figure 3—figure supplement 3 and Supplementary file 1b-1e.

Acknowledgements

We thank our reviewers for thoughtful feedback. This study was funded by Natural Science Foundation of China (31600925, 32371099, 31861143039), Shuimu Scholar Program of Tsinghua University, China Postdoctoral International Exchange Program (YJ20220273), Beijing Municipal Science & Technology Commission, Administrative Commission of Zhongguancun Science Park (Z221100002722012), Tsinghua University Guoqiang Institute (2020GQG1016), Beijing Academy of Artificial Intelligence (BAAI), and Double First-Class Initiative Funds for Discipline Construction.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Jia Liu, Email: liujiathu@tsinghua.edu.cn.

Clare Press, University College London, United Kingdom.

Timothy E Behrens, University of Oxford, United Kingdom.

Funding Information

This paper was supported by the following grants:

  • National Natural Science Foundation of China 31600925 to Shan Xu.

  • National Natural Science Foundation of China 32371099 to Shan Xu.

  • National Natural Science Foundation of China 31861143039 to Jia Liu.

  • Beijing Academy of Artificial Intelligence to Jia Liu.

  • Double First-Class Initiative Funds for Discipline Construction to Jia Liu.

  • Guoqiang Institute, Tsinghua University 2020GQG1016 to Jia Liu.

  • Beijing Municipal Science and Technology Commission, Adminitrative Commission of Zhongguancun Science Park Z221100002722012 to Jia Liu.

  • Tsinghua University Shuimu Scholar Program to Xinran Feng.

  • China Postdoctoral Science Foundation YJ20220273 to Xinran Feng.

Additional information

Competing interests

No competing interests declared.

Author contributions

Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Data curation, Investigation, Methodology, Writing - review and editing.

Conceptualization, Supervision, Funding acquisition, Writing - review and editing.

Ethics

Human subjects: Informed consent was obtained from each participant prior to each experiment. The study was approved by the Institutional Review Board at Beijing Normal University (202003180020).

Additional files

Supplementary file 1. Cortical regions showing significant results in whole-brain analyses (R=right hemisphere, L=left hemisphere; Z>2.3, P=0.05, cluster corrected).
elife-90583-supp1.docx (37.5KB, docx)
Supplementary file 2. The full list of inanimate objects used in the behavioral study, with the corresponding size rank noted according to Konkle and Oliva, 2011.
elife-90583-supp2.docx (22.8KB, docx)
MDAR checklist

Data availability

All analyses are included in the manuscript. The data are freely available from Figshare.

The following dataset was generated:

Feng XR, Xu S, Li Y, Liu J. 2024. Data for 'Body size as a metric for the affordable world'. figshare.

References

  1. Barsalou LW. Perceptual symbol systems. The Behavioral and Brain Sciences. 1999;22:577–609. doi: 10.1017/s0140525x99002149. [DOI] [PubMed] [Google Scholar]
  2. Barsalou LW. Grounded cognition. Annual Review of Psychology. 2008;59:617–645. doi: 10.1146/annurev.psych.59.103006.093639. [DOI] [PubMed] [Google Scholar]
  3. Beckmann CF, Jenkinson M, Smith SM. General multilevel linear modeling for group analysis in FMRI. NeuroImage. 2003;20:1052–1063. doi: 10.1016/S1053-8119(03)00435-X. [DOI] [PubMed] [Google Scholar]
  4. Binkofski F, Fink GR, Geyer S, Buccino G, Gruber O, Shah NJ, Taylor JG, Seitz RJ, Zilles K, Freund HJ. Neural activity in human primary motor cortex areas 4a and 4p is modulated differentially by attention to action. Journal of Neurophysiology. 2002;88:514–519. doi: 10.1152/jn.2002.88.1.514. [DOI] [PubMed] [Google Scholar]
  5. Borghi AM. In: Grounding Cognition: The Role of Perception and Action in Memory, Language, and Thinking. Pecher D, editor. Cambridge University Press; 2005. Object concepts and action. [Google Scholar]
  6. Bornstein MH, Korda NO. Discrimination and matching within and between hues measured by reaction times: some implications for categorical perception and levels of information processing. Psychological Research. 1984;46:207–222. doi: 10.1007/BF00308884. [DOI] [PubMed] [Google Scholar]
  7. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Amodei D. Language Models Are Few-Shot Learners. arXiv. 2020 https://arxiv.org/abs/2005.14165
  8. Campanella S, Chrysochoos A, Bruyer R. Categorical perception of facial gender information: Behavioural evidence and the face-space metaphor. Visual Cognition. 2001;8:237–262. doi: 10.1080/13506280042000072. [DOI] [Google Scholar]
  9. Casasanto D. Different bodies, different minds: the body specificity of language and thought. Current Directions in Psychological Science. 2011;20:378–383. doi: 10.1177/0963721411422058. [DOI] [Google Scholar]
  10. Castiello U, Bennett KMB, Stelmach GE. Reach to grasp: the natural response to perturbation of object size. Experimental Brain Research. 1993;94:163–178. doi: 10.1007/BF00230479. [DOI] [PubMed] [Google Scholar]
  11. Cesari P, Newell KM. Body-scaled transitions in human grip configurations. Journal of Experimental Psychology. Human Perception and Performance. 2000;26:1657–1668. doi: 10.1037//0096-1523.26.5.1657. [DOI] [PubMed] [Google Scholar]
  12. Chemero A. Radical embodied cognitive science. Review of General Psychology. 2013;17:145–150. doi: 10.1037/a0032923. [DOI] [Google Scholar]
  13. Colling LJ. Ljcolling/go-Bayesfactor. Version v0.9.0Zenodo. 2021 https://zenodo.org/records/4642331
  14. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv. 2018 https://arxiv.org/abs/1810.04805
  15. Diedenhofen B, Musch J. cocor: A comprehensive solution for the statistical comparison of correlations. PLOS ONE. 2015;10:e0121945. doi: 10.1371/journal.pone.0121945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Driess D, Xia F, Sajjadi MS, Lynch C, Chowdhery A, Ichter B, Florence P. PaLM-E: An Embodied Multimodal Language Model. arXiv. 2023 https://arxiv.org/abs/2303.03378
  17. Fan L, Li H, Zhuo J, Zhang Y, Wang J, Chen L, Yang Z, Chu C, Xie S, Laird AR, Fox PT, Eickhoff SB, Yu C, Jiang T. The human brainnetome atlas: a new brain atlas based on connectional architecture. Cerebral Cortex. 2016;26:3508–3526. doi: 10.1093/cercor/bhw157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Filimon F, Nelson JD, Hagler DJ, Sereno MI. Human cortical representations for reaching: mirror neurons for execution, observation, and imagery. NeuroImage. 2007;37:1315–1328. doi: 10.1016/j.neuroimage.2007.06.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Fodor JA. The Language of Thought. Harvard University Press; 1975. [Google Scholar]
  20. Gallagher S. Enactivist Interventions: Rethinking the Mind. Oxford University Press; 2017. [Google Scholar]
  21. Gibson JJ. The Ecological Approach to Visual Perception. Psychology Press; 1979. [Google Scholar]
  22. Glenberg AM, Gallese V. Action-based language: A theory of language acquisition, comprehension, and production. Cortex; a Journal Devoted to the Study of the Nervous System and Behavior. 2012;48:905–922. doi: 10.1016/j.cortex.2011.04.010. [DOI] [PubMed] [Google Scholar]
  23. Glenberg AM, Witt JK, Metcalfe J. From the revolution to embodiment: 25 years of cognitive psychology. Perspectives on Psychological Science. 2013;8:573–585. doi: 10.1177/1745691613498098. [DOI] [PubMed] [Google Scholar]
  24. Goldstone RL, Hendrickson AT. Categorical perception. Wiley Interdisciplinary Reviews. Cognitive Science. 2010;1:69–78. doi: 10.1002/wcs.26. [DOI] [PubMed] [Google Scholar]
  25. Greeno JG. Gibson’s affordances. Psychological Review. 1994;101:336–342. doi: 10.1037/0033-295x.101.2.336. [DOI] [PubMed] [Google Scholar]
  26. Grill-Spector K, Kushnir T, Hendler T, Malach R. The dynamics of object-selective activation correlate with recognition performance in humans. Nature Neuroscience. 2000;3:837–843. doi: 10.1038/77754. [DOI] [PubMed] [Google Scholar]
  27. Gupta A, Savarese S, Ganguli S, Fei-Fei L. Embodied intelligence via learning and evolution. Nature Communications. 2021;12:5721. doi: 10.1038/s41467-021-25874-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Harnad S. Psychophysical and Cognitive Aspects of Categorical Perception: A Critical Overview. University of Southampton; 1987. [Google Scholar]
  29. Hebart MN, Bankson BB, Harel A, Baker CI, Cichy RM. The representational dynamics of task and object processing in humans. eLife. 2018;7:e32816. doi: 10.7554/eLife.32816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Hebart MN, Dickter AH, Kidder A, Kwok WY, Corriveau A, Van Wicklin C, Baker CI. THINGS: A database of 1,854 object concepts and more than 26,000 naturalistic object images. PLOS ONE. 2019;14:e0223792. doi: 10.1371/journal.pone.0223792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Hestness J, Narang S, Ardalani N, Diamos G, Jun H, Kianinejad H, Zhou Y. Deep Learning Scaling Is Predictable, Empirically. arXiv. 2017 https://arxiv.org/abs/1712.00409
  32. Huang T, Song Y, Liu J. Real-world size of objects serves as an axis of object space. Communications Biology. 2022;5:749. doi: 10.1038/s42003-022-03711-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Hutto DD, Myin E. Radicalizing Enactivism: Basic Minds without Content. MIT Press; 2012. [DOI] [Google Scholar]
  34. Jenkinson M, Bannister P, Brady M, Smith S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. NeuroImage. 2002;17:825–841. doi: 10.1016/s1053-8119(02)91132-8. [DOI] [PubMed] [Google Scholar]
  35. Jenkinson Mark, Beckmann CF, Behrens TEJ, Woolrich MW, Smith SM. FSL. NeuroImage. 2012;62:782–790. doi: 10.1016/j.neuroimage.2011.09.015. [DOI] [PubMed] [Google Scholar]
  36. Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Zisserman A. The Kinetics Human Action Video Dataset. arXiv. 2017 https://arxiv.org/abs/1705.06950
  37. Konkle T, Oliva A. Canonical visual size for real-world objects. Journal of Experimental Psychology. Human Perception and Performance. 2011;37:23–37. doi: 10.1037/a0020413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Konkle T, Oliva A. A real-world size organization of object responses in occipitotemporal cortex. Neuron. 2012;74:1114–1124. doi: 10.1016/j.neuron.2012.04.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Konkle T, Caramazza A. Tripartite organization of the ventral stream by animacy and object size. The Journal of Neuroscience. 2013;33:10235–10242. doi: 10.1523/JNEUROSCI.0983-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Kourtis D, Vandemaele P, Vingerhoets G. Concurrent cortical representations of function- and size-related object affordances: an fmri study. Cognitive, Affective, & Behavioral Neuroscience. 2018;18:1221–1232. doi: 10.3758/s13415-018-0633-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Lakoff G, Johnson M. The metaphorical structure of the human conceptual system. Cognitive Science. 1980;4:195–208. doi: 10.1207/s15516709cog0402_4. [DOI] [Google Scholar]
  42. Liberman AM, Harris KS, Hoffman HS, Griffith BC. The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology. 1957;54:358–368. doi: 10.1037/h0044417. [DOI] [PubMed] [Google Scholar]
  43. Magri C, Konkle T, Caramazza A. The contribution of object size, manipulability, and stability on neural responses to inanimate objects. NeuroImage. 2021;237:118098. doi: 10.1016/j.neuroimage.2021.118098. [DOI] [PubMed] [Google Scholar]
  44. Malach R, Reppas JB, Benson RR, Kwong KK, Jiang H, Kennedy WA, Ledden PJ, Brady TJ, Rosen BR, Tootell RB. Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. PNAS. 1995;92:8135–8139. doi: 10.1073/pnas.92.18.8135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Mark LS. Eyeheight-scaled information about affordances: a study of sitting and stair climbing. Journal of Experimental Psychology. Human Perception and Performance. 1987;13:361–370. doi: 10.1037//0096-1523.13.3.361. [DOI] [PubMed] [Google Scholar]
  46. Matić K, Op de Beeck H, Bracci S. It’s not all about looks: The role of object shape in parietal representations of manual tools. Cortex; a Journal Devoted to the Study of the Nervous System and Behavior. 2020;133:358–370. doi: 10.1016/j.cortex.2020.09.016. [DOI] [PubMed] [Google Scholar]
  47. McDannald DW, Mansour M, Rydalch G, Bolton DAE. Motor affordance for grasping a safety handle. Neuroscience Letters. 2018;683:131–137. doi: 10.1016/j.neulet.2018.05.040. [DOI] [PubMed] [Google Scholar]
  48. Merleau-Ponty M, Smith C. Phenomenology of Perception. London: Routledge; 1962. [DOI] [Google Scholar]
  49. NCD Risk Factor (NCD-RisC) A century of trends in adult human height. eLife. 2016;5:e13410. doi: 10.7554/eLife.13410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Newell KM, Scully DM, McDonald PV, Baillargeon R. Task constraints and infant grip configurations. Developmental Psychobiology. 1989;22:817–831. doi: 10.1002/dev.420220806. [DOI] [PubMed] [Google Scholar]
  51. Op de Beeck HP, Torfs K, Wagemans J. Perceived shape similarity among unfamiliar objects and the organization of the human object vision pathway. The Journal of Neuroscience. 2008;28:10111–10123. doi: 10.1523/JNEUROSCI.2511-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. OpenAI Introducing ChatGPT. 2023. [February 12, 2023]. https://openai.com/blog/chatgpt
  53. Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, Lowe R. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems. 2022;35:27730–27744. [Google Scholar]
  54. Park S, Brady TF, Greene MR, Oliva A. Disentangling scene content from spatial boundary: complementary roles for the parahippocampal place area and lateral occipital complex in representing real-world scenes. The Journal of Neuroscience. 2011;31:1333–1340. doi: 10.1523/JNEUROSCI.3885-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Pearson K, Filon LNG. VII. Mathematical contributions to the theory of evolution.— IV. On the probable errors of frequency constants and on the influence of random selection on variation and correlation. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of A Mathematical or Physical Character. 1898;191:229–311. doi: 10.1098/rsta.1898.0007. [DOI] [Google Scholar]
  56. Prindle SS, Carello C, Turvey MT. Animal-environment mutuality and direct perception. Behavioral and Brain Sciences. 1980;3:395–397. doi: 10.1017/S0140525X0000563X. [DOI] [Google Scholar]
  57. Pylyshyn Z. Is vision continuous with cognition? The case for cognitive impenetrability of visual perception. The Behavioral and Brain Sciences. 1999;22:341–365. doi: 10.1017/s0140525x99002022. [DOI] [PubMed] [Google Scholar]
  58. Radford A, Narasimhan K, Salimans T, Sutskever I. Improving Language Understanding by Generative Pre-Training. OpenAI; 2018. [Google Scholar]
  59. Sakreida K, Effnert I, Thill S, Menz MM, Jirak D, Eickhoff CR, Ziemke T, Eickhoff SB, Borghi AM, Binkofski F. Affordance processing in segregated parieto-frontal dorsal stream sub-pathways. Neuroscience & Biobehavioral Reviews. 2016;69:89–112. doi: 10.1016/j.neubiorev.2016.07.032. [DOI] [PubMed] [Google Scholar]
  60. Smith SM. Fast robust automated brain extraction. Human Brain Mapping. 2002;17:143–155. doi: 10.1002/hbm.10062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Smith L, Gasser M. The development of embodied cognition: six lessons from babies. Artificial Life. 2005;11:13–29. doi: 10.1162/1064546053278973. [DOI] [PubMed] [Google Scholar]
  62. Snow JC, Pettypiece CE, McAdam TD, McLean AD, Stroman PW, Goodale MA, Culham JC. Bringing the real world into the fMRI scanner: repetition effects for pictures versus real objects. Scientific Reports. 2011;1:130. doi: 10.1038/srep00130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Stanfield RA, Zwaan RA. The effect of implied orientation derived from verbal context on picture recognition. Psychological Science. 2001;12:153–156. doi: 10.1111/1467-9280.00326. [DOI] [PubMed] [Google Scholar]
  64. Thompson E. Mind in Life: Biology, Phenomenology, and the Sciences of Mind. Harvard University Press; 2010. [Google Scholar]
  65. Troiani V, Stigliani A, Smith ME, Epstein RA. Multiple object properties drive scene-selective regions. Cerebral Cortex. 2014;24:883–897. doi: 10.1093/cercor/bhs364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Tucker M, Ellis R. Action priming by briefly presented objects. Acta Psychologica. 2004;116:185–203. doi: 10.1016/j.actpsy.2004.01.004. [DOI] [PubMed] [Google Scholar]
  67. Unpingco J. Python for Probability, Statistics, and Machine Learning. Cham: Springer International Publishing; 2016. [DOI] [Google Scholar]
  68. Vainio L, Ellis R. Action inhibition and affordances associated with a non-target object: An integrative review. Neuroscience & Biobehavioral Reviews. 2020;112:487–502. doi: 10.1016/j.neubiorev.2020.02.029. [DOI] [PubMed] [Google Scholar]
  69. van Gelder T. The dynamical hypothesis in cognitive science. The Behavioral and Brain Sciences. 1998;21:615–628. doi: 10.1017/s0140525x98001733. [DOI] [PubMed] [Google Scholar]
  70. Varela FJ, Thompson E, Rosch E. The Embodied Mind. MIT press; 2017. [DOI] [Google Scholar]
  71. Wagenmakers E-J, Wetzels R, Borsboom D, van der Maas HLJ. Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011) Journal of Personality and Social Psychology. 2011;100:426–432. doi: 10.1037/a0022790. [DOI] [PubMed] [Google Scholar]
  72. Warren WH. Perceiving affordances: visual guidance of stair climbing. Journal of Experimental Psychology. Human Perception and Performance. 1984;10:683–703. doi: 10.1037//0096-1523.10.5.683. [DOI] [PubMed] [Google Scholar]
  73. Warren WH, Jr, Whang S. Visual guidance of walking through apertures: body-scaled information for affordances. Journal of Experimental Psychology. Human Perception and Performance. 1987;13:371–383. doi: 10.1037//0096-1523.13.3.371. [DOI] [PubMed] [Google Scholar]
  74. Waskom ML. seaborn: statistical data visualization. Journal of Open Source Software. 2021;6:3021. doi: 10.21105/joss.03021. [DOI] [Google Scholar]
  75. Wilson M. Six views of embodied cognition. Psychonomic Bulletin & Review. 2002;9:625–636. doi: 10.3758/bf03196322. [DOI] [PubMed] [Google Scholar]
  76. Wilson AD, Golonka S. Embodied cognition is not what you think it is. Frontiers in Psychology. 2013;4:58. doi: 10.3389/fpsyg.2013.00058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Worsley KJ. Statistical analysis of activation images. Functional MRI: An Introduction to Methods. 2001;14:251–270. doi: 10.1093/acprof:oso/9780192630711.001.0001. [DOI] [Google Scholar]
  78. Young AW, Rowland D, Calder AJ, Etcoff NL, Seth A, Perrett DI. Facial expression megamix: tests of dimensional and category accounts of emotion recognition. Cognition. 1997;63:271–313. doi: 10.1016/s0010-0277(97)00003-6. [DOI] [PubMed] [Google Scholar]
  79. Yu C, Ballard DH, Aslin RN. The role of embodied intention in early lexical acquisition. Cognitive Science. 2005;29:961–1005. doi: 10.1207/s15516709cog0000_40. [DOI] [PubMed] [Google Scholar]
  80. Zhen Z, Yang Z, Huang L, Kong X-Z, Wang X, Dang X, Huang Y, Song Y, Liu J. Quantifying interindividual variability and asymmetry of face-selective regions: a probabilistic functional atlas. NeuroImage. 2015;113:13–25. doi: 10.1016/j.neuroimage.2015.03.010. [DOI] [PubMed] [Google Scholar]

eLife assessment

Clare Press 1

This paper presents valuable findings that shed light on the mental organisation of knowledge about real-world objects. It provides diverse, if incomplete and tentative, evidence from behaviour, brain, and large language models that this knowledge is divided categorically between relatively small objects (closer to the relevant scale for direct manipulation) and larger objects (further from the typical scope of human affordances for action).

Reviewer #1 (Public Review):

Anonymous

Ps observed 24 objects and were asked which afforded particular actions (14 action types). Affordances for each object were represented by a 14-item vector, values reflecting the percentage of Ps who agreed on a particular action being afforded by the object. An affordance similarity matrix was generated which reflected similarity in affordances between pairs of objects. Two clusters emerged, reflecting correlations between affordance ratings in objects smaller than body size and larger than body size. These clusters did not correlate themselves. There was a trough in similarity ratings between objects ~105 cm and ~130 cm, arguably reflecting the body size boundary. The authors subsequently provide some evidence that this clear demarcation is not simply an incidental reflection of body size, but likely causally related. This evidence comes in the flavour of requiring Ps to imagine themselves as small as a cat or as large as an elephant and showing a predicted shift in the affordance boundary. The manuscript further demonstrates that ChatGPT (theoretically interesting because it's trained on language alone without sensorimotor information; trained now on words rather than images) showed a similar boundary.

The authors also conducted a small MRI study task where Ps decide whether a probe action was affordable (graspable?) and created a congruency factor according to the answer (yes/no). There was an effect of congruency in posterior fusiform and superior parietal lobule for objects within body size range, but not outside. No effects in LOC or M1.

The major strength of this manuscript in my opinion is the methodological novelty. I felt the correlation matrices were a clever method for demonstrating these demarcations, the imagination manipulation was also exciting, and the ChatGPT analysis provided excellent food for thought. These findings are important for our understanding of the interactions between action and perception, and hence for researchers from a range of domains of cognitive neuroscience.

The major element that limits conclusions is that an MRI study with 12 P in this context can really only provide pilot data. Certainly the effects are not strong enough for 12 P to generate much confidence. The others of my concerns have been addressed in the revision.

Reviewer #2 (Public Review):

Anonymous

Summary

In this work, the authors seek to test a version of an old idea, which is that our perception of the world and our understanding of the objects in it are deeply influenced by the nature of our bodies and the kinds of behaviours and actions that those objects afford. The studies presented here muster three kinds of evidence for a discontinuity in the encoding of objects, with a mental "border" between objects roughly of human body scale or smaller, which tend to relate to similar kinds of actions that are yet distinct from the kinds of actions implied by human-or-larger scale objects. This is demonstrated through observers' judgments of the kinds of actions different objects afford; through similar questioning of AI large-language models (LLMs); and through a neuroimaging study examining how brain regions implicated in object understanding make distinctions between kinds of objects at human and larger-than-human scales.

Strengths

The authors address questions of longstanding interest in the cognitive neurosciences -- namely how we encode and interact with the many diverse kinds of objects we see and use in daily life. A key strength of the work lies in the application of multiple approaches. Examining the correlations among kinds of objects, with respect to their suitability for different action kinds, is novel, as are the complementary tests of judgments made by LLMs. The authors include a clever manipulation in which participants are asked to judge action-object pairs, having first adopted the imagined size of either a cat or an elephant, showing that the discontinuity in similarity judgments effectively moved to a new boundary closer to the imagined scale than the veridical human scale. The dynamic nature of the discontinuity hints that action affordances may be computed dynamically, "on the fly", during actual action behaviours with objects in the real world.

Weaknesses

A limitation of the tests of LLMs may be that it is not always known what kinds of training material was used to build these models, leading to a possible "black box" problem. Further, presuming that those models are largely trained on previous human-written material, it may not necessarily be theoretically telling that the "judgments" of these models about action-object pairs shows human-like discontinuities. Indeed, verbal descriptions of actions are very likely to mainly refer to typical human behaviour, and so the finding that these models demonstrate an affordance discontinuity may simply reflect those statistics, rather than providing independent evidence for affordance boundaries.

The relatively small sample size of the brain imaging experiment, and some design features (such as the task participants performed, and the relatively narrow range of objects tested) provide some limits on the extent to which it can be taken as support for the authors' claims.

Reviewer #3 (Public Review):

Anonymous

Summary:

Feng et al. test the hypothesis that human body size constrains the perception of object affordances, whereby only objects that are smaller than the body size will be perceived as useful and manipulable parts of the environment, whereas larger objects will be perceived as "less interesting components."

To test this idea, the study employs a multi-method approach consisting of three parts:

In the first part, human observers classify a set of 24 objects that vary systematically in size (e.g., ball, piano, airplane) based on 14 different affordances (e.g., sit, throw, grasp). Based on the average agreement of ratings across participants, the authors compute the similarity of affordance profiles between all object pairs. They report evidence for two homogenous object clusters that are separated based on their size with the boundary between clusters roughly coinciding with the average human body size. In follow-up experiments, the authors show that this boundary is larger/smaller in separate groups of participants who are instructed to imagine themselves as an elephant/cat.

In the second part, the authors ask different large language models (LLMs) to provide ratings for the same set of objects and affordances and conduct equivalent analyses on the obtained data. Some, but not all, of the models produce patterns of ratings that appear to show similar boundary effects, though less pronounced and at a different boundary size than in humans.

In the third part, the authors conduct an fMRI experiment. Human observers are presented with four different objects of different sizes and asked if these objects afford a small set of specific actions. Affordances are either congruent or incongruent with objects. Contrasting brain activity on incongruent trials against brain activity on congruent trials yields significant effects in regions within the ventral and dorsal visual stream, but only for small objects and not for large objects.

The authors interpret their findings as support for their hypothesis that human body size constrains object perception. They further conclude that this effect is cognitively penetrable, and only partly relies on sensorimotor interaction with the environment (and partly on linguistic abilities).

Strengths:

The authors examine an interesting and relevant question and articulate a plausible (though somewhat underspecified) hypothesis that certainly seems worth testing. Providing more detailed insights into how object affordances shape perception would be highly desirable. Their method of analyzing similarity ratings between sets of objects seems useful and the multi-method approach is original and interesting.

Weaknesses:

The study presents several shortcomings that clearly weaken the link between the obtained evidence and the drawn conclusions. Below I outline my concerns in no particular order:

(1) It is not entirely clear to me what the authors are proposing and to what extent the conducted work actually speaks to this. For example, in the introduction, the authors write that they seek to test if body size serves not merely as a reference for object manipulation but also "plays a pivotal role in shaping the representation of objects." This motivation seems rather vague motivation and it is not clear to me how it could be falsified.

Overall, the lack of theoretical precision makes it difficult to judge the appropriateness of the approaches and the persuasiveness of the obtained results. I would strongly suggest clarifying the theoretical rationale and explaining in more detail how the chosen experiments allow them to test falsifiable predictions.

(2) The authors used only a very small set of objects and affordances in their study and they do not describe in sufficient detail how these stimuli were selected. This renders the results rather exploratory and clearly limits their potential to discover general principles of human perception. Much larger sets of objects and affordances and explicit data-driven approaches for their selection would provide a more convincing approach and allow the authors to rule out that their results are just a consequence of the selected set of objects and actions.

(3) Relatedly, the authors could be more thorough in ruling out potential alternative explanations. Object size likely correlates with other variables that could shape human similarity judgments and the estimated boundary is quite broad (depending on the method, either between 80 and 150 cm or between 105 to 130 cm). More precise estimates of the boundary and more rigorous tests of alternative explanations would add a lot to strengthen the authors' interpretation.

(4) While I appreciate the manipulation of imagined body size, as a clever way to solidify the link between body size and affordance perception, I find it unfortunate that it is implemented in a between-subjects design, as this clearly leaves open the possibility of pre-existing differences between groups. I certainly disagree with the authors' statement that their findings suggest "a causal link between body size and affordance perception."

(5) The use of LLMs in the current study is not clearly motivated and I find it hard to understand what exactly the authors are trying to test through their inclusion. As it currently stands, I find it hard to discern how the presence of perceptual boundaries in LLMs could constitute evidence for affordance-based perception.

(6) Along the same lines, the fMRI study also provides little evidence to support the authors' claims. The use of congruency effects as a way of probing affordance perception is not well motivated. Importantly (and related to comment 2 above), the very small set of objects and affordances in this experiment heavily complicates any conclusions about object size being the crucial variable determining the occurrence of congruency effects.

Overall, I consider the main conclusions of the paper to be far beyond the reported data. Articulating a clearer theoretical framework with more specific hypotheses as well as conducting more principled analyses on more comprehensive data sets could help the authors obtain stronger tests of their ideas.

eLife. 2024 Mar 28;12:RP90583. doi: 10.7554/eLife.90583.3.sa4

Author Response

Xinran Feng 1, Shan Xu 2, Yuannan Li 3, Jia Liu 4

The following is the authors’ response to the current reviews.

Responses to the reviewers

We thank the editor and reviewers for their insightful feedback and valuable suggestions on our revised manuscript. In this reply, we provided further clarifications and made changes accordingly. Reviewers’ comments are in bold, and our responses are immediately below. Changes in the main text are presented in italics, accompanied by the specific line numbers in the revised manuscript where these changes can be found. Below, we respond to each reviewer’s comments in turn.

Reviewer #1 (Public Review):

Ps observed 24 objects and were asked which afforded particular actions (14 action types). Affordances for each object were represented by a 14-item vector, values reflecting the percentage of Ps who agreed on a particular action being afforded by the object. An affordance similarity matrix was generated which reflected similarity in affordances between pairs of objects. Two clusters emerged, reflecting correlations between affordance ratings in objects smaller than body size and larger than body size. These clusters did not correlate themselves. There was a trough in similarity ratings between objects ~105 cm and ~130 cm, arguably reflecting the body size boundary. The authors subsequently provide some evidence that this clear demarcation is not simply an incidental reflection of body size, but likely causally related. This evidence comes in the flavour of requiring Ps to imagine themselves as small as a cat or as large as an elephant and showing a predicted shift in the affordance boundary. The manuscript further demonstrates that ChatGPT (theoretically interesting because it's trained on language alone without sensorimotor information; trained now on words rather than images) showed a similar boundary.

The authors also conducted a small MRI study task where Ps decide whether a probe action was affordable (graspable?) and created a congruency factor according to the answer (yes/no). There was an effect of congruency in posterior fusiform and superior parietal lobule for objects within body size range, but not outside. No effects in LOC or M1.

The major strength of this manuscript in my opinion is the methodological novelty. I felt the correlation matrices were a clever method for demonstrating these demarcations, the imagination manipulation was also exciting, and the ChatGPT analysis provided excellent food for thought. These findings are important for our understanding of the interactions between action and perception, and hence for researchers from a range of domains of cognitive neuroscience.

The major element that limits conclusions is that an MRI study with 12 P in this context can really only provide pilot data. Certainly the effects are not strong enough for 12 P to generate much confidence. The others of my concerns have been addressed in the revision.

Reviewer #1 (Recommendations For The Authors):

I think that the authors need to mention in the abstract that the MRI study constitutes a small pilot.

Response: We appreciate the reviewer’s positive evaluation and constructive suggestions. In response to the concern about the limited number of participants in the fMRI study, we fully acknowledge the implications this has on the generalizability and robustness of our findings related to the congruency effect. To clarity, we have explicitly stated its preliminary nature of the MRI study in the abstract [line 22]: “A subsequent fMRI experiment offered preliminary evidence of affordance processing exclusively for objects within the body size range, but not for those beyond.”

Reviewer #2 (Public Review):

Summary

In this work, the authors seek to test a version of an old idea, which is that our perception of the world and our understanding of the objects in it are deeply influenced by the nature of our bodies and the kinds of behaviours and actions that those objects afford. The studies presented here muster three kinds of evidence for a discontinuity in the encoding of objects, with a mental "border" between objects roughly of human body scale or smaller, which tend to relate to similar kinds of actions that are yet distinct from the kinds of actions implied by human-or-larger scale objects. This is demonstrated through observers' judgments of the kinds of actions different objects afford; through similar questioning of AI large-language models (LLMs); and through a neuroimaging study examining how brain regions implicated in object understanding make distinctions between kinds of objects at human and larger-than-human scales.

Strengths

The authors address questions of longstanding interest in the cognitive neurosciences -- namely how we encode and interact with the many diverse kinds of objects we see and use in daily life. A key strength of the work lies in the application of multiple approaches. Examining the correlations among kinds of objects, with respect to their suitability for different action kinds, is novel, as are the complementary tests of judgments made by LLMs. The authors include a clever manipulation in which participants are asked to judge action-object pairs, having first adopted the imagined size of either a cat or an elephant, showing that the discontinuity in similarity judgments effectively moved to a new boundary closer to the imagined scale than the veridical human scale. The dynamic nature of the discontinuity hints that action affordances may be computed dynamically, "on the fly", during actual action behaviours with objects in the real world.

Weaknesses

A limitation of the tests of LLMs may be that it is not always known what kinds of training material was used to build these models, leading to a possible "black box" problem. Further, presuming that those models are largely trained on previous human-written material, it may not necessarily be theoretically telling that the "judgments" of these models about action-object pairs shows human-like discontinuities. Indeed, verbal descriptions of actions are very likely to mainly refer to typical human behaviour, and so the finding that these models demonstrate an affordance discontinuity may simply reflect those statistics, rather than providing independent evidence for affordance boundaries.

The relatively small sample size of the brain imaging experiment, and some design features (such as the task participants performed, and the relatively narrow range of objects tested) provide some limits on the extent to which it can be taken as support for the authors' claims.

Response: We thank the reviewer for the positive evaluation and the constructive comments. We agree that how LLMs work is a “black box”, and thus it is speculative to assume them to possess any human-like ability, because, as the reviewer pointed out, “these models demonstrate an affordance discontinuity may simply reflect those statistics.” Indeed, our manuscript has expressed a similar idea [line 338]: “We speculated that ChatGPT models may have formed the affordance boundary through a human prism ingrained within its linguistic training corpus.” That is, our intention was not to suggest that such information could replace sensorimotor-based interaction or achieve human-level capability, but rather to highlight that embodied interaction is necessary. Additionally, the scope of the present study does not extend to elucidating the mechanisms behind LLMs’ resemblance of affordance boundary, whether through statistical learning or actual comprehension. To clarify this point, in the revised manuscript, we have clarified that the mechanisms underlying the observed affordance boundary in LLMs may be different from human cognitive processes, and advocated future studies to explore this possibility [line 415]: “Nevertheless, caution should be taken when interpreting the capability of LLMs like ChatGPT, which are often considered “black boxes.” That is, our observation indicates that certain sensorimotor information is embedded within human language materials presumably through linguistic statistics, but it is not sufficient to assert that LLMs have developed a human-like ability to represent affordances. Furthermore, such information alone may be insufficient for LLMs to mimic the characteristics of the affordance perception in biological intelligence. Future studies are needed to elucidate such limitation.”

Regarding the concern about the models’ results not “providing independent evidence for affordance boundaries”, our objective in employing LLMs was to explore if an affordance boundary could emerge from conceptual knowledge without direct sensorimotor experience, rather than to validate the existence of the affordance boundary per se.

As for the concern about the limitations imposed by the small sample size and certain design features of our brain imaging experiment, please see our reply to Reviewer #1.

Reviewer #3 (Public Review):

Summary:

Feng et al. test the hypothesis that human body size constrains the perception of object affordances, whereby only objects that are smaller than the body size will be perceived as useful and manipulable parts of the environment, whereas larger objects will be perceived as "less interesting components."

To test this idea, the study employs a multi-method approach consisting of three parts:

In the first part, human observers classify a set of 24 objects that vary systematically in size (e.g., ball, piano, airplane) based on 14 different affordances (e.g., sit, throw, grasp). Based on the average agreement of ratings across participants, the authors compute the similarity of affordance profiles between all object pairs. They report evidence for two homogenous object clusters that are separated based on their size with the boundary between clusters roughly coinciding with the average human body size. In follow-up experiments, the authors show that this boundary is larger/smaller in separate groups of participants who are instructed to imagine themselves as an elephant/cat.

In the second part, the authors ask different large language models (LLMs) to provide ratings for the same set of objects and affordances and conduct equivalent analyses on the obtained data. Some, but not all, of the models produce patterns of ratings that appear to show similar boundary effects, though less pronounced and at a different boundary size than in humans.

In the third part, the authors conduct an fMRI experiment. Human observers are presented with four different objects of different sizes and asked if these objects afford a small set of specific actions. Affordances are either congruent or incongruent with objects. Contrasting brain activity on incongruent trials against brain activity on congruent trials yields significant effects in regions within the ventral and dorsal visual stream, but only for small objects and not for large objects.

The authors interpret their findings as support for their hypothesis that human body size constrains object perception. They further conclude that this effect is cognitively penetrable, and only partly relies on sensorimotor interaction with the environment (and partly on linguistic abilities).

Strengths:

The authors examine an interesting and relevant question and articulate a plausible (though somewhat underspecified) hypothesis that certainly seems worth testing. Providing more detailed insights into how object affordances shape perception would be highly desirable. Their method of analyzing similarity ratings between sets of objects seems useful and the multi-method approach is original and interesting.

Weaknesses:

The study presents several shortcomings that clearly weaken the link between the obtained evidence and the drawn conclusions. Below I outline my concerns in no particular order:

(1) It is not entirely clear to me what the authors are proposing and to what extent the conducted work actually speaks to this. For example, in the introduction, the authors write that they seek to test if body size serves not merely as a reference for object manipulation but also "plays a pivotal role in shaping the representation of objects." This motivation seems rather vague motivation and it is not clear to me how it could be falsified.

Overall, the lack of theoretical precision makes it difficult to judge the appropriateness of the approaches and the persuasiveness of the obtained results. I would strongly suggest clarifying the theoretical rationale and explaining in more detail how the chosen experiments allow them to test falsifiable predictions.

(2) The authors used only a very small set of objects and affordances in their study and they do not describe in sufficient detail how these stimuli were selected. This renders the results rather exploratory and clearly limits their potential to discover general principles of human perception. Much larger sets of objects and affordances and explicit data-driven approaches for their selection would provide a more convincing approach and allow the authors to rule out that their results are just a consequence of the selected set of objects and actions.

(3) Relatedly, the authors could be more thorough in ruling out potential alternative explanations. Object size likely correlates with other variables that could shape human similarity judgments and the estimated boundary is quite broad (depending on the method, either between 80 and 150 cm or between 105 to 130 cm). More precise estimates of the boundary and more rigorous tests of alternative explanations would add a lot to strengthen the authors' interpretation.

(4) While I appreciate the manipulation of imagined body size, as a clever way to solidify the link between body size and affordance perception, I find it unfortunate that it is implemented in a between-subjects design, as this clearly leaves open the possibility of pre-existing differences between groups. I certainly disagree with the authors' statement that their findings suggest "a causal link between body size and affordance perception."

(5) The use of LLMs in the current study is not clearly motivated and I find it hard to understand what exactly the authors are trying to test through their inclusion. As it currently stands, I find it hard to discern how the presence of perceptual boundaries in LLMs could constitute evidence for affordance-based perception.

(6) Along the same lines, the fMRI study also provides little evidence to support the authors' claims. The use of congruency effects as a way of probing affordance perception is not well motivated. Importantly (and related to comment 2 above), the very small set of objects and affordances in this experiment heavily complicates any conclusions about object size being the crucial variable determining the occurrence of congruency effects.

Overall, I consider the main conclusions of the paper to be far beyond the reported data. Articulating a clearer theoretical framework with more specific hypotheses as well as conducting more principled analyses on more comprehensive data sets could help the authors obtain stronger tests of their ideas.

Response: We appreciate the insightful inquiries regarding our manuscript. Below, we explained the theoretical motivation and rationale of each part of our experiments.

In response to the reviewer’s insights, we have modified the expression “plays a pivotal role in shaping the representation of objects” in the revised manuscript and have restated the general question of our study in the introduction. Our motivation is on the long-lasting debate over the representation versus direct perception of affordance, specifically examining the “representationalization” of affordance. That is, we tested whether object affordance simply covaried directly with continuous constraints such as object size, a perspective aligned with the representation-free (direct perception) view, or whether affordance became representationalized, adhering to the representation-based view, constrained by body size. Such representationalization would generate a categorization between objects that are affordable and the environment that exceeds affordance.

To test these hypotheses, we first delineated the affordance of various objects. We agree with the reviewer that in this step a broader selection of objects and actions could mitigate the risk of our results being influenced by the specific selection of objects and actions. However, our results are unlikely to be biased, because our selection was guided by two key criteria, rather than being arbitrary. First, the objects were selected from the dataset in Konkle and Oliva's study (2011), which systematically investigated object size’ impact on object recognition, thus providing a well-calibrated range of sizes (i.e., from 14 cm to 7,618 cm) reflective of real-world objects. Second, the selected actions covered a wide range of daily humans-objects/environments interactions, from single-point movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing) based on the kinetics human action video dataset (Kay et al., 2017). Thus, this set of objects and actions is a representative sampling of typical human experiences.

Upon demonstrating a trough in perceived affordance similarity, we recognized the location of the affordance boundary coincidentally fell within the range of human body size. We agree with the reviewer that this observation of the coincidence between body size and the location of boundary alone is not sufficient for a mechanistic explanation, because variables co-varying with object sizes might also generate this coincidence. The identification of a more precise location for the boundary unlikely rules out alternative explanations of this kind. To establish a causal link between body size and the affordance boundary, we opted for a direct manipulation of body sizes through imagination, while keeping all other variables constant across conditions. This approach allowed us to examine whether and how the affordance boundary shifts in response to body size changes.

Regarding the between-subjects design of the imagination experiment, we wish to clarify that this design aimed to prevent carryover effects. Although a within-subjects design indeed is more sensitive in detecting manipulation effects by accounting for subject variability, it risks contamination across conditions. Specifically, transitioning immediately between different imagined body sizes poses a challenge, and sequential participation could induce undesirable response strategies, such as deliberately altering responses to the same objects in different conditions. The between-subjects design, which susceptible to participant variability (e.g., “pre-existing differences between groups” suggested by the reviewer), avoids such contamination. In addition, we employed random assignment of participants to different conditions (cat-size versus elephant-size).

The body imagination experiment provided causal evidence of an embodied discontinuity, suggesting the boundary is tied to the agent’s motor capacity, rather than amodal sources. The LLMs experiment then sought to test a prediction from the embodied theories of cognition: the supramodality of object perception. Especially, we asked whether the embodied discontinuity is supramodally accessible, using LLMs to assess whether affordance perception discretization is supramodally accessible beyond the sensorimotor domain through linguistic understanding. From this perspective, our LLM experiment was employed not to affirm affordance-based perception but to examine and support a prediction by the embodied theories of cognition.

Finally, our preliminary fMRI study aimed to conceptually replicate the perceptual discontinuity and explore it neural correlates using a subset of objects and actions from the behaviour experiments. This approach was chosen to achieve stable neural responses and enhance study power, employing the congruent effect (congruent - incongruent) as a metric for affordance processing (e.g., Kourtis et al., 2018), which reflects facilitated responses when congruent with objects’ affordances (e.g., Ellis & Tucker, 2000). Nevertheless, we recognize the limitation of a relatively small sample sizes, for details please see our reply to the reviewer #1.

In summary, our findings contribute to the discourse on computationalism’s representation concept and influence of these representations, post-discretization, on processes beyond the sensorimotor domain. We hope that these additional explanations and revisions effectively address the concerns raised and demonstrate our commitment to enhancing the quality of our work in light of your valuable feedback. By acknowledging these limitations and directions for future research, we hope to further the discourse on affordance perception and embodied cognition.

References

Ellis, R., & Tucker, M. (2000). Micro‐affordance: The potentiation of components of action by seen objects. British Journal of Psychology, 91(4), 451-471.

Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., ... & Zisserman, A. (2017). The kinetics human action video dataset. arXiv preprint arXiv:1705.06950.

Konkle, T., & Oliva, A. (2011). Canonical visual size for real-world objects. Journal of Experimental Psychology: human perception and performance, 37(1), 23.

Kourtis, D., Vandemaele, P., & Vingerhoets, G. (2018). Concurrent cortical representations of function-and size-related object affordances: an fMRI study. Cognitive, Affective, & Behavioral Neuroscience, 18, 1221-1232.

The following is the authors’ response to the original reviews.

Responses to the reviewers

We deeply appreciate the reviewers’ comments. In response to the concerns raised, we have revised the manuscript accordingly. Below we address each of the reviewers’ comments in turn. Reviewers’ comments are in bold, and our responses are immediately below. Changes in the main text are presented in italics, followed by corresponding page and line numbers in the revised manuscript. We also highlighted tracks of change in the revised manuscript.

Reviewer #1 (Public Review):

(1) The main behavioural work appears well-powered (>500 Ps). This sample reduces to 100 for the imagination study, after removing Ps whose imagined heights fell within the human range (100-200 cm). Why 100-200 cm? 100 cm is pretty short for an adult. Removing 80% of data feels like conclusions from the imagination study should be made with caution.

R1: Sorry for the confusion. We did not remove 80% of the participants; instead, a separate sample of participants was recruited in the imagination experiment. The size of this sample (100 participants) was indeed smaller than the first experiment (528 participants), because the first experiment was set for exploratory purposes and was designed to be over-powered. Besides, inspection of the data of the first sample showed that the affordance pattern became stable after the first 50 participants. We explained this consideration in the revised manuscript:

(p 21, ln 490) “…, another one hundred and thirty-nine participants from the same population were recruited from the same platform. We chose a smaller sample size for the imagination experiment compared to that for the object-action relation judgement task, because inspection of the data of the first sample showed that the affordance pattern became stable after the first 50 participants.”

The average adult human height ranges from 140-170 cm for women and 150180 cm for men (NCD-RisC, 2016). Accordingly, the criterion of 100-200 cm covered this range and was set to ensure that participants unambiguously imagined a body schema different from that of human, as the tallest domestic cat below 100 cm according to the Guinness World Records and an elephant above 200 cm according to Crawley et al. (2017). We clarified these considerations in the revised manuscript:

(p 21, ln 494) “To maximize the validity of the manipulation, data from participants whose imagined height fell within the average human size range (100cm - 200cm) were excluded from further analysis. Consequently, 100 participants (49 males, aged from 17 to 39 years, mean age = 23.2 years) remained in the analysis. This exclusion criterion was broader than the standard adult human height range of 140cm to 180cm (NCD-RisC, 2016). This approach ensured that our analysis focused on participants who unambiguously imagined a body schema different from humans, yet within the known height range of cats and elephants.”

In addition, we also reanalysed the data with a more conservative criterion of 140cm to 180cm, and the results remained.

(2) There are only 12 Ps in the MRI study, which I think should mean the null effects are not interpreted. I would not interpret these data as demonstrating a difference between SPL and LOC/M1, but rather that some analyses happened to fall over the significance threshold and others did not.

R2: We would like to clarify that the null hypothesis of this fMRI study is the lack of two-way interaction between object size and object-action congruency, which was rejected by the observed significant interaction. That is, the interpretation of the present study did not rely on accepting any null effect.

Having said this, we admit that the fMRI experiment is exploratory and the sample size is small (12 participants), which might lead to low power in estimating the affordance effect. In the revision, we acknowledge this issue explicitly:

(p 16, ln 354) “…, supporting the idea that affordance is typically represented only for objects within the body size range. While it is acknowledged that the sample size of the fMRI study was small (12 participants), necessitating cautious interpretation of its results, the observed neural-level affordance discontinuity is notable. That is, qualitative differences in neural activity between objects within the affordance boundary and those beyond replicated our behavioral findings. This convergent evidence reinforced our claim that objects were discretized into two broad categories along the continuous size axis, with affordance only being manifested for objects within the boundary.”

(3) I found the MRI ROI selection and definition a little arbitrary and not really justified, which rendered me even more cautious of the results. Why these particular sensory and motor regions? Why M1 and not PMC or SMA? Why SPL and not other parietal regions? Relatedly, ROIs were defined by thresholding pF and LOC at "around 70%" and SPL and M1 "around 80%", and it is unclear how and why these (different) thresholds were determined.

R3: Our selection of these specific sensory and motor regions was based on prior literature reporting their distinct contribution to affordance perception (e.g., Borghi, 2005; Sakreida et al., 2016). The pFs was chosen as a representative region of the ventral visual stream, involved in object identification and classification, and the SPL was chosen as a representative region of the dorsal visual stream, involved in object perception and manipulation. The primary motor cortex (M1) has also been reported involved in affordance processing (e.g., McDannald et al., 2018), and we chose this region to probe the affordance congruency effect in the motor execution stage of the sense-think-act pathway. We did not choose the premotor cortex (PMC) and the supplementary motor area (SMA) because they were proposedly also involved in processes beyond motor execution (e.g., Hertrich et al., 2016; Kantak et al., 2012), and if any effect was observed, one cannot exclusively attribute the effect to motor execution. As for the parietal regions, our choice of the SPL not IPL/IPS is based on the meta-analysis of affordance processing areas where only the SPL shows consistent activation for both stable and variable affordances (Sakreida et al., 2016). We chose the SPL to capture effects on either type of affordances. In revision, we explained these considerations in the revised manuscript:

(p 14, ln 280) “In addition to the pFs and SPL, we also examined the congruency effect in the lateral occipital cortex (LO), which is involved in object representation (e.g., Grill-Spector et al., 2000; Konkle & Caramazza, 2013) and provides inputs to both the pFs and SPL (Hebart et al., 2018). Meanwhile, the primary motor cortex (M1), which receives inputs from the dorsal stream (Vainio & Ellis, 2020), is involved in affordance processing (e.g., McDannald et al., 2018) and action executions(Binkofski et al., 2002).”

(p 29, ln 684) “We chose the pFs, LO, SPL, and M1 as ROIs based on existing literature highlighting their distinct contributions to affordance perception (Borghi,2005; Sakreida et al., 2016).”

Regarding ROI thresholding, we apologize for the lack of clarity in reporting the thresholds in the original manuscript. The thresholds were different between ventral regions (from Zhen et al., 2015) and dorsal regions (from Fan et al., 2016) because they are from two different atlases. The former was constructed by probability maps of task-state fMRI activity during localizer contrast with stationary images and the latter by a parcellation of the brain's functional connectivity; therefore, the numerical values in these two atlases are not comparable. To extract ROIs with comparable sizes, we selected a threshold of 55% for the pFs, 90% for the LO, 78% for the SPL, and 94% for the M1 in the original manuscript.

To rule out the possibility that the results were distorted by the specific choice of thresholds, we re-ran the analysis with a threshold 80% for all ROIs (resulting in 456 voxels in the lpFs, 427 voxels in the rpFs, 1667 voxels in the lLO, 999 voxels in the rLO, 661 voxels in the lSPL, 310 voxels in the rSPL, 231 voxels in the lM1, and 327 voxels in the rM1) with the 2-by-2 repeated-measures ANOVA. Our results remained the same qualitatively. A significant interaction between object type and congruency was observed in the pFs (F(1,11) = 24.87, p <.001, η�2=.69) and SPL (F(1,11) = 14.62, p = .003, η�2=.57). The simple effect analysis revealed the congruency effect solely for objects within body size range (pFs: p = .003; SPL: p <.001), not for objects beyond (ps >.30). For the M1 and LO, neither significant main effects (ps >.11) nor interactions were found (ps >.20).

We clarified our choice of thresholds in the methods section in the revised manuscript:

(p 29, ln 686) “Eight ROIs depicted in Fig. 3b were constructed based on the overlap between the whole-brain map activated by both objects within and beyond and corresponding functional atlases (the pFs and LO from Zhen et al., 2015; the SPL and M1 from Fan et al., 2016). To achieve ROIs of similar sizes, we applied varying thresholds to each cortical area: for the pFs and LO, the atlases were thresholded at 55% and 90%, resulting in 266 voxels in the lpFs, 427 in the rpFs, 254 in the lLO and 347 in the rLO; for the SPL and M1, the atlases were thresholded at 78% and 94%, resulting in 661 voxels in the lSPL, 455 in the rSPL, 378 in the lM1, and 449 in the rM1. In the subsequent analysis, homologous areas spanning both cortical hemispheres were merged.”

(4) Discussion and theoretical implications. The authors discuss that the MRI results are consistent with the idea we only represent affordances within body size range. But the interpretation of the behavioural correlation matrices was that there was this similarity also for objects larger than body size, but forming a distinct cluster. I therefore found the interpretation of the MRI data inconsistent with the behavioural findings.

R4: We speculated that the similarity in action perception among objects beyond the body size range may be due to these objects being similarly conceptualized as‘environment’, in contrast to the objects within the body size range, which are categorized differently, namely as the ‘objects for the animal.’ Accordingly, in cortical regions involved in object processing, objects conceptualized as‘environment’ unlikely showed the congruency effect, distinct from objects within the body size range. We have explained this point in the revised manuscript:

(p 17, ln 370) “…which resonates the embodied influence on the formation of abstract concepts (e.g., Barsalou, 1999; Lakoff & Johnson, 1980) of objects and environment. Consistently, our fMRI data did not show the congruency effect for objects beyond the body size range, distinct from objects within this range, suggesting a categorization influenced by objects’ relative size to the human body.”

(5) In the discussion, the authors outline how this work is consistent with the idea that conceptual and linguistic knowledge is grounded in sensorimotor systems. But then reference Barsalou. My understanding of Barsalou is the proposition of a connectionist architecture for conceptual representation. I did not think sensorimotor representation was privileged, but rather that all information communicates with all other to constitute a concept.

R5: We are sorry for the confusion. We do not intend to argue that the sensorimotor representation is privileged. Instead, we would like to simply emphasize their engagement in concept. According to our understanding, Barsalou’s Perceptual Symbol Theory proposes that grounded concepts include sensorimotor information, and conceptual knowledge is grounded in the same neural system that supports action (Barsalou, 1999). This is consistent with our proposal that the affordance boundary locked to an animal’s sensorimotor capacity might give rise to a conceptual-ish representation of object-ness specific to the very animal. We have clarified this point in the introduction and discussion on the conceptual knowledge and sensorimotor information:

In the introduction (p 2, ln 59) “…, and the body may serve as a metric that facilitates meaningful engagement with the environment by differentiating objects that are accessible for interactions from those not. Further, grounded cognition theory (see Barsalou, 2008 for a review) suggests that the outputs of such differentiation might transcend sensorimotor processes and integrate into supramodal concepts and language. From this perspective, we proposed two hypotheses...”

In the discussion (p 18, ln 392) “Indeed, it has been proposed that conceptual knowledge is grounded in the same neural system that supports action (Barsalou, 1999; Glenberg et al., 2013; Wilson & Golonka, 2013), thereby suggesting that sensorimotor information, along with other modal inputs, may be embedded in language (e.g., Casasanto, 2011; Glenberg & Gallese, 2012; Stanfield & Zwaan, 2001), as the grounded theory proposed (see Barsalou, 2008 for a review).”

(6) More generally, I believe that the impact and implications of this study would be clearer for the reader if the authors could properly entertain an alternative concerning how objects may be represented. Of course, the authors were going to demonstrate that objects more similar in size afforded more similar actions. It was impossible that Ps would ever have responded that aeroplanes afford grasping and balls afford sitting, for instance. What do the authors now believe about object representation that they did not believe before they conducted the study? Which accounts of object representation are now less likely?

R6: We thank the reviewer for this suggestion. The theoretical motivation of the present study is to explore whether, for continuous action-related physical features (such as object size relative to the agents), affordance perception introduces discontinuity and qualitative dissociation, i.e., to allow the sensorimotor input to be assigned into discrete states/kinds, as representations envisioned by the computationalists; alternatively, whether the activity may directly mirror the input, free from discretization/categorization/abstraction, as proposed by the Replacement proposal of some embodied theories on cognition.

By addressing this debate, we hoped to shed light on the nature of representation in, and resulted from, the vision-for-action processing. Our finding of affordance discontinuity suggests that sensorimotor input undergoes discretization implied in the computationalism idea of representation. Further, not contradictory to the claims of the embodied theories, these representations do shape processes out of the sensorimotor domain, but after discretization.

We have now explained our hypotheses and alternatives explicitly in the revised introduction and discussion:

In the introduction (p 2, ln 45) “However, the question of how object perception is influenced by the relative size of objects in relation to the human body remains open. Specifically, it is unclear whether this relative size simply acts as a continuous variable for locomotion reference, or if it affects differentiating and organizing object representation based on their ensued affordances.”

In the discussion (p 14, ln 295) “One long-lasting debate on affordance centers on the distinction between representational and direct perception of affordance. An outstanding theme shared by many embodied theories of cognition is the replacement hypothesis (e.g., Van Gelder, 1998), which challenges the necessity of representation as posited by computationalism’s cognitive theories (e.g., Fodor, 1975). This hypothesis suggests that input is discretized/categorized and subjected to abstraction or symbolization, creating discrete stand-ins for the input (e.g., representations/states). Such representationalization would lead to a categorization between the affordable (the objects) and those beyond affordance (the environment), in contrast to the perspective offered by embodied theories. The present study probed this ‘representationalization’ of affordance by examining whether affordance perception introduces discontinuity and qualitative dissociation in response to continuous action-related physical features (such as object size relative to the agents), which allows sensorimotor input to be assigned into discrete states/kinds, in line with the representation-based view under the constraints of body size. Alternatively, it assessed whether activity directly mirrors the input, free from discretization/categorization/abstraction, in line with the representation-free view.

First, our study found evidence demonstrating discretization in affordance perception. Then, through the body imagination experiment, we provided causal evidence suggesting that this discretization originates from sensorimotor interactions with objects rather than amodal sources, such as abstract object concepts independent of agent motor capability. Finally, we demonstrated the supramodality of this embodied discontinuity by leveraging the recent advances in AI. We showed that the discretization in affordance perception is supramodally accessible to disembodied agents such as large language models (LLMs), which lack sensorimotor input but can access linguistic materials built upon discretized representations. These results collectively suggest that sensorimotor input undergoes discretization, as implied in the computationalism’s idea of representation. Note that, these results are not contradictory to the claim of the embodied theories, as these representations do shape processes beyond the sensorimotor domain but after discretization.

This observed boundary in affordance perception extends the understanding of the discontinuity in perception in response to the continuity of physical inputs (Harnad, 1987; Young et al., 1997).”

Reviewer #1 (Recommendations For The Authors):

a) I would recommend providing further justification for why 100-200 cm were used as the cut-offs reflecting acceptable imagined body size. Were these decisions preregistered anywhere? If so, please state.

Ra: Please see R1.

b) I would encourage the authors to call the MRI a small pilot study throughout, including in the abstract.

Rb: We completely agree and have indicated the preliminary nature of this study in the revised version:

(p 11, ln 236) “To test this speculation, we ran an fMRI experiment with a small number of participants to preliminarily investigate the neural basis of the affordance boundary in the brain by measuring neural activity in the dorsal and ventral visual streams when participants were instructed to evaluate whether an action was affordable by an object (Fig. 3a).”

c) Please provide much further justification of ROI selection, why these thresholds were chosen, and therefore why they are different across regions.

Rc: Please see R3.

d) Further elucidation in the discussion would help the reader interpret the MRI data, which should always be interpreted also in light of the behavioural findings.

Rd: Please see R4.

e) The authors may wish to outline precisely what they claim concerning the nature of conceptual/linguistic representation. Is sensorimotor information privileged or just part of the distributed representation of concepts?

Re: This is a great point. For details of corresponding revision, please see R5.

f) There are some nods to alternative manners in which we plausibly represent objects (e.g. about what the imagination study tells us) but I think this theoretical progression should be more prominent.

Rf: We thank the reviewer for this suggestion. For details of corresponding revision, please see R6.

Reviewer #2 (Public Review):

(1) A limitation of the tests of LLMs may be that it is not always known what kinds of training material was used to build these models, leading to a possible "black box" problem. Further, presuming that those models are largely trained on previous human-written material, it may not necessarily be theoretically telling that the "judgments" of these models about action-object pairs show human-like discontinuities. Indeed, verbal descriptions of actions are very likely to mainly refer to typical human behaviour, and so the finding that these models demonstrate an affordance discontinuity may simply reflect those statistics, rather than evidence that affordance boundaries can arise independently even without "organism-environment interactions" as the authors claim here.

R1: We agree that how LLMs work is a “black box”, and thus it is speculative to assume them to possess any human-like ability, because, as the reviewer pointed out, “these models demonstrate an affordance discontinuity may simply reflect those statistics.” Indeed, our manuscript has expressed a similar idea: “We speculated that ChatGPT models may have formed the affordance boundary through a human prism ingrained within its linguistic training corpus. (p 16 ln 338)”. That is, we did not intend to claim that such information is sufficient to replace sensorimotor-based interaction, or to restore human-level capability, for which we indeed speculated that embodied interaction is necessary. In the revised manuscript, we have clarified our stand that the mechanism generating the observed affordance boundary in LLMs might be different from that in human cognition, and urged future studies to explore this possibility:

(p 18, ln 413) “…, as well as alignment methods used in fine-tuning the model (Ouyang et al., 2022). Nevertheless, caution should be taken when interpreting the capabilities of LLMs like ChatGPT, which are often considered “black boxes.” That is, our observation indicates that some degree of sensorimotor information is embedded within human language materials presumably through linguistic statistics, but it is not sufficient to assert that LLMs have developed a human-like ability to represent affordances. Furthermore, such information alone may be insufficient for LLMs to mimic the characteristics of the affordance perception in biological intelligence. Future studies are needed to elucidate such limitation.”

Indeed, because of this potential dissociation, our LLM study might bear novel implications for the development of AI agents. We elaborated on them in the revised discussion on LLMs:

(p 19, ln 427) “…, represents a crucial human cognitive achievement that remains elusive for AI systems. Traditional AI (i.e., task-specific AI) has been confined with narrowly defined tasks, with substantial limitations in adaptability and autonomy. Accordingly, these systems have served primarily as tools for humans to achieve specific outcomes, rather than as autonomous agents capable of independently formulating goals and translating them into actionable plans. In recent years, significant efforts have been directed towards evolving traditional AI into more agent-like entities, especially in domains like navigation, object manipulation, and other interactions with the physical world. Despite these advancements, the capabilities of AI still fall behind human-level intelligence. On the other hand, embodied cognition theories suggest that sensorimotor interactions with the environment are foundational for various cognitive domains. From this point of view, endowing AI with human-level abilities in physical agent-environment interactions might provide an unreplaceable missing piece for achieving Artificial General Intelligence (AGI). This development would significantly facilitate AI’s role in robotics, particularly in actions essential for survival and goal accomplishment, a promising direction for the next breakthrough in AI (Gupta et al., 2021; Smith & Gasser, 2005).

However, equipping a disembodied AI with the ability for embodied interaction planning within a specific environment remains a complex challenge. By testing the potential representationalization of action possibilities (affordances) in both humans and LLMs, the present study suggests a new approach to enhancing AI’s interaction ability with the environment. For instance, our finding of supramodal affordance representation may indicate a possible pathway for disembodied LLMs to engage in embodied physical interactions with their surroundings. From an optimistic view, these results suggest that LLM-based agents, if appropriately designed, may leverage affordance representations embedded in language to interact with the physical world. Indeed, by clarifying and aligning such representations with the physical constitutes of LLM-based agents, and even by explicitly constructing an agent-specific object space, we may foster the sensorimotor interaction abilities of LLM-based agents. This progression could lead to achieving animal-level interaction abilities with the world, potentially sparking new developments in the field of embodied cognition theories.”

(2) The authors include a clever manipulation in which participants are asked to judge action-object pairs, having first adopted the imagined size of either a cat or an elephant, showing that the discontinuity in similarity judgments effectively moved to a new boundary closer to the imagined scale than the veridical human scale. The dynamic nature of the discontinuity suggests a different interpretation of the authors' main findings. It may be that action affordance is not a dimension that stably characterises the long-term representation of object kinds, as suggested by the authors' interpretation of their brain findings, for example. Rather these may be computed more dynamically, "on the fly" in response to direct questions (as here) or perhaps during actual action behaviours with objects in the real world.

R2: We thank the reviewer for pointing out the dynamic nature of affordance perception in our study. This feature indeed reinforced our attribution of the boundary into an affordance-based process instead of a conceptual or semantic process, the latter of which would predict the action possibilities being a fixed belief about the objects, instead of being dynamically determined according to the feature of the agent-object dyads. In addition, this dynamic does not contradict with our interpretation of the observed boundary in affordance perception. With this observation, we speculated that continuous input was abstracted or representationalized into discontinued categories, and the boundary between these categories was drawn according to the motor capacity of the agent. The finding of the boundary adapting to manipulation on body schema suggests that the abstraction/representationalization dynamically updates according to the current belief of motor capacity and body schema of the animal. In addition, we agree that future studies are needed to examine the dynamics of the abstraction/representationalization of affordance, probably by investigating the evolvement of affordance representation during ongoing actual interactions with novel objects or manipulated motor capability. These points are now addressed in the revision:

(p 17, ln 380) “Therefore, this finding suggests that the affordance boundary is cognitively penetrable, arguing against the directness of affordance perception (e.g., Gibson, 1979; Greeno, 1994; Prindle et al., 1980) or the exclusive sensorimotor origin of affordances (e.g., Gallagher, 2017; Thompson, 2010; Hutto & Myin, 2012; Chemero, 2013). Further, this finding that the boundary adapted to manipulation on body schema suggests that the abstraction/representationalization may be dynamically updated in response to the current motor capacity and body schema of the agent, suggesting that the affordance-based process is probably determined dynamically by the nature of the agent-object dyads, rather than being a fixed belief about objects. Future studies could explore the dynamics of affordance representationalization, probably by investigating how affordance representations evolve during active interactions with novel objects or under conditions of altered motor capabilities. Finally, our findings also suggest that disembodied conceptual knowledge pertinent to action likely modulates affordance perception.”

Reviewer #2 (Recommendations For The Authors):

a) As described, I think the authors could improve their discussion of the LLM work and consider more deeply possible different interpretations of their findings with those models. Are they really providing an independent data point about how objects may be represented, or instead is this a different, indirect way of asking humans the same questions (given the way in which these models are trained)?

Ra: Please see R1.

b) Some of the decisions behind the design of the fMRI experiment, and some of the logic of its interpretation, could be made clearer. Why those four objects per se? What kinds of confounds, such as familiarity, or the range of possible relevant actions per object, might need to be considered? Is there the possibility that relative performance on the in-scanner behavioural task may be in part responsible for the findings? Why were those specific regions of interest chosen and not others? The authors find that the dorsal and ventral regions make a univariate distinction between congruent and incongruent trials, but only for human-scale objects, but it was not clear from the framework that the authors adopted why that distinction should go in that direction (e.g. congruent > incongruent) nor why there shouldn't also be a distinction for the "beyond" objects? Finally, might some of these brain questions better be approached with an RSA or similar approach, as that would seem to better map onto the behavioural studies?

Rb: We thank the reviewer for the detailed suggestions.

Regarding the fMRI study, we have provided further justification on its rationale in the revised manuscript:

(p 11, ln 231) “The distinct categories of reported affordances demarcated by the boundary imply that the objects on either side of the boundary may be represented differently in the brain. We thus speculated that the observed behavioral discontinuity is likely underpinned by distinct neural activities, which give rise to these discrete ‘representations’ separated by the boundary.”

The objects used in the fMRI study were selected by taking into account the objective of the fMRI study, which was to provide the neural basis for the affordance discontinuity found in behaviour experiments. In other words, the fMRI study is not an exploratory experiment, but a validation experiment. To this end, we deliberately selected a small range of common objects to ensure that participants were sufficiently familiar with them, as confirmed through their oral reports. Furthermore, to ensure a fair comparison between the two categories of objects in terms of action possibility range, we predetermined an equal number of congruent and incongruent actions for each category. This arrangement was intended to eliminate any bias that might arise from different amount of action choices associated with each category. Therefore, the present object and action sets in the fMRI study, which were based on the behavior experiments, are sufficient for its purpose.

Regarding the possibility that the performance of the in-scanner behavioural task may be in part responsible for the findings, we analysed participants’ performance. Not surprisingly, participants demonstrated high consistency and accuracy in their responses:

MeanCongruent_ObjectWithin = 0.991, SD = 0.018;

MeanIncongruent_ObjectWithin = 0.996, SD = 0.007;

MeanCongruent_ObjectBeyond = 0.996, SD = 0.004;

MeanIncongruent_ObjectBeyond = 0.998, SD = 0.002

in all conditions, suggesting constant active engagement with the task. Thus, the inscanner behaviour unlikely resulted in the lack of congruency effect for the ‘beyond’ objects observed in the brain.

Regarding the selection of ROIs, our decision to focus on these specific sensory and motor regions was based on existing literature highlighting their distinct contribution to affordance perception (Borghi, 2005; Sakreida et al., 2016). The pFs was chosen for its role in object identification and classification, while the SPL was chosen for its involvement in object manipulation. Additionally, the primary motor cortex (M1) is known to be engaged in affordance processing (e.g., McDannald et al., 2018), which was included to investigate the affordance congruency effect during the motor execution stage of the sense-think-act pathway. These considerations are detailed in the revised manuscript:

(p 14, ln 280) “In addition to the pFs and SPL, we also examined the congruency effect in the lateral occipital cortex (LO), which is involved in object representation (e.g., Grill-Spector et al., 2000; Konkle & Caramazza, 2013) and provides inputs to both the pFs and SPL (Hebart et al., 2018). Meanwhile, the primary motor cortex (M1), which receives inputs from the dorsal stream (Vainio & Ellis, 2020), is involved in affordance processing (e.g., McDannald et al., 2018) and action executions(Binkofski et al., 2002).”

(p 29, ln 684) “We chose the pFs, LO, SPL, and M1 as ROIs based on existing literature highlighting their distinct contributions to affordance perception (Borghi,2005; Sakreida et al., 2016).”

Regarding the congruency effect, in our study, we followed the established fMRI research paradigm of employing the congruent effect as a measure of affordance processing (e.g., Kourtis et al., 2018), and the rationale behind the directionality of the distinction in our framework (congruent > incongruent) is grounded in the concept of affordance, in which the mere perception of a graspable object facilitates motor responses that are congruent with certain qualities of the object (e.g., Ellis & Tucker, 2000). From the interaction of congruency by object type, we observed only congruency effect for objects within rather than objects beyond. We speculate that the objects beyond the affordance boundary is generally beyond the motor capacities of the very animal, being too large for the animal to manipulate, thus no congruency effect was found. We have added these clarifications in the revised manuscript:

(p 11, ln 244) “The congruency effect, derived from the contrast of Congruent versus Incongruent conditions, is a well-established measure of affordance processing(e.g., Kourtis et al., 2018).”

(p 16, ln 340) “In contrast, objects larger than that range typically surpass the animal’s motor capabilities, rendering them too cumbersome for effective manipulation. Consequently, these larger objects are less likely to be considered as typical targets for manipulation by the animal, as opposed to the smaller objects. That is, they are perceived not as the “objects” in the animal’s eye, but as part of the background environment, due to their impracticality for direct interactions.”

Regarding the RSA analysis, we agree with the reviewer that RSA may offer a more direct comparison with similarities among objects. However, our primary objective in this fMRI study was to explore the neural basis of the affordance boundary observed in the behavioural study, rather than explaining the similarities in neural responses between different objects. For this reason, we did not conduct RSA analysis.

c) Page 4 Re statistical evaluation of the discontinuity in judgments, the authors might consider a Bayesian approach, which would be stronger than using "all ps > 0.05" to argue that within-boundary similarities are consistent and high.

Rc: We thank the reviewer for the suggestion on the Bayesian approach for significance tests, which has been now added in the revised manuscript:

In the results (p 4, ln 105) “This trough suggested an affordance boundary between size rank 4 and 5, while affordance similarities between neighboring ranks remained high (rs > 0.45) and did not significantly differ from each other (ps > 0.05, all BFbb10 < 10) on either side of the boundary (Fig. 1d, left panel, green lines).”

In the methods (p 25, ln 597) “Pearson and Filon’s (1898) Z, implemented in R package “cocor” (Diedenhofen & Musch, 2015) was used to evaluate the significance of these similarities (alpha level = .05, one-tail test). For significance tests, Bayesian statistical analyses were conducted using the web version of the “bayesplay” R package (Colling, 2021). Specifically, the data (likelihood) model was specified as a normal distribution, where the correlation coefficients were transformed to Fisher’s z. The null hypothesis was specified as a standard normal distribution centred at zero. Conversely, the alternative hypothesis was specified as a normal distribution centred at 2. Bayes factors (BF10) were calculated and interpreted using the classification scheme suggested by Wagenmakers et al. (2011), wherein a Bayes factor greater than 10 is considered strong evidence for accepting H1 over H0.”

d) Page 4 One question I had about the big objects is whether their internal similarity and dissimilarity to smaller objects, might largely arise if most of the answers about actions for those larger objects are just "no"? This depends on the set of possible actions that were considered: the authors chose 14 from a previous study but did not describe these further or consider possible strengths/limitations of this selection. This is a very important point that needs addressing - to what extent are these findings "fragile" in that they relate only to that specific selection of 14 action kinds?

Rd: The action judgements for objects beyond body size were not mostly “no”; in fact, there was no significant difference between average action possibilities related to objects beyond (25%) and within (26%). Rather, the dissimilarity between objects within and those beyond likely arose from the difference in most-plausible action set they related. For example, the top three actions related to objects within are “grasp”, “hold” and “throw”, while those related to objects beyond are “sit”, “lift” and “stand”, as stated in our original manuscript: “A further analysis on the affordances separated by the boundary revealed that objects within human body size range were primarily subjected to hand-related actions such as grasping, holding and throwing. These affordances typically involve object manipulation with humans’ effectors. In contrast, objects beyond the size range of human body predominantly afforded actions such as sitting and standing, which typically require locomotion or posture change of the whole body around or within the objects (p 11 ln 229)”.

Regarding the validity of action selection, the selection of the objects and affordances in this study was guided by two key criteria. First, the objects were selected from the dataset published in Konkle and Oliva's study (2011), which systematically investigates the effect of object size on object recognition. Therefore, the range of object sizes, from 14 cm to 7,618 cm, is well-calibrated and represents a typical array of object sizes found in the real world. Second, the actions were selected to cover a wide range of daily humans-objects/environments interactions, from singlepoint movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing), based on the kinetics human action video dataset (Kay et al., 2017). Thus, this set of objects and actions is a sufficiently representative of typic human experiences. In revision, we have clarified these two criteria in the methods section:

(p 22, ln 517) “The full list of objects, their diagonal size, and size rankings were provided in Supplementary Table S6. The objects were selected from the dataset in Konkle and Oliva’s study (2011) to cover typic object sizes in the world (ranging from 14 cm to 7,618 cm), and actions related to these objects were selected to span a spectrum of daily humans-objects/environments interactions, from single-point movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing), based on the Kinetics Human Action Video Dataset (Kay et al., 2017).”

Having said this, we agree with reviewer that a larger set of objects and actions will facilitate finer localization of the representational discontinuity, which can be addressed in future studies

(p 16, ln 344): “…, due to their impracticality for direct interactions. Future studies should incorporate a broader range of objects and a more comprehensive set of affordances for finer delineation of the representational discontinuity between objects and the environment.”

e) Page 12 "no region showed the congruency effect for objects beyond the body size" in a whole brain analysis. What about a similar analysis for the humanscale objects? We must also keep in mind that with N=12 there may be relatively little power to detect such effects at the random-effects level, so this null finding may not be very informative.

Re: We thank the reviewer for this advice. The whole brain analysis on the congruency effect for human-scale objects (objects within) has now been included in the supplementary materials (please see Author response figure 1d (New Supplementary Fig. S4d) and Author response table 1 (New Supplementary Table S5) below)Author response table 1.

Author response table 1. Cortical regions showing significant congruency effect (congruent versus incongruent) for objects within, whole-brain analysis (R = right hemisphere, L = left hemisphere; Z > 2.3, p = 0.05, cluster corrected).

Cluster Region Number ofvoxels in region MNI coordinates Peak Z value
x y z
(OW_Congruent – OW_Incongruent) > (OB_Congruent – OB_Incongruent)
1 L Middle Occipital Gyrus 831 22 –94 10 4.25
R Middle Occipital Gyrus 187
L Fusiform Gyrus 113
R Fusiform Gyrus 376
L Inferior Occipital Gyrus 293
R Inferior Occipital Gyrus 276
L Lingual Gyrus 215
R Lingual Gyrus 345
L Superior Occipital Gyrus 123
2 R Supplementary Motor Area 383 14 14 60 3.39
3 R Superior Parietal Lobule 191 36 –62 56 3.18
R Inferior Parietal Lobule 114
4 R Insula 175 32 18 8 3.41
(OB_Congruent – OB_Incongruent) > (OW_Congruent – OW_Incongruent)
- No significant cluster - - - - -

Author response image 1. Significant brain activations of different contrasts in the whole-brain level analysis.

Author response image 1.

(a) the effect of object type, positive values (warm color) indicated higher activation for objects within than objects beyond and negative values (cold color) indicated the opposite. (b) the effect of congruency, positive values indicated higher activation in congruent than incongruent condition. (c) the effect of interaction between object type and congruency, positive values indicated the larger congruency effect for objects within than beyond. (d) the congruency effect for objects within. All contrasts were corrected with cluster-level correction at p < .05. The detailed cluster-level results for each contrast map can be found in Supplementary Table S2 to S5.

Regarding the power of the fMRI study, we would like to clarify that, the critical test of this fMRI study is the two-way interaction of congruency effect by object size instead of the (null) congruency effect for the object beyond. Having said this, we agree that the sample size is small which might lead to lack of power in the fMRI study. In the revision we have now acknowledged this issue explicitly:

(p 16, ln 354) “…supporting the idea that affordance is typically represented only for objects within the body size range. While it is acknowledged that the sample size of the fMRI study was small (12 participants), necessitating cautious interpretation of its results, the observed neural-level affordance discontinuity is notable. That is, qualitative differences in neural activity between objects within the affordance boundary and those beyond replicated our behavior findings. This convergent evidence reinforced our claim that objects were discretized into two broad categories along the continuous size axis, with affordance only being manifested for objects within the boundary.”

f) Page 14 [the fMRI findings] "suggest that affordance perception likely requires perceptual processing and is not necessarily reflected in motor execution". This seems a large leap to make from a relatively basic experiment that tests only a small set of (arbitrarily chosen) objects and actions. It's important to keep in mind too that none of the studies here actually asked participants to interact with objects; that objects were shown as 2D images; and that the differences between real-world sizes of objects were greatly condensed by the way they are scaled for presentation on a computer screen (and such scaling is probably greater for the larger-than-human objects).

Rf: The action-congruency judgement task is widely used in the studies of affordance processing (e.g., Kourtis et al., 2018; Peelen & Caramazza, 2012), so does the practice of not including actual interaction with the objects and using 2D instead of 3D objects(e.g., Peelen & Caramazza, 2012; Matić et al., 2020). However, we are aware that alternative practice exists in the field and we agree that it would be interesting for future studies to test whether actual interactions and 3D objects presentation may bring any change on the affordance boundary observed in our study.

Our inference “affordance perception likely requires perceptual processing and is not necessarily reflected in motor execution” was based on the fMRI finding that the congruency effect only in cortical regions proposedly engaged in perceptual processing, but not in the M1 which is associated with motor execution. This significant two-way interaction pointed to a possibility that affordance processing may not necessarily manifest in motor execution.

We acknowledge the scaling issue inherent in all laboratory experiments, but we doubt that it significantly influenced our results. In fact, it is a common practice in studies on object size to present objects of different physical sizes as constantly sized images on a screen (e.g., Konkle & Oliva, 2012; Huang et al., 2022). Moreover, scaling does not change the smoothness of object sizes, whereas the affordance boundary represents a singularity point that disrupts this smoothness. Finally, regarding the limited variety of objects and actions, please see Rd.

g) Page 15 Why are larger objects "less interesting"? They have important implications for navigation, for example?

Rg: We are sorry for the confusion. Our intention was to express that objects beyond the affordance boundary are generally beyond motor capacities of the animal in question. As such, compared to smaller objects within the environment, these larger objects may not typically be considered as potential targets for manipulation. We have now corrected the wording in the revised text:

(p 16, ln 340) “In contrast, objects larger than that range typically surpass the animal’s motor capabilities, rendering them too cumbersome for effective manipulation. Consequently, these larger objects are less likely to be considered as typical targets for manipulation by the animal, as opposed to smaller objects in the environment. That is, they are perceived not as the “objects” in the animal’s eye, but as part of the background environment, due to their impracticality for direct interactions.”

h) Page 15 At several places I wondered whether the authors were arguing against a straw man. E.g. "existing psychological studies...define objects in a disembodied manner..." but no citations are given on this point, nor do the authors describe previous theoretical positions that would make a strong counter-claim to the one advocated here.

Rh: We are sorry for not presenting our argument clearly. Previous studies often define the object space based on object features alone, such as absolute size or function, without reference to the knowledge and the abilities of the agent (e.g., de Beeck et al., 2008; Konkle & Oliva, 2011). This perspective overlooks the importance of the features of the animal-object pairs. Gibson (1979) highlighted that an object’s affordance, which includes all action possibilities it offers to an animal, is determined by the object’s size relative to the animal’s size, rather than its real-world size. Under this embodied view, we argue that the object space is better defined by the features of the agent-object system, and this is the primary assumption and motivation of the present study. We have now clarified this point and added the references in the revision:

(p 2, ln 35) “A contemporary interpretation of this statement is the embodied theory of cognition (e.g., Chemero, 2013; Gallagher, 2017; Gibbs, 2005; Wilson, 2002; Varela et al., 2017), which, diverging from the belief that size and shape are inherent object features (e.g., de Beeck et al., 2008; Konkle & Oliva, 2011), posits that human body scale (e.g., size) constrains the perception of objects and the generation of motor responses.”

(p 17, ln 365) “Existing psychological studies, especially in the field of vision, define objects in a disembodied manner, primarily relying on their physical properties such as shape (e.g., de Beeck et al., 2008) and absolute size (e.g., Konkle & Oliva, 2011).”

Reviewer #3 (Public Review):

(1) Even after several readings, it is not entirely clear to me what the authors are proposing and to what extent the conducted work actually speaks to this. In the introduction, the authors write that they seek to test if body size serves not merely as a reference for object manipulation but also "plays a pivotal role in shaping the representation of objects." This motivation seems rather vague motivation and it is not clear to me how it could be falsified.

Similarly, in the discussion, the authors write that large objects do not receive "proper affordance representation," and are "not the range of objects with which the animal is intrinsically inclined to interact, but probably considered a less interesting component of the environment." This statement seems similarly vague and completely beyond the collected data, which did not assess object discriminability or motivational values.

Overall, the lack of theoretical precision makes it difficult to judge the appropriateness of the approaches and the persuasiveness of the obtained results. This is partly due to the fact that the authors do not spell out all of their theoretical assumptions in the introduction but insert new "speculations" to motivate the corresponding parts of the results section. I would strongly suggest clarifying the theoretical rationale and explaining in more detail how the chosen experiments allow them to test falsifiable predictions.

R1: We are sorry for the confusion about the theoretical motivation and rationale. Our motivation is on the long-lasting debate regarding the representation versus direct perception of affordance. That is, we tested whether object affordance would simply covary with its continuous constraints such as object size, in line with the representation-free view, or, whether affordance would be ‘representationalized’, in line with the representation-based view, under the constrain of body size. In revision, we have clarified the motivation and its relation to our approach:

In the introduction (p 2, ln 45): “However, the question of how object perception is influenced by the relative size of objects in relation to the human body remains open. Specifically, it is unclear whether this relative size simply acts as a continuous variable for locomotion reference, or if it affects differentiating and organizing object representations based on their ensued affordances.”

In the discussion (p 14, ln 295): “One long-lasting debate on affordance centers on the distinction between representational and direct perception of affordance. An outstanding theme shared by many embodied theories of cognition is the replacement hypothesis (e.g., Van Gelder, 1998), which challenges the necessity of representation as posited by computationalism’s cognitive theories (e.g., Fodor, 1975). This hypothesis suggests that input is discretized/categorized and subjected to abstraction or symbolization, creating discrete stand-ins for the input (e.g., representations/states). Such representationalization would lead to a categorization between the affordable (the objects) and those beyond affordance (the environment). Accordingly, computational theories propose the emergence of affordance perception, in contrast to the perspective offered by embodied theories. The present study probed this ‘representationalization’ of affordance by examining whether affordance perception introduces discontinuity and qualitative dissociation in response to continuous action-related physical features (such as object size relative to the agents), which allows sensorimotor input to be assigned into discrete states/kinds, in line with the representation-based view under the constraints of body size. Alternatively, it assessed whether activity directly mirrors the input, free from discretization/categorization/abstraction, in line with the representation-free view.

First, our study found evidence demonstrating discretization in affordance perception. Then, through the body imagination experiment, we provided causal evidence suggesting that this discretization originates from sensorimotor interactions with objects rather than amodal sources, such as abstract object concepts independent of agent motor capability. Finally, we demonstrated the supramodality of this embodied discontinuity by leveraging the recent advances in AI. We showed that the discretization in affordance perception is supramodally accessible to disembodied agents such as large language models (LLMs), which lack sensorimotor input but can access linguistic materials built upon discretized representations. These results collectively suggest that sensorimotor input undergoes discretization, as implied in the computationalism’s idea of representation. Note that, these results are not contradictory to the claim of the embodied theories, as these representations do shape processes beyond the sensorimotor domain but after discretization.

The observed boundary in affordance perception extends the understanding of the discontinuity in perception in response to the continuity of physical inputs (Harnad, 1987; Young et al., 1997).”

We are also sorry for the confusion about the expression “proper affordance representation”. We intended to express that the neural responses to objects beyond the boundary in the whole brain failed to reflect affordance congruency, and therefore did not show evidence of affordance processing. We have clarified this expression in the revised manuscript:

(p 12, ln 265) “Taken together, the affordance boundary not only separated the objects into two categories based on their relative size to human body, but also delineated the range of objects that evoked neural representations associated with affordance processing.”

Finally, we agree with the reviewer that the expressions, such as “not…inclined to interact” and “probably considered a less interesting component of the environment”, may be misleading. Rather, we intended to express that the objects beyond the affordance boundary is generally beyond the motor capacities of the very animal, being too large for the very animal to manipulated, as comparing to the smaller objects in the environment, may not be a typical target object for manipulation for the animal. We have revised these expressions in the manuscript and clarified their speculative nature:

(p 16, ln 340) “In contrast, objects larger than that range typically surpass the animal’s motor capabilities, rendering them too cumbersome for effective manipulation. Consequently, these larger objects are less likely to be considered as typical targets for manipulation by the animal, as opposed to the smaller objects. That is, they are perceived not as the “objects” in the animal’s eye, but as part of the background environment, due to their impracticality for direct interactions.”

(2) The authors used only a very small set of objects and affordances in their study and they do not describe in sufficient detail how these stimuli were selected. This renders the results rather exploratory and clearly limits their potential to discover general principles of human perception. Much larger sets of objects and affordances and explicit data-driven approaches for their selection would provide a far more convincing approach and allow the authors to rule out that their results are just a consequence of the selected set of objects and actions.

R2: The selection of the objects and affordances in this study was guided by two key criteria. First, the objects were selected from the dataset published in Konkle and Oliva's study (2011), which systematically investigates the effect of object size on object recognition. Therefore, the range of object sizes, from 14 cm to 7,618 cm, is well-calibrated and represents a typical array of object sizes found in the real world. Second, the actions were selected to cover a wide range of daily humans objects/environments interactions, from single-point movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing), based on the kinetics human action video dataset (Kay et al., 2017). Thus, this set of objects and actions is a sufficiently representative of typic human experiences. In revision, we have clarified these two criteria in the methods section:

(p 22, ln 517) “The full list of objects, their diagonal sizes, and size rankings were provided in Supplementary Table S6. The objects were selected from the dataset in Konkle and Oliva’s study (2011) to cover typic object sizes in the world (ranging from 14 cm to 7,618 cm), and actions related to these objects were selected to span a spectrum of daily humans-objects/environments interactions, from single-point movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing), based on the Kinetics Human Action Video Dataset (Kay et al.,2017).”

Having said this, we agree with reviewer that a larger set of objects and actions will facilitate finer localization of the representational discontinuity, which can be addressed in future studies

(p 16, ln 344): “…, due to their impracticality for direct interactions. Future studies should incorporate a broader range of objects and a more comprehensive set of affordances for finer delineation of the representational discontinuity between objects and the environment.”

(3) Relatedly, the authors could be more thorough in ruling out potential alternative explanations. Object size likely correlates with other variables that could shape human similarity judgments and the estimated boundary is quite broad (depending on the method, either between 80 and 150 cm or between 105 to 130 cm). More precise estimates of the boundary and more rigorous tests of alternative explanations would add a lot to strengthen the authors' interpretation.

R3: We agree with the reviewer that correlation analyses alone cannot rule out alternative explanations, as any variable co-varying with object sizes might also affect affordance perception. Therefore, our study experimentally manipulated the imagined body sizes, while keeping other variable constant across conditions. This approach provided evidence of a causal connection between body size and affordance perception, effectively ruling out alternative explanations. In revision, the rationale of experimentally manipulation of imagined body sizes has been clarified

(p 7, ln 152): “One may argue that the location of the affordance boundary coincidentally fell within the range of human body size, rather than being directly influenced by it. To rule out this possibility, we directly manipulated participants’ body schema, referring to an experiential and dynamic functioning of the living body within its environment (Merleau-Ponty & Smith, 1962). This allowed us to examine whether the affordance boundary would shift in response to changes in the imagined body size. This experimental approach was able to establish a causal link between body size and affordance boundary, as other potential factors remained constant. Specifically, we instructed a new group of participants to imagine themselves as small as a cat (typical diagonal size: 77cm, size rank 4, referred to as the “cat condition”), and another new group to envision themselves as large as an elephant (typical diagonal size: 577 cm, size rank 7, referred to as the “elephant condition”) throughout the task (Fig. 2a).”

Meanwhile, with correlational analysis, precise location of the boundary cannot help ruling out alternative explanation. However, we agree that future studies are needed to incorporate a broader range of objects and a more comprehensive set of affordances. For details, please see R2.

(4) Even though the division of the set of objects into two homogenous clusters appears defensible, based on visual inspection of the results, the authors should consider using more formal analysis to justify their interpretation of the data. A variety of metrics exist for cluster analysis (e.g., variation of information, silhouette values) and solutions are typically justified by convergent evidence across different metrics. I would recommend the authors consider using a more formal approach to their cluster definition using some of those metrics.

R4: We thank the reviewer for the suggestion. We performed three analyses on this point, all of which consistently indicated the division of objects into two distinct groups along the object size axis.

First, a hierarchical clustering analysis of the heatmaps revealed a two-maincluster structure, which is now detailed in the revised methods section (p 25, ln 589) “A hierarchical clustering analysis was performed, employing the seaborn clustermap method with Euclidean distance and Complete linkage (Waskom, 2021).”

Second, the similarity in affordances between neighbouring size ranks revealed the same two-main-cluster structure. In this analysis, each object was assigned a realworld size rank, and then Pearson’s correlation was calculated as the affordance similarity index for each pair of neighbouring size ranks to assess how similar the perceived affordances were between these ranks. Our results showed a clear trough in affordance similarity, with the lowest point approaching zero, while affordance similarities between neighbouring ranks on either side of the boundary remained high, confirming the observation that objects formed two groups based on affordance similarity.

Finally, we analysed silhouette values for this clustering analysis, where ai represents the mean intra-cluster distance, and bi represents the mean nearest-cluster distance for each data point i. The silhouette coefficient is calculated as (Rousseeuw, 1987):

Si=biaimax(bi,ai)

The silhouette analysis revealed that the maximum silhouette value coefficient corresponded to a cluster number of two, further confirming the two-cluster structure (please see Author response table 2 below).

Author response table 2. The silhouette values of a k-means clustering when k (number of clusters) = 2 to 10.

Number of clusters 2 3 4 5 6 7 8 9 10
Average silhouette values 0.362 0.275 0.299 0.194 0.171 0.263 0.176 0.164 0.137

(5) While I appreciate the manipulation of imagined body size, as a way to solidify the link between body size and affordance perception, I find it unfortunate that this is implemented in a between-subjects design, as this clearly leaves open the possibility of pre-existing differences between groups. I certainly disagree with the authors' statement that their findings suggest "a causal link between body size and affordance perception."

R5: The between-subjects design in the imagination experiment was employed to prevent contamination between conditions. Specifically, after imagining oneself as a particular size, it can be challenging to immediately transition to envisioning a different body size. In addition, participating sequentially participate in two conditions that only differ in imagined body sizes may lead to undesirable response strategies, such as deliberately altering responses to the same objects in the different conditions. The reason of employing the between-subjects design is now clarified in the revised text (p 7, ln 161): “A between-subject design was adopted to minimize contamination between conditions. This manipulation was effective, as evidenced by the participants’ reported imagined heights in the cat condition being 42 cm (SD = 25.6) and 450 cm (SD = 426.8) in the elephant condition on average, respectively, when debriefed at the end of the task.”

Further, to address the concern that “pre-existing differences between groups” would generate this very result, we adhered to standard protocols such as random assignment of participants to different conditions (cat-size versus elephant-size). Moreover, experimentally manipulating one variable (i.e., body schema) to observe its effect on another variable (i.e., affordance boundary) is the standard method for establishing causal relationships between variables. We could not think of other better ways for this objective.

(6) The use of LLMs in the current study is not clearly motivated and I find it hard to understand what exactly the authors are trying to test through their inclusion. As noted above, I think that the authors should discuss the putative roles of conceptual knowledge, language, and sensorimotor experience already in the introduction to avoid ambiguity about the derived predictions and the chosen methodology. As it currently stands, I find it hard to discern how the presence of perceptual boundaries in LLMs could constitute evidence for affordance-based perception.

R6: The motivation of LLMs is to test the supramodality of this embodied discontinuity found in behavioral experiments: whether this discontinuity is accessible beyond the sensorimotor domain. To do this, we leveraged the recent advance in AI and tested whether the discretization observed in affordance perception is supramodally accessible to disembodied agents which lack access to sensorimotor input but only have access to the linguistic materials built upon discretized representations, such as large language models (LLM). The theoretical motivation and rationale regarding the LLM study are now included in the introduction and discussion:

In the introduction (p 2, ln 59) “…, and the body may serve as a metric that facilitates meaningful engagement with the environment by differentiating objects that are accessible for interactions from those not. Further, grounded cognition theory (see Barsalou, 2008 for a review) suggests that the outputs of such differentiation might transcend sensorimotor processes and integrate into supramodal concepts and language. From this perspective, we proposed two hypotheses...”

In the introduction (p 3, ln 70) “Notably, the affordance boundary varied in response to the imagined body sizes and showed supramodality. It could also be attained solely through language, as evidenced by the large language model (LLM), ChatGPT (OpenAI, 2023).”

For details in the discussion, please see R1.

(7) Along the same lines, the fMRI study also provides very limited evidence to support the authors' claims. The use of congruency effects as a way of probing affordance perception is not well motivated. What exactly can we infer from the fact a region may be more active when an object is paired with an activity that the object doesn't afford? The claim that "only the affordances of objects within the range of body size were represented in the brain" certainly seems far beyond the data.

R7: In our study, we followed the established fMRI research paradigm of employing the congruent effect as a measure of affordance processing (e.g., Kourtis et al., 2018).The choice of this paradigm has now been clarified in the revised manuscript (p 11, ln 244): “The congruency effect, derived from the contrast of Congruent versus Incongruent conditions, is a well-established measure of affordance processing (e.g., Kourtis et al., 2018).”

The statement that “only the affordances of objects within the range of body size were represented in the brain” is based on the observed interaction of congruency by object size. In the revised text, we have weakened this statement to better align with the direct implications of the interaction effect (p 1 ln 22): “A subsequent fMRI experiment revealed evidence of affordance processing exclusively for objects within the body size range, but not for those beyond. This suggests that only objects capable of being manipulated are the objects capable of offering affordance in the eyes of an organism.”

(8) Importantly (related to my comments under (2) above), the very small set of objects and affordances in this experiment heavily complicates any conclusions about object size being the crucial variable determining the occurrence of congruency effects.

R8: The objective of the fMRI study was to provide the neural basis for the affordance discontinuity found in behaviour experiments. In other words, the fMRI study is not an exploratory experiment, and therefore, the present object and action sets, which are based on the behaviour experiments, are sufficient.

(9) I would also suggest providing a more comprehensive illustration of the results (including the effects of CONGRUENCY, OBJECT SIZE, and their interaction at the whole-brain level).

R9: We agree and in revision, we have now included these analyses in the supplementary material (p 30, ln 711): “For the whole-brain analyses on the congruency effect, the object size effect, and their interaction, see Supplementary Fig. S4 and Table S2 to S5.” Please see Author response image 2 (New Supplementary Fig. S4) and Author response tables 3–5 (New Supplementary Table S2 to S4) below.

Author response table 3. Cortical regions reaching significance in the contrasts of (A) objects within versus object beyond and (B) objects beyond versus objects within, whole-brain analysis (R = right hemisphere, L = left hemisphere; Z > 2.3, p = 0.05, cluster corrected).

Cluster Region Number ofvoxels in region MNI coordinates Peak Z value
x y z
Objects within >Objects beyond
1 L Cuneus 363 10 –90 20 4.24
R Cuneus 309
L Lingual Gyrus 176
R Lingual Gyrus 293
Objects beyond >Objects within
1 L Middle Occipital Gyrus 1831 –16 –92 -8 5.72
L Fusiform Gyrus 994
L Superior Parietal Lobule 584
L Lingual Gyrus 504
L Inferior Occipital Gyrus 304
L Superior Occipital Gyrus 211
L Parahippocampal Gyrus 210
L Precuneus 205
2 R Middle Occipital Gyrus 1358 20 –86 -2 5.4
R Lingual Gyrus 340
R Superior Parietal Lobule 328
R Superior Occipital Gyrus 320
R Inferior Occipital Gyrus 276
3 R Fusiform Gyrus 483 34 –38 –16 4.73
R Parahippocampal Gyrus 316

Author response table 4. Cortical regions reaching significance in contrasts of (A) congruent versus incongruent and (B) incongruent versus congruent, whole-brain analysis (R = right hemisphere, L = left hemisphere; Z > 2.3, p = 0.05, cluster corrected).

Cluster Region Number ofvoxels in region MNI coordinates Peak Z value
x y z
Congruent >Incongruent
1 L Inferior Parietal Lobule 322 –44 –50 64 3.47
2 R Superior Parietal Lobule 339 36 –66 52 3.31
R Inferior Parietal Lobule 165
Incongruent >Congruent
- No significant cluster - - - - -

Author response table 5. Review Table 5 (New Supplementary Table S4).Cortical regions showing significant interaction between object type and congruency, whole-brain analysis (OW = Objects within, OB = Objects beyond; R = right hemisphere, L = left hemisphere; Z > 2.3, p = 0.05, cluster corrected).

Cluster Region Number ofvoxels in region MNI coordinates Peak Z value
x y z
(OW_Congruent – OW_Incongruent) > (OB_Congruent – OB_Incongruent)
1 L Middle Occipital Gyrus 831 22 –94 10 4.25
R Middle Occipital Gyrus 187
L Fusiform Gyrus 113
R Fusiform Gyrus 376
L Inferior Occipital Gyrus 293
R Inferior Occipital Gyrus 276
L Lingual Gyrus 215
R Lingual Gyrus 345
L Superior Occipital Gyrus 123
2 R Supplementary Motor Area 383 14 14 60 3.39
3 R Superior Parietal Lobule 191 36 –62 56 3.18
R Inferior Parietal Lobule 114
4 R Insula 175 32 18 8 3.41
(OB_Congruent – OB_Incongruent) > (OW_Congruent – OW_Incongruent)
- No significant cluster - - - - -

Author response image 2. Significant brain activations of different contrasts in the whole-brain level analysis.

Author response image 2.

a, the effect of object type, positive values (warm color) indicated higher activation for objects within than objects beyond and negative values (cold color) indicated the opposite. b, the effect of congruency, positive values indicated higher activation in congruent than incongruent condition. c, the effect of interaction between object type and congruency, positive values indicated the larger congruency effect for objects within than beyond. d, the congruency effect for objects within. All contrasts were corrected with cluster-level correction at p < .05. The detailed cluster-level results for each contrast map can be found in Supplementary Table S2 to S5.

Reviewer #3 (Recommendations For The Authors):

a. (>a) Clarify all theoretical assumptions already within the introduction and specify how the predictions are tested (and how they could be falsified).

Ra: Please see R1.

b. (>b) Explain how the chosen experimental approach relates to the theoretical questions under investigation (e.g., it is not clear to me how affordance similarity ratings can inform inference about which part of the environment is perceived as more or less manipulable).

Rb: We thank the reviewer for the suggestion, and the theoretical motivation and rationale are now clarified. For details, please see R1.

c. (>c) Include a much larger set of objects and affordances in the behavioural experiments (that is more generalizable and also permits a more precise estimation of the boundary), and use a more rigorous methodology to justify a particular cluster solution.

Rc: Please see R2 for the limited variance of objects and actions, and R4 for more analyses on the boundary.

d. (>d) Clearly motivate what the use of LLMs can contribute to the study of affordance perception.

Rd: Please see R6.

(e) Clearly motivate why congruency effects are thought to index "affordance representation in the brain"Re: Please see R7.

(e) Include a much larger set of objects and affordances in the fMRI study.

Re: Please see R7.

(f) Consider toning down the main conclusions based on the limitations outlined above.

Rf: We have toned down the main conclusions accordingly.

We are profoundly grateful for the insightful comments and suggestions provided by the three reviewers, which have greatly improved the quality of this manuscript. References

Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and Brain Sciences, 22(4), 637-660.

de Beeck, H. P. O., Torfs, K., & Wagemans, J. (2008). Perceived shape similarity among unfamiliar objects and the organization of the human object vision pathway. Journal of Neuroscience, 28(40), 10111-10123.

Borghi, A. M. (2005). Object concepts and action. Grounding cognition: The role of perception and action in memory, language, and thinking, 8-34.

Colling, L.J. (2021). ljcolling/go-bayesfactor: (Version v0.9.0).Zenodo. doi:10.5281/zenodo.4642331

Crawley, J. A. H., Mumby, H. S., Chapman, S. N., Lahdenperä, M., Mar, K. U., Htut, W., ... & Lummaa, V. (2017). Is bigger better? The relationship between size and reproduction in female Asian elephants. Journal of Evolutionary Biology, 30(10), 1836-1845.

Ellis, R., & Tucker, M. (2000). Micro‐affordance: The potentiation of components of action by seen objects. British Journal of Psychology, 91(4), 451-471.

Fan, L., Li, H., Zhuo, J., Zhang, Y., Wang, J., Chen, L., ... & Jiang, T. (2016). The human brainnetome atlas: a new brain atlas based on connectional architecture. Cerebral Cortex, 26(8), 3508-3526.

Fodor, J. A. (1975). The Language of Thought (Vol. 5). Harvard University Press.

Gibson, J. J. (1979). The ecological approach to visual perception: Classic edition.

Hertrich, I., Dietrich, S., & Ackermann, H. (2016). The role of the supplementary motor area for speech and language processing. Neuroscience & Biobehavioral Reviews, 68, 602-610.

Huang, T., Song, Y., & Liu, J. (2022). Real-world size of objects serves as an axis of object space. Communications Biology, 5(1), 1-12.

Kantak, S. S., Stinear, J. W., Buch, E. R., & Cohen, L. G. (2012). Rewiring the brain: potential role of the premotor cortex in motor control, learning, and recovery of function following brain injury. Neurorehabilitation and Neural Repair, 26(3), 282-292.

Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., ... & Zisserman, A. (2017). The kinetics human action video dataset. arXiv preprint arXiv:1705.06950.

Konkle, T., & Oliva, A. (2011). Canonical visual size for real-world objects. Journal of Experimental Psychology: human perception and performance, 37(1), 23.

Kourtis, D., Vandemaele, P., & Vingerhoets, G. (2018). Concurrent cortical representations of function-and size-related object affordances: an fMRI study. Cognitive, Affective, & Behavioral Neuroscience, 18, 1221-1232.

Matić, K., de Beeck, H. O., & Bracci, S. (2020). It's not all about looks: The role of object shape in parietal representations of manual tools. Cortex, 133, 358-370.

McDannald, D. W., Mansour, M., Rydalch, G., & Bolton, D. A. (2018). Motor affordance for grasping a safety handle. Neuroscience Letters, 683, 131-137.

NCD Risk Factor Collaboration (NCD-RisC). (2016). A century of trends in adult human height. Elife, 5, e13410.

Peelen, M. V., & Caramazza, A. (2012). Conceptual object representations in human anterior temporal cortex. Journal of Neuroscience, 32(45), 15728-15736.

Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65.

Sakreida, K., Effnert, I., Thill, S., Menz, M. M., Jirak, D., Eickhoff, C. R., ... & Binkofski, F. (2016). Affordance processing in segregated parieto-frontal dorsal stream sub-pathways. Neuroscience & Biobehavioral Reviews, 69, 89-112.

Van Gelder, T. (1998). The dynamical hypothesis in cognitive science. Behavioral and Brain Sciences, 21(5), 615-628.

Wagenmakers, E.-J., Wetzels, R., Borsboom, D. & van der Maas, H. L. J. Why psychologists must change the way they analyze their data: the case of psi: Comment on Bem (2011). Journal of Personality and Social Psychology, 100(3), 426–432.

Zhen, Z., Yang, Z., Huang, L., Kong, X. Z., Wang, X., Dang, X., ... & Liu, J. (2015). Quantifying interindividual variability and asymmetry of face-selective regions: a probabilistic functional atlas. NeuroImage, 113, 13-25.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Feng XR, Xu S, Li Y, Liu J. 2024. Data for 'Body size as a metric for the affordable world'. figshare. [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Supplementary file 1. Cortical regions showing significant results in whole-brain analyses (R=right hemisphere, L=left hemisphere; Z>2.3, P=0.05, cluster corrected).
    elife-90583-supp1.docx (37.5KB, docx)
    Supplementary file 2. The full list of inanimate objects used in the behavioral study, with the corresponding size rank noted according to Konkle and Oliva, 2011.
    elife-90583-supp2.docx (22.8KB, docx)
    MDAR checklist

    Data Availability Statement

    All analyses are included in the manuscript. The data are freely available from Figshare.

    The following dataset was generated:

    Feng XR, Xu S, Li Y, Liu J. 2024. Data for 'Body size as a metric for the affordable world'. figshare.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES