Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Feb 1.
Published in final edited form as: J Exp Psychol Gen. 2019 Feb;148(2):252–271. doi: 10.1037/xge0000557

Perception in dynamic scenes: What is your Heider capacity?

Farahnaz A Wick 1,2, Abla Alaoui Soce 2, Sahaj Garg 3, River Grace 4, Jeremy M Wolfe 1,2
PMCID: PMC6396302  NIHMSID: NIHMS1002458  PMID: 30667269

Abstract

The classic animation experiment by Heider and Simmel (1944) revealed that humans have a strong tendency to impose narrative even on displays showing interactions between simple geometric shapes. In their most famous animation with three simple shapes, observers almost inevitably interpreted them as rational agents with intentions, desires and beliefs (“That nasty big triangle!”). Much work on dynamic scenes has identified basic visual properties that can make shapes seem animate. Here, we investigate the limits on the ability to use narrative to share information about animated scenes. We created 30-second Heider-style cartoons with 3–9 items. Item trajectories were generated automatically by a simple set of rules, but without a script. In Experiments 1 and 2, ten observers wrote short narratives for each cartoon. Next, new observers were shown a cartoon and then presented with a narrative generated for that specific cartoon or one generated for a different cartoon having the same items. Observers rated the fit of the narrative to the cartoon on a scale from 1(clearly does not fit) to 5(clearly fits). Performance declined markedly when the number of items was larger than three. Experiment 3 had observers determine if a short clip of a cartoon came from a longer clip. Experiment 4 had observers determine which of two narratives fit a cartoon. Finally, in Experiment 5, narratives always mentioned every item in a display. In all cases of matching narrative to cartoon, performance drops most dramatically between 3 and 4 items.

Keywords: visual working memory, dynamic scene understanding, tracking capacity, social cognition


We have the compelling impression that we can perceive the entire environment surrounding us in rich detail. However, it is well known that we cannot actually fully process all the local information in our environment at once. As a consequence, we selectively attend to some objects or regions and, as a result, those regions are more fully processed than other aspects of the current scene. This selection can be based on our current goals and/or lower level features (Wolfe & Horowitz, 2017; Theeuwes, 2018). One factor that can shape deployment of attention is known as the animate monitoring bias (New, Cosmides & Tooby, 2007; Simion, Regolin & Bulf, 2008). Attention is preferentially deployed to animate agents such as people or animals compared to inanimate objects such as plants and vehicles. Motion also influences the prioritization of inanimate items in our field of view (Buren, Uddenburg & Scholl, 2016). Thus, if a ball rolls towards you, it is more likely to capture your attention than if it is moving away or if it is stationary.

We deploy attention to motion because, in the real world, objects in motion are more likely to be important. All the more so, if the motion is intentional. One evolutionary key to successful social relationships is the ability to understand the mental states or intentions of other people. Motion is one clue to those intentions. For instance, if you see someone shaking their fists at another person, you automatically think that the person moving their hands might be “angry”. Using visual input to understand and recognize that people have beliefs and desires different from our own is an aspect of theory of mind (Premack & Woodruff, 1978).

Inferring intentions of other people is so important for our survival that it has been suggested that neural mechanisms to detect biological motion might have evolved for this very reason (Frith & Frith, 1999). The rules for inferring animacy and intentions on the basis of motion have been the subject of considerable psychophysical research (Pantelis et al, 2014; Gao, McCarthy & Scholl, 2010; Barrett, Todd, Miller & Blythe, 2005; Tremoulet & Feldman, 2006, 2000; Leslie, Friedman & German, 2004; Blythe, Todd, Miller, 1999, Heider & Simmel, 1944) and computational modeling (Baker, Jara-Ettinger, Saxe & Tenenbaum, 2017; Pantelis et al, 2016). Others have considered the role of these phenomena in the context of mindreading (Kuhlmeier, Wynn & Bloom, 2003, Gallese & Goldman, 1998).

In this paper, our interests are focused on the limits of the ability to infer the goals and intentions of others from their motion. It seems intuitively clear that there must be a limit. For instance, it might be difficult to simultaneously consider the separate intentions of 100 agents. To bring this problem to a controlled lab setting we used animations with simple geometric shapes. The tendency to ‘anthropomorphize’ simple shapes was famously demonstrated by Heider and Simmel (1944). In their study, observers were asked to view a short, simple animation of two triangles, a circle and a rectangular frame with a “door” (see Figure 1). When observers were asked to describe the animation, they produced narratives involving the three shapes instead of a literal description of the movement. For present purposes, a narrative can be defined as an organized interpretation of a sequence of events. Given the classic Heider and Simmel cartoon, most narratives involved a romantic relationship between the small triangle and the circle who were trying to escape the aggressive triangle. In producing such stories, observers are adopting what Dennett (1989) called the “intentional stance”, imposing intentions on the shapes rather than simply offering a report of the visual features and motions. This mechanism of remembering events in the form of a story seems to be reflexive and automatic (Scholl & Tremoulet, 2000). It could be thought of as a way to ‘parse’ a dynamic stimulus into meaningful units (Tse, Cavanagh & Nakayama, 1998; Zacks, 2004). Turning motion into a story is reminiscent of ‘chunking’ processes in memory (Miller, 1956), where the imposition of meaning onto a stimulus allows more of that stimulus to be coded into memory. Thus, the string “14921776” becomes two dates rather than eight digits. In a similar way, a string of motions might be recoded as the square avoiding the circle. This narrative recoding differs from chunking of digits because, while it could be seen as a more compact representation of the input, narrative recoding in this manner would not allow the observer to recover the precise details of the unchunked stimulus in the way that those two dates can be restored to a string of eight digits. In this paper, we examine the limits on the use of narrative in perceiving and remembering motion. Specifically, suppose that an observer generates a narrative based on a cartoon with N moving shapes each with its own motion. Will a second observer be able to look at a cartoon and determine if that cartoon inspired that narrative? The motions we used could interact (e.g. object A ‘chases’ object B) but we did not use group motion (e.g. a school of fish) because we assume that, for present purposes, the group would become a single object.

Figure 1.

Figure 1.

A single frame, redrawn from the animation used to study perception of intention in Heider and Simmel’s (1944) experiment.

Our question about processing limits or capacity is different than the questions that have been the usual focus of interest in studies of the perceived animacy of simple shapes. Most work has focused on the stimulus properties that produce a perception of animacy. Historically, Michotte (1963) did the classic work describing when motions and interactions produce a causal interpretation (Did this object cause the motion of that object?). More recently, Scholl and Gao (2013) produced a taxonomy of motions that look animate (Did that spot ‘decide’ to move or did it simply move when hit by some other item?). Similarly, Tremoulet and Feldman (2000, 2006) asked observers to assess the animacy of a small moving dot that changed direction and speed of movement. Dots that underwent larger direction or speed changes were perceived to be more animate than dots that underwent smaller changes. More complex states like “chasing” can be modulated by subtle cues such as whether an object is oriented toward another object. You do not tend to chase things that you are not looking at (Gao, McCarthy & Scholl, 2009, see review: Scholl & Tremoulet, 2000; Gao, Newman & Scholl, 2010). Pantelis and Feldman performed a series of studies exploring how human observers attribute mental states to autonomous virtual agents and how they categorize those mental states in dynamic displays where agents forage for food and each agent can explore, gather, attack or flee from other agents (Pantelis et al., 2016; Pantelis et al, 2012; Pantelis & Feldman, 2012).

We are asking about the stories that arise when people apply these rules to moving objects and attempt to tell others about what they have seen. Assuming that different observers follow similar rules of inference, observers seem likely to agree about the interpretation of simple events or behaviors. For example, if one object makes contact with another and the second, previously stationary object moves, an observer might say that object A hit object B and caused it to move. Observer B is likely to agree. Multiple objects can generate similarly simple stories if objects can be grouped. For instance, a single “wolf” object can be perceived as “chasing” a collection of “sheep” objects (Gao, McCarthy & Scholl, 2009). But what happens when the numbers of agents or groups of agents gets larger? If we view 3, 4, or more items, each performing its own rule-governed behavior, what do we perceive? If one observer creates an account of the activity in a multi-item display, would another observer recognize that account as clearly referring to that display? What is the limit on the number of agents that we can integrate into a coherent, generally agreed upon narrative? The purpose of this paper is to make an estimate of that limit – a limit we will call our ‘Heider capacity’ in honor of the classic Heider and Simmel work.

We define ‘Heider capacity’ as the ability to infer intentionality of agents in a dynamic scene and communicate that intentionality to others in the form of a narrative. It can be thought of as “shared capacity” based on communication and agreement between observers. The term “capacity” should not be overstressed here. We are attempting to characterize a limit on the ability to perceive and/or communicate the contents of simple, artificial scenes. To anticipate, our data will show that this shared capacity appears to be quite limited. Heider capacity varies, to some degree as a function of how it is probed. However, over a series of experiments, performance falls as the number of agents rises. The general trend is when the number of items in a display exceeds three, observers are markedly less likely to agree about what they have seen.

General Method

We generated cartoons composed of simple geometric shapes in motion, as in the original Heider animation described above. However, rather than being scripted, the movements of objects in our cartoons were governed by a set of stochastic rules, described below. Therefore, the story line was much less predetermined in our cartoons than it was in Heider’s. In Part 1 of the experiment, we asked a group of observers to watch the cartoons and produce narratives. In Part 2, a separate group of observers saw a cartoon and read a narrative that was derived from that cartoon or from another cartoon with the same ‘characters’ but with different motions. We measured the ability of these new observers to determine if the narrative matched the cartoon. From the change in performance as a function of the number of characters on screen, we derive an estimate of what we are calling the “Heider capacity”.

Experiment 1: Measuring Heider capacity with simple shapes

Method

Participants

Part 1: Collect narratives

Twenty observers were recruited from Amazon Mechanical Turk to view cartoons and produce a narrative for each cartoon viewed. All observers were from the United States, gave informed consent and were paid $8.00 for approximately 45–60 minutes of their time. The informed consent procedures were approved by Brigham and Women’s Hospital IRB.

Part 2: Measure Heider capacity

Ninety-six observers were recruited through Amazon Mechanical Turk to measure Heider capacity from the narratives and cartoons. All observers were from the United States, gave informed consent and were paid $2.00 for approximately 10–15 minutes of their time. The informed consent procedures were approved by Brigham and Women’s Hospital IRB. For a novel experiment where we did not have an estimate the effect size, we tested a large sample in the hope of achieving a robust effect (see power discussion below). We used the same sample size in most of the subsequent experiments.

Stimuli

Heider-style cartoons were created using Matlab with Psychtoolbox (Brainard, 1997; Pelli, 1997; Kleiner et al, 2007). Each cartoon contained moving circles, squares and triangles of uniform size and a larger, unmoving black rectangle representing a wall or obstacle. We generated a total of 20 cartoons, populated by 3, 4, 5, 7 or 9 moving shapes or ‘characters’. Four cartoons were created for each set size, each populated by the same characters, but with movements dictated by a unique combination of rules. Each cartoon was 30 seconds in length. The length was chosen based on pilot experiments: shorter length cartoons (15–20 seconds) produced impoverished/uninteresting narratives whereas narratives from longer cartoons (45–60 seconds) showed quite marked primacy-recency effects. Systematic manipulation of the length could be an interesting follow-up study. The cartoons were generated and stored offline. The cartoons are available at https://osf.io/atc9x/.

The numbers of circles, squares and triangles were distributed as evenly as possible; i.e. if the set size was seven, there would be two of each shape, plus an extra instance of one shape. Each shape was randomly assigned a unique color and a behavior from the following list: chasing, repulsion, attraction, moving to a specific location on the screen (e.g. one corner), jittering, and avoiding the stationary rectangle. Behaviors were selected with replacement. Thus, two items might be ‘chasing’ in the same cartoon. The cartoons selected for the experiments were visually inspected to ensure that shapes exhibited different mixes of behaviors. For instance, it would not be permitted for two, five-element cartoons to have two chasers and an item moving to a corner, even if the motion paths and the shapes assigned to these motions were not identical.

Colors.

Twenty-eight visually distinguishable colors were generated and selected from the following website: http://phrogz.net/css/distinct-colors.html using the default settings. Therefore, every shape or character in the cartoons had a ‘visually distinct’ color. Different colors were assigned to shapes of the same type (for instance, there were no two ‘blue’ triangles in a cartoon). Colors will have some range of variation because the stimuli were viewed on many different screens. However, if a narrative declares that the red square was following the blue circle, we have no reason to believe that normal variation in color across platforms would interfere in understanding of the narrative.

Movement rules.

Each cartoon consisted of 1800 static frames presented one after the other for approximately 17 ms each. Depending on the motion rule, the items could move a minimum of 1 and a maximum of 6 pixels per frame. If we assume a display subtending approximately 53° by 35° (Dell 28-inch monitor with a resolution 1920 × 1280 pixels, viewing distance of 50 cm), individual items were 2° in diameter. The black rectangle would subtend a visual angle of 2° x 10° (see Figure 2). The resulting item velocities were in the average range from 1° to 4°/s. The velocities of the items were not constant and could increase or decrease by 1 pixel per frame on average. Of course, viewing conditions for Amazon Turk observers will vary. However, over a reasonable range, it seems unlikely that this variation will be a critical variable in this task. Consider, for example, that the “narrative” unfolding in a movie is not radically different if the observer is in the first row of the theater or viewing the film on the screen on the back of the airline seat in row 32.

Figure 2.

Figure 2.

Screenshots from cartoons of different set sizes (indicated by the number on the top left corner) in (a) Experiment 1 and (b) Experiment 2.

Initial positions and directions for each shape were generated at random. There were many possibilities for collisions between items, in which case, the shapes bounced off each other conserving momentum. Objects also bounced off the edges of the display and the stationary rectangle. The specific motion rules are described below:

Chasing:

Two shapes in the cartoon were selected whenever this rule was used. The chaser would move towards the direction of the chased shape and the chased item would pick a random direction and move in that direction away from the chaser when that chaser approached within a ~200 pixel radius.

Attraction:

Again, two shapes were selected whenever this rule was used. Each shape would move towards each other in a straight line. When they were within 50 pixels of each other, they would move along a random but conjoint trajectory, circling and bumping into each other. Note that during collisions with other shapes, obstacles or edges of the display, these shapes could separate and subsequently rejoin each other.

Repulsion:

Two shapes were assigned to this rule. The shapes would move toward each other on the screen but if they were within ~100 pixels of each other, they would exert a force (calculated from the distance between the shapes with an added constant) on each other so it would seem like they are ‘pushing’ each other back. After repelling each other, there was a 30% probability that the shapes would wander away from each other and a 70% chance that they would, again, move toward each other for another round of repulsion.

Moving to a corner of the screen or to a random location:

A shape moved to a corner or other specific location from its current location. Since the velocities never dropped to zero, the shape would bounce around that corner of the display or move around randomly in the vicinity of its specific goal location.

Jittering:

The shape would change its heading in the range of +/− 20° per frame. This would give the impression of jittering or ‘shaking’ along a trajectory. This property could be assigned to any item on the screen with a probability of 30%. Therefore, there could be multiple items, involved in other behaviors, that could have this property.

Avoiding the stationary rectangle:

These items moved on straight line trajectories. They would slow down and change direction whenever the item was within ~100 pixels of the stationary rectangle

Procedure

Part 1: Collecting narratives

Each observer completed 12 trials, including 2 practice trials. They viewed two cartoon versions for each of the five set sizes. On each trial, observers pressed a key to start the cartoon. They were encouraged to take notes as they viewed each cartoon, but could not pause or replay them. After the cartoon, a textbox appeared and observers were asked to write a story about the animation that was at least 25 words long. If the textbox contained less than 25 words, a pop-up display would alert the observer of the minimum word limit and prevent them from proceeding to the next trial. We did not give any specific guidelines for the stories, except to state that it was not useful to give purely physically descriptive accounts (e.g. It would not be useful to say “There were 5 shapes. They were red, green, blue, yellow, and purple. They moved around.”).

There were five set sizes: 3, 4, 5, 7 and 9. Each experiment consisted of 12 cartoons: 2 practice trials and 2 cartoons for each of five set sizes. The practice trials consisted of cartoons from set sizes 3 and 9 and these cartoons used in the practice trials were not repeated during the experiment. Narratives were collected from ten observers for each cartoon in the stimulus set. These narratives were rated independently by three lab assistants (all naïve to the purpose of the study) for their accuracy and fit with the corresponding cartoon. The guidelines provided to assess a fit were to assign a point for each of the following 6 criteria met: shapes could be identified based on the physical description, behaviors could be identified, narrative was “entertaining” (to avoid purely physical descriptive accounts), and the narrative did not contain specialized references (e.g. to pop culture). Points were given if narrative attributed emotions to shape and if the spatial structure and locations of shapes were described accurately. The five highest rated narratives per cartoon were selected for Part 2 of the study. Narratives from practice trials were discarded. The narratives were corrected for spelling and grammatical mistakes.

Part 2: Measuring Heider capacity

On each trial in Part 2, an observer viewed a cartoon, read a narrative, and then rated how well the narrative matched the cartoon. No observers from Part 1 participated in Part 2. There were two conditions (described below) with 48 observers in each condition. If we use a chi-sq test to compare correct responses between set sizes, 117 observations are adequate to detect an effect of Cohen’s medium size (.3) with a power of .9 at a significance level of .01. Our 480 observations in each condition (48 observers, 10 observations per observer) should be more than adequate to see any interesting effects of set size. We use similar sample sizes in all our experiments.

Observers completed 12 trials including 2 practice trials, viewing two cartoon versions from each set size. The two practice trials consisted of cartoons with set sizes 3 and 9 and these cartoons along with the corresponding narratives were not repeated during the experiment. In each trial, observers pressed a key to start the cartoon. They were encouraged to take notes but could not pause or replay the cartoon. After the cartoon ended, a narrative was shown that either matched or mismatched the cartoon just viewed (cartoon-first). In a separate condition with 48 new observers, the narrative was presented first followed by the cartoon (narrative-first). After the observers had been exposed to both a cartoon and a narrative, they rated whether the narrative fit the cartoon on a scale from 1 (clearly does not fit) to 5 (clearly fits). An equal number of matched or mismatched narratives were shown for each set size. Data from practice trials were not included in the analysis. Each cartoon was shown only once during the experiment to avoid repetition or learning effects.

Results

Part 1

On average, the selected narratives produced in Part 1 were 43.6 words long (SD = 20.5). When the set size shown in the cartoon was less than 5, the narratives usually described some behaviors of all items in the cartoons (see Fig 3, left). As set size increased however, narratives tended to be focused on a subset of the items (approximately 3 items), which generally revolved around shapes that moved in and out of the two spaces separated by the black rectangle. Here are two sample narratives:

Figure 3.

Figure 3.

Left: Average number of shapes mentioned in narratives used In Experiment 1 and 2. Recall there were four cartoons for each set size involving the same actors. The number of items were averaged across 20 narratives per set size. Right: Average number of actions mentioned in the narratives used in Experiments 1-5. The error bars represent standard error.

Set size 3

The triangle and circle played together, chasing each other all over the field and having a great time. The poor square was left out, and was timidly trying to join in and get them to notice but to no avail. The square was left out and ignored by the other two.

Set size 9

The triangles had been friends for a while now. One of them convinced the other to go to the party. The green one was super hyper, but generally stayed with his triangle friends. The circles, meanwhile, were very extroverted and talked to everyone at the party.

As shown in Figure 3, we counted the number of shapes mentioned in narratives from Experiment 1. Only shapes that were explicitly mentioned (‘that blue triangle’) were counted. Some narratives contained shapes (such as ‘the nervous square’) entering or leaving a group of other items and in these cases, only the shapes that were described performing some action were counted. The group would not be counted in this case. In narratives where group behavior was clearly described, rather than the behavior of individual shapes, the total numbers of items in the groups were used. Thus, if a narrative stated ‘the triangles attacked the circles’, all the triangles and circles would be considered to have been mentioned, though one could argue that multiple objects had been reduced to two groups. As can be seen in Figure 3, the average number of items described in narratives is around three regardless of the number of shapes in the scene. As the set sizes in the cartoon increased, observers did not or could not increase the numbers of shapes in the narrative. Note that they did not, for example, report on shapes A, B, and C for the first 10 seconds of the cartoon and C, D, and E for some later portion. Stories tended to be about three shapes. An important implication here is that two observers watching a cartoon with 9 shapes, for example, might not recognize each others’ narratives because each might be paying attention to a different three-item subset. This three-item limit is suggestive of the limits on visual working memory (Luck & Vogel, 1997; Oksama & Hyona, 2004; Wolfe, Reinecke & Brawn, 2006) and on working memory more generally (Cowan, 2017) though the similarity could be coincidental.

We counted the number of action words used in the narratives used in Experiment 1. We counted words that described interactions between two items or behaviors of single items. Compound descriptions like ‘pushing and kicking’ were considered to be a single action word. Behaviors describing emotion such as ‘jittery square’, ‘moved angrily’ and position of items such as ‘close to red square’ were also considered action words. If the action of an item was repeated in the story, the repeats were counted separately. As can be seen in Figure 3 (right), the average number of action words (~ 4 words) used to describe behavior in narratives is similar across set sizes.

Part 2

As shown in Figure 4, the rating scale data from Part 2 can be used to generate Receiver Operating Characteristic (ROC) curves for each set size for the entire group of observers. Points on the ROC are determined by shifting a decision criterion. Thus, all ratings above 3 might be taken as “match” responses and all other responses as “mismatch”. The match responses give rise to true positive and false positive proportions and, thus, to a point on the ROC. Moving the criterion to a rating of 2 give a different set of proportions and a different point, and so on. Area under the curve (AUC) or d′ values can be derived from these curves (Macmillan & Creelman, 1996). If observers were guessing, the points of the ROC should fall along the diagonal (dotted lines in Figure 4). If observers can discriminate between matched or mismatched narratives, then the ROC curve will lie above the chance diagonal, curved towards the top left-hand corner.

Figure 4.

Figure 4.

ROC curves generated from the rating scale data in Experiment 1. Each curve represents cartoons with a different set size of shapes.

Figure 4 shows a straightforward result. In both conditions (cartoon-first, narrative-first), when there are three items in the cartoon, the second observer can recognize whether the cartoon she is seeing fits with the story that another observer is telling. When the set size is greater than 3, performance deteriorates markedly. This can be quantified by calculating d′ values from the area under the ROC curves (see Table 1). Thus, we would say that the ‘Heider capacity’, as measured in Experiment 1, appears to be about 3.

Table 1:

d’ values for each set size in each experiment.

Experiment 1 Experiment 2 Experiment 3

Set size Cartoon first Narrative first Cartoon first Narrative first Cartoon first Clip first

3 1.73 1.61 1.27 1.67 1.11 0.93
4 0.57 0.59 0.55 0.93 0.97 0.22
5 0.38 0.67 0.29 1.14 0.64 0.70
7 0.37 0.89 0.02 0.92 0.46 0.32
9 0.43 0.65 0.40 0.91 0.77 0.59

To understand the performance differences between set sizes within and across the two conditions, we counted the number of ‘correct’, ‘neutral’ and ‘incorrect’ responses. A response was coded as correct (or incorrect respectively) if the narrative matched the cartoon and the observer’s rating was greater than 3 (recall that a rating of 5 meant that the observer agreed that the narrative clearly fits the cartoon). A rating of 3 was coded as ‘neutral’. We compared these coded ratings for successive set sizes and found that performance on set size 3 is significantly different from set size 4 for both cartoon-first (χ2 (2) = 13.48, p < 0.005, ϕ = 0.27) and narrative-first (χ2 (2) = 10.52, p < 0.005, ϕ = 0.23) conditions. No other pairwise comparisons between set sizes greater than 3 were significant in either condition (all χ2 (2) <1.5, p > 0.4, after alpha correction of p < 0.025 for multiple comparisons). We performed a two-way repeated measures ANOVA with Set size and Condition as the independent variables and used the average coded responses for each set size as the dependent variable. As would be expected, this shows a main effect of Set size, F(4, 376) = 10.31, p < 0.001, partial η2 = 0.09. There was no effect of Condition (cartoon-first vs narrative-first: F(1, 94) = 3.43, p = 0.07). The interaction was not significant, F(4, 376) = 0.88, p = 0.470 after Greenhouse-Geisser correction. We used a repeated-measures ANOVA as it is equivalent and theoretically more powerful than the non-parametric Friedman’s test for comparisons across two-classifiers when the ANOVA’s assumptions are met (Demšar, 2006).

Since these experiments were conducted on Amazon Turk, response time is not an interesting measure. Observers finished the experiments at their leisure within the allotted one hour. Average completion time was 21 minutes.

Discussion

Why is the Heider limit approximately 3 items?

Though our intuition might suggest that two observers should be able to agree about what happens in a scene, even when it involves more than three actors, this appears not to be the case in Experiment 1. A limit of about 3 items is similar to and might be related to limits on working memory (Cowan 2001) and/or motion tracking though this experiment does not prove that connection. In multiple object tracking (MOT), 3–4 items is a typical limit on the number of items that can be tracked. Some multiple object tracking (MOT) studies have shown that we can track anywhere from 4 up to 8 identical objects at once under the right conditions (Alvarez & Franconeri, 2007). Tracking performance depends on crowding of items in the display, within or across visual hemifields, and the speed at which objects travel (see Scimeca & Franconeri, 2015, for a review). Our displays use parameters similar to those that produce MOT capacities around 4 to 6 items.

Even if 4+ items were tracked, the apparent capacity could be depressed if the basic features of color and shape are not firmly tied to their items. Feature binding failures certainly occur when observers are asked about stimuli that are defined by conjunctions of two or more features (Treisman & Schmidt, 1982). For instance, a cartoon containing a blue square chasing a red circle could yield a narrative describing a red square and a blue circle (“binding errors” or “illusory conjunctions”). Scholl and others (1999) have found that in multiple object tracking (MOT) displays, observers may successfully report the targets’ location and motion direction, while failing to report accurate shape or color (Scholl, Pylyshyn & Franconeri, 1999). Even when all objects have unique shapes and/or colors, these unique identities are not remembered well in tracking tasks. In a multiple identity tracking (MIT) study, observers were asked to track the locations of unique moving objects. At the end of the trial, they were asked to report the identity of a probed target and it was found that limits were at least as severe as those seen in MOT tasks (Oksama & Hyona, 2008). In a related MIT task, Horowitz et al. (2007) used unique cartoon animals as stimuli. Their observers watched these animals move about the screen. At test time, all animals were occluded and the observers were asked to locate a specific animal. Observers could typically locate only 1 or 2 such animals in a display. If these limits are relevant to performance in our Heider task, it is fairly easy to see how it would be difficult to agree on a story, once the number of actors got much beyond 3 or 4. One might have expected somewhat better performance, given that observers were allowed to take notes, but apparently this did not help a great deal. Unfortunately, we did not collect these notes.

Beyond tracking limits, performance in these displays might also be limited by visual crowding (Whitney & Levi, 2011). As set size increases, crowding will increase in our displays. Crowding could exacerbate the binding errors, mentioned above (Treisman, 1996; Cave & Wolfe, 1999). Features like color, shape, and motion might be transposed between objects. Crowding and binding problems would be lessened if the items were more distinctive. Accordingly, in Experiment 2, we replicate Experiment 1 with a set of shapes that are intended to be harder to confuse with each other.

Experiment 2: Measuring Heider capacity with distinct shapes

Since distinct shapes are known to reduce the effects of crowding, because of a more efficient representation of their features and locations (Whitney & Levi, 2011), we repeated Experiment1 with the more distinctive set of stimuli shown in Figure 4.

Method

Participants

Part 1: Collecting narratives

Twenty observers were recruited from Amazon Mechanical Turk and asked to view cartoons and to produce a narrative for each cartoon viewed. All observers were from the United States, gave informed consent and were paid $8.00 for approximately 45–60 minutes of their time. These observers did take not part in Experiments 1. The informed consent procedures were approved by Brigham and Women’s Hospital IRB.

Part 2: Measuring Heider capacity

Ninety-six observers were recruited through Amazon Mechanical Turk to measure Heider capacity from the narratives and cartoons. All observers were from the United States, gave informed consent and were paid $2.00 for approximately 10–15 minutes of their time. The informed consent procedures were approved by Brigham and Women’s Hospital IRB. These observers did take not part in Experiments 1.

Stimuli

Heider-style cartoons with distinct shapes were created using Matlab with Psychtoolbox using the same behavioral rules, set sizes and length described in Experiment 1. Each set size was made up of a unique subset of shapes shown in Figure 5. These shapes were generated in Photoshop and the color/texture choices for the shapes were arbitrary. Additionally, all shapes were outlined in black to maximize contrast with the white background. If we assume a display of 53° x 35°, the shapes would subtend a visual angle of 2° x 1.5°-2.5°. The cartoons were generated and stored offline. They are available at https://osf.io/atc9x/.

Figure 5.

Figure 5.

Distinct shapes used in Experiment 2

Procedure

The procedure and instructions were identical to those of Experiment 1. In Part 1, we collected ten narratives per cartoon with distinct shapes. Five narratives were selected based on the ratings of three lab assistants after checking accuracy and fit for Part 2, as well as ensuring that the narratives would be intelligible to a common reader. For instance, if a narrative mentioned a ‘red thing’, it was corrected to reflect the correct shape such as a ‘red arrow’. Narratives from practice trials were discarded.

A separate group of observers was recruited for Part 2. We measured their Heider capacity in cartoon-first and narrative-first conditions. There were 48 observers in the cartoon-first condition and 48 new observers in the narrative-first condition. Data from one participant was discarded in the narrative-first condition because their responses were not recorded during the experiment due to a technical error.

Results

Part 1

On average, the narratives used in Part 2 contained 54.1 words (SD = 23.2) after they were corrected for spelling and grammar errors. The average number of shapes mentioned in the narratives was 3 across set sizes and the average number of behaviors mentioned in the narratives was about 4 as shown in Figure 3.

Part 2

Results are shown in Figure 6. These look similar to the results of Experiment 1 (Figure 4). To understand the performance differences between set sizes within and across the two conditions, we again counted the number of ‘correct’, ‘neutral’ and ‘incorrect’ responses based on dividing the data at a rating of 3. We compared these coded ratings for successive set sizes and found that performance for set size 3 is significantly different from that for set size 4 in the narrative-first condition (χ2 (2) = 9.09, p <0.011, ϕ = 0.22). The difference was not significant in the cartoon-first condition (χ2 (2) = 5.83, p = 0.054). No pairwise comparisons between set sizes greater than 3 were significant in either condition (all ϕ2 (2) < 4.7, p > 0.04. These fail to meet the p < 0.025 criterion for significance after alpha correction of for multiple pairwise comparisons). A two-way repeated measures ANOVA with Set Size and Condition as the independent variables and the average coded responses for each set size as the dependent variable shows main effects of Set Size (F(4, 372) = 6.26, p < 0.001, partial η2 = 0.06 ) and Condition F(1, 93) = 27.3, p < 0.001, partial η2 = 0.23). The interaction between cartoon-first and narrative-first conditions was not significant, F(4, 372) = 1.01, p = 0.404.

Figure 6.

Figure 6.

ROC curves generated from the rating scale data in Experiment 2. The points on the graphs represent the rating points of the scale.

Discussion

Even with distinct shapes, we find that our ‘Heider capacity’ is comparable to Experiment 1 and falls off sharply after 3 items even when the narrative is presented first. The narrative-first condition produced higher performance overall even at larger set sizes. This is not surprising; when the narrative comes first, observers can attempt to restrict their attention to a subset of the shapes on the screen. Even with this advantage, however, performance falls off between 3 and 4 shapes. For set sizes over 3, d′ is higher at each set size in Experiment 2 though those d′ values of about 1.0 are less than the 1.67 d′ at set size 3.

Experiment 3: Are the narratives the problem?

Perhaps our observers’ written narratives failed to capture the stories that they extracted from the cartoons. It could be that observers in both parts of the experiment were good at understanding the story line of an N-shape cartoon. Perhaps the problem is in the communication channel when observers in Part 1 produce narratives with only three agents. Maybe those observers just couldn’t or wouldn’t write down a more comprehensive account. In an effort to assess this possibility, we repeated Experiment 2 but in place of a narrative account, we showed observers a 5-second video segment of the 30-second video. The segment came either from the correct cartoon (a match) or from a different cartoon involving the same shapes (a mismatch). The five-second duration is an admittedly somewhat arbitrary choice. It is comparable to the amount of time required to read one of the narratives in Experiments 1 and 2 and is intended to be akin to being asked about some episode in a longer event: “Did you see when that blue guy bumped into the green, shiny guy?” If the limiting step in Experiments 1 and 2 was the verbal narrative, this matching task should markedly improve performance at higher set sizes.

Method

Participants

Ninety-six observers were recruited through Amazon Mechanical Turk to measure Heider capacity from the clips and cartoons. All observers were from the United States, gave informed consent and were paid $2.00 for approximately 10–15 minutes of their time. The informed consent procedures were approved by Brigham and Women’s Hospital IRB. These observers did take not part in Experiments 1 to 2.

Stimuli

The cartoons and set sizes from Experiment 2 were used.

Procedure

The procedure was similar to Part 2 of Experiment 2. The major difference was that observers rated whether 5-sec video segments (instead of narratives) matched the cartoons. The wording of the rating scale was changed to 1 (does not match) to 5 (clearly matches). We used a multi-category rating approach instead of a binary rating to simultaneously measure both performance and level of confidence in the ratings given by the observers and in order to derive ROC curves in the manner of Experiments 1 and 2.

On each trial in the cartoon-first condition, a cartoon was shown and afterwards a 5-second clip was played that was either from the cartoon previewed (match) or from a different version of the cartoon (mismatch) involving the same items. The clip segment was randomly selected from the cartoon and in the match case, the first and last 5 seconds were excluded to avoid primacy and recency effects (Kool, Conway & Turk-Browne, 2014; Nairne, 1992). In a condition analogous to the narrative-first conditions of Experiments 1 and 2, in the clip-first condition, the clip was viewed before the cartoon. There were 48 observers in cartoon-first condition and 48 new observers in the clip-first condition. Data from 2 observers were discarded in the clip-first condition due to missing data.

Results

Results are shown in Figure 7. Set size 3 produces the best performance, with d, calculated from the AUC, being the highest for set size 3 (see Table 1) in both conditions. However, unlike Experiments 1 and 2, there was not a large drop in performance between set size 3 and other set sizes in either condition. Note, however, that performance (d) at set size 3 appears to be somewhat lower in Experiment 3 than in either Experiment 1 or 2. Performance for larger set sizes is somewhat better in Experiment 3 than in 1 or 2 when the full cartoon comes first and somewhat worse when the cartoon comes second (see Table 1).

Figure 7.

Figure 7.

ROC curves generated from rating scales in Experiment 3.

A two-way repeated measures ANOVA with Set Size and Condition as the independent variables and the average coded responses for each set size as the dependent variable shows a main effect of Set Size, F(4,368) = 3.89, p < 0.01, partial η2 = 0.04 and no effect of Condition, F(1,92) = 3.61, p = 0.061. The interaction between cartoon-first and clip-first conditions was not significant, F(4, 368) = 1.62, p = 0.170. Most of the pairwise comparisons were not significant in the cartoon-first and clip-first conditions. There were significant differences in the coded ratings between set size 3 and 7 (χ2 (2) > 8.36, p < 0.015, ϕ > 0.23). There were no other significant differences between other pairwise comparisons between set sizes in either condition (all χ2 (2) < 4.80, p > 0.03. These fail to meet the p < 0.025 criterion for significance after alpha correction of for multiple pairwise comparisons).

Discussion

Recognizing a 5-second video segment as belonging to a longer cartoon seems to be different than recognizing a 5-second narrative of the same cartoon. Most notably, while the smallest set size continues to produce the best performance, the large drop between set size 3 and 4 is not evident here. In Experiment 3, the decline is noisier and more gradual. From the d′ values in Table 1, it can be seen that observers have a better ability to decide if a short clip is part of a previously viewed longer cartoon (cartoon-first) than deciding if a long cartoon contains a previously viewed short clip. The results of Experiment 3 indicate that the results of Experiments 1 and 2 are tied to the processes of coding the image in and out of a narrative and not solely to the visual properties of a cartoon. Pure visual matching produced a different pattern of results but it does not seem to produce a marked improvement in performance.

Experiment 4: Explicitly storing dynamic scenes in working memory

The results from Experiments 1 and 2 indicate that, when narratives are used, performance drops sharply when cartoon set size is greater than three. Is this apparent limit an artifact of the experimental design? Might we be able to perform more accurately if correct and incorrect narratives about the cartoon were presented together and observers merely needed to pick the more likely account? Perhaps it would be obvious which of two stories is more plausible even if that story seems imperfect in isolation. In Experiment 4, observers viewed a cartoon as before. In this experiment, we added a step to the method in an effort to improve performance. Observers were asked to write their own narrative about the cartoon in an effort to encourage them to store details about the cartoon. Finally, in the test phase, observers were presented with two narratives, written by others. One corresponded to the cartoon that they had just seen and written about. The other came from a control cartoon having the same set size and the same cast of distinctive shapes. Observers were asked to decide which of the two narratives better matched the cartoon that they had just seen and described. We hypothesized that this series of events would give observers a chance to demonstrate a higher Heider capacity.

Method

Participants

Forty-eight observers were recruited through Amazon Mechanical Turk for this experiment. All observers were from the United States, gave informed consent and were paid $8.00 for approximately 30–45 minutes of their time. The informed consent procedures were approved by Brigham and Women’s Hospital IRB. These observers did take not part in Experiments 1 to 3. Note that we recruited 48 rather than 96 observers because this experiment has only a single condition.

Stimuli

The cartoons and set sizes from Experiment 2 were used.

Procedure

There were 12 trials including two practice trials. The practice trials consisted of cartoons from set size 3 and 9 and these cartoons were not repeated during the experiment. In the experimental trials, two cartoons from each of five set sizes were shown. In each trial, observers viewed a cartoon and were instructed to write a narrative once the cartoon was finished. The cartoon could not be paused or replayed but they were encouraged to take notes. After the cartoon ended, a textbox appeared, and observers were asked to write a story about the cartoon that was at least 25 words long. If the textbox contained less than 25 words, a pop-up display would alert the participant of the minimum word limit and prevent them from proceeding. The instructions for the story writing process were similar to Experiments 1 and 2. After observers wrote the story, they were shown two narratives written by other observers. These were drawn from Part 1 of Experiment 2. One narrative was based on the cartoon viewed in the trial while the other was based on a cartoon with the same characters but different actions. The observers selected the narrative that best matched the cartoon. Thus, this task involved a two-alternative forced-choice instead of a rating scale unlike the previous experiments. Data from one participant was discarded because of incomplete trials.

Results

The narratives generated by observers in this experiment were simply a device to encourage them to store the details of the cartoons explicitly. However, we did check the narratives for accuracy in order to confirm that the observers completed the task correctly.

The structure of this task does not produce separate true positive and false positive rates. Thus, we cannot calculate d′ or plot ROCs as in the other experiments. Here, as shown in Figure 8, we report the true positive rate; defined as the percentage of correct narrative-cartoon matches. To compare the results with the cartoon-first versions of Experiments 1 and 2, we take the d′ rates from Table 1 and derive true positive rates by assuming a neutral criterion and a symmetrical ROC. Thus, the derived true positive rate is the half of the standard cumulative normal distribution value for d′. Modest deviations from the assumptions (e.g. if the ROCs are somewhat asymmetrical) would not change the overall pattern of results. It can be seen in Figure 8 that the general pattern of results is similar in all three experiments. Performance drops off quite dramatically between set sizes 3 and 4 and more slowly thereafter.

Figure 8.

Figure 8.

True positive rates from Experiment 4 and 5, where observers were given a choice between two narratives. These are compared to true positive rates derived from Experiments 1 and 2.

Pairwise Fischer’s exact tests showed a significant difference between set size 3 and 4,5,7,9 (p <0.025, after alpha correction for multiple comparisons, odds-ratio > 2.3). There were no significant differences for pairwise comparisons between other set sizes (all p > 0.05, odds-ratio < 1.26).

We can also test if performance at a set size is above chance. A binomial test indicated that the number of hits for set size 3 was greater than the expected number of hits, p < 0.0001 (1-sided). Similarly, binomial tests for set size 4 and 5 showed that the proportion of hits was greater than expected, p < 0.05 (1-sided) but this was not true for set sizes 7 and 9, p > 0.05.

The true positive rate (TPR) for set size 3 was 0.82 (SD = 0.32) and other set sizes were 0.66 or less. The TPR was the highest for set size 3 and then falls off as number of items in the cartoons increase and the pattern is similar to Experiments 1 and 2. Note that the exact accuracy values are less important than the pattern of change across set sizes. In Experiments 1, 2 and 4, performance is quite good for a set size of 3 and declines dramatically when set size is greater than 3. It is this pattern that indicates that the Heider capacity might be about 3.

Experiment 5: Information content of narratives

In Experiments 1-4, the narratives were constructed after a single viewing of the cartoon (as would be the case if one were narrating a real-world event). As shown in Figure 3 (left), this produces narratives having a roughly constant number of shapes mentioned. Since this number averages less than three, it means that as the set size increases, there is a decline in the proportion of items in the cartoons that are mentioned in the narratives. Could the number of shapes mentioned in the narratives be a limiting factor in one’s performance on this task? With a larger set size, the author of the narrative might have attended to shapes A and B while the subsequent observer might attend to C and D. This might explain why observers were poor at matching narratives with cartoons when the set size was greater than 3.

Accordingly, in Experiment 5, we generated a new set of narratives where behavior of every shape in the cartoon was mentioned. We tested the hypothesis that the Heider capacity would be greater if we increased the information content of the narrative by mentioning the behavior of all items in the cartoon. We also continued to use the method from Experiment 4 where observers chose between two narratives – one correct and one incorrect - after each trial.

Method

Participants

Ninety-six observers were recruited through Amazon Mechanical Turk to measure the Heider capacity using a new set of narratives generated for this experiment. All observers were from the United States, gave informed consent and were paid $2.00 for approximately 10–15 minutes of their time. The informed consent procedures were approved by Brigham and Women’s Hospital IRB. These observers did take not part in Experiments 1 to 4.

Stimuli and procedure

Narratives

The cartoons and set sizes from Experiment 1 and 2 were used. The narratives were written by five lab assistants (including 3 authors). Each assistant viewed all four versions of cartoons from each set size. Therefore, each assistant generated 40 narratives (20 narratives for cartoons in Experiment 1 and 20 for Experiment 2). Since the behavior of every shape was required to be described in the narrative, observers could replay the cartoons as many times as they wished. The minimum word limit was increased to 50 words with the maximum of 180 words. We did not give any specific guidelines for the stories, except that it was not useful to give purely physically descriptive accounts. The first author checked the narratives for clarity, grammar, spelling and verified that every shape within a cartoon was mentioned in the narrative.

Measuring Heider capacity

The cartoons and set sizes from Experiment 1 and 2 were used. 96 observers viewed a cartoon and were shown two narratives after the cartoon ended. Their task was to select the narrative that best matched the cartoon viewed. Data from 8 observers were excluded due to incomplete trials. Observers were assigned to two conditions: one group (44 observers) viewed cartoons and narratives from Experiment 1 (simple shapes) and the other group (44 observers) viewed cartoons and narratives from Experiment 2 (unique shapes).

There were 12 trials for each participant including two practice trials. The practice trials consisted of cartoons from set sizes 3 and 9 and these cartoons (and the respective narratives) were not repeated during the experiment. In the experimental trials, two cartoons from each of five set sizes were shown. In each trial, observers viewed a cartoon that could not be paused or replayed and they were encouraged to take notes. After the cartoon ended, they were shown two narratives about the characters in that cartoon. One narrative matched the cartoon viewed in the trial. The other, incorrect narrative described a cartoon with the same ‘characters’ but moving with different motion rules. As in Experiment 4, the two-alternative forced-choice task was to select the correct narrative.

Results

The average word count for the new narratives used for cartoons in Experiment 1 is 84.9 (SD = 24.5) and for cartoons in Experiment 2 is 81.4 (SD = 24.4). We report the hit rate (true positive rate) from the two conditions (Exp 5: simple shapes and Exp 5: unique shapes) in Figure 8. The general pattern is similar to, if noisier than the pattern from the earlier experiments. Performance with the simple shapes at set size 3 is mysteriously poor.

A binomial test indicated that the number of hits for all set sizes in both conditions was greater than the expected number of hits, p < 0.01 (1-sided). For simple shapes, pairwise Fischer’s exact tests using the number of correct and incorrect responses showed no significant differences between any set sizes (p > 0.03 after alpha correction of p < 0.025 for multiple comparisons, odds-ratio < 2.1). The TPR for set size 3 was .72 (SD = 0.25), set size 4 was .76 (SD = 0.29), set size 5 was .68 (SD = 0.34), set size 7 was .60 (SD = 0.38) and set size 9 was .69 (SD = 0.34). For unique shapes, pairwise Fischer’s exact tests showed significant differences between set size 3 and 4,7,9 (p <0.006 after alpha correction of p < 0.025, odds-ratio > 3.65) but not for set size 3 and 5 (p = 0.026 after alpha correction of p < 0.025, odds-ratio = 2.66). There were no differences between any other set sizes (p >0.03, odds-ratio <1.9). The TPR for set size 3 was .90 (SD = 0.20), set size 4 was .72 (SD = 0.33), set size 5 was .78 (SD = 0.29), set size 7 was .63 (SD = 0.31) and set size 9 was .70 (SD = 0.33).

Discussion

The pattern of results is similar (perhaps noisier) to Experiment 4 even when the narratives contain descriptions of all shapes. The longer narratives provide more chances for a piece of a narrative to match with a remembered piece of a cartoon. Performance still declines as the set size increases. Note, though performance declines, it is well above chance for cartoons containing both simple and unique shapes at the higher set sizes. There is no doubt that observers know details about a cartoon with 9 items but, clearly, something is lost as set size increases. One could have imagined the result coming out differently. With 8 or 9 items in the cartoon and 9 items in the narratives, it could have been that the chance of finding a bit of definitive piece of information would increase over having only 3 or 4 (e.g. “I know that triangle wasn’t chasing anything. This must be the wrong story”). However, that is not the pattern in the data. A cartoon with few agents is more readily matched to its narrative than one with more agents.

General discussion

To briefly review; in five experiments, we demonstrated that the ability of humans to match narratives to abstract scenes declined as a function of the number of agents in the scene. Performance was best when the scenes contained three actors. In Experiments 1 and 2, observers were asked if a specific story matched a specific cartoon. In Experiments 4 and 5, they were asked which of two stories was the correct match to a specific cartoon. Experiment 5 used longer narratives naming every agent in the cartoon. This improved performance but did not eliminate the advantage for small set sizes with a particularly sharp drop in performance from set size 3 to set size 4, except for the simple shapes version of Experiment 5 with its mysteriously poor performance at set size 3. In Experiment 3, no narratives were involved. Observers were asked if a short segment of a cartoon came from the whole cartoon, shown either immediately before or after the segment. The results show something more like a continuous decline (see Figure 8).

Why are our observers so limited in their abilities? This doesn’t seem to fit with our experience of encoding and recalling the narratives of our lives nor does it seem to fit with our ability to understand the stories that we encounter in the world, either visually or linguistically. Storytellers since Homer were not limited to three characters. Movies do not need to limit their casts to three or four (Zacks, 2015). Our stimuli and the original Heider stimuli are akin to Ebbinhaus’ nonsense syllables in the study of memory (Ebbinghaus, 1885, 1913, reprinted 1964; see also Roediger, 1985). Ebbinghaus wanted stimuli that would measure memory in the absence of the effects of experience and structure. Heider stimuli are the narrative equivalent of Ebbinghaus nonsense syllables. Heider and Simmel noted that our minds make an effort to turn these minimal stimuli into stories. However, we lack most of the usual ‘chunking’ mechanisms that make a collection of semi-random events into a memorable story. The terminology is not consistent across different literatures but it may be useful to use Keven’s (2016) distinction between event memory and episodic memory. In his usage, episodic memories “bind event memories into a retrievable whole that is temporally and causally organized around subject’s goals”. Event memories “provide a short-term record of progress in current goal processing and learning for routine events ... They are only retained in a durable form if … they become linked to other events within a narrative. Otherwise they are rapidly forgotten”. Ebbinghaus’ stimuli allowed us to see the limits imposed by working memory processes on the encoding of lists. Our data appear to show similar limits imposed on the encoding of nearly meaningless events. It is, thus, perhaps not surprising that the limits appears to be on the order of other working memory limits Cowan (2001).

Given these results, in what sense is it meaningful to talk about a “Heider Capacity”? Further work would be needed to determine if we are looking at a different manifestation of a working memory limit or a capacity (or resource) limit specific to memory for narrative. Regardless, thinking about a Heider capacity may be useful in thinking about communication between individuals. If two people see the same scene, their agreement about what they have seen will decline as a function of the number of actors in that scene. In the case of and abstract scene or a scene that was otherwise hard to encode, the biggest decline would be between scenes with 3 and 4 actors. One can imagine that this limit, whatever its cause, could have significant impact on the effectiveness of surveillance or the reliability of eye witness testimony. Think about a witness in court, providing the jury with her descriptive narrative. It may be useful to remember that the value of that narrative may decline rapidly if there were more than three actors present. If there are two such witnesses, the data presented here suggests that one should not be surprised to find that their narratives disagree, especially if the scene involves more than about 3 actors and the narratives contain descriptions of a subset of those actors. In our contentious times, when there is, for instance, a confrontation on the streets, many people can be very sure that they know and remember what they saw. These data suggest that one should be a little less sure. We are certainly not arguing that eye witness reports are not useful, but we argue that it is important to understand the limits on the cognitive systems that allow us to create meaning from our visual perceptions.

Even if performance falls off when the set size grows beyond three, it seems extremely unlikely that observers restricted their attention to only three items in the cartoon. Attention undoubtedly switched between items/groups in the cartoon. However, with these nonsense narratives, many of the attended events fail to make it into explicit long-term memory and, thus, fail to make it out into the narrative. It is possible that some material is encoded into a more implicit form of long-term memory. If so, it might be available to be matched to a narrative if that narrative contained more items. In Appendix A, we looked at how the number of items mentioned in the narratives affect the ratings. It is not clear from the narratives generated by observers for Experiment 1 and 2, whether the ratings for these narratives were more accurate if more items were mentioned in the narratives.

Experiment 5 represents a more determined effort to look for such an effect. We generated a new set of narratives that always contained descriptions of all shapes in each set size, i.e. cartoons with 9 shapes contained descriptions of behavior of those 9 shapes. This did not eliminate the decline in narrative recognition with increasing set size. There does not seem to be a substantial hidden memory for the contents of the cartoon that could be brought out with a richer narrative for comparison. Still, the added information in the narratives may have made the decline more gradual; at least, in the version that used the simple shapes of Experiment 1. It may also have simply increased the noisiness of the data. When a narrative contains information about every item in the cartoon, there may be some greater chance that observers will find agreement between the narrative and some aspect of their experience of the cartoon. Perhaps, this benefit is diluted by the difficulty of holding a 5, 7, or 9 character narrative in working memory. In all cases, performance declines with set size and the bulk of the evidence suggests that the decline is the most dramatic between set sizes 3 and 4.

In addition to considering the number of agents in the narratives, we can ask if performance is tied to the number of behaviors described in the narratives. As can be seen on the right side of Figure 3, like the number of actors, the number of action words used in narratives does not differ across set sizes for the free narratives generated in Experiments 1 and 2. The number of action words is higher in Experiment 5 and that number is correlated with set size. This is unsurprising, given that the authors of those narrative were required to mention every object. In Appendix B, we analyzed whether certain action words in the narratives led to better performance. As can be seen from the graphs in Appendix B, no clear pattern seems to be consistent across set sizes and experiments but, generally speaking, narratives containing words such as ‘avoid’, ‘hide’ seem to have lower performance, indicating these behaviors might be harder to detect or recall. On the other hand, narratives with words such as ‘follow’, ‘chase’ or ‘attack’ are associated with higher performance. This could indicate that these behaviors can be easily recognized as proposed in previous studies (Gao et al., 2009).

Might our cartoon-generating algorithm have produced different distributions of agent behaviors for different set sizes and might this have influenced the results? For all set sizes, behaviors were sampled with replacement by the algorithm. The four versions of cartoons for a set size were then chosen so that the cartoons contained a mix of behaviors. As a check, we plotted the distribution of behaviors as a function of set size. The cartoons of different set sizes had a similar composition and had no statistically significant variation in distribution of behaviors between set sizes (one-way ANOVA for behaviors classified into individual, chasing, attraction, repulsion between set sizes, all p > 0.05 for Experiments 1 and 2).

Displays and narratives become more complex as set size increases. How did visual ‘complexity’ of the display affect performance? As one measure of this complexity, we have calculated the entropy1 of each cartoon. Entropy can be understood as a measure of the extent to which the shapes scatter throughout the course of the cartoon. Intuitively, low entropy indicates a cartoon having items in only a few predictable regions of the scene while high entropy indicates a scene in which items are more unpredictable and scattered. Formally, entropy is defined as:

 Entropy =ipilogpi (1)

where i indexes the number of regions in the cartoon. A region is a cell in a n x m grid which partitions the screen into equally sized cells. The choice of that grid is a free parameter. We show results for 25 (5 × 5) and 100 (10 ×10) regions. Our cartoon-generating algorithm produces 30 second cartoons. To calculate entropy, we sample the position of all shapes in the cartoon every 1 second, tabulating the numbers of shapes in each cell in each 1 second frame. At the end, we get a count across the cells of the scene which is normalized so the distribution sums to 1. This gives us the probability distribution (p) of regions for the chosen set size, which is used to calculate the entropy by summing across all cells. The raw entropy value calculated in this manner is hard to interpret so we divide the entropy by the maximum entropy possible in that scene, which is given by:

%maxentropy=ipilogpilog(1/numberof cells) (2)

To calculate maximum entropy, p is equal to 1/number of cells, i.e. every region has equal probability. When we sum over all cells and calculate the percentage max entropy, we get equation (2). Unfortunately, we did not store the movement data of shapes or their initial starting positions in the cartoons used in the experiments, but we used the behaviors of the shapes in Experiment 1 and 2 to compute average max entropy across set sizes. For each set size and their corresponding behaviors, we ran the algorithm generating the Heider cartoons for 100 times, each time with random initial starting positions for the shapes. Figure 9 (left) shows that the percentage of max entropy, shown by our cartoons, increases significantly across set sizes (F(4,792) = 916.4, p < .001, partial η2 = .82) for both 25 and 100 regions. The other panels of Figure 9 plot percentage max entropy against performance in Experiments 1, 2, 4, and 5. In general, it can be seen that performance and entropy are negatively correlated but notice that this is largely driven by the difference in behavior between set size 3 and other cartoons (except Experiment 5: simple shapes).

Figure 9.

Figure 9.

Left: The average percentage max entropy of 100 trials across each set size for the algorithm used to generate the Heider cartoons in Experiment 1 and 2. Right: True positive rate of Experiments 1-5 plotted against the mean percentage entropy for the cartoons used in our experiments (top row: 25 cells, bottom row: 100 cells). The error bars show standard error.

The current experiments have limits that point toward future research. One could systematically study the movement rules assigned to items. The choice of the movement rules in our cartoon were somewhat arbitrary. Is ‘jittering’ or moving to location in isolation recognized as a ‘social’ motion like chasing? Since we know that predatory behavior and chasing motion tend to capture our attention (Gao et al., 2009), it could be interesting to study Heider capacity in the absence of these ‘pop-out’ motions. Another iteration of this experiment could use more scripted or ‘less random’ cartoons more like the movements of children playing in a playground or people shopping in a grocery store. Our Heider capacity could be higher under these circumstances where the motions of items in the display are more predictable and observers might compress and store more details of the dynamic scene than reported this paper. Other variables like timing and object identity could also be varied. The wording in the narratives could be systematically biased in favor of some behaviors. Even simple visual factors like the size of the display might be important. Given that we do not know the effects of these variables, quantitative statements about capacity should be taken with caution. It seems likely that any version of this experiment will show that this capacity is limited but the precise nature of that limitation may change with different stimuli.

It could be valuable to study rule learning in this task. For instance, if observers knew that the red circle typically chased the green square, how would that change the narratives and, perhaps, the Heider capacity? Returning to the real-world implications of this work, how would the narratives and capacity change if observers were biased to think that that red circles typically chased green squares? Would our biases, implicit or otherwise, warp our narratives about red circles? This could be interesting in a world where some people are seen as threatening based on their group membership.

In summary, the principle conclusion to be drawn from the results presented here is that there are limitations on the processes that allow us to code, remember and communicate the actions and intentions of moving objects in dynamic scenes. Further research would be needed to determine if there is an independent “Heider capacity” or if the limits reflect limits on known processes like motion tracking or working memory. In either case, an understanding of our limits should color our expectations concerning the fidelity of reports about dynamic real-world scenes.

Context

This work brings together two classic, dynamic visual displays. In the 1940s, Heider and Simmel showed observers a few geometric items, moving around on a screen, to show how easy it is for us to infer intention on these very abstract shapes in the world. In the 1980s, Pylyshyn showed observers a few identical items, moving around on a screen, and demonstrated how hard it was to keep track of more than 3 or 4 of them. Now, another 40 years later, we combine these two classes of experiments to ask about capacity limits in Heider-style displays. How many items can you incorporate into your story about the scene in front of your eyes such that another observer is likely to agree with your story? The answer is ‘not many’; an answer that has implications for those situations where people are asked to monitor and/or recall activity in complex scenes.

Supplementary Material

fgure exp 1
figure B exp 2
figure B exp 5
figure B.exp 1
figure exp 2

Acknowledgements

This research was supported by the Army Research Office (ARO) R00000000000588 and NIH EY017001 to Jeremy M. Wolfe. We would like to thank Hayden Schill and Makaela S. Nartker for help with generating narratives for Experiment 5 and Dr. Melissa Kibbe for her suggestion for Experiment 5. The data from Experiment 1 and 2 was presented in a talk at Vision Sciences Society Annual Meeting (2016) and in a poster at Psychonomics (2016).

Appendix A

True positives rates against the number of shapes mentioned in narratives

Are true positive rates higher for cartoons that contained descriptions of more items? We analyzed the average true positive rates for narratives containing descriptions of items in the cartoon grouped by the number of items mentioned. In Experiments 1, 2 and 4, we did not control the number of items that was mentioned in the narratives. Therefore, we created a hash table of the number of shapes mentioned in a narrative and the corresponding rating given. Then the average true positive rates were calculated from these ratings. The error bars represent the standard error (missing error bars indicate that either only one rating was available for that narrative or the ratings were identical).

There is no clear pattern consistent across the graphs below: TPR is not necessarily greater if more items were mentioned in the narrative.

graphic file with name nihms-1002458-f0001.jpg

graphic file with name nihms-1002458-f0002.jpg

Appendix B

True positive rates for action words used in narratives

We calculated the true positive rates for action words appearing in narratives. The point of this exercise was to check whether certain words describing behaviors made the narratives difficult or easy to parse and in turn, affected true positive rates across experiments. In our experimental design, each cartoon had 5 narratives, one of which was randomly selected during a trial for each participant. As a result, we do not have an even number of ratings for all 5 narratives for each cartoon and d′ calculated from these sparse data are difficult to interpret. Therefore, we calculated true positive rates for Experiment 1 and 2 in the following manner: for each narrative, if the cartoon matched (or mismatched) the narrative and the rating was greater than 3 (or less than/equal to 3 respectively), then the answer was counted as a hit.

We took the action words from narratives and created a hash table of these words and the narratives in which they appear. We used a language stemmer so that words such ‘moving’, ‘moved’, ‘move’, ‘moves’ are hashed under the stem ‘mov’. Only stems which appeared more than 2 times in the experiment were considered in the analysis. For every stem, we extracted the ratings of the corresponding narratives in which they appear and calculated the average true positive rates from these ratings. In the graphs below, the number appearing next to the stem indicates the number of times a narrative containing that stem was rated. The TPR is arranged from the lowest to the highest on the x-axis.

There is no clear pattern that certain words produced higher or lower ratings consistently across experiments. Generally speaking, words such as ‘avoid’, ‘hid’ produced lower TPR than words such as ‘attack’, ‘follow’, ‘chase’, ‘bump’ across set sizes. This could be because these behaviors are easier/harder to detect as number of items on the displays increased.

graphic file with name nihms-1002458-f0003.jpg

graphic file with name nihms-1002458-f0004.jpg

graphic file with name nihms-1002458-f0005.jpg

Footnotes

1.

Although we employ entropy, there are other possible ways to measure the “complexity” of a dynamic scene such as Kolmogorov complexity or minimum description length. The actual choice of complexity is somewhat immaterial as they are highly related (see Grunwald & Vitanyi, 2008), all capable of incorporating model-specific phenomena that might be relevant to the compression of a scene, such as color, shape, and behavior. Our experimental design explicitly controls for color and shape by ensuring every object has a unique color-shape combination (e.g., there can be two triangles, but only if they are different in color and this somewhat ensures that a participant would not always be able to reduce a scene into, say, “5 red triangles”). The remaining complexity is mostly due to behavior that governs the movement of the objects which we quantify with a position-based notion of entropy.

References

  1. Alvarez GA, Franconeri SL (2007) How many objects can you track? Evidence for a resource-limited attentive tracking mechanism. Journal of Vision,7:1–10. [DOI] [PubMed] [Google Scholar]
  2. Baker CL, Jara-Ettinger J, Saxe R, & Tenenbaum JB (2017). Rational quantitative attribution of beliefs, desires and percepts in human mentalizing. Nature Human Behaviour, 1(4), 0064. [Google Scholar]
  3. Barrett HC, Todd PM, Miller GF, & Blythe PW (2005). Accurate judgments of intention from motion cues alone: A cross-cultural study. Evolution and Human Behavior, 26, 313–331. [Google Scholar]
  4. Blythe P, Todd PM, & Miller GF (1999). How motion reveals intention: Categorizing social interactions In Gigerenzer G, Todd PM, Miller GF, & the ABC Research Group (Eds.), Simple heuristics that make us smart (pp. 257–286). New York, NY: Oxford University Press. [Google Scholar]
  5. Brainard DH, & Vision S (1997). The psychophysics toolbox. Spatial vision, 10, 433–436. [PubMed] [Google Scholar]
  6. Bock K, Irwin D, and Davidson D. 2004. Putting first things first In the Interface between Language, Vision and Action: Eye Movements and the Visual World, ed. Henderson J and Ferreira F, 249–317. New York; Hove, UK: Psychology Press. [Google Scholar]
  7. Buren B, Uddenberg S, & Scholl BJ (2016). The automaticity of perceiving animacy: Goal-directed motion in simple shapes influences visuomotor behavior even when task-irrelevant. Psychonomic bulletin & review, 23(3), 797–802. [DOI] [PubMed] [Google Scholar]
  8. Cave KR, & Wolfe JM (1999) The Psychophysical Evidence for a Binding Problem in Human Vision. Neuron, 24(1): 11–17 [DOI] [PubMed] [Google Scholar]
  9. Chun M, Golomb J, and Turk-Browne N (2011). A taxonomy of external and internal attention. Annual Review of Psychology, 62, 73–101. [DOI] [PubMed] [Google Scholar]
  10. Cosman JD, & Vecera SP (2010). Attentional capture by motion onsets is modulated by perceptual load. Attention, Perception, & Psychophysics. [DOI] [PubMed] [Google Scholar]
  11. Cowan N (2001). The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behavior Brain Sciences, 24(1), 87–114; discussion 114–185. [DOI] [PubMed] [Google Scholar]
  12. Cowan N (2010). The magical mystery four: How is working memory capacity limited, and why?. Current directions in psychological science, 19(1), 51–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Cowan N (2017). The many faces of working memory and short-term storage. Psychonomic Bulletin & Review, 24(4), 1158–1170. doi: 10.3758/s13423-016-1191-6 [DOI] [PubMed] [Google Scholar]
  14. Csibra G (2008). Goal attribution to inanimate agents by 6.5-month-old infants. Cognition, 107, 705–717. [DOI] [PubMed] [Google Scholar]
  15. Dennett DC (1989). The intentional stance. MIT press. [Google Scholar]
  16. Ebbinghaus H (1885, 1913, 1964). Memory: A contribution to experimental psychology. (Ruger HA & Bussenius CE, Trans. Dover reprint of 1964 ed.). New York: Dover. [Google Scholar]
  17. Franconeri SL, & Simons DJ (2003). Moving and looming stimuli capture attention. Perception & Psychophysics, 65, 999–1010. [DOI] [PubMed] [Google Scholar]
  18. Frith CD & Frith U (1999) Interacting minds — a biological basis. Science, 286, 1692–1695 [DOI] [PubMed] [Google Scholar]
  19. Gallese V, & Goldman A (1998). Mirror neurons and the simulation theory of mind-reading. Trends in cognitive sciences, 2(12), 493–501. [DOI] [PubMed] [Google Scholar]
  20. Gao T, Newman GE, and Scholl BJ (2009). The psychophysics of chasing: A case study in the perception of animacy. Cognitive Psychology, 59, 154–179.
 [DOI] [PubMed] [Google Scholar]
  21. Gao T, McCarthy G, and Scholl BJ (2010). The wolfpack effect: Perception of animacy irresistibly influences interactive behavior. Psychological Science, 21, 1845–1853. [DOI] [PubMed] [Google Scholar]
  22. Grunwald P and Vitanyi P (2008). Shannon Information and Kolmogrov Complexity. arXiv:cs/0410002v1 [cs.IT] [Google Scholar]
  23. Heider F, & Simmel M (1944). An experimental study of apparent behavior. The American Journal of Psychology, 57(2), 243–259. [Google Scholar]
  24. Horowitz TS, Klieger SB, Fencsik DE, Yang KK, Alvarez GA, & Wolfe JM (2007). Tracking unique objects. Attention, Perception, & Psychophysics, 69(2), 172–184. [DOI] [PubMed] [Google Scholar]
  25. Keven N (2016). Events, narratives and memory. Synthese, 193(8), 2497–2517. doi: 10.1007/s11229-015-0862-6 [DOI] [Google Scholar]
  26. Kirchner H, & Thorpe SJ (2006). Ultra-rapid object detection with saccadic eye movements: Visual processing speed revisited. Vision Research, 46, 1762–1776 [DOI] [PubMed] [Google Scholar]
  27. Kool W, Conway ARA and Turk-Browne N,B (2014). Sequential dynamics in visual short-term memory. Attention, Perception, & Psychophysics, 76(7):1885–1901, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kleiner M, Brainard D, Pelli D, Ingling A, Murray R, & Broussard C (2007). What’s new in Psychtoolbox-3. Perception, 36(14), 1. [Google Scholar]
  29. Kuhlmeier V, Wynn K, & Bloom P (2003). Attribution of dispositional states by 12-month-olds. Psychological Science, 14(5), 402–408 [DOI] [PubMed] [Google Scholar]
  30. Lavie N (1995). Perceptual load as a necessary condition for selective attention. Journal of Experimental Psychology: Human Perception & Performance, 21, 451–468. [DOI] [PubMed] [Google Scholar]
  31. Leslie AM, Friedman O, & German TP (2004). Core mechanisms in ‘theory of mind’. Trends in cognitive sciences, 8(12), 528–533. [DOI] [PubMed] [Google Scholar]
  32. Luck SJ, Vogel EK (1997). The capacity of visual working memory for features and conjunctions. Nature, 390(20v):279–281 [DOI] [PubMed] [Google Scholar]
  33. Macmillan NA, & Creelman CD (1996). Triangles in ROC space: History and theory of “nonparametric” measures of sensitivity and response bias. Psychonomic Bulletin & Review, 3, 164–170. [DOI] [PubMed] [Google Scholar]
  34. Makovski T, & Jiang Y (2009). Feature binding in attentive tracking of distinct objects. Visual Cognition, 17, 180–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Makovski T, Vazquez GA, & Jiang YV (2008). Visual learning in multiple-object tracking. PLoS ONE, 3, e2228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Michotte A (2017). The perception of causality (Vol. 21). Routledge. [Google Scholar]
  37. Miller GA (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological review, 63(2), 81. [PubMed] [Google Scholar]
  38. Nairne JS (1992). The loss of positional certainty in long-term memory. Psychological Science, 3(3):199–202. [Google Scholar]
  39. New JJ, Cosmides L, and Tooby J (2007). Category-specific attention for animals reflects ancestral priorities not expertise. Proceedings of the National Academy of Sciences, 104, 16598–16603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Oksama L, Hyönä J. (2008) Dynamic binding of identity and location information: A serial model of multiple identity tracking. Cognitive Psychology, 56, 237–283. [DOI] [PubMed] [Google Scholar]
  41. Pantelis PC, & Feldman J (2012). Exploring the mental space of autonomous intentional agents. Attention, Perception, & Psychophysics, 74(1), 239–249. [DOI] [PubMed] [Google Scholar]
  42. Pantelis PC, Baker CL, Cholewiak SA, Sanik K, Weinstein A, Wu CC, ... & Feldman J (2014). Inferring the intentional states of autonomous virtual agents. Cognition, 130(3), 360–379. [DOI] [PubMed] [Google Scholar]
  43. Pantelis PC, Gerstner T, Sanik K, Weinstein A, Cholewiak SA, Kharkwal G, ... & Feldman J (2016). Agency and rationality: Adopting the intentional stance toward evolved virtual agents. Decision, 3(1), 40. [Google Scholar]
  44. Pearson J, & Kosslyn SM (2015). The heterogeneity of mental representation: ending the imagery debate. Proceedings of the National Academy of Sciences, 112(33), 10089–10092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Pelli DG (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial vision, 10(4), 437–442. [PubMed] [Google Scholar]
  46. Pratt J, Radulescu P, Guo RM, and Abrams RA (2010). It’s alive! Animate motion captures visual attention. Psychological Science, 21, 1724–1730. [DOI] [PubMed] [Google Scholar]
  47. Premack D and Woodruff G (1978). Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, 1, 515–526. [Google Scholar]
  48. Roediger HL (1985). Rembering Ebbinghaus. Contemporary Psychology, 30(7), 519–523. [Google Scholar]
  49. Scholl BJ, & Gao T (2013). Perceiving animacy and intentionality: Visual processing or higher-level judgment? In Rutherford MD & Kuhlmeier VA (Eds.), Social perception: Detection and interpretation of animacy, agency, and intention (pp. 197–230). Cambridge, MA: MIT Press. [Google Scholar]
  50. Scholl BJ, Pylyshyn ZW, Franconeri S (1999). When are featural and spatiotemporal properties encoded as a result of attentional allocation? Investigative Ophthalmology & Visual Science, 40(4), S797. [Google Scholar]
  51. Scholl BJ and Tremoulet PD (2000). Perceptual causality and animacy. Trends in Cognitive Science, 4:299–309, 2000. [DOI] [PubMed] [Google Scholar]
  52. Scimeca JM, & Franconeri SL (2015). Selecting and tracking multiple objects. Wiley Interdisciplinary Reviews: Cognitive Science, 6(2), 109–118. [DOI] [PubMed] [Google Scholar]
  53. Simion F, Regolin L, & Bulf H (2008). A predisposition for biological motion in the newborn baby. Proceedings of the National Academy of Sciences, 105(2), 809–813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Theeuwes J (2018). Visual Selection: Usually fast and automatic; seldom slow and volitional. J. of Cognition, in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Treisman AM, & Schmidt H (1982). Illusory conjunctions in the perception of objects. Cognitive Psychology, 14, 107–141. [DOI] [PubMed] [Google Scholar]
  56. Treisman A (1996). The binding problem. Current opinion in neurobiology, 6(2), 171–178. [DOI] [PubMed] [Google Scholar]
  57. Tremoulet PD, & Feldman J (2000). Perception of animacy from the motion of a single object. Perception, 29, 943–951. [DOI] [PubMed] [Google Scholar]
  58. Tremoulet PD, & Feldman J (2006). The influence of spatial context and the role of intentionality in the interpretation of animacy from motion. Perception & Psychophysics, 68, 1047–1058. [DOI] [PubMed] [Google Scholar]
  59. Tse P, Cavanagh P, & Nakayama K (1998). The role of parsing in high-level motion processing. High-level motion processing: Computational, neurobiological, and psychophysical perspectives, 249–266. [Google Scholar]
  60. Whitney D, & Levi DM (2011). Visual crowding: A fundamental limit on conscious perception and object recognition. Trends in cognitive sciences, 15(4), 160–168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Wolfe JM, & Horowitz TS (2017). Five factors that guide attention in visual search. [Review Article]. Nature Human Behaviour, 1, 0058. doi: 10.1038/s41562-017-0058 [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Wolfe JM, Place SS, & Horowitz TS (2007). Multiple Object Juggling: Changing what is tracked during extended multiple object tracking. Psych Bulletin & Review, 14(2), 344–349. [DOI] [PubMed] [Google Scholar]
  63. Wolfe JM, Reinecke A, & Brawn P (2006). Why don’t we see changes? The role of attentional bottlenecks and limited visual memory. Visual Cognition, 14(4–8), 749–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Wundt W [1900] 1970. The psychology of the sentence In Language and Psychology: Historical Aspects of Psycholinguistics , ed. Blumenthal AL, 20–31. New York: Wiley. [Google Scholar]
  65. Zacks JM (2015). Précis of Flicker: Your Brain on Movies. Projections, 9(1), 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Zacks JM (2004). Using movement and intentions to understand simple events. Cognitive Science, 28(6), 979–1008. [Google Scholar]
  67. Zacks JM, Braver TS, Sheridan MA, Donaldson DI, Snyder AZ, Ollinger JM, & Raichle ME (2001). Human brain activity time-locked to perceptual event boundaries. Nature neuroscience, 4(6), 651–655. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

fgure exp 1
figure B exp 2
figure B exp 5
figure B.exp 1
figure exp 2

RESOURCES