Action starring narratives and events: Structure and inference in visual narrative comprehension

Neil Cohn; Eva Wittenberg

doi:10.1080/20445911.2015.1051535

. Author manuscript; available in PMC: 2016 Jul 14.

Published in final edited form as: J Cogn Psychol (Hove). 2015 Jul 14;27(7):812–828. doi: 10.1080/20445911.2015.1051535

Action starring narratives and events: Structure and inference in visual narrative comprehension

Neil Cohn ^1,², Eva Wittenberg ³

PMCID: PMC4689435 NIHMSID: NIHMS698880 PMID: 26709362

Abstract

Studies of discourse have long placed focus on the inference generated by information that is not overtly expressed, and theories of visual narrative comprehension similarly focused on the inference generated between juxtaposed panels. Within the visual language of comics, star-shaped “flashes” commonly signify impacts, but can be enlarged to the size of a whole panel that can omit all other representational information. These “action star” panels depict a narrative culmination (a “Peak”), but have content which readers must infer, thereby posing a challenge to theories of inference generation in visual narratives that focus only on the semantic changes between juxtaposed images. This paper shows that action stars demand more inference than depicted events, and that they are more coherent in narrative sequences than scrambled sequences (Experiment 1). In addition, action stars play a felicitous narrative role in the sequence (Experiment 2). Together, these results suggest that visual narratives use conventionalized depictions that demand the generation of inferences while retaining narrative coherence of a visual sequence.

Keywords: discourse, narrative, inference, comics, visual language

Introduction

How do we make sense of information in a narrative that is not overtly provided? The generation of inferences—the information that a reader understands despite being unstated in a discourse—has long been a primary focus in the study of discourse (Graesser, Millis, & Zwaan, 1997; Keenan, Potts, Golding, & Jennings, 1990; van den Broek, 1994; Zwaan & Rapp, 2006). Because inferences allow a reader to make sense of unexpressed material, they contribute towards building a “situation model” of the discourse in memory (van Dijk & Kintsch, 1983). This emphasis on inference generation has also been a hallmark of film theory (Bordwell & Thompson, 1997; Eisenstein, 1942; Kuleshov, 1974), and studies of film comprehension support that viewers are consciously able to identify changes in time, characters, and spatial locations (Magliano, Miller, & Zwaan, 2001; Magliano & Zacks, 2011; Zacks, Speer, & Reynolds, 2009).

Theories of visual narrative comprehension have also emphasized inference (Bordwell, 1985, 2007; Branigan, 1992; Chatman, 1978; Eisenstein, 1942; Magliano, Dijkstra, & Zwaan, 1996; McCloud, 1993; Saraceni, 2001; Yus, 2008), especially the bridging inferences where readers “fill in” the information left unstated between “panels”—the encapsulated image units of a static visual narrative sequence. Similar to the linear coherence relationships between sentences (Halliday & Hasan, 1976; Hobbs, 1985; Kehler, 2002; Mann & Thompson, 1987; Zwaan & Radvansky, 1998), theories of visual narratives emphasize the linear semantic changes between panels across dimensions of time, causation, characters, environments, and scenes (McCloud, 1993; Saraceni, 2000, 2001; Stainbrook, 2003). More inference is hypothesized to be demanded by greater discontinuities between panels, such as when incoming panels do not repeat information in prior panels or do not share elements related to a broader semantic field (Saraceni, 2000, 2001).

To create these bridging inferences, two panels must provide bottom-up content for a reader to infer the link between them. However, some panels in the visual language of comics have such impoverished semantic content that inference is necessary to understand what they mean, let alone how they unite with other information. Consider Figure 1a, where a dog curiously chases a ball until unexpectedly getting scared off by people playing soccer: We never actually see the players interacting with the dog, though we know this event occurs in panel 3. This panel only depicts an “action star,” a “visual morpheme” used to represent impacts (Cohn, 2013a; Potsch & Williams, 2012; Walker, 1980). In panel 3, which only shows the action star, the events are inferred via the preceding and final panels. This inference does not occur between panels 2 and 3 or between panels 3 and 4. Rather, inference is necessary to comprehend the events omitted within the action star itself, not just to understand the relations between panels.

Simple visual narrative sequences. (a) shows a sequence where an “action star” omits crucial event information in the culminating Peak of the sequence, while (b) shows a sequence with a canonical mapping of event structure to narrative structure.

The structure of visual narratives

Action stars are notable not only because they require inferences, but because they seem to play a narrative function in visual sequences. We can understand this role by drawing on the theory of Visual Narrative Grammar (VNG), which posits that individual panels play categorical roles in a narrative sequence, which then become structured into hierarchic constituents (Cohn, 2013b) analogous to the way that words play categorical roles in the hierarchic structure of sentences. This comparison between syntax and visual narrative is one of function—the units of sentences (words) and visual narratives (images) convey information in different ways and levels of meaning. Functionally though, a narrative grammar packages meaning into a sequence using similar architectural constraints (categories, hierarchy, etc.) as how syntax packages meaning in sentences, only operating at a discourse level of information. While VNG has some similarities with previous “grammatical” approaches to narrative (e.g., Mandler & Johnson, 1977; Rumelhart, 1975; Stein & Glenn, 1979; Thorndyke, 1977), VNG uses simpler structures (Cohn, 2013b), makes an explicit separation of structure and meaning (Cohn, Paczynski, Jackendoff, Holcomb, & Kuperberg, 2012), and incorporates modifiers beyond a canonical narrative arc (Cohn, 2013a, 2013b). Also, VNG is not incompatible with most models of discourse, which tend to focus on semantic aspects of comprehension like coherence relationships and inference generation (for review, see McNamara & Magliano, 2009), while VNG outlines the “grammatical” relationships that interface with those semantic processes. For example, although VNG extends beyond linear coherence relations (e.g., McCloud, 1993; Zwaan & Radvansky, 1998), such semantic changes should interface with the narrative grammar in predictable ways, like linear coherence relationships correlating with breaks between constituents (Cohn, 2013b).

Narrative categories in VNG are assigned through an interaction of the bottom-up semantic content of panels and their top-down context in the broader narrative (Cohn, 2013b, 2014). Consider Figure 1b, which progresses in a canonical narrative arc. An “Establisher” opens the sequence with a woman sitting angrily next to a man, which functions to set up the characters and situations of a sequence without acting upon them. Next, an Initial begins the events of the sequence, prototypically with a preparatory action, like the woman reaching back to smack the man. A sequence climaxes at a Peak, where completed events or actions typically occur (like smacking the man). The aftermath occurs in the Release, as in the final panel where the man humorously is not affected by the woman's actions. Figure 1b shows a prototypical interface between structure and semantics, where the narrative categories directly correspond to the event structure. In addition, though it will not be dealt with here, these narrative categories apply both to individual panels and to whole constituents, recursively extending to visual narratives of greater lengths (see Cohn, 2013b; Cohn, Jackendoff, Holcomb, & Kuperberg, 2014). Such constituents also allow for surface patterns to violate a canonical arc, though the individual constituents that make up that sequence may not (for example, a sequence with the surface structure Initial-Peak-Initial-Peak could be felicitous if segmented [[_II-P]-[_PI-P]]).

Now, reconsider Figure 1a. The Establisher here starts with a lot of action: the soccer ball flies into the frame and the dog is excited by it. Despite not depicting a passive state, this panel still introduces the reader to the characters involved in the situation (reinforced by the characters also meeting each other). The Initial shows the dog chasing the ball, but no preparatory action (the dog is already engaged in this action). The action star does have a culminating Peak, but without depicting any completed actions (a point returned to below), nor is a completed action inferred (the dog's action is interrupted). Finally, the Release shows the aftermath of the (unseen and now inferred) prior event. Thus, prototypical and non-prototypical mappings can occur between narrative structure and meaning, and their assignment involves both bottom-up semantic features and top-down context in a sequence.

Let us now return to action stars, which appear within American comic books and strips to stand in for events, both related and unrelated to impacts. Action stars vary in appearance both between and within authors, sometimes appearing just as a star or sometimes with text that disambiguates the actions (like “Pow!” or “Zap!”). Thus, as a conventionalized aspect of a broader “lexicon” of visual narratives, action stars have allomorphic representations (Cohn, 2013a); for additional examples of action stars, see supplementary material available at http://www.visuallanguagelab.com/A/AS_Supplement.pdf.

Though action stars show minimal event information, thereby demanding inference for their meaning, their morpho-semantics implies a “culminating event”, which provides enough information for them to act as Peaks of the sequence. They thereby provide a way to elide information about events but retain narrative felicity (Cohn, 2013b). This would be functionally analogous to a “pro-form” in the syntactic structure of sentences, which plays a grammatical role as a noun (he, she, it) or preposition (there, here, then), yet provides fairly minimal semantic information. Similarly, action stars act as narrative Peaks, but physically convey only an unspecified event, with no properties about what that event is or who is involved. Granted, visual narratives contain far more information per unit than individual words, but again this analogy between action stars and pro-forms is made purely at the functional level related to structure, not the level of conceptualized information (just like the broader analogy between syntax and narrative structure in VNG).

We therefore have two hypotheses about action stars: First, they should require inference to be understood because of their impoverished semantic structure (e.g., McKoon & Ratcliff, 1992), not only for inferences between panels, but to understand action stars themselves. Second, they should play a role in the narrative grammar as Peaks. This is what makes them interesting and unique: they are an impoverished conventionalized depiction that demands inference, yet maintains a felicitous role in a narrative structure.

Processing of visual narratives

Despite many theories stressing the importance of inference in visual narrative comprehension, thus far no studies have explicitly examined the image-by-image processing of bridging inferences in visual narratives. Yet, some work has shown that accuracy for inferring omitted panels from visual narratives correlates with age and experience reading comics (Nakazawa, 2005). In addition, narrative categories seem to differ in their inferential demands. In a previous study, participants were more accurate at recognizing the ellipsis of Peaks from sequences than other elided categories, and strips with missing Peaks were rated lower than those omitting other categories (Cohn, 2014). Given that Peaks contain the apex of many causal relations, these findings are consistent with research emphasizing that the locus of causal relations in a discourse may be more important than units with more peripheral information (Trabasso, Secco, & van den Broek, 1984; Trabasso & Sperry, 1985).

Additional research using sequential images has begun to construct a view of image-by-image processing of visual narrative sequences. First, comprehenders use both bottom-up content and top-down context to make expectations about subsequent information in a sequence. For example, comprehenders may assume that bottom-up semantic referential information like characters, locations, and/or semantic associative fields will repeat across images (Cohn et al., 2012; Magliano & Zacks, 2011; Saraceni, 2001). Semantic information also involves expectations about events (Reid & Striano, 2008; Sitnikova, Holcomb, & Kuperberg, 2008), like that a completed action will be presumed to follow a preparatory action (Cohn & Paczynski, 2013). Interfacing with this semantic information, comprehenders may also anticipate top-down narrative structural information, such as that a Peak will follow an Initial (Cohn et al., 2014). Disconfirmation of these structural and semantic predictions incurs processing costs both at an unexpected or anomalous image itself (Cohn, 2014; Cohn et al., 2014; Cohn et al., 2012; West & Holcomb, 2002), and potentially at subsequent panels where further context leads to (re)assessment of the prior information (Cohn et al., 2014; Cohn & Paczynski, 2013). However, the coherent combination of both structure and meaning allows for a facilitation of semantic comprehension with each subsequent image in a sequence (Cohn et al., 2012).

Given this framework, an action star would thus satisfy a structural prediction that a Peak would follow a preceding Initial. However, its content would remain ambiguous, and the bottom-up content would only inform that an “event” takes place. While no continuity would be maintained for low-level referential information (e.g., Magliano & Zacks, 2011; Saraceni, 2001), event information may influence inferences about its content given the semantic constraints of the prior panel (McKoon & Ratcliff, 1986, 1992), allowing for at least some sense of causal cohesion (Magliano, Baggett, Johnson, & Graesser, 1993; Singer, Halldorson, Lear, & Andrusiak, 1992; Trabasso et al., 1984). For example, a preparatory action (e.g., the runner going towards the catcher in Figure 2) may allow for the predicted inference of the subsequent action star containing a completion (i.e., the collision) since preparations generate expectations about subsequent actions (Cohn & Paczynski, 2013). No matter the preceding content, full inference may only be totally accessible once the image after the action star is reached, where reanalysis and/or confirmation can be made given the subsequent context of the sequence. Bridging inferences related to action stars must therefore involve information in both the prior and subsequent panels (e.g., Kintsch, 1988, 1998). Thus, while structural felicity may be assessed at the action star panel itself, evidence of inference should appear at the subsequent panel, where the contents of an action star must be integrated/analyzed in light of additional context.

Experimental stimuli contrasting Coherent visual sequences with a felicitous narrative and Scrambled sequences, where the order of panels were rearranged. Within these sequences, critical panels either maintained the original depicted scene of a Peak panel—the culmination of the narrative—or substituted it with an “action star.” Critical panel position was placed anywhere from the third to the sixth position in the sequence.

We investigated these semantic and structural traits of action star comprehension using two experiments in a “self-paced viewing” paradigm (Cohn, 2012, 2014; Cohn & Paczynski, 2013). Experiment 1 compared action stars and normal Peaks in the context of coherent and scrambled narrative sequences, while Experiment 2 compared the comprehension of action stars to other types of panels in coherent sequence frames.

Experiment 1: Scrambling

If action stars play a narrative role that requires the generation of inferences through their relationships with surrounding panels, then such effects should disappear in sequences lacking a coherent narrative grammar, such as when discourse units are rearranged to not make sense. Participants are better able to recall verbal narratives that follow a canonical structure than those where temporal order is changed (Mandler & Johnson, 1977), where sentences are inverted (Mandler, 1978, 1984; Mandler & DeForest, 1979), or where sentences are fully scrambled (Mandler, 1984). Cross-modal comparison of narratives have supported that scrambling the order of discourse units inhibits comprehension across domains, be it in the verbal, written, or visual-graphic modality (Gernsbacher, Varner, & Faust, 1990; Robertson, 2000). Consistent with this, target panels from random sequences of images elicit slower response times than panels from sequences with narrative grammar and/or semantic associations between panels (Cohn et al., 2012). Furthermore, panels in scrambled sequences evoked larger amplitude N400 effects than those in normal sequences with coherent narrative structure (Cohn et al., 2012)—the N400 effect being a neural response elicited by both words (Kutas & Hillyard, 1980) and images (Barrett & Rugg, 1990; Barrett, Rugg, & Perrett, 1988) and modulated by the degree to which the semantic features of an input matches or mismatches with its prior context (Kutas & Federmeier, 2011; Kutas & Hillyard, 1980).

Action stars should demand little inference within scrambled sequences that do not allow for a coherent narrative structure, because relations between panels would lack the narrative and causal information necessary to comprehend sequential events. In Experiment 1, we used a “self-paced viewing” paradigm to explore the hypothesis that action stars would evoke inferences at the subsequent panel by comparing normal Peaks with action stars within both coherent and scrambled visual narratives. Here, participants controlled the pace of viewing each panel in a sequence while we measured how long each panel stayed on the screen. Self-paced viewing paradigms have long been used in the study of inference in verbal discourse comprehension (Haviland & Clark, 1974; Keenan et al., 1990; McKoon & Ratcliff, 1986), and have proven to be a successful technique for measuring comprehension in visual narratives (Cohn, 2012, 2014; Cohn & Paczynski, 2013).

In discourse studies, longer viewing times have typically appeared to sentences that require inference to understand prior information relative to non-inference generating controls (Haviland & Clark, 1974; Keenan et al., 1990; Sanford & Garrod, 1981; van den Broek, 1994). Analogously, if action stars force a reader to infer the unseen content, we reasoned that slower viewing times should appear to the panel following action stars than to corresponding panels following normal Peak panels. However, little or no difference should appear between panels following normal Peaks and action stars in scrambled sequences, because the context would create little demand for inference generation. Additionally, we would expect that action stars playing a structural narrative role would be viewed shorter in coherent narratives than scrambled sequences, just as we would expect normal Peaks to be viewed shorter in coherent narratives than in scrambled sequences.

Methods

Stimuli

We used 60 coherent 6-panel long visual narrative sequences from an existing corpus with coherency confirmed in a prior rating study (Cohn et al., 2012). Sequences maintained panels of a similar size and had no text to eliminate any influence of written language on comprehension. These sequences were then manipulated in two ways.

First, “scrambled” sequences rearranged panels into an incomprehensible order that would contrast with the “coherent” normal sequences. Panels were rearranged such that their order would not create alternate coherent sequences (e.g., moving an Establisher such that it would act as a Release (Cohn, 2014)), particularly by reversing the order of Initials and Peaks, both locally and across constituents, among other rearrangements. Critical Peak panels remained in the same position for both Coherent and Scrambled versions of a given sequence, distributed throughout ordinal sequence positions 3 through 6. Peak panels were able to fall between positions 3 and 6 because these Coherent 6-panel long sequences often consisted of multiple constituents, where Peak panels could vary from the penultimate position.

Second, within these sequence types (Coherent, Scrambled) critical panels used either the original Peak panel of the sequence, or were replaced by an action star. This yielded a 2 (Sequence Type: Coherent/Scrambled) × 2 (Peak Type: Depicted Scene/Action Star) design, as in Figure 2. These sequences were divided into four counterbalanced lists such that lists included each strip only once, and no sequence appeared in the same list twice. 30 fillers of coherent sequences were added to balance the number of coherent scenes (45 total: 15 experimental, 30 fillers) with scrambled and action star sequences (45 total: 15 Coherent Action Stars, 15 Scrambled Depicted Scenes, 15 Scrambled Action Stars) viewed by each participant. Fillers also added variability in the length of sequences (fillers ranged from 6 to 12 panels long), such that not all sequences ended after 6 panels. Each list presented sequences in a randomized order.

Procedure

Participants viewed each strip frame-by-frame on a computer screen with a pace under their own control. Viewing times were measured to each button press for how long each frame stayed on the screen. Trials began with a screen reading READY, followed by a fixation cross (+). Each panel then appeared one at a time centered on an otherwise black screen. A 300ms ISI prevented panels from overlapping to appear like a flipbook style animation. A question mark appeared after each sequence, where participants rated how easy the strip was to understand (1=difficult, 7=easy). A practice list with ten stimuli oriented participants to the procedure.

Participants

Twenty-eight comic readers from the Tufts University population (19 male, 9 female, mean age: 21.04) were compensated for their participation in the study. All participants gave their informed written consent according to Tufts University Human Subjects Review Board guidelines.

Previous studies have shown that comprehension of sequential images differs based on comic reading ability (Cohn et al., 2012), including inferences drawn from sequences with omitted information (Nakazawa, 2005), so fluent comic readers were recruited to ensure fluency in this “visual language.” This expertise was assessed using the “Visual Language Fluency Index” (VLFI) questionnaire (Cohn et al., 2012) asking how often participants read various types of comics on a scale of 1 (never) to 7 (always) including comic books, comic strips, graphic novels, and Japanese comics. These ratings assessed both current reading habits as well as when they were growing up. A “VLFI score” was then computed using the following formula:

(Mean Comic Reading Freq . \times Comic reading expertise) + (\frac{Comic Drawing Freq . \times Drawing Ability}{2})

This formula weights fluency towards comic reading comprehension, while giving an additional “bonus” for fluency in comic production. Previous research has shown that the score derived from this metric provides a strong predictor of both behavioral and neurophysiological effects in online comprehension of visual narratives (Cohn & Maher, 2015; Cohn et al., 2012). Within this metric, an idealized average would be a score of 12, with low being below 7 and high above 20. Participants had an “average” fluency, with a mean of 15.13 (SD = 8.48; range = 1.5 - 38.12).

Data Analysis

Outlier viewing times for each participant were discarded if they fell below a threshold of 300ms, or above 8000ms. This lower limit was set below half the fastest mean panel viewing times seen in our previous studies of visual narrative (Cohn, 2012, 2014; Cohn & Paczynski, 2013), while the upper limit was roughly four times the longest viewing times. This amounted to few rejected trials, with 99% (SD = 1.4%) of trials retained across all participants.

We analyzed all data using mixed-effects regression models (Baayen, Davidson, & Bates, 2008) with maximal random effects structure, including Peak Type (Scene or Star) and Sequence Type (Coherent or Scrambled) and their interaction as fixed effects, and random slopes for both participants and items (Barr, Levy, Scheepers, & Tily, 2013), using the lme4 package (Bates, Maechler, Bolker, & Walker, 2014). Viewing times were log-transformed and analyzed at the critical panel (CP) and the immediately subsequent panel (CP+1), as well as non-critical panels. Finally, correlations were used to compare the VLFI fluency scores with viewing times.

Results

Ratings

Figure 3 shows how participants rated the strips, according to condition. Coherent Scenes were rated highest (5.99, SD:1.57), followed by Coherent Stars (5.14, SD: 1.95), Scrambled Scenes (3.98, SD: 1.88), and finally, Scrambled Stars (3.78, SD: 2.06). Peak type had a significant influence on ratings (β=-.83, t=-5.73, p<.0001), indicating that sequences with Depicted Scenes received a significantly higher rating than those with action stars. Sequence Type also significantly influenced ratings (β=-2.01, t=-8.67, p<.0001), with Coherent sequences receiving better ratings than Scrambled sequences. There was also a significant interaction between Peak Type and Sequence Type (β=.63, t=3.09, p<.001).

Viewing times

Viewing times to non-critical panels across ordinal sequence position showed that panels in Scrambled sequences were consistently viewed slower than those in Coherent sequences. While the particular panel position had no influence on viewing times overall (β=.01, t=-0.5, p>.61), Sequence Type did (β=0.21, t=4.33, p<.0001), and the interaction was also significant (β=-.04, t=-2.87, p<.005). In addition, as depicted in Figure 4, final panels of Coherent sequences were slower than those in Scrambled sequences, t(27)=4.4, p<.001, d=.28, while the first panel of each sequence was viewed longer than other ordinal positions in both sequence types (all ts > 4.4, all ps < .001, all ds >.65), while the final panel of only Coherent sequences was viewed longer than the preceding panels in positions 2 through 5 (all ts > 3.3, all ps < .005, all ds > .41).

Viewing times for critical panels are depicted in Figure 5, and listed along with standard deviation and standard error in Table 1. At the critical panel (CP), Peak Type had a significant influence on viewing times (β=.3, t=-4.2, p<.0001), with Depicted Scenes being viewed slower than Action Stars (Figure 5). Sequence Type also significantly influenced how long participants viewed a panel (β=.18, t=2.28, p<.02). There was also a significant interaction between Peak Type and Sequence Type (β=-.09, t=-2.16, p<.02). At the panel following the critical panel (CP+1), again Peak Type had a significant influence on viewing times (β=.36, t=4.38, p<.0001), with Depicted Scenes being viewed faster than Action Stars (Figure 5). Sequence Type also significantly influenced how long participants viewed a panel (β=.18, t=2.24, p<.03). Again, a significant interaction appeared between Peak Type and Sequence Type (β=-.16, t=-3.13, p<.001).

Viewing times at the critical Peak panel and the panel following it (CP+1) for Scrambled and Coherent sequences with either Depicted Scenes or Action Stars. Error bars depict standard error.

Table 1.

Mean viewing times, standard deviations, and standard error for experimental results in Experiments 1 and 2.

Experiment 1
		Viewing Times	SDs	SEs

Critical Panel	Coherent Scene	1162.4	795.7	39.0
	Coherent Star	731.8	366.1	18.0
	Scrambled Scene	1272.6	854.8	41.8
	Scrambled Star	738.6	395.0	19.4

Critical Panel + 1	Coherent Scene	1132.0	768.0	37.7
	Coherent Star	1435.2	1099.7	54.1
	Scrambled Scene	1142.6	714.2	34.9
	Scrambled Star	1256.2	968.6	47.5

Experiment 2
		Viewing Times	SDs	SEs

Critical Panel	Coherent Peak	1031.8	574.8	35.2
	Action Star	643.8	238.6	14.6
	Blank Panel	762.6	258.7	15.7
	Anomalous Peak	1480.2	800.2	49.1

Critical Panel + 1	Coherent Peak	1206.3	905.1	55.5
	Action Star	1420.9	960.8	58.8
	Blank Panel	1432.8	986.5	60.0
	Anomalous Peak	1525.9	1064.3	65.3

Open in a new tab

Discussion

This experiment compared the viewing times of depicted scenes and action stars in coherent narrative sequences and scrambled sequences. Coherent sequences were easier to understand than scrambled sequences, with slower viewing times appearing across the ordinal position of non-critical panels in sequences for scrambled sequences, and coherent sequences rated as more comprehensible than scrambled ones, regardless of Peak type. These results replicate established findings across domains that scrambling a narrative impairs comprehension (Cohn et al., 2012; Gernsbacher et al., 1990; Mandler, 1978, 1984; Mandler & DeForest, 1979; Stein & Nezworski, 1978), and are consistent with findings that narrative events with functional relations and/or continuity with preceding information are read faster than those that are not causally related (Radvansky & Copeland, 2000; Zwaan, Magliano, & Graesser, 1995).

In addition, slower viewing times appeared for starting panels for both sequence types, again consistent with previous findings in verbal discourse (Glanzer, Fischer, & Dorfman, 1984; Haberlandt, 1984) and visual narratives (Cohn, 2014; Cohn & Paczynski, 2013; Gernsbacher, 1983) where the starting unit “lays a foundation” of information for the subsequent narrative (Gernsbacher, 1990). That viewing times were slightly longer for starting panels of scrambled sequences, above and beyond this process, implies that these panels were not prototypical for beginning a sequence (Cohn, 2014), since no prior context would have impacted their processing. In contrast, the slowing of viewing times to the final panel of the coherent sequences suggests a wrap-up effect (Cohn, 2014) consistent with those observed at the end of sentences (e.g., Rayner, Kambe, & Duffy, 2000). However, this slowing appeared only for coherent sequences and not for scrambled sequences, suggesting that participants responded to a feature of the narrative (ex. a Release panel) rather than ordinal position alone. This interpretation is further supported by the fact that filler sequences varied the length of the stimuli, making it harder for participants to anticipate that the sixth panel would be the final panel in these experimental sequences.

Like non-critical panels, critical panels with depicted scenes appeared to be slower in scrambled than coherent sequences. However, action stars in coherent and scrambled sequences were viewed at nearly the same pace, despite having very different contexts. We therefore cannot confirm that action stars play a narrative role in coherent sequences that is ameliorated in scrambled sequences. However, action stars were viewed almost twice as fast as depicted scenes. This rapid viewing can perhaps be attributed to the physical differences between these panels: Without representational information (i.e., characters, objects) action stars contain far less visual information than normal Peaks, and thus can be viewed more rapidly because they lack the need to process basic scenes (e.g., Oliva, 2005). The significantly shorter viewing times to action stars than depicted scenes may thus reflect a ceiling for action stars, which may have been reached regardless of sequence context because of their impoverished representation.

At the panel following the critical panel, longer viewing times appeared to panels following action stars than to the same panels following depicted scenes within each of the sequence types. This supports the idea that comprehenders generate inferences at a panel following an action star, because comprehenders must infer the contents of the prior panel, unlike when such information is provided overtly in a depicted Peak. While viewing times alone cannot confirm that inferences may be made—because no exploration of participants' actual inferences were tested (such as with a think-aloud task)—these results are consistent with studies of discourse that have interpreted longer viewing times to sentences eliciting inferences than controls as evidence of inference generation (Haviland & Clark, 1974; Keenan et al., 1990; Sanford & Garrod, 1981; van den Broek, 1994).

Despite not finding evidence for a narrative role of action stars, these results may be indicative of action stars maintaining the causal coherence of a sequence (Magliano et al., 1993; Singer et al., 1992; Trabasso et al., 1984), possibly through inference (Magliano et al., 1993). While processing is generally faster for elements that retain strong causal relations than those with weaker causal connections (Keenan, Baillet, & Brown, 1984; Myers, Shinjo, & Duffy, 1987; Radvansky & Copeland, 2000), the shorter viewing times to panels after action stars in scrambled sequences relative to coherent sequences may provide the reverse interpretation: The lack of an influence of action stars in scrambled sequences for generating inferences may indicate a lack of causal structure in which action stars are embedded. Action stars therefore maintain causal coherence in the sequence, despite not sustaining referential continuity with the prior context (e.g., Magliano & Zacks, 2011; Saraceni, 2001). Although we find this explanation appealing, we cannot be certain that viewing times across sequence types reflects an amelioration of inference generation, given that CP+1 panels were likely different images between coherent and scrambled sequences. Thus, we offer this analysis only tentatively.

An alternative interpretation may attribute the slowing caused by action stars not to inference generation, but to them being “surprising” panels given the context. Under this view, action stars may not play a narrative role at all, and the slowing to subsequent panels is a reaction to their disruption of the semantic and/or narrative structure of a sequence (Zwaan et al., 1995)— with or without inference. This slowing would be consistent with prior research showing that longer viewing times appear to panels following narrative and/or semantic violations than those following normal Peak panels (Cohn, 2012). Indeed, slightly slower viewing times also arose to panels following action stars than following depicted scenes in scrambled sequences, where no inference should be possible. Under this view, action stars are an unexpected panel that requires a “recovery” following their appearance, no matter the sequence's context or the inference required.

The longer times appearing to panels after coherent action stars than scrambled action stars could be accounted for by the “surprise” interpretation: Since coherent narratives are disrupted more by an incongruity than a scrambled sequence (as evident in viewing times at and after normal Peaks), longer viewing times appear after action stars in coherent sequences than those in scrambled sequences. An additional alternative may attribute the relative difference between viewing times following action stars to inference, with action stars in coherent sequences sponsoring the generation of inference above and beyond a reaction to their incongruity.

Experiment 2: Action stars and violations

Altogether, Experiment 1 suggested that action stars require comprehenders to generate inferences. However, such results do not provide evidence that action stars play a narrative role in the sequence, and, further, leave open the possibility that action stars may be incongruous to a sequence—regardless of inference generation. If action stars are surprising, they should evoke the same increase in viewing times at a subsequent panel as fully anomalous panels, such as a Peak from another unrelated sequence. Nevertheless, the impoverished graphic structure of action stars should still be viewed faster than panels with more graphic content. Thus, an additional contrast would be to compare them to an empty, blank panel lacking content entirely, which would be closer in physical appearance. An empty panel, devoid of content, should indeed be anomalous to a sequence, but should also evoke inferences for missing information, similar to action stars. Experiment 2 therefore contrasted viewing times and ratings of normal, coherent Peaks with action stars, blank panels, and anomalous Peaks.