The Agent Preference in Visual Event Apprehension

Arrate Isasi-Isasmendi; Caroline Andrews; Monique Flecken; Itziar Laka; Moritz M Daum; Martin Meyer; Balthasar Bickel; Sebastian Sauppe

doi:10.1162/opmi_a_00083

. 2023 Jun 9;7:240–282. doi: 10.1162/opmi_a_00083

The Agent Preference in Visual Event Apprehension

Arrate Isasi-Isasmendi ^1,^2,^*, Caroline Andrews ^1,², Monique Flecken ³, Itziar Laka ⁴, Moritz M Daum ^2,^5,⁶, Martin Meyer ^1,^2,⁷, Balthasar Bickel ^1,^2,^†, Sebastian Sauppe ^1,^2,^†

PMCID: PMC10320828 PMID: 37416075

Abstract

A central aspect of human experience and communication is understanding events in terms of agent (“doer”) and patient (“undergoer” of action) roles. These event roles are rooted in general cognition and prominently encoded in language, with agents appearing as more salient and preferred over patients. An unresolved question is whether this preference for agents already operates during apprehension, that is, the earliest stage of event processing, and if so, whether the effect persists across different animacy configurations and task demands. Here we contrast event apprehension in two tasks and two languages that encode agents differently; Basque, a language that explicitly case-marks agents (‘ergative’), and Spanish, which does not mark agents. In two brief exposure experiments, native Basque and Spanish speakers saw pictures for only 300 ms, and subsequently described them or answered probe questions about them. We compared eye fixations and behavioral correlates of event role extraction with Bayesian regression. Agents received more attention and were recognized better across languages and tasks. At the same time, language and task demands affected the attention to agents. Our findings show that a general preference for agents exists in event apprehension, but it can be modulated by task and language demands.

Keywords: event apprehension, eye tracking, event roles, agents, patients, case marking, Basque, Spanish, brief exposure paradigm

INTRODUCTION

To understand the complex reality of everyday life, we need to attend to the events unfolding around us. Events are dynamic interactions that develop over space and time (Altmann & Ekves, 2019; Richmond & Zacks, 2017). A crucial component of events is their participants or event roles, as well as the interaction that binds them. The most basic event roles are the doer of the action (“agent”) and the undergoer to whom the action is done (“patient”). The action in the event is defined by the specific relationship between these two roles (the event type, e.g., seeing or kicking).

Humans have been proposed to categorize critical information in events using event models (Zacks, 2020). These models are mental representations in memory used to segment the perceived ongoing activity into structured events (Radvansky & Zacks, 2011; Zacks et al., 2007), probably using specialized neural mechanisms (Baldassano et al., 2018; Stawarczyk et al., 2021). Event models are believed to store information on the abstract structure of events, including the event roles.

When exposed to events, humans can extract event role information in a quick and effortless way (Hafri et al., 2018). Hafri et al. (2013) presented participants with events for as short as 37 and 73 ms, and then asked a probe question about the event type, the agent, or the patient. They found that the event type and event roles could be recognized even with the shortest presentation time. Rissman and Majid (2019) review a range of experimental studies with adults, children and infants, and conclude that there is a universal bias to distinguish agent and patient roles.

The agent and patient event roles are asymmetric in their cognitive status, and so far the evidence suggests that agents are more salient. The agent role is characterized by distinguishing perceptual features, such as outstretched limbs (Hafri et al., 2013; Rissman & Majid, 2019). In contrast, the patient role is defined by the lack of these features, resulting in a more diffuse category (Dowty, 1991). When looking at pictures of events, humans tend to inspect agents more thoroughly than patients or other event elements (Cohn & Paczynski, 2013). Furthermore, the agent role is preferentially attended to in all stages of development in humans (Cohn & Paczynski, 2013; Galazka & Nyström, 2016; New et al., 2007), a preference shared with other animals (V. A. D. Wilson et al., 2022). Taken together, the reported evidence suggests that agents take a privileged position in the basic mechanisms of event processing (Dobel et al., 2007; Gerwien & Flecken, 2016; F. Wilson et al., 2011).

This is consistent with how humans attend to scenes in general: In the inspection of real-world scenes, conceptually relevant information guides attention (Henderson et al., 2009, 2018; Rehrig et al., 2020). This happens in a top-down fashion, that is, by higher-order cognitive representations affecting the information uptake. In the inspection of events specifically, the agent is arguably the conceptually most relevant or salient element, and would therefore guide visual attention in a top-down fashion. We use the term “agent preference” to refer to the presumably privileged status of agents in event cognition.

The agent preference finds parallels in other domains, too. When communicating about events, agents occupy privileged positions in how they are expressed. Semantic role categories in language are organized hierarchically, and theories converge on ranking agents the highest in this hierarchy for predicting the morphosyntactic properties of event role expressions (e.g., Bresnan, 1982; Fillmore, 1968; Gruber, 1965; Van Valin, 2006). Agents also play an important role in generating predictions in incremental sentence processing since they tend to be the expected default interpretation for noun phrases (Bickel et al., 2015; Demiral et al., 2008; Haupt et al., 2008; Kamide et al., 2003; Matzke et al., 2002; Sauppe, 2016). When gesturing about events, naïve participants across cultures tend to place the agent first, independently of the word order of their language (Gibson et al., 2013; Goldin-Meadow et al., 2008; Hall et al., 2013; Schouwstra & de Swart, 2014). This mirrors the tendency for placing agents first across languages (Dryer, 2013; Napoli & Sutton-Spence, 2014). In sum, this evidence supports the idea that there is a general preference for agents in cognition.

However, many studies investigating attention to events were not designed to specifically address this cognitive bias. Their findings can therefore only be interpreted indirectly and also might be at least partially confounded by other visual properties. Most studies on event cognition have used non-human and smaller-sized patients (Dobel et al., 2007; Gerwien & Flecken, 2016; Ünal, Richards, Trueswell, & Papafragou, 2021), such as, for example, an event in which a woman cuts a potato. Size, animate motion, and human features attract visual attention (Frank et al., 2009; Pratt et al., 2010; Wolfe & Horowitz, 2017) and this might have biased the attention toward agents in these studies.

So far, two studies have attempted to account for animacy when testing attention allocation in events (Cohn & Paczynski, 2013; Hafri et al., 2018). In these experiments, both agents and patients were human and of similar size. Cohn and Paczynski (2013) measured looking times to the event roles in cartoon strips that were presented frame by frame. They found longer looking times for agents than for patients and argued that the advantage for agents stemmed from them being the initiators of the action. Hafri et al. (2018) also used human agents and patients in their stimuli, but unlike Cohn and Paczynski (2013), they did not find a preference for agents. When viewing event photographs for short durations, participants responded faster to low-level features in patients than in agents. This is potential counter-evidence against the agent preference, opening the possibility that this preference is not a cognitive bias per se, but rather a side effect of an animacy bias, in line with findings from emergent sign languages (Meir et al., 2017). An agent preference could also emerge from conscious decision-making and only in a later time frame when attending to human-human interactions. This would explain why Cohn and Paczynski (2013) found an agent preference in a self-paced task, while Hafri et al. (2018) did not find such a preference when using brief stimulus presentation times and high time pressure. Hence, it remains unknown whether the agent preference operates independently of animacy, and whether it arises in the earliest stages of attending to events.

In the present work, we investigate whether an agent preference in event cognition is detectable in early visual attention. We include both human and non-human patients in the stimuli, as well as patients of different sizes. Following Dobel et al. (2007) and Hafri et al. (2018), we focus on the apprehension phase of processing events. We define event apprehension as the phase in which the gist of an event is obtained, covering approximately up to the first 400 ms after seeing an event picture (Griffin & Bock, 2000). We chose the apprehension phase specifically because it captures the earliest and most spontaneous allocation of visual attention. During apprehension, agent and patient roles are extracted spontaneously and independently of an explicit goal, that is, also when the task requires only the extraction of low-level features (such as color) and does not encourage the processing of event roles (Hafri et al., 2013, 2018). If there is a general agent preference in event cognition, it should be detectable already in this phase.

To target event apprehension, we adapted the brief exposure paradigm from Dobel et al. (2007) and Greene and Oliva (2009). In this paradigm, pictures of events are presented for only very short periods of time, typically between 30 and 300 ms, depending on the screen position in which the picture appears (Dobel et al., 2007, 2011). Because planning and launching a saccade already takes between 150 and 200 ms (R. H. S. Carpenter & Williams, 1995; Duchowski, 2007; Pierce et al., 2019), viewers need to make quick decisions about what to look at. These decisions are arguably based on prior information and task-related knowledge (Gerwien & Flecken, 2016).

As well as probing for an agent preference in the earliest time window of attention, we also test whether this preference persists across different languages and task configurations. Indeed, the agent preference is likely to interact with other top-down cues that guide visual attention, such as knowledge of the event, prior experiences, and task demands (e.g., Summerfield & de Lange, 2014). An important task is producing sentences in a specific language, because language can exert a top-down influence on the way events are inspected (Norcliffe & Konopka, 2015). For example, Norcliffe et al. (2015) used a picture description experiment with speakers of languages with different word order (subject-initial vs. verb-initial), and showed that speakers of verb-initial sentences in Tzeltal (Mayan) prioritized attending to verb- or action-related information over agents (cf. also Sauppe et al., 2013).

In our study, we test whether an agent preference persists across two languages that mark agents differently, namely Basque and Spanish. Basque has a case marker (conventionally known as ‘ergative’) specifically for agent noun phrases, while Spanish does not have an agent-specific case marker. This means that in Basque, agentive subjects are overtly marked (-k in the examples in 1), and non-agentive subjects are left unmarked. In Spanish, by contrast, both agent and patient subjects are treated alike, independently of their event role (carrying the unmarked nominative case, similar to English and German). This difference is illustrated in the following sentences in Basque and Spanish. In Basque, only agentive subjects (1a–b in contrast to 1c) receive the ergative case marker. In Spanish (2), instead, all subjects are unmarked (no ergative case marking).

(1)
Basque¹
- a.
  Emma-k mahaia altxatu du
  
  Emma-erg table lift aux
  
  ‘Emma lifted the table.’
- b.
  Emma-k borrokatu du
  
  Emma-erg fight aux
  
  ‘Emma fought.’
- c.
  Emma-∅ iritsi da
  
  Emma-nom arrive aux
  
  ‘Emma arrived.’
(2)
Spanish
- a.
  Emma-∅ ha levantado la mesa
  
  Emma-nom aux lift det table
  
  ‘Emma lifted the table.’
- b.
  Emma-∅ ha luchado
  
  Emma-nom aux fight
  
  ‘Emma fought.’
- c.
  Emma-∅ ha llegado
  
  Emma-nom aux arrive
  
  ‘Emma arrived.’

Given this case marking system, speakers must commit to the agentivity of the subject noun phrase early on when planning a sentence in Basque, because they need this information to decide on its case marker (Egurtzegi et al., 2022; Sauppe et al., 2021). This may increase the need to search for agents in events, especially in languages such as Basque, where agency is the critical feature. Hence, the tendency to inspect agents might increase when planning sentences in Basque due to the demands imposed by case marking. In contrast, for Spanish speakers, agent-related information is not necessary to plan the subject argument of the sentence, because the case marking will not be affected. This means that they could defer making a decision for building a description of event roles to a later point in time and thus maintain more flexibility (Bock & Ferreira, 2014; F. Ferreira & Swets, 2002; V. S. Ferreira, 1996).

In our experiments, we tested native Basque and Spanish speakers in an event description task to investigate whether the agent preference persists across these two different case marking settings. This way, we explored whether an agent preference arises in a language production task, independently of language-specific grammatical features.

In addition, we also tested Basque and Spanish participants in a task that does not require sentence planning. Most previous studies that provide evidence for an agent preference involved a task that required participants to describe events (Gerwien & Flecken, 2016; Sauppe & Flecken, 2021). Given that most languages are agent-initial, it is possible that general sentence planning mechanisms give rise to the agent preference. In our experiments, we introduced a task manipulation to explore whether an agent preference emerges also in the absence of sentence planning demands.

Hence, participants in our experiments undertook two tasks: after being presented an event photograph for 300 ms, they either produced a sentence to describe the event (the event description task) or decided whether a probe picture matched an event participant from the briefly presented target picture (the probe recognition task). The event description task required the linguistic encoding of the event with the corresponding language-specific differences between the Basque-and Spanish-speaking groups. By contrast, the probe recognition task only demanded selective attention to event roles, with no linguistic response. It is still possible that participants covertly recruited or activated language in the probe recognition task (Ivanova et al., 2021), but this task did not require any sentence planning, which probably decreased the activation or use of language.

In Experiment 1, Basque and Spanish speakers participated on the internet and we measured their accuracy and reaction times for each event role in both tasks. In Experiment 2, participants were tested in a laboratory setting, and we used eye tracking to record the eye gaze to the event pictures during the brief exposure period. First fixations have been argued to closely reflect the processes underlying event apprehension (Gerwien & Flecken, 2016), as viewers collect parafoveal information on the event structure and use it to decide on the location of their first fixation to the picture. In comparison, the accuracy and reaction time of the response (the verbal description or the decision on the probe) reflect the outcome of the apprehension stage together with additional cognitive processes, such as memory, post-hoc reasoning, and judgment processes demanded by the task (Firestone & Scholl, 2016). Nevertheless, accuracy and reaction times contain valuable information on the apprehension phase (Hafri et al., 2013, 2018). Thus, we used fixations to pictures, accuracy, and reaction times as three different measures of participants’ attention allocation and information uptake during event apprehension. This procedure allows us to tackle two research questions: Is there an agent preference in attention patterns in visual apprehension? Does this preference persist across different language and task configurations? Based on the findings presented by Cohn and Paczynski (2013), we predicted that if there is a general agent preference in cognition, it should already be detectable in the earliest and most spontaneous stage of attention allocation. In the current experiments, agents thus should receive more visual attention than other event elements across both languages and tasks tested.

EXPERIMENT 1

Methods

Participants.

Native speakers of Basque (N = 90) and Spanish (N = 88) participated in an online study.² Social media were used to advertise the study and recruit participants, and monetary prizes were raffled for participation. All participants reported that their native language was still the most common or one of the most common languages in their daily life at the time of participation. The experiment was approved by the ethics committees of the Faculty of Arts and Social Sciences of the University of Zurich (Approval Nr. 19.8.11) and the University of the Basque Country (Approval Nr. M10/2020/007), and all participants gave their informed written consent. All procedures were performed following the ethical standards of the 1964 Helsinki declaration and its later amendments.

Materials and procedure.

Stimuli consisted of 48 photographic gray-scale images depicting transitive, two-participant events with a human agent (see Figure 1 for an example). In half of the events, the patient was human (e.g., with actions such as “hit” or “greet”) and in the other half, an inanimate object was the patient (e.g., with actions such as “wipe” or “hammer”). Ten intransitive events featured a sole participant performing an action and were included as fillers. Within intransitive events, half of them featured an agent-like participant (e.g., “jump”), and the other half a patient-like participant (e.g., “fall down”). A full list of events is presented in Table A1. Photographs depicted the midpoints of events. Static images have been found to spontaneously convey motion information (Guterstam & Graziano, 2020; Kourtzi & Kanwisher, 2000; Krekelberg et al., 2005) so that participants could automatically represent the depicted event sequences as a whole.

The events were portrayed by four different actors (two males, two females). Four versions of each event were photographed, one with each of the four actors as the agent. In the case of events with human patients, the respective actor of the patient was counterbalanced between events, so that the identity of the patient was independent of the identity of the agent. A horizontally mirrored version of each picture was also created to counterbalance the agent’s position across experimental participants. This led to a total of 232 stimulus pictures of 58 events.

The stimulus pictures were distributed over four lists. The set of events was identical across the lists, but with different combinations of actors in agent and patient roles across lists. The lists were used as blocks in the experiment, and participants were randomly assigned two out of the four blocks, with a total of 116 event pictures. The order of events within the blocks was randomized for each participant.

Participants responded to a short demographic questionnaire before the experiment, which included information on gender, age, language acquired from each of their parents, and their most commonly used language. The experiment was programmed in PsychoPy and exported to PsychoJS (Peirce et al., 2019). The experimental sessions were run in full-screen mode on Pavlovia (https://pavlovia.org). Pavlovia offers high temporal resolution (Anwyl-Irvine et al., 2021; Bridges et al., 2020), which ensured that the duration of the stimulus presentation was approximately 300 ms. Mobile phones and tablets were not allowed; participants were directed to the Pavlovia experiment through Psytoolkit (Stoet, 2010, 2017), which enabled blocking access from mobile devices.

Trials began with a fixation cross (with a jittered duration time between 800 ms and 1200 ms), followed by the target event picture, which was displayed for 300 ms in one of the four corners of the screen. The orientation (agent left or right) and the picture’s screen position were counterbalanced within the same event types, so that each combination occurred equally often. A mask image appeared immediately after the event picture for a duration of 500 ms (cf. Figure 2). The mask was used to deprive participants of the ability to use their visuospatial sketchpad memory to reconstruct the image (Baddeley, 2007). Following the mask display, participants were prompted to perform an event description or a probe recognition task. The tasks were administered in separate blocks and the order in which the tasks were presented was counterbalanced across participants.

Figure 2. — **Trial structure.** The trial procedure was the same until the task and it consisted of a fixation cross, the brief exposure image, and the mask image. Then, in the event description task, the participants were prompted with a question to describe the event image (*Zer gertatu da?* in Basque and *¿Qué ha ocurrido?* in Spanish). In the probe recognition task, a probe image was displayed and participants had to press a button to answer whether that person or object was present in the event.

The event description blocks started with the presentation of the four actors and their names. Participants were instructed to learn the four names and use them to describe the events they would see (e.g., “Emma has kicked Tim”). Participants in both languages were instructed to describe events as if they were just completed to elicit sentences in the perfective aspect. This ensured that Basque speakers produced sentences with ergative case marking because there is no ergative marking in the progressive aspect (Laka, 2006). Six example trials were also provided, where sample written descriptions were displayed after the event pictures. The participants also completed six practice trials before proceeding to critical trials.

In the probe recognition task, participants indicated by a button press whether the actor or object in a probe picture had been present in the target event picture or not. The probes showed the agent of the event, the patient of the event, or another actor or object not present in the target picture. Participants were required to respond within 2000 ms, and six practice trials with longer time-outs (7000 ms, down to 3000 ms) were included prior to critical trials.

No feedback was provided for the practice trials in either of the tasks. Figure 2 provides a graphical representation of the trial structure. The experiment sessions lasted approximately 30 minutes. All instructions were provided in Basque or Spanish, respectively.

Data processing and analyses.

Nine participants who reported not speaking the same language with both parents were excluded. Data from two participants were lost due to technical errors in data saving. In total, data from 84 native Basque speakers (age range = 18–66 years, mean age = 31.4 years, SD = 11.4 years, 55 female) and 82 native Spanish speakers (age range = 18–68 years, mean age = 33.1, SD = 11.7 years, 48 female) were available for analysis.

Following the previous literature on the brief exposure paradigm (Dobel et al., 2007; Hafri et al., 2013), we relied on behavioral measures (task specificity, accuracy, and reaction time) as possible windows to event apprehension. In the event description task, written responses largely followed the agent-patient word order patterns, canonical in both languages (see details in Table C1). A native speaker of Basque (A.I.-I.) coded agent, patient, and action specificity for each response, specifying whether the description was specific, general, or inaccurate. We considered answers as specific if the name of the event role participant (“Emma”) or object (“bowl”) was correct and as general if the description contained correct general features, such as gender or category (e.g., “Lisa” or “One girl” for “Emma”, “pan” for “bowl”). The descriptions that were incorrect (“Tim” instead of “Emma”) or uninformative (“someone”) were coded as inaccurate. Event description trials were excluded from analysis if the intended target verb and event roles were inverted (e.g., “Emma has listened to Lisa” instead of “Lisa has shouted to Emma”), if the description was reciprocal (e.g., “Emma and Lisa have shouted to each other”) or if the sentence was not described in perfective aspect (in total, 8% of all event description trials).

For the probe recognition task, we analyzed the trials in which the probes matched either the agent or the patient of the previous event picture (half of all probe recognition trials). The other half of the probe recognition trials showed a foil that was not present in the event picture. We included these foils to ensure that the number of trials requiring a “true” or a “false” answer was balanced, but they were not informative about event role-related accuracy and hence were not included in the analyses. In addition, trials without responses and trials with response times shorter than 200 ms or 2.5 standard deviations longer than the mean were excluded (in total, 6.5% of all probe recognition trials).

Participants were additionally excluded from analyses if they performed with overall low accuracy, separately for each task. For the event description analyses, three participants who had less than 60% specific agent answers and five participants who had less than 50% trials remaining after applying the other exclusion criteria were excluded. For the probe recognition analyses, we excluded four participants who had less than 50% trials left after applying the other exclusion criteria or whose overall accuracy was below 60%. We applied these exclusion criteria to ensure that participants in the analyses understood and followed task instructions, and were not performing at chance. For the probe recognition task, we additionally checked that the participants had above-chance accuracy also for the trials with foils (“false” trials). We found that participants correctly rejected foil trials on average in 86% of trials (SE = 0.7%). This ensures that the results from the critical trials (“true” trials) were informative and not driven by a bias to simply answer positively. Figure B1 shows that all participants had above-chance accuracy for the whole set of trials in probe recognition.

On balance, data from 157 participants for the event description task (7071 trials) and from 163 participants (3739 trials) for the probe recognition task were included in statistical analyses.

Statistical analyses were conducted in R (R Core Team, 2022) using hierarchical Bayesian regression through the brms (Bürkner, 2017, 2018) interface to Stan (B. Carpenter et al., 2017). Post-hoc contrasts between the predictor factor levels were extracted with the emmeans package (Lenth, 2020). A cumulative ordinal model with a logit link function was fit to jointly model agent, patient, and action specificity for event description trials (ranking specific > general > inaccurate). For probe recognition analyses, a Bernoulli model with a logit link function was fit to model accuracy in response to agent and patient probes. In both models, event role, language, and their interaction were predictors of interest; identity of the agent role actor, animacy of the patient, and task order were included as nuisance predictors (Sassenhagen & Alday, 2016). Animacy is known to attract visual attention (Frank et al., 2009), and therefore we included it as a covariate to capture its potentially large effects. This ensured that any evidence in favor of the agent preference was not driven by differences between agent and patient animacy in events depicting human-object interactions. We modeled log-transformed reaction times in the probe recognition task with a Gaussian model with an identity link function. Language, event role, and their interaction were the critical predictors, and animacy of the patient, trial accuracy, and task order were included as nuisance predictors. We included random intercepts and slopes for language and event role by participant and by event type in all models. Student-t distributed priors (df = 5, μ = 0, σ = 2) were used for the intercept and all population-level predictors in all models. Default priors (Student-t, df = 3, μ = 0, σ = 2.5) were used for group-level predictors. In all models, the block number and task order were standardized (z-transformed) and all other predictors were sum-coded (−1, 1).

When reporting the parameter estimates of interest ( $\hat{β}$ ), we provide the mean and standard error of the posterior draws. We additionally include the posterior probability of the hypothesis that the estimate is smaller than or larger than 0. This is equal to the proportion of draws from the posterior distribution that fall on the same side of 0 as the mean of the posterior distribution, which is a direct indication of the strength of the evidence (Kruschke, 2015). We visualize this information with posterior density plots for each parameter of interest (Figure 3B and Figure 4C–D).

Figure 3. — **Results from event description task in Experiment 1.** (A) Specificity in event description, showing the proportion of *specific* answers (in contrast to *general* and *inaccurate* answers). Individual dots represent participant means, black dots represent means of participant means, and error bars on the black dots indicate 1 standard error of the mean. Figure F1 in the Appendix shows fitted values from the Bayesian regression model. (B) Posterior estimates for the predictor event role from the Bayesian regression model (Table D1). (C) Posterior estimates of interactions between each of the event roles (agent, patient, action) and language from the same Bayesian regression model. Black horizontal lines represent the 50%, 80%, and 95% credible intervals.

Results

The results from the event description task are shown in Figure 3 and from the probe recognition task in Figure 4; regression model summaries are presented in Tables D1, D2, and D3. Compared to patients, agents were described with greater specificity ( $\hat{β}$ _Agent: mean = 1.13, SE = 0.15, P( $\hat{β}$ > 0) = 0.99; see Table 1 and Figure 3). Participants also recognized agents with greater accuracy than patients in probe recognition ( $\hat{β}$ _Agent: mean = 0.24, SE = 0.13, P( $\hat{β}$ > 0) = 0.99; see Table 1 and Figure 4A).

Table 1. .

Proportion of specific or accurate responses in event description and probe recognition tasks, as well as reaction times (in ms) for probe recognition in Experiment 1; standard deviations in parentheses.

Event Role	Language	Event description	Probe recognition
Event Role	Language	Proportion of specific responses	Proportion of correct responses	Reaction time
Agent	Basque	0.90 (0.08)	0.89 (0.10)	939 (183)
Agent	Spanish	0.91 (0.09)	0.86 (0.13)	728 (113)
Patient	Basque	0.72 (0.09)	0.79 (0.14)	1057 (189)
Patient	Spanish	0.69 (0.10)	0.81 (0.13)	1030 (168)
Action	Basque	0.64 (0.11)	—	—
Action	Spanish	0.65 (0.12)	—	—

Open in a new tab

In line with the accuracy results, participants in both languages responded faster to agent probes compared to patient probes ( $\hat{β}$ _EventRole = −0.055, SE = 0.008, P( $\hat{β}$ < 0) = 0.99; Table 1, Figure 4B).

The higher specificity, accuracy, and faster reaction times for agents were detectable despite the high variability between participants and events (for event description $\hat{β}$ _Participant: mean = 0.63, SE = 0.05, P( $\hat{β}$ > 0) = 0.99; $\hat{β}$ _Event: mean = 0.91, SE = 0.10, P( $\hat{β}$ > 0) = 0.99; for probe recognition: $\hat{β}$ _Participant: mean = 0.53, SE = 0.08, P( $\hat{β}$ > 0) = 0.99; $\hat{β}$ _Event: mean = 0.69, SE = 0.10, P( $\hat{β}$ > 0) = 0.99). This high variability was expected due to the online setting, which allows only limited control of participants’ behavior.

On top of these effects for event roles, we also found an interaction between language and event role in both tasks. In event description, Basque speakers described patients with higher specificity than Spanish speakers ( $\hat{β}$ _{Language×Patient} = 0.09, SE = 0.03, P( $\hat{β}$ > 0) = 0.99; see Figure 3). By contrast, Spanish speakers described action verbs more precisely than Basque speakers ( $\hat{β}$ _{Language×Action} = −0.08, SE = 0.04, P( $\hat{β}$ < 0) = 0.99). There were no notable differences in the specificity of agent descriptions between languages ( $\hat{β}$ _{Language×Agent} = −0.02, SE = 0.05, P( $\hat{β}$ < 0) = 0.64).

In the probe recognition task (see Figure 4A; Table D2), the interaction between language and event role showed that the Basque participants were more accurate when responding to agent probes than the Spanish participants ( $\hat{β}$ _{Language×EventRole}: mean = 0.08, SE = 0.06, P( $\hat{β}$ > 0) = 0.91). In contrast, there was no substantial evidence for an interaction between language and event role in the reaction times ( $\hat{β}$ _{Language×EventRole} = 0.002, SE = 0.004, P( $\hat{β}$ > 0) = 0.75; Table D3).

Discussion

Speakers of Spanish and Basque both showed a general preference for agents in both tasks, reflected in higher specificity and accuracy for agents, as well as faster reaction times. This effect was consistent across languages and participants, even when the animacy of the patient was controlled for in the statistical analysis. This matches the results reported by Cohn and Paczynski (2013) and provides evidence for an agent preference in human cognition (New et al., 2007; Rissman & Majid, 2019). We will return to this finding in the General Discussion.

In addition to the general preference for agents over patients, there were differences between speakers of the two languages in their attention to event roles. In the probe recognition task, the Basque participants were more accurate than the Spanish participants in responding to agent probes, and less accurate than the Spanish participants in responding to patient probes. By contrast, we did not find any effect on the specificity of agent descriptions in the event description task. However, Basque speakers were more specific than Spanish speakers in describing patients.

Hence, we found effects of language on how speakers apprehended the event roles, although these were not consistent across tasks. A possible explanation for the divergent language effects between tasks could be the different time frames each task measured and the ability of the tasks to reflect event apprehension more or less directly (Firestone & Scholl, 2016). For event descriptions, the time required to type the responses could have led to memory decay (Gold et al., 2005; Hesse & Franz, 2010) and a deteriorated ability to reflect the apprehended information. Producing a sentence is a complex task and could also have contributed to memory distortion (Baddeley et al., 2011; Vandenbroucke et al., 2011). When producing descriptions, the words (lexical forms) of the agent and other parts of the sentence are usually retrieved in order of mention (Griffin & Bock, 2000; Meyer et al., 1998; Roeser et al., 2019). This potentially leaves the later elements of the sentence with a less clear memory trace. In Spanish, the patient is mentioned last (SVO order), while patients usually occupy the sentence-medial position in Basque (SOV order). This difference in word order could have interfered with the specificityof responses, because Basque speakers were able to “offload” patient information earlier (Baddeley et al., 2009, 2011). In fact, word order is known to influence the time course of sentence planning (Norcliffe et al., 2015; Nordlinger et al., 2022; Santesteban et al., 2015). In our experiment, all descriptions were agent-initial, and variations in word order were present only later in the sentence. Therefore, it does not appear likely that word order had an effect as early as in the apprehension phase, but we suggest that it affected how participants recalled and linearized the patient and action information when formulating their responses. It is also possible that additional post-hoc processes influenced the results because participants were not under time pressure to provide their answers.

By contrast, the time pressure was high in the probe recognition task because of the time-out. Participants responded by pressing the button on average 974 ms after the onset of the event picture (cf. Figure 2). This much shorter time between stimulus presentation and completed response may have reduced the effect of post-hoc cognitive processes, suggesting that the accuracy in this task better reflects the attention patterns during the event apprehension phase.

Therefore, the results from each task may reflect different stages of processing events: probe recognition accuracy would represent the outcome of event apprehension, and specificity in event descriptions would reflect a combination of event apprehension, possibly influenced by task demands and word order differences. These inherent differences between the tasks and the proneness of behavioral responses to be influenced by post-hoc processes make it necessary to further explore the event apprehension and to move beyond behavioral measures for doing so.

A way to bypass these problems and obtain a more direct measure of apprehension is to use eye tracking. Gaze allocation patterns to the briefly presented event pictures are considered direct reflexes of the event apprehension process (Gerwien & Flecken, 2016). When presented with an event picture (as in Figure 2), viewers collect parafoveal information on the event structure and use this coarse representation to decide where on the event picture to fixate first. Therefore, first fixations to event pictures can be reliably linked to the event apprehension process and provide an alternative to relying solely on offline measures. Measuring gaze allocation also provides a direct way of comparing the agent preference across tasks (in contrast to the behavioral task-specific measures in Experiment 1).

EXPERIMENT 2

We adapted the design from Experiment 1 for the laboratory and introduced eye tracking to measure how participants directed their overt visual attention during event apprehension. Consequently, the main measure in Experiment 2 was the location of the first fixation in the stimulus pictures. We adapted the response modalities in the tasks to elicit faster responses, by requiring oral responses in the event description task and by reducing the time-out in the probe recognition task to 1500 ms. We predicted that the agent preference would be detectable in the fixation patterns and behavioral correlates, replicating and further characterizing the agent preference found in Experiment 1.