Skip to main content
PLOS One logoLink to PLOS One
. 2022 Dec 21;17(12):e0278604. doi: 10.1371/journal.pone.0278604

Boys don’t cry (or kiss or dance): A computational linguistic lens into gendered actions in film

Victor R Martinez 1,*, Krishna Somandepalli 2, Shrikanth Narayanan 1,2
Editor: Natalia Grabar3
PMCID: PMC9770346  PMID: 36542600

Abstract

Contemporary media is full of images that reflect traditional gender notions and stereotypes, some of which may perpetuate harmful gender representations. In an effort to highlight the occurrence of these adverse portrayals, researchers have proposed machine-learning methods to identify stereotypes in the language patterns found in character dialogues. However, not all of the harmful stereotypes are communicated just through dialogue. As a complementary approach, we present a large-scale machine-learning framework that automatically identifies character’s actions from scene descriptions found in movie scripts. For this work, we collected 1.2+ million scene descriptions from 912 movie scripts, with more than 50 thousand actions and 20 thousand movie characters. Our framework allow us to study systematic gender differences in movie portrayals at a scale. We show this through a series of statistical analyses that highlight differences in gender portrayals. Our findings provide further evidence to claims from prior media studies including: (i) male characters display higher agency than female characters; (ii) female actors are more frequently the subject of gaze, and (iii) male characters are less likely to display affection. We hope that these data resources and findings help raise awareness on portrayals of character actions that reflect harmful gender stereotypes, and demonstrate novel possibilities for computational approaches in media analysis.

Introduction

TV and film media often reflect the views of a society with a considerable impact in how gender stereotypes are created or reinforced [1]. Through assumed behaviors and social roles, media representation and portrayals are a major influence in the way we construct our beliefs and ideas around gender-appropriate behaviors and norms, particularly during the formative years of childhood and youth [25]. Even throughout our adulthood, these portrayals can still guide the way we think [6], how we create our worldview and perceptions of others [7], our fashion choices [8], and the perception of our self-identity [9].

Right now there is a clear disparity in the way characters are portrayed in TV and film media, particularly with respect to the characters’ assumed or perceived gender expression. For example, over the past two decades, the media industry has been highlighted as one where women tend to be both underrepresented [10, 11] and depicted in a stereotypical manner [12, 13]. The former being so gravely unbalanced that male leads outnumber female leads two-to-one, with male characters speaking and appearing on the screen twice as often than their female counterparts [11]. Female characters are often presented in decorative (e.g., for their body and beauty), family-oriented, and demure roles [1418]; male characters are typically shown as independent, authoritarian and professional agents and, unlike their female counterparts, these representations do not depend on the male actor’s age or physical appearance [12, 19].

One of the main limitations in most of these efforts is their largely qualitative nature that require immense manual work with human annotations and/or surveys [20]. These cannot match the scale at which media content is currently being produced or consumed; in fact, these efforts have been unable to produce systematic data for both science and media scholarship at scale. To provide supporting evidence for systematic differences in gender portrayals at a larger scale, researchers have recently turned to machine learning models. These applications range from works that automatically detect actors’ faces and voices in TV and film [2124] to works on film narrative understanding through the analysis of character dialogues [25, 26]. A number of these are applied directly to movie scripts to gain insight into the early stages of content creation, and where the suggested modifications could be implemented at lower cost. For example, a linguistic analysis of gender ladenness (i.e., the degree of association in which language may be perceived as feminine or masculine [27]) in movie scripts found that romantic movies tend to include language with a higher degree of feminine association, whereas action movies tend to include language with higher degree of masculine association [28]. With respect to characters’ dialogues, linguistic analysis studies have found that male characters are associated with a higher number of words related to achievement, whereas female characters are usually written with more positive language, lower agency, and less power than their male counterparts [26, 29]. Other types of studies center around the social network inferred from scene sharing among characters. These studies demonstrate that with a few exceptions, men play almost all central roles across all genres, and for every three characters interacting in a movie, at least two are men [25, 26].

These automatic approaches come with its own limitations. For example, even when there is a general concensus among researchers that gender lies on a spectrum, most of the automated media content tools are limited to identifying only the female–male dyad—a reduction that some might argue to be overly simplistic [30]. Another limiting factor is that gender stereotypes are not bound to dialogue or scene co-appearance, but also can be communicated through the actions and behaviors of the characters [31]. For instance, consider the following three common stereotypes. First, the tomboy, a stereotype embodied by girls who are interested in science, vehicle mechanics, sports or other gender non-conforming behaviors or appearances. Second, that of the spinster or crazy cat lady, an unmarried women of a certain age whose sole narrative arc centers around their quest to find a partner. Finally, the scary angry man, which often portrays persons of color as innately savage, animalistic, destructive, and criminal. These stereotypes are rarely described explicitly through the character’s dialogues, and similarly one would be hard pressed to infer them through the character’s scene co-appearance networks. Instead, audiences tend to infer the implied stereotypes through a complex understanding of the character’s appearance, as well as their actions and behaviors [32, 33]. With this in mind, we believe there to be an open opportunity for a complementary analysis based on the character’s actions to better understand the pervasiveness of harmful gender representations in media. This work, to best of our knowledge, is the first to address this gap.

Current research

In this work, we present a large-scale media content analysis framework to uncover differences in how characters engage in actions depending on their their assumed gender expression. To this end, we perform three steps: (i) data collection and annotation; (ii) machine-learning modeling, and (iii) statistical analysis. Fig 1 presents an overview of the complete process. In the following we will briefly describe each of these steps.

Fig 1. Computational linguistic lens into gendered actions in film.

Fig 1

Our framework starts with a dataset of over 1.2 million movie descriptors from 912 movies and then implements three steps: first, the annotation process, where we collect manual annotations for 9,613 descriptions and over 1.5 million gender expression labels for characters. In the second step, we develop a machine learning model to identify actions, agents and patients from the natural language found in the movie scripts. Our model—trained on a large-set of general-domain documents and fine-tuned on a manually labeled description set—identifies actions, agents and patients from the natural language descriptions found in movie scripts. We use this model to automatically label the complete dataset for analysis. In the final third step, we perform a series of statistical analysis to uncover portrayal differences along characters’ portrayed attributes.

Data collection and annotation

We start from a collection of 912 movie scripts from which we extracted 131,954 scenes with 1,242,107 action descriptors (i.e., sentences that describe an action occurring during a scene). We design a task to collect manual annotations for a small sample of these action descriptions. This provides us with a manually labeled dataset for actions and their constituents (i.e., the agents and patients engaged in the action). Additionally, to identify the assumed gender expressions for each character in the movie scripts, we design a heuristic-based method based on identifying usage of proper names, gender pronouns and manually labeled examples.

Machine-learning modeling

While the previous step can provide reliable labels, it can only do so for a small subset of the action descriptions. To scale our approach, we develop a state-of-the-art machine-learning model that identifies characters engaged in actions from the natural language descriptors found in the movie scripts. Our approach leverages transformers, particularly the recent developments in large-scale contextual language models (BERT) [34]. We fine-tune the pre-trained general-knowledge BERT model to allow the model to learn the different ways in which scriptwriters describe character actions and behaviors. We use the model to automatically label the remainder of the descriptions for actions, as well as the characters acting as agents and patients of the action. This process yields a set of 1.2M+ automatically labeled descriptors, containing over 50,000 different actions with over 20,000 different participants.

Statistical analysis

As a final part of this work, we show how this framework can uncover portrayal differences along character attributes such as gender. To this end, we perform a series of statistical analyses over the 1.2+ million action descriptions to estimate the frequency of portrayal as a function of role and gender. As part of our results, we provide insights into some of the nuanced aspects in which certain stereotypes are being communicated through character actions and behaviors. Based on previous literature, we categorize our findings on the differences in portrayals into three main groups: (i) in how female characters are often depicted with less agency than male characters [29]; (ii) on the emphasis placed on the female appearance and sexual objectification of women actors [35, 36]; and (iii) on how gender plays a role in the frequency of affective portrayals, either by frequently casting women into overly emotional roles [12, 37, 38] or by a clear absence of male portrayals of affection [39, 40].

These analyses, and the insights obtained, exemplify how the computational framework can be beneficial for film makers, scriptwriters, and the broader society in terms of creating awareness about media portrayals from an equity lens.

In summary, the contributions of this work include:

  1. the design, construction and public sharing of an action description dataset with over 1.2 million descriptions obtained from 912 movie scripts

  2. the proposal for a machine learning method to identify actions, agents and patients from linguistic cues found in the action descriptions of a screenplay, and

  3. Implementation of a statistical framework with the 1.2 million action descriptions to highlight biases in the gendered portrayals of actions, agents and patients in media

To provide other researchers with the opportunity to explore additional hypotheses, we have made the dataset and labeling models available for download at https://sail.usc.edu/~ccmi/actions-agents-and-patients/.

Data collection and annotation

In this section we describe the process used to obtain manual annotations for actions and the characters engaged in these actions as agents or patients. Furthermore, we describe the heuristics used to identify the character’s assumed gender expressions from the language used throughout the movie script.

Our analysis is performed over a dataset of 1.2 million action descriptions obtained from a publicly available collection of 912 movie scripts covering over 31 genres and 104 years of movie productions (1909–2013) [41]. This collection of movie scripts has been widely accepted and validated by the research community, particularly in the analysis of character portrayals [29, 42]. Thus, we believe this corpus provides a representative sample of the typical character actions and behaviors on screen. Moreover, it is one of the few resources publicly available to provide human-validated named entity recognition, co-reference resolution, and gender information for each of the main characters in each movie script.

Action description dataset

From each movie script, we collect all of its action descriptions. Action descriptions are paragraphs within a scene that tells the reader what is going to happen on the screen and describes the characters and their actions. We focus on these descriptions as a complementary approach to prior works based on character dialogue [25, 26, 29]. We split each action paragraph into sentences and identify the actions (verbs) using Spacy [43]. This process yields a total of 1, 242, 107 sentences (μ = 1363.45, σ = 560.03, M = 1313 per description), 1, 634, 230 predicates and 84, 513 unique actions. Table 1 presents descriptive statistics of the constructed dataset.

Table 1. Descriptive statistics for the action description dataset.

Min Mean Median Max Std.
Movies: 912
Genres: 31
Per movie 2 3.00 3.00 7 0.95
Scenes: 131,954
Per movie 1 1363.45 136.00 646 68.52
Action Descriptions: 1,242,107
Per movie 1 1363.45 1313.00 4901 560.03
Actions: 1,634,230
Per movie 1 1797.83 1765.00 4057 633.98
Per action description 0 1.32 1.00 16 1.06
Agents: 1,175,237
Per movie 1 1008.93 977.00 3061 361.33
Per action description 0 0.95 1.00 11 0.56
Patients: 960,970
Per movie 2 1056.01 1000.50 4286 464.94
Per action description 0 0.77 1.00 14 0.89

Manual annotation

We select a sample of 12, 500 sentences to be coded by human annotators. This sample is constructed by rejection sampling, where each sentence has to have at least one verb (action). Our annotation procedure consisted of two tasks: labeling and verification. The following sections describe each task in detail.

A total of 981 annotators were hired through Mechanical Turk. From these, 602 were assigned to the labelling task, and 379 to the verification tasks. On average, each annotator took around a minute to complete an annotation task (μ = 50.21s; σ = 173.69), while only requiring half the time to verify an annotated result (μ = 28.49s; σ = 30.78). Remuneration scheme was devised to ensure that the annotators receive at least an hourly minimum wage in the U.S. This was calculated by diving the current (2021) minimum wage per hour ($7.25) by the expected time it would take to complete a single annotation (∼1 minute).

Labeling

For the labeling task, we present non-expert annotators with a sentence and ask them to identify the agents (or patients) for a particular action (verb). To ensure that annotators label only characters (and not inanimate objects), our annotation style follows the work of [44] where the syntactic heads of the constituents are annotated instead of their full extent. Moreover, we simplify the annotator’s task by providing annotators with two pieces of information: a pre-selected action, and a list of possible entities (see Fig 2). The former is obtained from the part-of-speech tags provided by the corpus. For each verb, we created a separate annotation task where annotators identify the agents and patients for that particular action. For the latter, however, we were not able to provide a list of all possible characters, as accurately identifying the literary figures in a text is still an open research problem [45]. Instead, we provided our best attempt at a reduced list of possible characters by identifying entities that are more likely to be used as character names. To construct this list, we start by filtering out words that are not pronouns, proper nouns, nouns, or noun phrases. From the remainder, we remove most of the common words since these are not normally used to refer to a character (e.g., door, eyes, room, hand, car, head, floor, etc…). For the special case of honorifics (such as Mr. or Miss), we follow [45] in considering these tokens as part of an entire maximal span (e.g., [Mr. Collins] or [Miss Havisham]). We include this maximal span as a single entry in our select box. For cases where the agent (or patient) is not explicitly stated in the sentence, annotators have the option to check in the box for ‘Does not say’.

Fig 2. Labeling task.

Fig 2

Annotators are presented with a sentence and an action. They are asked to either select the agent (source) and patient (target) of the action. For cases where one of these is missing, the annotator has the option to check the ‘Does not say’ box.

Each sentence was annotated by 3 non-experts and their agreement is used as the presumptive label for the next stage. If there was no agreement between the annotators, we discarded the sentence from the sample.

Verification

In this stage, we ask another annotator to verify if these labels are correct or not. We present a single sentence, the pre-selected action, and the presumptive labels for agents and patients when applicable or a ‘Does not say’ string for when not. The annotator gets prompted with a single question: “Is [AGENT] doing the [ACTION] to [PATIENT]?” and radio buttons for Yes, No or Does not say. If the annotator does not agree with the result, or if they cannot say whether it is correct or not, we discard the particular sentence from consideration.

As a final quality control step, one of the authors checked all the sentences and verified the labels for consistency. We decided to follow this multi-tiered approach to ensure a high level of annotation quality, one that makes sure there is a marked distinction between subjects and patients. This distinction becomes paramount if one considers that the models we will be working with do not have an inherent notion of what a character is. Hence, if the annotation results in low-quality labels, we run the risk of having a model that picks any word from the predicate as the patient for a given action. As we will discuss in further sections, our two-step verification was needed to ensure that most of the labels were correct and identify unreliable annotators.

Inferring assumed gender for characters

To obtain a character’s assumed gender expression, we follow a hierarchical heuristic approach. This approach is heavily informed by prior work on the same domain [25, 26, 29]. Our gender estimation method proceeds as follows: first, for movie scripts of an already produced film, we obtain the character’s gender from the casting of that role from IMDb http://imdb.com. For the remainder of the characters, we rely on the following heuristics:

  1. For proper names, we estimate gender using historical U.S. census data [46].

  2. We use gendered pronouns as markers of that character’s gender. The set of pronouns was selected from Twenge et al. [47], and includes the following words: female pronouns (e.g., she, hers, her, and herself); male pronouns (e.g., he, his, him, and himself), and neutral (or plural) pronouns (e.g., we, they, them).

  3. Lookup over a manually collected word list containing the 3, 000 most frequent words and their gender. This list includes gendered words for family and relationships (e.g., uncle, aunt, wife, husband), common gendered nouns and other words where gender is presumed evident (e.g., boy, gal, policeman, congresswoman).

From the total of n = 2, 136, 207 character instances, our hierarchical heuristic method is able to infer the gender for 71.64% of instances (n = 1, 530, 587). From these, 917, 114 correspond to agents and 613, 473 to patients (see Table 2). In line with previous research, the sample of genders contains male characters in a 2-to-1 proportion to female characters [10, 48]. We manually identify 49, 885 cases of character references which should have a gender, but our heuristics were not able to determine which gender it should be. A majority of these cases (56.4%) are unresolved co-references (e.g., I, you, me), and could be addressed in future work when appropriate literary co-reference systems become widely available.

Table 2. Gender distribution of agents and patients.

Our sample displays a 2-to-1 ratio of male to female characters. Proportion tests reveal an unequal proportion of Male agents to Female agents, and Female patients to Male patients (p < 0.0001).

Female Neutral Male Unknown Total
Agent 227,196 176,596 479,136 34,186 917,114
Patient 144,890 168,092 284,757 15,734 613,473
Total 372,086 344,688 763,893 49,920 1,530,587

Gender estimation performance

We estimate the performance of our gender estimation through a manual verification process. This process involved a manual inspection of the dataset to collect 400 character names alongside their gender (i.e., non-gendered, male, female, and neutral). We then verify that our heuristics are able to infer the correct gender for each of these instances as a classification task. We report classification performance as part of our results.

Machine-learning model

In this section we provide an overview of the computational model that identifies the action and its constituents (agents and patients). This model is based on the current state-of-the-art BERT-based models for semantic role labeling (SRL) [49]. Additionally, we present the steps performed for domain-adaptation, which results in a significant improvement in performance over competitive baselines.

Automatic identification of character actions

We frame the problem of automatically identifying the set of actions and its participating characters as a Semantic Role Labeling task (SRL), with a few differences. Given a sentence, the SRL task consists of analyzing the propositions expressed by some target verbs of the sentence. In particular, for each target verb, all constituents in the sentence which fill a semantic role of the verb have to be recognized. Typical Semantic arguments include Agent, Patient, Instrument, etc. and also adjuncts such as Locative, Temporal, Manner, Cause, etc. [50].

According to Shi et al. [49], a typical formulation of the SRL task is split into four subtasks: predicate detection, predicate sense disambiguation, argument identification, and argument classification. We start from the assumption that there are models that support the first two tasks in a reliable manner. Specifically, in our experiments, we use Spacy [43] for predicate identification, which allows us to focus entirely on the argument identification and classification sub-tasks. Furthermore, in contrast to the traditional SRL task [50], we are only interested in the characters performing the action, and those that are the object of an action. Hence, our label set can be restricted to actions, agents, and patients only. Another point of contrast is that we explicitly make the distinction between objects (inanimate) and patients (characters).

Proposed model

We follow Shi et al. [49] in applying a simple yet powerful recipe for SRL: obtain word vector representations from a pretrained BERT-based architecture to train a Recurrent Neural Network for sequence labeling. Our proposed model (see Fig 3) learns to map the sequence of tokens from an input sentence to a sequence of labels for actions, agents, and patients. Inputs to our model are sentences, which are tokenized and fed into a BERT model to obtain highly contextualized word representations. These representations are then used as input to a Recurrent Neural Network to produce a sequence of token-level labels.

Fig 3. Proposed SRL system.

Fig 3

Starting at the bottom, the input to the system is an action description in natural language. The output, shown at the top of the figure, is a sequence of labels (one per word). Labels indicate whether this word depicts an action, agent, patient or none. From its inputs, our model obtains a highly-contextualized representation for each word using the BERT transformer [53]. Each representation corresponds to a high dimensional dense vector that encodes the semantics of that word and the context it plays within the sentence. The sequence of vector representations is then fed into a recurrent neural network and a softmax layer for sequence labeling. As a post-processing step, a set of heuristics aggregate conjunctions to handle the case of groups of agents or patients.

In contrast to Shi et al. [49], who inputs the predicate as an additional feature to the model, our current setup restricts role labeling to a single predicate per sentence. If a sentence has more than one predicate, we create a separate copy for each predicate; this same setting was applied by Daza and Frank [51] and Zhou et al. [52]. In the following sections, we provide further details into the steps taken by our model.

Input representation

We selected a BERT model for our input representation because of its remarkable success on a variety of NLP tasks, such as question answering, dialogue systems, and information extraction [53]. In this work, we start from the original BERT model (https://github.com/google-research/bert) trained for the general-domain on a large unlabeled plain text corpus—that is, the complete English Wikipedia and BookCorpus.

We follow the traditional format used for sentence encoding in the BERT transformer [53]. To obtain an input representation, we feed an action description into the model as an input sequence. This sequence starts with a sentence delimiter ([CLS]) and ends in a separator delimiter ([SEP]), as follows:

[CLS]word1word2word3wordk[SEP]

Our first step is to tokenize the sentence elements using WordPiece [54]. The WordPiece tokens are fed into pre-trained BERT models from which we obtain one vector representation for each of the tokens. Formally, the sentence representation step maps a delimited sequence of k words into n WordPiece tokens, as follows

[CLS],w1,w2,,wk,[SEP]t[CLS],t1,t2,,tn,t[SEP] (1)

Note that the length of these sequences might not be the same as the tokenizer might split a word into multiple sub-word tokens. The token sequence is then mapped into the representation sequence by the BERT model. Let H denote the sequence of BERT representations, given by

H={h[CLS],h1,h2,,hn,h[SEP]} (2)

The dimension of each hi is given by the BERT model, and corresponds to 768 and 1024 for the bert-base and bert-large models, respectively. The sequence of highly-contextualized vector representations is then fed into a RNN for token-level prediction.

Token-level prediction

We use a bidirectional recurrent neural network (RNN) to obtain the semantic role labels for each word in a sentence. RNNs are a class of neural networks specialized in processing sequences of inputs. With each input, the RNN updates its internal state (memory) and produces a probability distribution over the labels for that input. Here, we feed the output of BERT (Eq 2) into the RNN layer as sequence of tokens, for which we obtain a sequence of probability distributions. To obtain the sequence of SRL labels (i.e., Verb, Agent, and Patient), each token gets assigned to the label with the maximum (posterior-)probability. For the RNN, we explore two popular configurations: Long short-term memory cells (LSTM) [55] and Gated Recurrent Units (GRU) [56].

Formally, the sequence obtained from BERT, H, is then fed into a bidirectional RNN to learn a mapping between the tokens and the SRL labels of interest. The RNN takes the sequence of representations H and outputs a sequence of n hidden vectors {v[CLS], v1, …, vn, v[SEP]}. Each vi is constructed as the concatenation of the left-side and right-side context, vi=[vi;vi]. To predict a label, we use a fully connected dense layer and a softmax function over all labels:

si=ϕ(vi)y^=softmax(si) (3)

where ϕ is the activation function. In our experiments, this function corresponds to a linear activation function ϕ(x) = xA + b where A and b are learn-able parameters of the model. Finally, the complete model is trained using a weighted cross entropy loss where wC is the weighted associated to the class C

L(y,y^)=wC(-y[C]+log(ey^))

Post-processing

From the SRL system, we collect two outputs: the subword WordPiece token representation of the sentence, and the sequence of token-level labels. To recover the word-level sentence, we post-process the outputs of the RNN (Eq 3) by removing the special tokens introduced by BERT (i.e., [CLS], [SEP], [PAD] and [MASK]) and merging back WordPiece tokens into words. The word-level label is calculated from its corresponding tokens as the mode of the token’s label. Additionally, we postprocess the data further to accommodate for cases where agents and patients are composed of more than one word. Examples of this include the case of honorifics (e.g., ‘Mr. Anderson’, ‘Captain Crunch’) or expressions with more than one character (e.g., ‘Mr. Smith and his wife’). We restructure our word sequence by (i) concatenating consecutive words with the same label into a single term, and (ii) merging consecutive terms that end up in a conjunction. The labels for word groups is assigned to be equal to the label of the left-most word. For example, after applying our model and post-processing procedure to the sentence presented in Fig 3 we are able to discern that “Boromir” is the agent of the action “look”, and that the patients correspond to “Elron and Galdalf”.

Fine-tuning

Fine-tuning aims to adjust the BERT language model and its vocabulary for the way language is used in a particular domain, without offsetting the generalization power of the original model. This procedure typically results in models that achieve state-of-the-art performance for domain-related tasks. In this work, we perform fine-tuning of BERT language models by continuing the train of the model end-to-end on the action description dataset described in the previous section. In end-to-end training, we back-propagate the errors from the sequence labeling task back through the network, all through the BERT transformer layers (yellow box in Fig 3). This results in the update of the BERT layer parameters, which adapts them to the patterns in the language of the movie scripts.

Model performance comparison

Model performance was estimated over the manual annotated dataset of n = 9, 613 action descriptions. For training, we used 75% of the available data (n = 7, 209). Performance was estimated on 15% of the data, held back as a test set (n = 1, 419). A development set (10%, n = 985) was used for parameter optimization and early stopping. Our experiments measure the model’s ability to correctly categorize each token by its corresponding semantic role label. Hence, this can be seen as a 4-way classification task (i.e., Action, Agent, Patient, and None). We report the average accuracy and micro-average F-score for this classification task.

Baselines

We compare the performance of our proposed approach to 3 baseline models, which were selected informed by previous literature in the SRL task. Our first baseline corresponds to a named-entity based approach proposed by Sap et al. [29] where we leverage part-of-speech tags and syntactic dependency tress to identify actions, Agents and Patients. The second and third baselines follow state-of-the-art BERT-based models for semantic role labeling (SRL): SimpleBERT [49] and AllenNLP SRL [57]. Even though our architectures are quite similar, there are a few differences between our approaches, particularly with respect to the inputs and the sequence labeling layer. A first difference is that AllenNLP uses a time-distributed linear dense layer to classify the sequence of outputs from the BERT system into the sequence of semantic role labels. In contrast, SimpleBERT and our method use a single RNN layer, as this might be better adapted to handling sequence data. Second, both SimpleBERT and AllenNLP extend the sentence representation of BERT to include the current predicate, as a way of informing the model which action to attend to. Instead, we restrict our inputs to only a single predicate per sentence. Finally, we note that SimpleBERT and AllenNLP are trained on non-literary data. Hence, these models serve as a direct comparison to the performance of out-of-the-box state-of-the-art SRL systems when applied to a novel real-world application.

We control for this possible source of noise by presenting two versions of each baseline. In the first version, the models are only provided with the action description and have to perform predicate identification and semantic role labeling. In a second version, we provide the sentence and the gold standard predicate, so the models only need to do the semantic role labeling task. We refer to the second version as the oracle predicate version.

Statistical analysis

We propose a statistical model to identify significant differences in the frequency of the action portrayals due to the role and gender of its participants. We do this through a series of four studies. These studies explain the response variable (action frequency) as a function of different independent variables. The first and second studies examine the relation between action frequency and agent’s or patient’s gender independently. We denote the results from this studies by (S1: agent-only) and (S2: patient-only), respectively. The third study investigates the effects of the agent’s and patient’s gender simultaneously by including an interaction term as part of the co-variates. This model highlights actions that occur more frequently only when agents’ and patients’ have a particular gender each. We denote this model by (S3: agent & patient). Finally, our fourth model studies the dependency between action portrayals dependent on film trends over the years. We denote this study by (S4: agent & patient + time).

In each study, a generalized mixed-effects linear regression model is used to relate the genders and roles to the frequency of portrayals. We describe our statistical formulation in the following sections.

Regression models

We part from the assumption that if there is no difference between the number of times an action is portrayed with respect to any particular gender, then this action does not reflect a gender stereotype. This gives us a natural framework for posing the problem of identifying stereotypes as a regression over the frequency of actions and the gender of its participants.

To uncover this relation, we use a Poisson-regression generalized linear mixed model(GLMM; [58]). GLMMs are an extension of generalized linear models (e.g., logistic regression) to include both fixed and random effects. A fixed effects approach was chosen over a random-effects approach because our data contain repeated measurements (i.e., a single character can participate in multiple actions, multiple times). Furthermore, we use a Poisson regression as it is particularly useful for response variables that represent counts or frequencies [59].

The formal specification of the GLMM is as follows. Let Y = [y1, y2, …, yN] be our response variable. Each yi corresponds to the number of times we see action i in our dataset. We assume that Y follows a multi-variate Poisson distribution, and we model the expected value as a linear combination of unknown parameters,

E[Yγ]ηg(η)=Xβ+Zγ (4)

where X and β are the fixed effects design matrix, and fixed effects; Z and γ are the random effects design matrix and random effects. The link function g corresponds to the log-link function log. We vary our predictor variables X and random effects matrix Z according to the each study as we describe in the forthcoming sections.

Parameter estimation

Regression parameters are estimated by deviance minimization, a generalization of the idea of using the sum of squares of residuals in ordinary least squares to cases where model-fitting is achieved by maximum likelihood [58]. Additionally, we identify statistically significant coefficients through a series of hypothesis tests on the coefficients. For each coefficient, we perform a Z-test and correct for multiple comparisons using Holm-Bonferroni method.

Model validation

For each of our presented studies, we validate its corresponding regression models by comparing the performance with two reduced models: null and the no-interaction models. In the null model, no additional explanatory variables are used. This model seeks to explain frequency of portrayal as a function solely of the control variables Z. In the no-interaction model, we remove the random effect–cross–gender interaction variable. By comparing against these two reduced models, we can provide statistical evidence for the existence of a relation between the gender of the characters (X) and the frequency of portrayals (Y). Furthermore, by comparing against the no-interaction model, we provide evidence for the hypothesis that the interaction between the agent’s (or patient’s) gender and actions makes for a better predictor than just the action and gender on their own. All model comparisons were done using a likelihood ratio test (χ2 test) at a significance level of α = 0.05.

Studies

Study 1: Agent-only

For our first study, the fixed effect matrix, X, corresponds to an indicator variable of the gender of the agent. Genders are encoded as M, F and N for male, female and neutral, respectively.

Additionally, we control for two sources of variability as part of the random effect matrix, Z. The first effect controls for the distribution of actions. That is, the fact that naturally some actions will occur more often than others. For this, Z incorporates actions as a categorical co-variate (Ca). The second effect we control for is the movie genre. This comes from the fact that certain actions are more common than others for a particular type of movie. For example, we can expect the action ‘run’ to be more common in action movies than in dramas and romantic comedies. We obtain the genres for all of the movies in our data set from IMDb, and transform them into a binary matrix, mg ∈ {0, 1}N×G, where each entry indicates whether that movie has that particular genre. We include mg as an additional co-variate in our model. Formally, Z is given by

Z=[Ca;mg] (5)

To fit this study’s regression models, we sub-sample the action description dataset to include only those records with a known agent. We observe that the gender distributions of this sub-sample still follows the gender distribution of the full dataset. Moreover, it also follows previous reports in the literature, that is, it remains as a 2-to-1 male to female ratio [10, 48].

Study 2: Patient-only

Similarly to our first study, in the second study, the fixed effect matrix, X, encodes an indicator variable of the gender of the patient. It also uses the same random effect matrix as the previous study (Eq 5)

To fit the regression model, we sub-sample the action description dataset to include only those records with a known patient. This sub-sample also follows the gender distribution of the full dataset and the 2-to-1 male to female ratio.

Study 3: Agent & patient

For the third study, the fixed effect matrix, X, is designed to reflect the gender group of agents and patients. This gender group is constructed as a categorical value for the combinations of male-to-male, male-to-female, female-to-male, and female-to-female. We use the same random effect matrix, Z, as in the first study (Eq 5).

For this study, the regression model is fitted to those records for which we know the gender of both agent and patient.

Study 4: Agent & patient + time

It could be argued that the frequency of actions can be explained by changes in film trends over the years. Since our sample already provides a resource that covers a large span of production years (1909–2013), in our final study, we investigate if the frequency of portrayals can be explained as a function of both time and character demographics. To this end, we include an additional control variable my ∈ {1909, 2013}N, the year the movie was released, as part of the control variables Z. By following this approach (as opposed to just including the variable as an additional co-factor), we ensure that the model learns that different years might follow different trends. The final control effect variable Z˜ is given by

Z˜=[Ca;mg;my]

Results

Annotation verification

Manual annotations

From the 12, 500 sampled sentences, we found 14, 344 actions to be annotated. Annotators agreed with both the agents and patients for a large majority of the cases n = 11, 775(82.09%). Although, in one out of five, the verification annotation considered their answers to be incorrect (n = 2, 162(18.36%)). A posterior analysis on the samples that were deemed incorrect reveals that most (90%) of the errors were caused by a small subset of annotators. We believe this annotators skipped the task completely by marking the ‘Does not say’ box and submitting. After discarding the incorrectly labeled samples, we are left with a dataset of 9, 613 sentences with at least one action.

Character’s gender estimation performance

Table 2 presents the gender distribution of our dataset. Previous works argue that male characters are generally given more agency than female characters [11, 29]. We are able to corroborate this finding through proportion tests. Our results reveal that the proportion of male agents is significantly higher than that of female agents (Z = 294.12, p < 0.0001). Additionally, the proportion of female patients is larger than that of male patients (Z = 294.12, p < 0.0001).

The results of the gender estimation heuristics performance on a set of 400 manually selected samples is shown in Table 3. This table reports precision, recall and F1 for each one of the gender groups, as well as average metrics across all groups. Our method achieves 75% accuracy and macro-F1 score of 77.75%, with individual F1 scores rating from 72.0% (non-gendered) to 87.0% (neutral). From their class-level results, we observe that our gender classification method is fairly precise when labeling female and male characters as well as gender-neutral terms (e.g., they, them).

Table 3. Gender classification report for the proposed heuristics over 400 manually annotated samples.
Precision Recall F1-score
Female 91.0 67.0 77.0
Male 91.0 63.0 75.0
Neutral 95.0 80.0 87.0
Average 92.33 70.0 79.66

Machine-learning model performance

The complete system-level performance results are presented as part of the S1 Table. With respect to identifying agents and patients, the baseline models achieve high precision but suffer from low recall. While a high precision is important, we must consider that this model is to be used in the context of a large-scale analysis of characters and their actions. To obtain the most representative sample for such analysis, we would want our model to retrieve as many instances of actions and their participants. Hence, models with a higher recall ought to be preferred.

Even though there is a clear domain mismatch in the way the baselines were trained (news wires vs. movie scripts), both baselines can still recover some of the signal present in the dataset. In contrast, our naïve approach of relying on a pre-trained model BERT-base (uncased) resulted in no patient label being produced, and thus a 0.00% F1 score for that category. This suggests that movie scripts, and character names specifically, always follow a proper grammar.

Our results show that the domain adaptation of BERT language model resulted in the biggest improvement overall. For example, domain adapting SimpleBERT [49] resulted in a 6 percent increment in action identification, and about 3 to 4% (absolute) gains in agent and patient classification. Furthermore, our proposed model, trained end-to-end, achieved over 30% (absolute) above the baseline performance. The best performing model was our proposed conjunction of transformer and RNN (GRU), where the transformer was initialized with a BERT-base cased pre-trained model, and the complete set of parameters was updated end-to-end. This model achieves F1 scores of 96.80, 89.78 and 73.00 percent for action, agent and patient respectively. Compared to the baseline models, the difference in these performances was found to be significant (permutation test, n = 105, all p < 0.05). Surprisingly, even though we did not precondition BERT with the current predicate (as both SimpleBERT and AllenNLP do), our model was able to correctly infer the action for most of the sentences.

Finally, we investigated changes in the performance of the proposed model due to different RNN dimensions (see S2 Table). The model seems to get saturated around a dimension of 300, with higher dimensions not performing particularly differently from the current size. This result also suggests that the poor performance of the SRL baselines could be due to their larger size.

Statistical models results

Across all four studies, models that include gender predictors performed significantly better than the null models (χ2 tests, all p < 0.0001). Including interactions between gender and action improved the performance of all models (χ2 tests, all p < 0.0001). These results support our assumption that gender plays an important role in defining the frequency of portrayals for a particular set of actions.

Study 1: Agent-only

The regression model with gender as a co-variate achieved significantly lower AIC than the null model (χ2(3) = 17, 375, p < 2.2e − 16). Furthermore, including the interaction variable between action and agent’s gender reduces AIC significantly further (χ2(5723) = 15, 311, p < 2.2e − 16). From the fitted regression models, we identify 378 actions for which an agent’s gender plays a significant role in the frequency of the action portrayal (t-test, α = 0.05). S2 Table shows the list of significant regressors. It is important to note that some of the identified actions appear to contain errors. For example, errors in the lemmatizer or parsing modules (e.g., confusing the past tense ‘lie’ with ‘lay’), intransitive verbs (e.g., ‘dress’, ‘wear’), or verbs whose subject is a thing and not a movie character. We have manually identified these errors and color coded them in our results.

Study 2: Patient-only

Once again the regression model with gender as a co-variate showed a significantly lower AIC than the null model (χ2(3) = 17543, p < 2.2e − 16). Including the interaction term between patient’s gender and action lead to even lower AIC (χ2(5723) = 6456, p < 0.0001). Similarly, we identify 60 actions for which a patient’s gender plays a significant role in the frequency of the action portrayal (t-test, α = 0.05). S3 Table shows the list of significant regressors.

Study 3: Agent & patient

The regression model with gender groups achieved a significantly lower AIC than the null model (χ2(3) = 16789, p < 2.2e − 16). Furthermore, we found that a model that considers action-gender interactions provides a significant better explanation to the frequency of portrayals than the model without interactions (χ2(5723) = 8951, p < 2.2e − 16). We identify 135 instances where the frequency of an action is significantly impacted by the genders of the agents and patients portraying those actions. S4 Table show these significant relations.

Study 4: Agent & patient + time

For our last study we see a similar trend, even when controlling for year, gender co-variates provide a significant reduction in AIC (χ2(3) = 18751, p < 2.2e − 16). Further decreased by incorporating the action–gender interaction term (χ2(5723) = 6933, p < 2.2e − 16). However, out of the four studies, this model only produces a handful of significant results. The reduced number of results we obtained for this study could be due to only a few actions happening consistently across the span of several decades. If an action only occurs a few times in a given year, our model might not be able to pick this difference up due to issues with statistical power. Hence the need for the aggregated results from studies 1 to 3. From the actions that do happen consistently across decades, we identify 23 actions for which a character’s gender plays a significant role in the frequency of the action portrayal even when controlling for the yearly trends. These results are show in S5 Table.

In the following section, we discuss the significance of this findings in the context of film theory.

Discussion

TV and film are among the most universal mass media in history [60]. They have a tremendous power to shape the ways in which people think and behave. When character depictions are perceived by the viewer as similar to their standard everyday reality, the media message is amplified, creating a more powerful and influential suggestion [60, 61].

Studies conducted with small annotated samples of movies and TV shows have suggested that female characters are constantly portrayed to be less dominant, more emotional, less technical, and more nurturing. In contrast, male characters are shown to be assertive, competitive, independent avoiding weakness, insecurities and emotional outbursts [29, 31, 37]. In the following sections, we discuss how our findings on the portrayals of actions can help provide large-scale empirical evidence to corroborate these results. We do so with a focus on character agency and portrayals of emotion.

Additionally, we explore how some of the actions we found to be significantly dependent on the patient’s gender can be explained as part of the “male gaze” theory. The ‘male gaze’ theory [35] posits three different looks associated with a film: one of the camera (usually controlled by a man, either a staffer or director), one for the characters looking at each other, and one originating from the spectators or audiences. In all of these, the woman is the passive receiver of the gaze and the man is the active spectator of the woman; the woman is taken as an object, subjected to a controlling and curious gaze of the man [36]. The results obtained through the presented studies (S1—S4) corroborate that certain actions often associated with gaze typically originate from a male agent, and mostly focus toward a female patient.

While we only present a handful of the results, we make the larger set readily available as a way for researchers across other disciplines to look through it and corroborate with their expertise.

Characters’ agency

Previous works argue that male characters are generally given more agency than female characters [11, 29]. Proportion tests on the difference between the number of portrayals of male agents vs. female agents corroborate this finding. In our dataset, the number of male agent’s portrayals significantly higher than that of female agents (Z = 294.12, p < 0.0001). Moreover, the proportion of female patients is larger than that of male patients (Z = 294.12, p < 0.0001).

With respect to the actions that significantly depend on the gender of the actor, our results parallel those collected by Sap et al. [29]. Male characters are less likely to be shown ‘letting’ other male characters do something to them (S3: β = −2.27).

With respect to portrayals across the decades, we see that ‘call meeting’ is an action that has been consistently portrayed mostly between two male characters (S4: β = 8.83). We hypothesize that this trend reflects the fact that male characters are being shown in professional settings more often than their female counterparts.

The male gaze theory

Our results highlight differences in the emphasis placed on the female appearance and sexual objectification of women actors. For example, our estimations of actions on the patients (Study 2) suggest that female characters are more likely to be ‘gawked’ or ‘looked at’ by other characters (β = −2.76 and β = −2.88, respectively). From our study 1, we also found that the action ‘stare fascinated’ is more frequently portrayed by male agents (S1: β = 2.28). Furthermore, our results of Study 4 (Agent & Patient + Time) suggests that some of these actions are consistent even across decades. After controlling for year of production, male agents are still shown ‘looking’ significantly more frequent than female agents (S4: β = 73.62), and female patients are significantly more often portrayed as being ‘looked’ or ‘watched’ (S4: β = −50.26 and β = −13.87, respectively).

Portrayals of emotion

Previous media studies suggested that female characters are typically stereotyped through portrayals of emotional outbursts [37]. In a similar tone, our results highlights that male characters are statistically less likely to be shown portraying certain actions that convey emotions. For example, S1: Agent-only highlights that male characters are less likely to be agents of ‘sobbing’ and ‘crying’ (S1: β = −1.47 and β = −1.01 respectively). Moreover, male characters not often portrayed as baffled or concerned (S1: β = −3.72 and β = −3.27), nor as agents of snuggling and giggling (S1: β = −1.49 and β = −1.03). Likewise, our results from S3: Agent + Patient suggests that male characters rarely ‘scream’ at other male characters (S3: β = −2.59); nor do male agents ‘laugh’ or ‘smile’ at other male characters (S3: β = −2.46, and β = −2.19, respectively).

Violence and aggression

Media studies have found that women are typically portrayed in distress or need of protection [12, 38, 62, 63]. We have identified particular actions that help corroborate these findings. Across the decades, male agents have been shown to ‘get pissed’ at other male characters more often (S4: β = 6.45). Other results show that female characters are more likely to be patients of aggressive actions. We see this in examples of actions such as ‘kidnap’, ‘drug’, and ‘hassle’ where the patient is more likely to be a female character (S2: β = −2.21, β = −2.89, and β = −2.60 respectively). Similarly, female characters are less likely to be ‘lured’ by other characters (S2: β = −2.41).

We found an interesting exception to this trend in the action ‘shoot’. There is a higher frequency when the target is a male character than when the target is female (S2: β = 3.19). One possible explanation for this result is that male characters just happen to be portrayed more often in action scenes involving a gun. As previous literature suggested, most of the perpetrators are portrayed by middle-aged white male actors [6466], in action-driven male-dominated narratives where the conflict is resolved with the villain’s demise.

Displays of affection

Additional results highlight gender differences in the way affective interactions happen on the screen, specifically with respect to the male-to-male pairs. Male characters are often the patient of the ‘kiss’ (S2: β = 3.63) but not often the initiators of the action (S1: ‘kiss’ β = −2.98, adjusted p-value >0.05). Yet, two male characters are rarely shown on screen ‘kissing’, ‘wrapping [arms]’, ‘dancing’, or ‘hugging’ (S3: β = −3.18, β = −2.91, β = −2.91, and β = −2.78, respectively).

These results seem to reflect on a societal conception of mixed-sex affection portrayals—what Tillmann [67] calls the ‘invisibly ordinary’—where heterosexuality is the predominant assumption for characters in movies. In contrast, the historical notion of same-sex displays of affection is that they evoke disgust, ‘cultural squeamishness’, and even sometimes, real-life violence [39, 40, 67, 68]. If same-sex shows of affection induce such a response, why is that our results only capture the male-to-male dyad and not the female-to-female dyad? To uncover possible differences in the same-sex dyads, we perform an additional comparison using regression models fitted in the subset where the agent and patient are assigned the same assumed gender (i.e., male-to-male vs. female-to-female). The frequency of the affective actions ‘kiss’, ‘hug’ and ‘wrap’ and the genders of the agent and patient remains significant (t-test, all p < 0.05). In all cases, the affective actions were less likely to be portrayed by two males than two females (same-sex: β = −3.22, β = −2.96, and β = −2.93 respectively). Thus, our results seem to point towards the sexualization of female shows of affection, specifically in how an on-screen lesbian kiss can be perceived as a form of sexualized entertainment for the heterosexual male viewer [69].

Conclusion

This work presents a novel large-scale analysis on the actions taken by the characters, and how these actions are related to gender-based stereotypes in media. It is part of an overarching initiative to go beyond simple frequency statistics and assess the quality of character portrayals in media stories told in film and television.

Our results uncover linguistic patterns from the action descriptions in movie scripts where male characters are portrayed with higher agency than female characters; female characters are often cast in emotional supporting roles; male affection is rarely presented on screen, especially when the patient of affection is also a male, and where female characters are often the patients of actions that draw attention to their appearance and looks. Thus, our work complements previous literature and provides large-scale empirical evidence to support their claims.

Limitations and future work

Some of the limiting factors of our analysis originate from the limitations of the automated processing pipeline. We are bound by the capabilities of the parser to identify words, their part-of-speech tags, as well as the semantic role they play.

Another limitation is due to a nuanced societal and linguistic context that its lost to our analysis. Even when certain words are being used inside the scripts they might communicate a different meaning given what is happening on-screen and the overall narrative of the story. While our large sample of movies, use of highly-contextual embeddings, and aggregation studies aim to coalesce the differences, we can only be certain that we capture but a small sample of the all the possible different linguistic contexts.

Finally, we would like to extend our current framework to incorporate notions of representation intersectionality (e.g., gender, age and race). Even though our current framework provides a way to incorporate these variables, there are still several limitations in automating the labeling of these constructs at a scale (e.g., defining an appropriate ontology).

Supporting information

S1 Table. Machine learning model performance.

Complete table of performance results for the SRL systems. Legend: Oracle action denotes models with no automatic action identification. Uncased / Cased refers to the type of pre-trained BERT model used. +fine-tune notes models that were fine-tuned using the manually labeled action description dataset. For LSTM and GRU, size of hidden dimension is given by the number in parenthesis. Best performance highlighted in bold.

(PDF)

S2 Table. Results for Study 1: Agent-only.

Regression model results for agent’s actions. We test the significance of the coefficients through Z-test, and correct for multiple comparisons using the Holm-Bonferroni method. Table shows only significant coefficients with adjusted-p < 0.05. Rows are ordered by the magnitude of their coefficient (β). The direction of the relationship is determined by the sign of the coefficient, with positive coefficients corresponding to actions which are more likely portrayed by a male character. Likewise, negative coefficients present actions that are less likely to be portrayed by male characters. Manually identified errors are color coded (blush—errors due to parsing and lemmatization; gray—errors due to SRL).

(PDF)

S3 Table. Results for Study 2: Patient-only.

Regression model results for actions done to patients. We test the significance of the coefficients through Z-test, and correct for multiple comparisons using the Holm-Bonferroni method. Table shows only significant coefficients with adjusted-p < 0.05. Rows are ordered by the magnitude of the coefficients (β). The direction of the relationship is determined by the sign of the coefficient, with positive coefficients corresponding to actions more likely done towards male characters. Manually identified errors are color coded (blush—errors due to parsing and lemmatization; gray—errors due to SRL).

(PDF)

S4 Table. Results for Study 3: Agent & patient.

Regression model results for the agent–patient interactions. Group encodes gender dynamics (e.g., M→M identify actions done by male characters towards other male characters). We test the significance of the coefficients through Z-test, and correct for multiple comparisons using the Holm-Bonferroni method. Table shows only significant coefficients with adjusted-p < 0.05. Rows are ordered by the magnitude of the coefficients (β). Direction of the relationship is given by the sign and magnitude of β with positive values indicating actions more likely portrayed by that group of characters. Manually identified errors are color coded: blush for errors down-streamed from an outside the SRL system (e.g., parsing, lemmatization), and gray for errors due to mislabels coming from our SRL system.

(PDF)

S5 Table. Results for Study 4: Agent & patient + time.

Regression model results for the agent–patient interactions controlling for year of production. Group encodes gender dynamics (e.g., M→M identify actions done by male characters towards other male characters). A star (*) is used as short-hand for either group (e.g., *→F labels actions where the patient is Female and the agent’s gender can be any value). We test the significance of the coefficients through Z-test, and correct for multiple comparisons using the Holm-Bonferroni method. Table shows only significant coefficients with adjusted-p < 0.05. Rows are ordered by the magnitude of the coefficient (β). The direction of the relationship is given by the coefficient’s sign, with positive coefficients corresponding to actions more likely portrayed by their encoding group. Manually identified errors are color coded gray for errors due to mislabels coming from our SRL system.

(PDF)

Acknowledgments

We thank the reviewers for their insightful comments and feedback towards improving this work. VRM would like to acknowledge Prof Jesus Arroyo Relion for his guidance while developing the statistical models. Finally, the authors gratefully acknowledge the members and advisors of USC Center for Computational Media Intelligence.

Data Availability

We have made the dataset and labeling models available for download at https://sail.usc.edu/~ccmi/actions-agents-and-patients/.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1. Holtzman L, Sharpe L. Media messages: What film, television, and popular music teach us about race, class, gender, and sexual orientation. Routledge; 2014. [Google Scholar]
  • 2. Gerbner G, Gross L, Morgan M, Signorielli N. Growing up with television: The cultivation perspective. In: Bryant J, Zillmann D, editors. Media effects: Advances in theory and research. New Jersey, NJ: Lawrence Erlbaum Associates, Inc; 1994. p. 17–41. [Google Scholar]
  • 3. Browne Graves S. Television and prejudice reduction: When does television as a vicarious experience make a difference? Journal of Social Issues. 1999;55(4):707–727. doi: 10.1111/0022-4537.00143 [DOI] [Google Scholar]
  • 4. Cheryan S, Drury BJ, Vichayapai M. Enduring influence of stereotypical computer science role models on women’s academic aspirations. Psychology of Women Quarterly. 2013;37(1):72–79. doi: 10.1177/0361684312459328 [DOI] [Google Scholar]
  • 5. Steinke J. Adolescent girls’ STEM identity formation and media images of STEM professionals: Considering the influence of contextual cues. Frontiers in psychology. 2017;8:716. doi: 10.3389/fpsyg.2017.00716 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Entman RM. How the media affect what people think: An information processing approach. The journal of Politics. 1989;51(2):347–370. doi: 10.2307/2131346 [DOI] [Google Scholar]
  • 7. Widdershoven G, Josselson R, Lieblich A. The narrative study of lives.; 1993. [Google Scholar]
  • 8. Wilson JD, MacGillivray MS. Self-perceived influences of family, friends, and media on adolescent clothing choice. Family and Consumer Sciences Research Journal. 1998;26(4):425–443. doi: 10.1177/1077727X980264003 [DOI] [Google Scholar]
  • 9. Polce-Lynch M, Myers BJ, Kliewer W, Kilmartin C. Adolescent self-esteem and gender: Exploring relations to sexual harassment, body image, media influence, and emotional expression. Journal of Youth and Adolescence. 2001;30(2):225–244. doi: 10.1023/A:1010397809136 [DOI] [Google Scholar]
  • 10.Smith SL, Choueiti M, Pieper K. Inequality in 900 Popular Films; 2017.
  • 11.Geena Davis Institute on Gender in Media. The Geena Benchmark Report 2007–2017; 2019.
  • 12. Gauntlett D. Media, gender and identity: An introduction. Routledge; 2008. [Google Scholar]
  • 13. Wood JT. Gendered media: The influence of media on views of gender. Gendered lives: Communication, gender, and culture. 1994;9:231–244. [Google Scholar]
  • 14. Uray N, Burnaz S. An analysis of the portrayal of gender roles in Turkish television advertisements. Sex roles. 2003;48(1-2):77–87. doi: 10.1023/A:1022348813469 [DOI] [Google Scholar]
  • 15. Fouts G, Burggraf K. Television situation comedies: Female body images and verbal reinforcements. Sex roles. 1999;40(5-6):473–481. doi: 10.1023/A:1018875711082 [DOI] [Google Scholar]
  • 16. Zhang Y, Dixon TL, Conrad K. Female body image as a function of themes in rap music videos: A content analysis. Sex roles. 2010;62(11-12):787–797. doi: 10.1007/s11199-009-9656-y [DOI] [Google Scholar]
  • 17. Paek HJ, Nelson MR, Vilela AM. Examination of gender-role portrayals in television advertising across seven countries. Sex roles. 2011;64(3):192–207. doi: 10.1007/s11199-010-9850-y [DOI] [Google Scholar]
  • 18. Sink A, Mastro D. Depictions of gender on primetime television: A quantitative content analysis. Mass Communication and Society. 2017;20(1):3–22. doi: 10.1080/15205436.2016.1212243 [DOI] [Google Scholar]
  • 19. Reichert T, Carpenter C. An update on sex in magazine advertising: 1983 to 2003. Journalism & Mass Communication Quarterly. 2004;81(4):823–837. doi: 10.1177/107769900408100407 [DOI] [Google Scholar]
  • 20. Somandepalli K, Guha T, Martinez VR, Kumar N, Adam H, Narayanan S. Computational Media Intelligence: Human-Centered Machine Analysis of Media. Proceedings of the IEEE. 2021; p. 1–20. [Google Scholar]
  • 21.Guha T, Huang CW, Kumar N, Zhu Y, Narayanan SS. Gender representation in cinematic content: A multimodal approach. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction; 2015. p. 31–34.
  • 22.Kumar N, Nasir M, Georgiou PG, Narayanan SS. Robust Multichannel Gender Classification from Speech in Movie Audio. In: Interspeech; 2016. p. 2233–2237.
  • 23.Hebbar R, Somandepalli K, Narayanan S. Robust speech activity detection in movie audio: Data resources and experimental evaluation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2019. p. 4105–4109.
  • 24.Hebbar R, Somandepalli K, Narayanan SS. Improving Gender Identification in Movie Audio Using Cross-Domain Data. In: INTERSPEECH; 2018. p. 282–286.
  • 25. Kagan D, Chesney T, Fire M. Using data science to understand the film industry’s gender gap. Palgrave Communications. 2020;6(1):1–16. doi: 10.1057/s41599-020-0436-1 [DOI] [Google Scholar]
  • 26.Ramakrishna A, Martínez VR, Malandrakis N, Singla K, Narayanan S. Linguistic analysis of differences in portrayal of movie characters. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada: Association for Computational Linguistics; 2017. p. 1669–1678. Available from: https://www.aclweb.org/anthology/P17-1153.
  • 27. Clark JM, Paivio A. Extensions of the Paivio, Yuille, and Madigan (1968) norms. Behavior Research Methods, Instruments, & Computers. 2004;36(3):371–383. doi: 10.3758/BF03195584 [DOI] [PubMed] [Google Scholar]
  • 28.Ramakrishna A, Malandrakis N, Staruk E, Narayanan S. A quantitative analysis of gender differences in movies using psycholinguistic normatives. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics; 2015. p. 1996–2001. Available from: https://www.aclweb.org/anthology/D15-1234.
  • 29.Sap M, Prasettio MC, Holtzman A, Rashkin H, Choi Y. Connotation Frames of Power and Agency in Modern Films. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics; 2017. p. 2329–2334. Available from: https://www.aclweb.org/anthology/D17-1247.
  • 30. Ainsworth C. Sex redefined. Nature. 2015;518(7539):288. doi: 10.1038/518288a [DOI] [PubMed] [Google Scholar]
  • 31. England DE, Descartes L, Collier-Meek MA. Gender role portrayal and the Disney princesses. Sex roles. 2011;64(7):555–567. doi: 10.1007/s11199-011-9930-7 [DOI] [Google Scholar]
  • 32. Baldick C. The concise Oxford dictionary of literary terms. Oxford University Press; 1996. [Google Scholar]
  • 33. Niemiec RM, Wedding D. Positive psychology at the movies: Using films to build virtues and character strengths. Hogrefe Publishing; 2013. [Google Scholar]
  • 34.Devlin J, Cheng H, Fang H, Gupta S, Deng L, He X, et al. Language Models for Image Captioning: The Quirks and What Works. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Beijing, China: Association for Computational Linguistics; 2015. p. 100–105. Available from: https://www.aclweb.org/anthology/P15-2017.
  • 35. Mulvey L. Visual pleasure and narrative cinema. In: Visual and other pleasures. Springer; 1989. p. 14–26. [Google Scholar]
  • 36.Eva-maria Jacobsson. A Female Gaze?; 1999.
  • 37.Fast E, Vachovsky T, Bernstein M. Shirtless and dangerous: Quantifying linguistic signals of gender bias in an online fiction writing community. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 10; 2016.
  • 38. Stabile CA. “Sweetheart, This Ain’t Gender Studies”: Sexism and Superheroes. Communication and Critical/cultural studies. 2009;6(1):86–92. doi: 10.1080/14791420802663686 [DOI] [Google Scholar]
  • 39. Dixon WW. Straight: Constructions of heterosexuality in the cinema. SUNY Press; 2012. [Google Scholar]
  • 40. McKinnon S. Watching men kissing men: the Australian reception of the gay male kiss on-screen. Journal of the History of Sexuality. 2015;24(2):262–287. doi: 10.7560/JHS24204 [DOI] [Google Scholar]
  • 41.Gorinski PJ, Lapata M. Movie Script Summarization as Graph-based Scene Extraction. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Denver, Colorado: Association for Computational Linguistics; 2015. p. 1066–1076. Available from: https://www.aclweb.org/anthology/N15-1113.
  • 42.Martinez V, Somandepalli K, Tehranian-Uhls Y, Narayanan S. Joint Estimation and Analysis of Risk Behavior Ratings in Movie Scripts. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics; 2020. p. 4780–4790.
  • 43.Honnibal M, Montani I, Van Landeghem S, Boyd A. spaCy: Industrial-strength Natural Language Processing in Python; 2020. Available from: https://spacy.io/.
  • 44.Li Z, He S, Cai J, Zhang Z, Zhao H, Liu G, et al. A Unified Syntax-aware Framework for Semantic Role Labeling. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics; 2018. p. 2401–2411. Available from: https://www.aclweb.org/anthology/D18-1262.
  • 45.Bamman D, Popat S, Shen S. An annotated dataset of literary entities. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics; 2019. p. 2138–2144. Available from: https://www.aclweb.org/anthology/N19-1220.
  • 46.Social Security Administration. Name Distributions in the Social Security Area; 1998. Available from: https://www.ssa.gov/oact/babynames/.
  • 47. Twenge JM, Campbell WK, Gentile B. Male and female pronoun use in US books reflects women’s status, 1900–2008. Sex roles. 2012;67(9):488–493. doi: 10.1007/s11199-012-0194-7 [DOI] [Google Scholar]
  • 48.Radford W, Gallé M. Roles for the boys?: Mining cast lists for gender and role distributions over time. In: Proceedings of the 24th International Conference on World Wide Web. ACM; 2015.
  • 49. Shi P, Lin J. Simple BERT Models for Relation Extraction and Semantic Role Labeling. (Tech Report). CoRR. 2019;abs/1904.05255. [Google Scholar]
  • 50.Carreras X, Màrquez L. Introduction to the CoNLL-2005 shared task: Semantic role labeling. In: Proceedings of the ninth conference on computational natural language learning (CoNLL-2005); 2005. p. 152–164.
  • 51.Daza A, Frank A. A Sequence-to-Sequence Model for Semantic Role Labeling. In: Proceedings of The Third Workshop on Representation Learning for NLP. Melbourne, Australia: Association for Computational Linguistics; 2018. p. 207–216. Available from: https://www.aclweb.org/anthology/W18-3027.
  • 52.Zhou J, Xu W. End-to-end learning of semantic role labeling using recurrent neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Beijing, China: Association for Computational Linguistics; 2015. p. 1127–1137. Available from: https://www.aclweb.org/anthology/P15-1109.
  • 53.Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics; 2019. p. 4171–4186. Available from: https://www.aclweb.org/anthology/N19-1423.
  • 54. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. (Tech Report). CoRR. 2016;abs/1609.08144. [Google Scholar]
  • 55. Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation. 1997;9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735 [DOI] [PubMed] [Google Scholar]
  • 56. Chung J, Gülçehre Ç, Cho K, Bengio Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. (Tech Report). CoRR. 2014;abs/1412.3555. [Google Scholar]
  • 57. Gardner M, Grus J, Neumann M, Tafjord O, Dasigi P, Liu NF, et al. AllenNLP: A Deep Semantic Natural Language Processing Platform; 2017. [Google Scholar]
  • 58. Breslow NE, Clayton DG. Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association. 1993;88(421):9–25. doi: 10.2307/2290687 [DOI] [Google Scholar]
  • 59. Coxe S, West SG, Aiken LS. The analysis of count data: A gentle introduction to Poisson regression and its alternatives. Journal of personality assessment. 2009;91(2):121–136. doi: 10.1080/00223890802634175 [DOI] [PubMed] [Google Scholar]
  • 60. Gerbner G, Gross L, Morgan M, Signorielli N. The “mainstreaming” of America: violence profile number 11. Journal of communication. 1980;30(3):10–29. doi: 10.1111/j.1460-2466.1980.tb01987.x [DOI] [Google Scholar]
  • 61. Gerbner G, Gross L, Morgan M, Signorielli N, Shanahan J. Growing up with television: Cultivation processes. In: Bryant J, Zillmann D, editors. Media effects: Advances in theory and research (2nd ed.). vol. 2. New Jersey, NJ: Lawrence Erlbaum Associates Publishers; 2002. p. 43–67. [Google Scholar]
  • 62. Clifford JE, Jensen CJ III, Petee TA. DOES GENDER MAKE A DIFFERENCE? Women, Violence, and the Media: Readings in Feminist Criminology. 2009; p. 124. [Google Scholar]
  • 63. De Ceunynck T, De Smedt J, Daniels S, Wouters R, Baets M. “Crashing the gates”–selection criteria for television news reporting of traffic crashes. Accident Analysis & Prevention. 2015;80:142–152. doi: 10.1016/j.aap.2015.04.010 [DOI] [PubMed] [Google Scholar]
  • 64. Potter WJ, Vaughan MW, Warren R, Howley K, Land A, Hagemeyer JC. How real is the portrayal of aggression in television entertainment programming? Journal of Broadcasting & Electronic Media. 1995;39(4):496–516. doi: 10.1080/08838159509364322 [DOI] [Google Scholar]
  • 65. Bleakley A, Jamieson PE, Romer D. Trends of sexual and violent content by gender in top-grossing US films, 1950–2006. Journal of Adolescent Health. 2012;51(1). doi: 10.1016/j.jadohealth.2012.02.006 [DOI] [PubMed] [Google Scholar]
  • 66. Smith SL, Wilson BJ, Kunkel D, Linz D, Potter WJ, Colvin CM, et al. Violence in television programming overall: University of California, Santa Barbara study. National Television Violence Study. 1998;3:5–220. [Google Scholar]
  • 67. Tillmann-Healy LM. Men kissing. Ethnographically speaking: Autoethnography, literature, and aesthetics. 2002; p. 336–343. [Google Scholar]
  • 68. Warner M. Publics and counterpublics. Public culture. 2002;14(1):49–90. doi: 10.1215/08992363-14-1-49 [DOI] [Google Scholar]
  • 69. Morris CE, Sloop JM. “What lips these Lips have kissed”: Refiguring the politics of queer public kissing. Communication and Critical/Cultural Studies. 2006;3(1):1–26. doi: 10.1080/14791420500505585 [DOI] [Google Scholar]

Decision Letter 0

Natalia Grabar

13 May 2022

PONE-D-22-04236Boys don’t cry (or kiss or dance): A computational linguistic lens into gendered actions in filmPLOS ONE

Dear Dr. Martinez,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 27 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Natalia Grabar

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex

3. Please note that PLOS journals require authors to make all data necessary to replicate their study’s findings publicly available without restriction at the time of publication. Please see our Data Availability policy at https://journals.plos.org/plosone/s/data-availability. As such, please make your full dataset available by either A) uploading the full dataset as supplementary information files, or B) including a URL link in your Data Availability Statement and Methods section to where the full dataset can be accessed

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This article proposal presents a large scale experiment aiming at analyzing the way actions in movies are stereotyped according to genders. In order to do so, a simplified version of SRL is performed on a large corpus of scripts, with the identification of actions, agents and patients using BERT+RNN. A throughout statistical analysis is also performed on the results.

The experiment is well-described and sound, the paper is well-written and the results are useful (although not surprising). However, the paper could be improved and I'll try to provide some help with that.

The main design issue with this research is related to the binary classification of gender, which is addressed in a note: "[...] we are limited by the content analysis procedures". This should be more developed and detailed in the limitations and perspectives, so that somebody who would like to do this is a more continuum-oriented way know what to do.

I'm also concerned by the variables being maybe not independent of others which are not taken into account here, like the socio-economic status of the characters. This is highlighted in the limitations concerning race and age, but not the socio-economic status.

The experiment implies a number of layers, it is difficult to assess the quality of the intermediary results (of Spacy, for example). I suggest to add a table showing the different layers and the performance for each of them, if space allows.

Figure 2 is unreadable to me: what is the meaning of the colors? where is the information about female/male characters?

Also, in the huge tables in the Appendix, the best results should be in bold.

Some parts of the paper are a bit too optimistic and should be more nuanced. In particular, Table 2 results are described in the caption as " high precision and recall.", with a recall ranging from 0.63 to 0.8 (precision over 0.9). I suggest to correct this, in order to reflect the results, as the recall cannot be considered "high" here.

The Appendix is longer than the paper itself and contains a lot of interesting information, some of which should appear in the paper itself, in particular concerning the manual annotation process. Concerning the usage of MTurk, the appendix states that the annotators received at least an hourly US minimum wage. Can you explain how and when it was ensured/computed? Which minimum wage (the (very low) federal one?)? Can you also state how many workers (if any) were rejected and for which reason?

"more feminine language": can you be more precise? What does "feminine language" mean in this paper?

Legal issue: the raw corpus is not freely available, instead it is only available for "fair use". This should be cleared in the paper.

I don't think the term "MWE" can be used for expressions like "Elron and Galdalf". These can be considered as named entities, which you do not want to split, but probably not as MWE. Criteria for MWE in UD can be found here: https://universaldependencies.org/workgroups/mwe.html. More details: https://hal.archives-ouvertes.fr/hal-03016721/document

Justification for the answer NO to "Have the authors made all data underlying the findings in their manuscript fully available?":

the details of the results are available in the supplementary materials, but not the manually annotated data and the status of the dataset is not specified in the paper. This should be made clearer.

Finally, 4 references are in fact pre-prints, not peer-reviewed, papers. I suggest to clearly state that they are preprints in the bibliography and to name them "technical reports".

Ref to consider (maybe):

- BUSSO, Lucia ; VIGNOZZI, Gianmarco. Gender Stereotypes in Film Language: A Corpus-Assisted Analysis In: Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017: 11-12 December 2017, Rome (https://books.openedition.org/aaccademia/2367?lang=en)

Reviewer #2: First, the "general analysis topic of gender-based portrayals in media" is still essential for organizations' effective and socially responsible communication activities. Thus, the author(s) undertook the task of exploring a complimentary analysis through machine learning based on the character’s actions rather than only dialogue or scene co-appearance better to understand the pervasiveness of harmful gender representations in media. The Manuscript (MS) is interesting and valuable regarding its aim, methodology, and analysis. However, I would like to suggest some revisions;

1. I believe that ıf the structure and flow of MS are revised in an integrative manner, the reader will quickly grasp the idea and follow the process. The current appearance of the MS is an entirely technical paper, whereas the idea/research problem is based on communication and even social problem. The authors also confirm this comment by emphasizing only several analytical/technical contributions of their study on p.4. At the same time, the MS needs to contribute to the literature regarding methodological aspects and conceptual/theoretical aspects. The MS has some potential in this respect.

2. I suggest the author(s) revise the MS by considering the similar headlines below;

- A flow diagram depicting the stages of data collection/ processing stages and analyses step might be handy to catch the reader’s attention and make the readers understand the whole data processing and analysis stage.

- The subtitles under the Conclusion section could be revised by considering the compatibility with the findings in general ( For instance, 4.2 ( The male-gaze theory" does not seem to be the compatible title. The authors use this base or approach just to explain their findings.)

- All of the information in the Appendices are valuable. However, this section seems to be too long as an Appendix. I suggest the authors summarize the quotes from "Materials" to "Experiments" and integrate this information into the main body of the MS. Some of the sections in the Appendices, excluding Tables, are too long and detailed. The authors can put appropriate subtitles If they prefer or necessary. Furthermore, the flow diagram would visually complete the meaning of the whole process.

- the authors state neglecting a time-trend factor in the models as one of the limitations of the study. Unfortunately, this might be an important analytical weakness rather than limitation.At this stage, I would like to ask why do not consider time series based analysis ?

I wish the authors good luck with their research.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Dec 21;17(12):e0278604. doi: 10.1371/journal.pone.0278604.r002

Author response to Decision Letter 0


18 Jul 2022

General Response

We thank the reviewers for their thoughtful feedback. We are encouraged by their recognition that the topic is “essential for organization’s effective and socially responsible communication” (R2) and that they found the manuscript and experiments well-written and sound with useful results (R1, R2).

A common concern among the reviewers was that the appendices contained valuable information that should have been included as part of the main body (R1, R2). We have since summarized the appendices into the paper itself, with a particular focus on the manual annotation process (as suggested by R1). Furthermore, we have followed R2’s suggestion on including a flow diagram in the introduction that we hope better explains the whole data processing and analysis stage.

Furthermore we have followed the style and grammar suggestions provided by the reviewers. In the following, we try to address the remainder of the reviewers points.

Response to Reviewer 1:

Q1: The main design issue with this research is related to the binary classification of gender, which is addressed in a note: "[...] we are limited by the content analysis procedures". This should be more developed and detailed in the limitations and perspectives, so that somebody who would like to do this is a more continuum-oriented way know what to do.

Author’s response: We deeply appreciate the reviewer’s comment, however at this time we have our reservations preventing us from venturing any suggestion on what a holistic continuum solution would even look like. While one possibility is to argue that we could represent a character’s expressed gender as a continuum variable, this is still limited by its inherent reliance on a (hetero-)normative onthology that places female–male as polar opposites. We believe that there is still a long-way to go in discussions and implementations before any automatic content analysis procedure can fully capture the nuances and inter-complexity of a character’s (and ultimately a person’s) identity in any comprehensive manner. We believe that these discussions, while still important to have, are out of the scope of our current work.

Q2: I'm also concerned by the variables being maybe not independent of others which are not taken into account here, like the socio-economic status of the characters. This is highlighted in the limitations concerning race and age, but not the socio-economic status.

Author’s response: There might be many hidden factors yet to be considered as explanations for action frequency, yet many of these are unattainable due to a lack of data. Furthermore, since our analysis is performed over a large sample of action descriptions, over a span of 4 decades, and controlling for the factors we do have data for, we suspect that these hidden relationships are marginalized in the aggregative results.

Q3: The experiment implies a number of layers, it is difficult to assess the quality of the intermediary results (of Spacy, for example). I suggest to add a table showing the different layers and the performance for each of them, if space allows.

Author’s response: Our model relies on Spacy only to find the verbs inside the action descriptions. Spacy provides public performance benchmarks on their website (https://spacy.io/usage/facts-figures). Given that we have no reason to believe that the performance of our system would be greatly impacted had we used a different verb identification system, we decided not to include these. We do provide extensive comparisons against other SRL systems (with and without Spacy) in Table S1.

Q4: Figure 2 is unreadable to me: what is the meaning of the colors? where is the information about female/male characters? Also, in the huge tables in the Appendix, the best results should be in bold.

Author’s response: We agree that Figure 2 failed to communicate our original intent and confused the discussion. We have since removed Figure 2 from the updated manuscript. We have also updated the table in the appendix to highlight the best model’s performance in bold.

Q5: Some parts of the paper are a bit too optimistic and should be more nuanced. In particular, Table 2 results are described in the caption as " high precision and recall.", with a recall ranging from 0.63 to 0.8 (precision over 0.9). I suggest to correct this, in order to reflect the results, as the recall cannot be considered "high" here.

Author’s response: We have updated the language used throughout the manuscript in an effort to avoid sounding overly optimistic.

Q6: The Appendix is longer than the paper itself and contains a lot of interesting information, some of which should appear in the paper itself, in particular concerning the manual annotation process. Concerning the usage of MTurk, the appendix states that the annotators received at least an hourly US minimum wage. Can you explain how and when it was ensured/computed? Which minimum wage (the (very low) federal one?)? Can you also state how many workers (if any) were rejected and for which reason?

Author’s response: We have re-structured the manuscript to reflect your suggestions. We have now incorporated the information on dataset creation, methods and experiment (previously found in the appendix) as part of the main body.

We set the remuneration scheme for our task to ensure annotators receive at least minimum wage for their effort. This was calculated for the U.S. as $7.25/hr divided by the expected time it would take to complete a single annotation. Before submitting our task, we did a few dry runs where we estimated the time required to complete a single annotation task. The labeling task (i.e., given an action description, selecting an agent and a patient from a dropdown box) took us about a minute per sample, while the verification task (i.e., deciding whether a proposed label is correct) took half a minute per sample. We have incorporated this response as part of the footnote in page 5.

With respect to the rejection rate, we found that some of the annotators submitted empty results (i.e., did not select a character from the drop-down). We marked their annotations as moot and rejected their tasks. This totalled to 2,162 rejected samples. Additional information can be found in lines 416–423.

Q7: "more feminine language": can you be more precise? What does "feminine language" mean in this paper?

Author’s response: In this context we were referring to gender ladenness. Gender Ladenness, as defined in (Clark and Paivio, 2004) represents the degree of perceived “feminine or masculine association” on a numerical scale ranging from very masculine to very feminine. We have updated the manuscript to make the language clearer.

James M Clark and Allan Paivio. 2004. Extensions of the Paivio, Yuille, and Madigan (1968) norms. Be- havior Research Methods, Instruments, & Comput- ers, 36(3):371–383.

Q8: Legal issue: the raw corpus is not freely available, instead it is only available for "fair use". This should be cleared in the paper.

Author’s response: We appreciate the reviewer’s concern on the legality status of our work, however, assuming that raw corpus refers to ScriptBase, we can assure the reviewer that our use falls well within the limits established by the U.S. Copyright Law section 107 (fair use under non-commercial education and research purposes). If our assumption on what the reviewer meant is wrong, we will be happy to revisit this point.

Q9: I don't think the term "MWE" can be used for expressions like "Elron and Galdalf". These can be considered as named entities, which you do not want to split, but probably not as MWE. Criteria for MWE in UD can be found here: https://universaldependencies.org/workgroups/mwe.html. More details: https://hal.archives-ouvertes.fr/hal-03016721/document

Author’s response: The reviewer is correct in pointing this out. We have updated the manuscript to use the more appropriate term of conjunction.

Q10: Justification for the answer NO to "Have the authors made all data underlying the findings in their manuscript fully available?": the details of the results are available in the supplementary materials, but not the manually annotated data and the status of the dataset is not specified in the paper. This should be made clearer.

Author’s response: We have made all the annotated data and trained models available for anyone to download and use at https://sail/usc/edu/~ccmi/actions-agents-and-patients (Line 109–111).

Q11: Finally, 4 references are in fact pre-prints, not peer-reviewed, papers. I suggest to clearly state that they are preprints in the bibliography and to name them "technical reports".

Author’s response: Thank you for pointing this out. We have updated the manuscript to reflect that these are not peer-reviewed publications.

Response to Reviewer 2:

Q1: A flow diagram depicting the stages of data collection/ processing stages and analyses step might be handy to catch the reader’s attention and make the readers understand the whole data processing and analysis stage.

Author’s response: We thank the reviewer for this suggestion. We have updated the manuscript by including the suggested diagram as Fig 1.

Q2: The subtitles under the Conclusion section could be revised by considering the compatibility with the findings in general ( For instance, 4.2 ( The male-gaze theory" does not seem to be the compatible title. The authors use this base or approach just to explain their findings.)

Author’s response:

Q3: All of the information in the Appendices are valuable. However, this section seems to be too long as an Appendix. I suggest the authors summarize the quotes from "Materials" to "Experiments" and integrate this information into the main body of the MS. Some of the sections in the Appendices, excluding Tables, are too long and detailed. The authors can put appropriate subtitles If they prefer or necessary. Furthermore, the flow diagram would visually complete the meaning of the whole process.

Author’s response: We have re-structured the manuscript to reflect your suggestions. We have now incorporated the information on dataset creation, methods and experiment (previously found in the appendix) as part of the main body.

Q4: the authors state neglecting a time-trend factor in the models as one of the limitations of the study. Unfortunately, this might be an important analytical weakness rather than limitation.At this stage, I would like to ask why do not consider time series based analysis ?

Author’s response: The reason behind not considering a time-trend as a factor were two fold. First, we did not have complete information on the years of production. Second, as stated, our original models did not converge due to technical limitations of the models–mainly that since they were developed using an old statistical package (based on R) it did not allow for parallel processing. We are happy to let the reviewers know that we have worked out these limitations by collecting additional year-of-release information, and re-writing our statistical models using the more modern Julia programming language. This allowed us to include an additional study (Study 4: Agent & Patients + Time) in which we include a movie’s year-of-release as part of the control variables. The idea behind it is that this model captures the notion that different years have different film trends resulting in different action-frequency distributions. We have updated the manuscript incorporating the details on this time-based study (lines 404–412) and its respective results (lines 503–517) as well as updated the discussion to reflect our gained knowledge on how certain actions are significant even when controlling for yearly trends.

Attachment

Submitted filename: Response to reviewers.pdf

Decision Letter 1

Natalia Grabar

17 Oct 2022

PONE-D-22-04236R1Boys don’t cry (or kiss or dance): A computational linguistic lens into gendered actions in filmPLOS ONE

Dear Dr. Martinez,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

============================== Victor, I am again sorry about the delays with the rereviewing.

Thank you for taking into account the previous comments. You have now some additional minor comments to consider.

Thank you for your work.

 ==============================

Please submit your revised manuscript by Dec 01 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Natalia Grabar

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: The topic is extremely interesting. This is a good paper, and the results are now better explained than in the original submission.

But the tables 5 to 8 in the supplementary material are important, and they are still not easy to understand. Why are the rows ordered by Z, wouldn't ordering by Estimate make more sense? It's not clear what the column "significance" is based on, or what information is adds. I see entries with two stars with a larger Estimate than entries with three stars. For tables 5 and 6 (agent-only and patient-only) it would be nice to have an easy way to see the top actions for males and the top actions for female separately. Either color code them differently, or put them in different tables, or sort the rows to have all the male terms together and all the female terms together...

Other, more minor points, that are not an obstacle to accept the paper:

Figure 1: Great figure, it helps a lot to guide the reader. Small layout issue for the lines connected to the boxes agents/actions/patients : The lines don't bend where you meant it to bend, so it looks like actions and patients are connected to each other, but not to annotation on the left.

line 172~181 verification steps of the annotations: So you have 3 annotators per action, then a fourth annotator to verify the results, then one of the author looks at it one more time. Why all these different steps? Did you have a lot of bad annotations using just the 3 initial annotators agreement? It would be a plus to have this explained in a few words.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: Yes: Sam Bigeard

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Dec 21;17(12):e0278604. doi: 10.1371/journal.pone.0278604.r004

Author response to Decision Letter 1


14 Nov 2022

November 12th, 2022

PLOS One Editorial Team,

Dear Editorial Team,

We would like to thank the reviewer for their effort in providing insightful feedback to improve our work. Throughout this process, we continue to be encouraged by the reviewer’s recognition of the interesting topic, and the value that our work provides to the discussion on gender bias in the media.

While working on this revision, we have identified and addressed a limitation in our statistical analysis that resulted in an overestimation of the number of significant coefficients. In summary, our Z-tests for coefficient significance (H0: \\beta = 0 vs. H1: \\beta != 0) were not controlled for multiple comparison errors (also known as familywise errors). To address this limitation, we incorporated a post-hoc Holm-Bonferroni correction for all GLME models. In other words, we are employing a higher standard for what we consider to be a significant result. The presented revision collects only the results that are found to be significant after correction (i.e., adj-p < 0.05).

We have updated the results, tables and discussion section to contextualize only those results that meet the new (and more appropriate) standard. This resulted in the removal of certain citations, as these are no longer needed.

##### Response to Reviewer 3:

Comment regarding the tables 5 to 8: We agree with the reviewers comment that the tables are hard to parse and not easy to understand. In addition to increasing the standard for what we consider a significant coefficient, we improve the readability of the tables in the supplementary materials as follows:

* We removed the significance column. This column aimed to provide an easy way to discern the power of the test (i.e., how likely is this coefficient to be different from zero?). However, after correcting for multiple comparisons, the coefficients deemed significant were those with the smallest p-values (most of which had adj-p = 0) which resulted in all having the same significance level.

* We followed R3’s suggestion on presenting the coefficients ordered by their magnitude.

* Additionally, we followed R3’s suggestion to split the tables into sub-tables according to the direction of the relationship (ie. less or more likely to be portrayed by X gender). This direction is given by the coefficient sign. We caption sub-tables with a brief explanation of the interpretation for each of the results presented (e.g., “Actions where the agent is more likely to be male”).

Figure 1: We updated the figure to fix the layout issue. (Thanks for pointing this out).

Verification steps of the annotations: We have included a paragraph with a brief explanation of why we decided on a two-step verification process. In summary, we had to be certain that the labels met a minimum quality standard since the models we are using make no distinction between subjects (i.e., person, animal or thing that is in the receiving end of the action) and patients (ie. characters receiving actions).

In addition to the above comments, we have corrected additional spelling and grammatical errors.

We look forward to hearing from you in due time regarding our submission and to respond to any further questions or comments you may have.

Sincerely,

Dr. Victor R Martinez

Corresponding Author

Attachment

Submitted filename: PLOS Response to reviewers 2.pdf

Decision Letter 2

Natalia Grabar

21 Nov 2022

Boys don’t cry (or kiss or dance): A computational linguistic lens into gendered actions in film

PONE-D-22-04236R2

Dear Dr. Martinez,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Natalia Grabar

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

The comments of the reviewers have been taken into account. The Authors also corrected some previous limitations of their methodology. This improved the overall quality of the submission, which is acceptable for publication now.

Acceptance letter

Natalia Grabar

23 Nov 2022

PONE-D-22-04236R2

Boys don’t cry (or kiss or dance): A computational linguistic lens into gendered actions in film

Dear Dr. Martinez:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Natalia Grabar

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Machine learning model performance.

    Complete table of performance results for the SRL systems. Legend: Oracle action denotes models with no automatic action identification. Uncased / Cased refers to the type of pre-trained BERT model used. +fine-tune notes models that were fine-tuned using the manually labeled action description dataset. For LSTM and GRU, size of hidden dimension is given by the number in parenthesis. Best performance highlighted in bold.

    (PDF)

    S2 Table. Results for Study 1: Agent-only.

    Regression model results for agent’s actions. We test the significance of the coefficients through Z-test, and correct for multiple comparisons using the Holm-Bonferroni method. Table shows only significant coefficients with adjusted-p < 0.05. Rows are ordered by the magnitude of their coefficient (β). The direction of the relationship is determined by the sign of the coefficient, with positive coefficients corresponding to actions which are more likely portrayed by a male character. Likewise, negative coefficients present actions that are less likely to be portrayed by male characters. Manually identified errors are color coded (blush—errors due to parsing and lemmatization; gray—errors due to SRL).

    (PDF)

    S3 Table. Results for Study 2: Patient-only.

    Regression model results for actions done to patients. We test the significance of the coefficients through Z-test, and correct for multiple comparisons using the Holm-Bonferroni method. Table shows only significant coefficients with adjusted-p < 0.05. Rows are ordered by the magnitude of the coefficients (β). The direction of the relationship is determined by the sign of the coefficient, with positive coefficients corresponding to actions more likely done towards male characters. Manually identified errors are color coded (blush—errors due to parsing and lemmatization; gray—errors due to SRL).

    (PDF)

    S4 Table. Results for Study 3: Agent & patient.

    Regression model results for the agent–patient interactions. Group encodes gender dynamics (e.g., M→M identify actions done by male characters towards other male characters). We test the significance of the coefficients through Z-test, and correct for multiple comparisons using the Holm-Bonferroni method. Table shows only significant coefficients with adjusted-p < 0.05. Rows are ordered by the magnitude of the coefficients (β). Direction of the relationship is given by the sign and magnitude of β with positive values indicating actions more likely portrayed by that group of characters. Manually identified errors are color coded: blush for errors down-streamed from an outside the SRL system (e.g., parsing, lemmatization), and gray for errors due to mislabels coming from our SRL system.

    (PDF)

    S5 Table. Results for Study 4: Agent & patient + time.

    Regression model results for the agent–patient interactions controlling for year of production. Group encodes gender dynamics (e.g., M→M identify actions done by male characters towards other male characters). A star (*) is used as short-hand for either group (e.g., *→F labels actions where the patient is Female and the agent’s gender can be any value). We test the significance of the coefficients through Z-test, and correct for multiple comparisons using the Holm-Bonferroni method. Table shows only significant coefficients with adjusted-p < 0.05. Rows are ordered by the magnitude of the coefficient (β). The direction of the relationship is given by the coefficient’s sign, with positive coefficients corresponding to actions more likely portrayed by their encoding group. Manually identified errors are color coded gray for errors due to mislabels coming from our SRL system.

    (PDF)

    Attachment

    Submitted filename: Response to reviewers.pdf

    Attachment

    Submitted filename: PLOS Response to reviewers 2.pdf

    Data Availability Statement

    We have made the dataset and labeling models available for download at https://sail.usc.edu/~ccmi/actions-agents-and-patients/.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES