Skip to main content
PLOS One logoLink to PLOS One
. 2020 Sep 8;15(9):e0238230. doi: 10.1371/journal.pone.0238230

Math and language gender stereotypes: Age and gender differences in implicit biases and explicit beliefs

Heidi A Vuletich 1,*, Beth Kurtz-Costes 2, Erin Cooley 3, B Keith Payne 2
Editor: Jennifer Steele4
PMCID: PMC7478909  PMID: 32898854

Abstract

In a cross-sectional study of youth ages 8–15, we examined implicit and explicit gender stereotypes regarding math and language abilities. We investigated how implicit and explicit stereotypes differ across age and gender groups and whether they are consistent with cultural stereotypes. Participants (N = 270) completed the Affect Misattribution Procedure (AMP) and a survey of explicit beliefs. Across all ages, boys showed neither math nor language implicit gender biases, whereas girls implicitly favored girls in both domains. These findings are counter to cultural stereotypes, which favor boys in math. On the explicit measure, both boys’ and girls’ primary tendency was to favor girls in math and language ability, with the exception of elementary school boys, who rated genders equally. We conclude that objective gender differences in academic success guide differences in children’s explicit reports and implicit biases.

Introduction

Children’s perceptions of gender differences in cognitive abilities are important because they may lead boys and girls to develop different interests and different areas of achievement [13]. For example, stereotypic perceptions of academic abilities have been implicated in gender differences in course selections and career trajectories, contributing to the underrepresentation of women in science, technology, engineering and math (STEM) [4,5]. Much of the research devoted to this topic has examined children’s self-reported explicit beliefs about gender differences in abilities. As summarized below, explicit beliefs may differ in systematic ways from implicit gender biases, which are automatically activated associations to gender categories. In this study, we examined children’s implicit biases and their explicit beliefs regarding gender differences in math and language abilities.

Math is directly relevant to students’ interest in and preparation for STEM careers, but biases favoring girls in language could also contribute to STEM disparities by disproportionately attracting girls and deterring boys from the humanities [6]. To identify if, and when, changes in explicit beliefs and implicit biases are taking place, we examined age differences in these phenomena, using a sample that ranged in age from 8 to 15 years. We also examined relations between youth’s implicit biases and their explicit beliefs.

Possible influences on children’s perceptions of gender differences in academic abilities

According to social identity theory, identifying with a social category leads to in-group preferences because doing so protects people’s self-esteem [7]. Gender is one of the first social categories to develop, and children as young as 3 to 5 years old express preferences for their own gender [8,9]. Same-gender preferences typically result in more generalized positive evaluations of one’s gender group, particularly among children [10]. For these reasons, one would predict that children’s explicit reports of competence in academic domains would be biased by their gender, resulting in in-group preferences. Other factors that may influence children’s perceptions of academic abilities, particularly among older children, are the need to be fair [11], the development of more accurate evaluations [12], and the need to present themselves in socially desirable ways [13].

As children age, they also become aware of cultural stereotypes that contradict their positive evaluations of the in-group. Two traditional gender stereotypes are that boys are more talented than girls are in math, whereas girls are more talented than boys in language domains [14]. Sociocultural views of knowledge construction suggest that children’s awareness of these kinds of cultural stereotypes increases with maturation [15], and empirical studies have shown support for this idea [1618]. Therefore, as children grow older and have more experiences in which stereotypes are discussed or endorsed by others, these stereotypes may become a more salient source informing children’s own attitudes. In particular, when children encounter negative stereotypes about their social group, those stereotypes may either temper their in-group favoritism, yielding more egalitarian attitudes, or guide their attitudes such that they endorse the negative stereotypes about their group. Research suggests that the way children respond often depends on where their group is positioned within the larger social hierarchy [19]. Children belonging to groups of high status are more likely to endorse negative stereotypes about their in-group, presumably because they do not have a strong need to self-enhance. Children belonging to low-status groups are more resistant to endorsing negative stereotypes and either downplay them or opt for egalitarian views instead. For instance, in two studies, boys and Whites endorsed traditional gender and race stereotypes regardless of whether those stereotypes favored their in-group, but girls and Black children were more selective, endorsing stereotypes that favored their in-group while denying stereotypes that portrayed their in-group negatively [18,19].

Another factor influencing children and adolescents’ beliefs is their personal experience. Although in most countries worldwide men are more likely than women to pursue careers in STEM domains [20], girls tend to receive higher grades than boys throughout primary and secondary school across academic subjects [21,22]. Particularly as youth enter middle school, where achievement becomes more salient because of academic tracking and public posting of honor rolls, they are likely to become aware of differences in academic performance favoring girls. Indeed, using a photo-identification task in which youth were asked to pair photos with verbal descriptions of individuals, middle school youth were more likely to choose photos of girls than of boys for depictions of high-achieving youth [23].

Girls’ relative superiority over boys in academic settings is robust, as reflected by school grades, high school and college completion rates, and teachers’ ratings of school engagement [21,24,25]. Therefore, although gender differences persist in career choices, and stereotypes are still widely endorsed that favor boys in math, youths’ awareness of gender differences in school success might lead to implicit biases and/or explicit reports that are either egalitarian or that favor girls across domains. For instance, one research study with children ages 4–10 found that, with age, boys increasingly endorse personal beliefs about girls’ academic superiority and also beliefs that adults see girls as academically superior [26].

Explicit beliefs regarding gender differences in academic abilities

As outlined in the previous section, several factors potentially influence children’s perceptions of gender differences in academic abilities. Unfortunately, research studies investigating children’s beliefs through explicit reports have not demonstrated consistent findings in support of any one explanation. Regarding math ability, several studies have shown that young boys (approximately ages 6–11) show in-group preferences [27,28]. In two different studies, Italian boys in first grade were more likely to point to a picture of a boy, rather than a girl, when asked who is especially good at math [28,29]. Italian boys in third and fifth grade were also more likely to say that boys, rather than girls, were better at math [30]. In U.S., German, and French samples, boys in fourth grade rated boys as more gifted than girls in math [3133]. Singaporean boys in first, third, and fifth grade were more likely to point to a picture of a boy, rather than a girl, when asked who liked math more [27]. These results are consistent with the idea that in-group preferences dominate in early and middle childhood. However, this explanation does not hold for girls. Although some of the studies just discussed reported in-group math preferences among young girls (ages 6–11) [2830,32,33], others found that young girls hold egalitarian views [27] or even endorse stereotypes that boys are more able than girls in math [31]. The findings also do not uniformly support other explanations such as social status, cultural stereotypes, or actual differences in performance as factors influencing children’s beliefs. Findings with older children are even less straightforward. Some studies show older boys and girls (ages ranging from 12–15) favor their own gender, some find youth are egalitarian, and yet others find that they favor the outgroup, with no clear age trends [3032,3437].

Research results regarding boys’ and girls’ gendered beliefs about language ability are also mixed. To our knowledge, no studies have reported girls favoring boys in language domains across any age range, but a few studies have found that young girls (ages ranging from 6–11) reported no differences between boys and girls in their liking of, or competence in, language [27,38]. Italian first grade girls did not show a gender preference when choosing between a picture of a boy and that of a girl when selecting who was better at language [38]. Singaporean girls in first, third, and fifth grade also did not show a gender preference when rating boys’ and girls’ liking of language [27]. Boys, on the other hand, more consistently endorse the female-language stereotype, but exceptions have been found whereby boys report egalitarian beliefs or even favor boys in language. For instance, first grade Italian boys did not show a gender preference when choosing between a picture of a boy and one of a girl when selecting who was better at language [29,38]. In a U.S. sample, fourth grade boys were egalitarian regarding their beliefs about who was better at reading and writing [32]. Therefore, the influences of in-group preferences versus cultural stereotypes, social status, or other factors on children’s explicit beliefs about math and language abilities remain unclear.

A problematic aspect of research on academic stereotypes focused on explicit beliefs is that it relies on self-report measures, which allow participants to control their responses. Children may report beliefs that match perceived social expectations or self-presentation goals, regardless of their own beliefs. For instance, 6- to 8-year-old children who thought they were being videotaped suppressed their explicit in-group preference and outgroup prejudice, whereas those who thought the camera was off did not [39]. The mixed findings in explicit attitudes, then, might reflect variability in children’s personal beliefs, but they could also be an artifact of children’s reactance to being questioned about a socially sensitive topic.

Examining children’s implicit biases circumvents the self-presentation problems associated with explicit measures. Implicit biases are automatic associations that are measured using cognitive tests that capitalize on reaction times or priming procedures. Responses to these tests are difficult to control and thus, are often independent of intent [4042]. Such associations are not necessarily indicative of explicit beliefs, and may even be inconsistent with them [43]. Although explicit measures are informative in their own right, implicit measures can provide additional information regarding the impact of socially sensitive topics, such as stereotypes, that may not be readily accessible through explicit reports. In this study, we explored both children’s explicit beliefs and their implicit biases regarding math and language ability among boys and girls. Our hypotheses regarding explicit beliefs were that in-group preferences would be evident in the youngest age group (elementary school children) regarding math ability. We did not have specific hypotheses regarding older children, as multiple factors can affect their responses on explicit tests, including societal stereotypes, social status differences, and social desirability, among others. Regarding language ability, we hypothesized that both boys and girls would favor girls because those beliefs are congruent with multiple factors, including societal stereotypes, actual performance, and social acceptability of endorsing the stereotype (i.e., the stereotype is less condemned).

Children’s implicit biases about academic abilities

The majority of studies examining children’s gender implicit biases regarding academic abilities have measured implicit bias using the Implicit Association Test (IAT). This test measures reaction times in stereotype-congruent versus stereotype-incongruent conditions, the theoretical principle being that children (and adults) respond more quickly to paired categories that are cognitively associated. More specifically, in the stereotype-congruent condition, children press one computer key if they see words associated with boy names or math concepts and another key if they see words associated with girl names or language concepts. Faster reaction times within this stereotype-congruent condition—compared to the stereotype-incongruent condition in which paired categories are boys/language and girls/math—are typically interpreted as indicating math-male implicit bias. Using this procedure, several studies have found that girls show stronger stereotypic implicit biases than boys. In a sample of Italian children in first grade, boys did not show an implicit bias whereas girls showed stereotype-consistent implicit bias [28]. Using a paper-and-pencil IAT in which participants had 30 seconds to classify words into categories, Italian girls in Grades 3, 5, and 8 showed an implicit bias, such that they categorized more words in the stereotype-congruent condition compared to the stereotype-incongruent condition [30]. Boys, on the other hand, showed stereotypic associations in eighth grade, but not in third or fifth grade. In a sample of German children in Grades 4, 7 and 9, boys did not show any stereotypic associations across any age group, whereas girls in Grades 4 and 9 showed significant stereotyping [31]. In a rare exception for this literature, girls in seventh grade did not show any bias. These are just a few examples, but other studies show similar patterns: Girls demonstrate a stereotypic implicit bias, whereas boys either demonstrate a stereotypic implicit bias or show no bias at all [27,29,35,38,44,45]. These findings cover age spans from 5- to 15-years-old, and use samples across multiple nations, including Canada, Chile, Italy, Singapore, and the United States. Based on the results of these IAT studies alone, one conclusion is that children (girls in particular) assimilate societal stereotypes about gender differences in math ability favoring boys from an early age [45]. This interpretation implies that research efforts and interventions ought to be focused on children’s math associations and beliefs. Yet, an outstanding question is whether these biases are truly about math rather than language, even though they are typically labeled as math-male biases.

Due to the paired nature of IAT categories, it is impossible to disambiguate whether children who show a “male-math” bias do so because they associate math with boys more strongly than they associate math with girls, or because they associate language with girls more strongly than they associate language with boys, or both. In some cases, studies have found that IAT scores predict math outcomes, but even then what is labeled as a “math outcome” is a relative difference between math and language outcomes. For instance, in a German sample of 4th, 7th, and 9th graders, stereotype-consistent IAT scores were correlated positively for girls and negatively for boys with intentions to enroll in language versus math courses, grades in language versus math, and self-concepts in language versus math [31]. One experimental study did find that IAT scores were positively correlated with math performance on a test, explaining part of the relation between exposure to the math-male stereotype and math performance [28]. However, the relation to math outcomes is not always consistent, as some studies have found no relation [30,35].

A recent study further challenges the assumption that math associations drive previous findings. Critically, the study used an implicit bias measure that did not confound math and language bias [36]. Instead, participants pressed one key if they saw positive adjectives related to “doing good work in mathematics” and another key if they saw negative adjectives related to “doing poor work in mathematics.” In the stereotype-congruent condition, the key corresponding to doing good work in math was on the same side as a smiley face and a picture of a male doll on the screen. The key for doing poorly in math was paired with a sad face and a picture of a female doll. The pairings were reversed in the stereotype-incongruent condition. In a different set of blocks altogether, language implicit biases were assessed using an identical procedure, but the adjectives were described as being related to doing good or poor work in reading. Implicit biases were scored as the difference in reaction times between stereotype-congruent and stereotype-incongruent trials. Thus, the researchers were able to obtain implicit bias scores for math separately from those for language. Using this procedure in a sample of Canadian children in Grades 4–6, researchers found—in stark contrast to previous results using the IAT—that girls held a counter-stereotypical implicit bias favoring girls over boys in math. Boys demonstrated no math implicit bias. Language biases, on the other hand, were consistent with previous IAT findings; girls showed a stereotypical language-female bias, whereas boys demonstrated no language implicit bias.

In the only other study (to our knowledge) that used an implicit bias measure that disambiguated math and language implicit biases, girls also did not demonstrate stereotypical math-male bias [46]. In that study, German students in Grade 9 completed a go/no go association task as the measure of implicit bias. This task required participants in the stereotype-congruent condition to press the space bar if they saw words on the computer screen associated with math or boys and to ignore other words (i.e., words associated with girls and neutral stimuli). In the stereotype-incongruent condition they responded to words associated with math or girls and ignored all other words. The same procedure was applied to measure language bias. Scores were based on reaction time differences between the stereotype-congruent and stereotype-incongruent blocks. The results were that girls did not show a math-male bias, whereas boys did. On the other hand, girls showed an implicit language-female bias, whereas boys showed a counter-stereotypical bias favoring boys in language.

To summarize, all studies examining math/language implicit biases using the IAT have found that girls demonstrate stereotypical implicit biases (with the exception of one seventh grade subsample). Although these results might reflect math-male biases, language-female biases, or both biases, they have been labeled as “math-male” biases. In contrast, the only two published studies that have employed implicit bias measures that disambiguate math and language biases have not supported the notion that girls hold math-male biases. We base these statements on a literature search for peer-reviewed articles on Google Scholar and PsychINFO using different combinations of keywords: “IAT,” “implicit bias,” “implicit stereotype,” “math,” “language,” “children,” and “girls” (up-to-date as of March 2020). This pattern of findings suggests that measurement differences could explain the seemingly conflicting results. Further, when understood from the perspective of language gender biases, all the findings align.

In our study, we also used a measure that disambiguates math and language implicit biases. We tested two different predictions based on two theoretical accounts. The first was that girls would hold math-male implicit biases, consistent with the idea that they assimilate cultural stereotypes favoring boys in math (in line with the interpretation of IAT results). The alternatively hypothesis was that girls would show a math-female counter-stereotypical bias, consistent with national gender differences in overall academic performance. Because performance and stereotypes about language are consistent with each other, we expected girls to show implicit biases favoring girls in language. We were agnostic about the implicit biases of boys, as previous findings have been inconsistent and do not clearly favor one theoretical account over another. The dissociative processes by which girls and boys form automatic associations is in itself interesting, but not the subject of this report.

Affect misattribution procedure

In the current study, we used the Affect Misattribution Procedure (AMP) [47] to measure implicit biases about math and language ability among girls and boys. The AMP has several strengths for studying implicit biases [48]. First, it tests automatic associations to single domains, meaning it can assess implicit biases regarding math and language separately. Second, it has a simple structure, which is ideal for use with children. Participants follow simple instructions to make binary judgements about ambiguous stimuli across several trials. In the present study, the binary judgements were “good at math” versus “bad at math” (or language arts), and the stimuli were Chinese symbols. Each ambiguous stimulus (i.e., Chinese symbol) is preceded by a prime that participants are instructed to ignore. In the present study, the prime was a picture of a boy or a girl. The AMP measures participants’ unintended misattribution of affect or semantic content (e.g, [49,50]) from the prime to the ambiguous stimulus. For example, when asked to judge whether a Chinese symbol means “good at math” or “bad at math,” participants unintentionally use their judgements about the preceding prime (e.g., photo of a girl/boy) to make a response.

One potential concern about the AMP is that participants could ignore instructions and directly rate the primes, making the measure more akin to an explicit measure. To address this concern, one study tested two different AMP conditions assessing race implicit bias, one where adult participants were instructed to directly rate the primes and another one where they were instructed to ignore the primes and rate the target stimuli [51]. These two conditions yielded divergent results. The traditional AMP predicted racial bias in an impression formation task, whereas the “explicit” AMP did not, presumably because individuals were motivated to control expressions of prejudice in the explicit condition. Indeed, motivation to control prejudice was related to the explicit AMP but not the traditional one. These results suggest that, under normal conditions (i.e., when people are instructed to ignore the photo primes and evaluate the symbols), participants are not intentionally rating the photo primes. Though these findings were based on adult samples, they likely extend to children, as children tend to have less self-regulatory skills than adults to control their responses [52]. Another study addressing this concern found that adults are not able to accurately introspect on what influenced their response pattern [51], rendering self-reports of this information unreliable. Taken together, these findings suggest that any systematic influence of the photo primes on the ratings of the ambiguous stimuli reflects an unintended/automatic, and thus implicit, bias.

The potential weaknesses of the AMP have been tempered by empirical evidence, and its strengths have been documented extensively. Another strength of the AMP is its reliability. Meta-analytic procedures have shown the AMP to be more reliable than most reaction-time-based measures [48]. There is also evidence that it is a reliable measure for use with children, with good predictive validity [53].

Use of the AMP in this study allowed us to examine gender and age differences in implicit gender biases regarding math and language ability separately. Given the potential methodological specificity of previous findings regarding math implicit biases in children, it was important to select a measure that not only disambiguates biases by domain, but is also simple enough to use with children, reliable, and has proven to be valid.

Relations between implicit biases and explicit beliefs

In addition to measuring age and gender differences in reports, we also measured relations between implicit biases and explicit beliefs. Explicit reports are presumed to reflect personally endorsed attitudes, which may be shaped by motivated reasoning such as social desirability or the need to protect one’s social identity. Recent theoretical perspectives on implicit biases, in contrast, suggest that situational effects are a strong determinant of implicit biases [54]. Examples of these situational effects are cultural stereotypes cued by media or perceptible environmental inequalities, such as differences in classroom performance. Thus, for example, an adolescent who is aware of the math-male stereotype might show implicit biases favoring boys in math even if she does not personally endorse the stereotype, simply because it has been activated by something in the environment. Similarly, a child who observes differences in classroom performance might automatically associate girls with academic success, but still assert that boys are better than girls at a given subject in order to protect his gender self-esteem.

Thus, implicit and explicit measures might yield unrelated results because of motivational biases operating in explicit reports, or because implicit bias responses reflect activation of cultural knowledge or environmental cues not endorsed or acknowledged by the individual. Meta-analyses examining relations between the two have shown small, positive relations, with mean effect sizes often ranging between .20 and .24 [43,55]. Results of individual studies vary widely, however, with the strength of relations shaped by moderators such as conceptual correspondence between the two measures and other task characteristics. Findings from investigations focusing specifically on children’s academic biases and beliefs have also found small to no correlations. In their study of Singaporean children in first, third, and fifth grades, Cvencek et al. [27] found very low to no correlations between children’s implicit and explicit reports of gender differences in math ability. Implicit attitudes were unrelated to explicit math stereotypes in Passolunghi et al.’s [30] study of third, fifth, and eighth grade Italian children. Though these results could be indicative of a dissociation between implicit biases and explicit beliefs among children, they could also be indicative of the lack of correspondence between the IAT, which confounds math and language and was the implicit measure used in those studies, and their explicit measures, which focused on math.

Current study

The aim of this paper was to examine age and gender differences in implicit biases and explicit beliefs regarding gender differences in math and language abilities. By using the Affect Misattribution Procedure (AMP) [47,48] to measure implicit biases, we were able to obtain independent measures of gender biases regarding math as opposed to language abilities. We used a cross-sectional sample of youth in elementary school, middle school, and high school to test age differences.

With regard to explicit stereotypes, we expected age differences with the youngest age group most likely to favor their own gender in math. We did not have specific predictions about older children, as we could envision multiple factors influencing their beliefs, including the substantial efforts in recent decades to encourage the idea that girls can excel in math, increasing sensitivity to such social norms with age, cultural stereotypes, social status differences, and actual differences in performance. In contrast, given cultural stereotypes emphasizing girls’ success in language domains, combined with gender differences in academic performance, we expected that youth of both genders would favor girls in their explicit reports of language abilities.

With regard to youth’s implicit biases, we envisioned two potential outcomes based on two different theoretical accounts. We expected girls would show traditional math-male biases if they have assimilated cultural stereotypes that favor boys in math. In contrast, girls would favor girls in math if pervasive differences in academic performance are the primary factor shaping automatic associations about gender and math ability. We expected girls to show implicit biases favoring girls in language, regardless, as both cultural stereotypes and differences in academic performance favor girls in language domains. We were agnostic about trends for boys.

With regard to relations between implicit and explicit measures, given developmental differences across this age range and results of prior studies, we expected weak positive relations between explicit and implicit measures in each domain, with the possibility that the strength of relation might decrease with age. Whereas younger youth may be more transparent and explicitly report their automatic associations, older youth might control their responses so that their implicit biases are more dissociated from their explicit reports.

Method

Participants

Participants were 270 youth (141 girls) ages 8 to 15. Youth were recruited from public libraries and schools in the southeastern region of the United States. A sensitivity analysis conducted in G*Power (Version 3.1.9.2) [56] indicates that this sample is sufficient to detect main effects as small as η2 = .03 (f = .17) and interaction effects as small as η2 = .04 (f = .19) at .80 power.

Youth ages 8–10 were grouped into an elementary school category (n = 101, 53 girls and 48 boys, Mage = 8.9, SD = 0.7). Those ages 11–13 were grouped into a middle school category (n = 67, 25 girls and 42 boys, Mage = 11.5, SD = 0.7), and youth ages 14–15 were grouped into a high school category (n = 99, 63 girls and 36 boys, Mage = 14.4, SD = 0.8). Our sample was 49.1% White, 29.9% Black, 12.0% Hispanic, 5.2% mixed race/ethnicity, 2.2% Asian, and 0.37% other (1.23% did not report their race).

Procedures

All procedures were consistent with ethical standards of the American Psychological Association and were approved by the Institutional Review Board at the University of North Carolina at Chapel Hill. After parents provided informed consent, youth gave verbal and written assent to participate. Next, youth completed the Affect Misattribution Procedure (AMP), an implicit measure of academic stereotypes. A researcher read the initial instructions aloud and gave the participant intermittent reminders. The AMP is a computerized task that was administered on a laptop computer. Finally, participants completed a paper survey that included an explicit measure of academic stereotypes, along with other measures not included in the current report. If needed, a researcher assisted younger participants in reading the instructions and questions, but to prevent social desirability effects, the researcher read from a different survey, facing away from the child in order not to look at his or her responses. The research team included both men and women, African Americans, non-Hispanic Whites, and Hispanics. All stimulus materials reported here can be found in S1 File.

Children and adolescents were recruited from schools and from the community through announcements posted in public locations. Data collection was conducted in public libraries, at a local YMCA, and in four participating schools. In each of those settings, testing took place individually in a quiet and secluded area, facing away from other people.

Measures

Implicit bias

We used the Affect Misattribution Procedure (AMP) [48] to measure implicit bias. The AMP has been validated as a measure of implicit biases in adults [47] and also in children ranging from 4 to 12 years old [5759]. Each trial of the AMP began with a brief presentation (200 ms) of a photo on the computer screen. The photographs used as primes included 40 images of early adolescents—20 girls and 20 boys, balanced in race (Black and White). The photos were selected based on a pilot study to ensure that photos of the two genders did not differ on perceived attractiveness, age, or mood. Internal consistency for the measure was α = .40 (procedure outlined in [47], Experiment 1).

Following the randomly selected photograph, a black and white pattern (125 ms) and a Chinese symbol were presented (150 ms). A black and white pattern then appeared until the participant made a response. Participants were instructed to ignore the photograph and make a judgment about the meaning of the Chinese symbol. They made these judgements in two blocks of trials, one for math and one for language. Each block consisted of 40 trials. For example, in one block participants guessed whether each symbol was a word related to the ideas of “good at math” versus “bad at math.” In the next block, they guessed whether each symbol was related to being “good at language arts” versus “bad at language arts.” The keys on the keyboard were clearly labeled “good” or “bad” (in place of the “F” and “J” keys). Participants were told that each symbol was “a word from the Chinese alphabet,” and that we were interested in how people make guesses about the meaning of words. The instructions further specified that the participant should rate about half of the symbols as good at math (or language arts) and half as bad at math (or language arts). Two other school domains (sports and science) were included in separate blocks, but are not the focus of this report. Block presentation was counterbalanced across participants. We do not have information about whether any participants were familiar with the Chinese symbols in our study or on their thoughts about the cover story. However, we should note that the believability of the cover story is superfluous to the task’s objective, which was to rate ambiguous stimuli preceded by a prime. The mechanism by which the AMP functions (i.e., misattribution of affect/semantic content from prime to target) does not depend on the construal of the task. Familiarity with the Chinese symbols does present a problem, which is that the symbols would no longer be neutral, weakening priming effects. Thus, though unlikely given our sample’s demographics, we should note that the effects reported here might be underestimates.

Participants were reminded between blocks that they should ignore the photo of the person and to just focus on the symbol. The sequence of photographs was randomized within each block, and the sequence of domains (language, math) was randomized across participants.

Each participant had two implicit bias scores for each domain, one representing their implicit bias regarding girls (i.e., the proportion of times the student designated “good in math” [or language] after seeing a photo of a girl) and one representing their implicit bias regarding boys (i.e., the proportion of times the student selected “good in math” [or language] after seeing a photo of a boy). Because these scores were proportions of the total number of times they viewed stimuli of each gender, scores had the possible range of zero to 1.00, with higher scores indicating greater bias favoring the gender of the prime photograph, which we refer to as the prime gender.

Explicit beliefs

Youth used a visual analog scale (VAS), consisting of a 100 mm horizontal line, to indicate with a vertical mark how well they thought boys or girls performed on a specific academic subject and how difficult they thought boys or girls found the subject. They could place a mark anywhere on the line, which allowed them to give very low or high ratings without having to choose the extreme option, as is the case with Likert scales. This attribute of VAS lines is important when measuring beliefs or attitudes that are sensitive to social desirability effects, such as stereotypes.

Participants answered two items regarding math and two items regarding language ability (e.g. I think that in MATH boys do this well…, and I think that boys find MATH…, with the extremes of the line labeled from “not well at all” to “very well” and from “very hard” to “very easy,” respectively). Each item was answered separately in regard to boys and girls, and each gender group was represented on a separate page. Youth rated the competence of boys and girls in other domains, both academic and non-academic, but those data are not the focus of this report. Items were scored by measuring the distance in millimeters from the left scale anchor to the line drawn by the respondent for each item. The two items corresponding to each subject and gender were averaged. Scores ranged from zero to 100, and the correlations between the two items in each measure (i.e., math and language) ranged from r = .53 to r = .61. Higher values indicate endorsement of greater competence in math/language.

Results

All data for this study can be found in the online repository: https://osf.io/fv5h8/. Our exclusion criteria for the implicit bias task (pressing the same key on all trials or alternating keys on all trials) did not apply to any participant.

Implicit gender bias

Tables 1 and 2 show the average proportion of prime photos that youth associated with “good at math" or “good at language,” respectively, split by prime gender and participant characteristics. To assess implicit biases regarding math and language ability in boys and girls, we conducted a 2(Participant Gender) x 3(Age Group) x 2(Academic Subject) x 2(Prime Gender) ANOVA, with Participant Gender and Age Group as between-subject factors, Academic Subject and Prime Gender as within-subjects factors, and implicit scores as the dependent variable.

Table 1. Math implicit bias scores by age group, participant gender and prime gender.

Prime gender
Girls Boys
Age Group Participant Gender Mean Proportion “Good at math” SD Mean Proportion “Good at math” SD n
Elementary School Girls .541 .122 .478 .126 51
Boys .527 .124 .547 .097 47
Middle School Girls .571 .107 .479 .132 24
Boys .524 .086 .525 .114 42
High School Girls .558 .129 .527 .130 62
Boys .543 .108 .539 .130 36

SD = standard deviation. Girls at all three ages showed implicit bias favoring girls in math.

Table 2. Language implicit bias scores by age group, participant gender and prime gender.

Prime gender
Girls Boys
Age Group Participant Gender Mean Proportion “Good at language arts” SD Mean Proportion “Good at language arts” SD n
Elementary School Girls .553 .121 .473 .124 51
Boys .532 .124 .518 .132 47
Middle School Girls .544 .132 .548 .136 24
Boys .500 .109 .520 .086 42
High School Girls .569 .126 .527 .123 62
Boys .560 .125 .572 .144 36

SD = standard deviation.

The main effect of Prime Gender was significant, F(1, 256) = 6.79, p = .010, η2 = .03, and was qualified by a significant Participant Gender x Prime Gender interaction, F(1, 256) = 10.94, p = .001, η2 = .04. Fig 1 displays that girls of all ages showed an implicit own-gender bias. Girls rated a greater proportion of symbols preceded by photos of girls as good at math and language arts compared to symbols preceded by photos of boys, and this difference was statistically significant (see Table 3). Boys, on the other hand, did not show implicit bias. The Participant Gender x Academic Subject x Prime Gender interaction was not significant, F(1, 256) = 0.62, p = .432, failing to provide evidence that boys and girls differed in the extent to which they showed consistent gender-domain associations. None of the other interactions were significant. These results show implicit biases as being invariant across age, with girls holding an implicit in-group bias for both math and language, and boys not associating either math or language with either gender preferentially.

Fig 1. Estimated marginal means for implicit bias scores by participant gender.

Fig 1

Values indicate the proportion each prime gender associated with “good at math” or “good at language arts.” Bars represent standard errors.

Table 3. Pairwise comparisons of math implicit bias scores by participant gender.

Participant Gender Mean Diff Prime Gender (Girls—Boys) SE p 95% CI n
Girls .051* .012 .000 [.026, .075] 137
Boys -.006 .012 .614 [-.030, .017] 125

Diff = difference, SE = standard error, p = probability value, CI = confidence intervals, n = number of participants. Adjustment for multiple comparisons: Bonferroni. Mean differences represent the proportion of symbols rated “good at math/language” when the prime was a girl minus the proportion when it was a boy.

An alternative interpretation of our results is that the AMP was measuring generalized gender biases (e.g., girls = good) rather than domain-specific biases (e.g., girls = good at math). To test this possibility, we examined girls’ and boys’ implicit biases regarding sports ability (one of the domains included in the study, but not the focus of this report). We report the results in our S1 File. In short, boys showed evidence of implicit bias favoring boys for the sports domain, but girls showed no implicit bias. Although these results might still reflect generalized biases (i.e., girls = good at academics / boys = good at athletics), they suggest that the AMP was sensitive to domain category and not just valance. In theory, if the instrument distinguishes between broad categories (e.g., sports versus academics), then it can distinguish between the categories specified in the actual task (sports v. math v. language), unless the effects of the specific academic subjects are so small as to be overwhelmed by the broader category. This latter point is in itself informative and contrary to the current understanding of girls’ implicit biases.

Overall, these results challenge the assumption that girls automatically associate math ability with boys rather than girls, and they suggest that measurement differences could explain ostensibly conflicting results from previous studies. Of course, more research is needed in this area to be conclusive, but the current study highlights the importance of measuring math and language implicit biases separately to better understand children’s automatic associations between gender and domain-specific abilities. Especially in light of persistent gender disparities in STEM, this issue deserves careful attention, as children’s perceptions of language domains may have implications for the interests they develop and the trajectories they pursue.

Explicit gender beliefs

Tables 4 and 5 show the average competence scores that children gave to boys and girls, split by participant characteristics. To assess gender differences in children’s explicit beliefs regarding math and language ability in boys and girls, we conducted a 2(Participant Gender) x 3(Age Group) x 2(Academic Subject) x 2(Target Gender) ANOVA, with Participant Gender and Age Group as between-subject factors, Academic Subject and Target Gender as within-subject factors, and explicit scores as the dependent variable. We found significant main effects of Academic Subject, F(1, 258) = 11.21, p = .001, η2 = .05, Target Gender, F(1, 258) = 57.64, p < .001, η2 = .18, and Age Group, F(2, 258) = 6.71, p = .001, η2 = .05. These main effects were qualified by significant two-way and three-way interactions. We only describe the three-way interactions here, as they qualify the two-way interactions (see the S1 File for full results). The Age Group x Participant Gender x Target Gender interaction was significant, F(2, 258) = 6.82, p = .001, η2 = .05. Adjusting for multiple comparisons using a Bonferroni adjustment, we found that with scores collapsed across the two academic domains, girls of all three age groups favored girls, whereas boys favored girls in middle school and high school, but not in elementary school (statistics appear in Table 6).

Table 4. Math explicit belief scores by participant gender, age group, and target gender.

Target gender
Girls Boys
Age Group Participant Gender Mean math competence SD Mean math competence SD n
Elementary School Girls 74.94 18.53 57.43 26.83 53
Boys 72.32 16.88 73.10 18.17 47
Middle School Girls 63.05 17.59 61.66 16.93 25
Boys 72.48 17.70 65.21 14.32 42
High School Girls 67.23 17.90 63.65 16.65 62
Boys 61.87 9.80 59.95 10.42 36

SD = standard deviation, n = number of participants.

Table 5. Language explicit belief scores by age group, participant gender and target gender.

Target gender
Girls Boys
Age Group Participant Gender Mean language competence SD Mean language competence SD n
Elementary School Girls 77.30 20.06 58.96 27.11 53
Boys 76.43 16.24 73.40 18.92 47
Middle School Girls 76.03 15.13 61.66 18.59 25
Boys 75.29 18.39 65.67 15.80 42
High School Girls 72.59 14.87 58.97 15.15 61
Boys 70.00 12.50 54.06 13.96 36

SD = standard deviation, n = number of participants.

Table 6. Pairwise comparisons of explicit scores by age group and participant gender.

Participant Gender Age group Mean Diff (Girls—Boys) SE p 95% CI n
Girls Elementary School 17.92* 2.49 < .001 [13.02, 23.43] 53
Middle School 7.88* 3.62 .030 [-0.75, 15.01] 25
High School 8.61* 2.32 < .001 [4.04, 13.17] 63
Boys Elementary School 1.13 2.64 .670 [-4.07, 6.33] 48
Middle School 8.45* 2.79 .003 [2.94, 13.95] 42
High School 8.93* 3.02 .003 [2.98, 14.87] 36

Diff = difference, SE = standard error, p = probability value, CI = confidence intervals, n = number of participants. Adjustment for multiple comparisons: Bonferroni. Mean differences represent how much more competent participants rated girls to be in math and language compared to boys. Asterisks indicate that girls were rated as more capable than boys.

The Age Group x Academic Subject x Target Gender interaction was also significant, F(2, 258) = 3.51, p = .031, η2 = .03. Children favored girls in language across all three age groups. In contrast, in the case of math, only children in elementary school showed a bias, favoring girls over boys in math. Youth in middle school and high school did not show a gender bias in explicit reports of math ability. These results should be interpreted with caution, though, as the effect size is below the threshold calculated by our sensitivity analysis.

Though the four-way interaction was not significant, perhaps due to low power, we conducted pairwise comparisons to directly test our hypothesis that the youngest children would favor their own group in math. We report our results in the S1 File. Our hypothesis was partly supported; elementary-school girls reported girls as being better than boys at math. Elementary-school boys were neutral regarding gender differences, but they gave boys a significantly higher rating in math than elementary-school girls gave boys. Our hypothesis that children of all age groups would favor girls over boys in language ability was also partly supported; with the exception of elementary-school-aged boys (who were neutral), children favored girls over boys in language ability.

In summary, regardless of age group, girls explicitly endorsed an in-group preference in both math and language. Boys, on the other hand, explicitly favored girls over boys in math and language only later in development (i.e., middle school and high school, but not elementary school). These results are somewhat consistent with what we predicted. We hypothesized that younger children would show an in-group preference in math. Though that was not exactly the case for boys, younger boys were, on average, less prone than older boys to explicitly favor girls. Unexpectedly, older children of both genders rated girls’ ability in math and language as superior to that of boys. Their agreement on these gender differences could be due to their observations of academic performance within their classrooms and schools.

Correlations between explicit and implicit measures

Finally, we computed bivariate correlations between implicit bias and explicit stereotypes. Implicit math gender biases were not correlated with explicit math stereotypes, r(259) = -.03, 95% CI = [-.15, .09], p = .624. Neither were implicit language biases correlated with explicit language stereotypes, r(258) = .11, 95% CI = [-.01, .23], p = .076. Bivariate correlations of implicit and explicit stereotype scores for each domain, calculated separately for each participant gender and age group revealed a similar pattern after adjusting the alpha level to .008 for multiple tests. Implicit biases and explicit stereotypes were not correlated for math or language across any age group for either boys or girls (all p’s > .008).

Correlations between pairs of implicit scores and pairs of gender group competence scores are presented in Table 7 (correlations split by participant gender are reported in the S1 Table in S1 File, but in general, they follow the same pattern). For these correlations, for implicit scores we used the proportion of items in which [girls; boys] were associated with the “good in” prompt; for explicit group competence, we used the average of the two explicit items for each gender. All explicit gender group competence ratings were positively associated. For example, youth who rated boys as highly competent in math also tended to rate boys as highly competent in language, and youth who rated boys as competent in language tended to also rate girls as capable in language, r’s = .672 and .328, respectively. In contrast, implicit scores were positively correlated only within gender.

Table 7. Correlations among implicit bias scores (above the diagonal) and explicit gender group competence (below the diagonal).

Math-Boys Math-Girls Language-Boys Language-Girls
Math-Boys .021 .281*** .111
Math-Girls .148* .016 .277***
Language-Boys .672*** .328*** -.027
Language-Girls .308*** .499*** .332***

N = 263 for implicit bias scores and N = 267 for explicit stereotypes.

Discussion

Because of the salience of gender identity for most children and adolescents, youths’ perceptions of gender differences in academic skills are posited to shape perceptions of the self, classroom motivation and behaviors, and long-term career goals [1,2,60]. In the current study, we examined explicit beliefs and implicit biases regarding perceptions of gender differences in math and language abilities in a cross-sectional sample of youth from elementary school, middle school, and high school. In addition to testing possible age and gender differences, we also measured correlations between implicit and explicit gender stereotypes. We found that girls showed in-group preferences in math and language across both implicit and explicit measures. These findings are consistent with gender differences in academic performance at the national level, and they contradict traditional math stereotypes. Boys, on the other hand, showed no implicit biases within any of the three age groups. Explicitly, boys in middle school and high school favored girls in math and language. Boys in elementary school reported egalitarian beliefs on the explicit measure.

The results of this study, at first glance, appear to be at odds with persistent gender disparities in STEM careers. They also contradict previous implicit bias findings that have shown math implicit biases among girls. In the next sections, we discuss potential explanations for these results and their implications.

Children and adolescents’ implicit biases regarding gender differences in abilities

An important contribution of the current study is that we used a measure of implicit gender biases in which math and language abilities were not confounded. Most prior research in this area has used the Implicit Association Task, in which the two pairings (e.g., boys-math; girls-language) are not measured independently. Of note, authors of those studies have often interpreted their results as indicating an implicit bias favoring boys in math, and have used their measures to predict math-related outcomes. However, our results suggest that girls hold implicit biases favoring girls in both math and language. In supplemental analyses, we examined whether these associations might have been an artifact of the measure’s sensitivity, which may not have been granular enough to test domain-specific associations, but only broader associations (e.g., “girls = good” rather than “girls = good at math). We examined implicit biases regarding sports ability and found a different pattern of results from academic biases. Girls were neutral whereas boys favored boys in sports ability. Although these biases may still reflect a general association between girls and good academic performance, they suggest the AMP was sensitive to domain-category and challenge prior assumptions that girls implicitly favor boys over girls in math. At a minimum, our study suggests that gender math associations do not override the positive associations that girls have about girls’ general academic performance compared to boys.’ Another interpretation is that girls’ associate girls more than boys with good math performance, in line with their explicit reports.

It is possible that a strong girls-language association—an association that may be more likely to emerge on both implicit and explicit measures (rather than just implicit) due to its relative social acceptability—may be a particularly powerful predictor of both better language outcomes and, perhaps, worse math outcomes. To our knowledge, our study is only the third to use an implicit bias measure that disambiguates math and language biases allowing us to better assess these nuanced research questions. In the two earlier studies, one study found a counter-stereotypical math-female bias among girls [36], and the other found no math bias among girls [46]. Measurement differences could account for these ostensibly conflicting findings in the literature, and future research should continue to disentangle gendered associations within different domains to clarify their relation to meaningful academic outcomes.

Using the AMP, we found that in contrast to explicit reports, which differed across age, gender, and academic domains, youths’ implicit biases differed primarily by gender of the respondent, with girls favoring girls in both domains and boys showing egalitarian responses. These responses to our implicit bias measure might reflect a combination of in-group preference as well as youths’ lived reality of gender differences in school performance. Beginning with school entry and continuing throughout primary and secondary education, girls receive better grades than boys, are rated by teachers and parents as more engaged in schoolwork, and have higher graduation rates [21,24,25]. In addition, some scholars have suggested there is a discord between traditional norms of masculinity and behaviors that promote academic success such as help-seeking and cooperation [61,62]. These factors have led scholars to posit that school is perceived as a feminine domain. Indeed, using an implicit measure, Heyder and Kessels [63] found that German ninth graders associated school with girls more strongly than with boys, and that boys’ tendencies to view school as feminine and to ascribe negative masculine traits to themselves were related to lower grades in German. The view of academic success as a feminine trait may have led girls in the present study to show implicit biases favoring girls in both domains, whereas for boys, those views may have been tempered by a tendency to show in-group preference, resulting in their egalitarian scores on the task.

Though, at first glance, our results might not appear consistent with gender disparities in STEM careers, they are revealing in that they support recent theoretical frameworks suggesting that girls opt out of math, not due to perceived deficit in math ability compared to boys, but due to perceived strength in language ability over math ability. For example, a large international study of 15-year-old students found that girls’ comparative advantage in reading as opposed to math can largely explain gender disparities in intentions to pursue math-related careers [64]. In that study, girls who were found to be good at math were more likely than boys to be even better at reading than at math. The gap between math and reading performance accounted completely for gender differences in math self-concept, interest in math, and attitudes towards math. Other studies have also found that intra-individual contrasts of math and language abilities predict STEM disparities. In a longitudinal study of twelfth grade students, those with high ability in both math and language (more girls than boys) were less likely to pursue STEM careers than those with high ability in math and moderate ability in language [6]. Although cultural stereotypes can still be detrimental to girls insofar as they elicit stereotype-threat effects [65] or signal lack of belonging [66], our results imply that girls hold positive associations about their gender group across both math and language ability, consistent with models that depict girls as having more choices in their pursuits, rather than being bound by real or perceived ability constraints. Research that distinguishes between math and language implicit beliefs, then, is important because it can lead to different conclusions about the type of interventions that might be effective for reducing STEM disparities.

Youths’ explicit reports of gender differences in language and math abilities

According to social identity theory [7], in-group preferences frequently emerge when individuals identify with a social category such as gender, and young children (as compared to older children and adolescents) are particularly likely to show in-group preferences [8,9]. We hypothesized that elementary school-aged youth would demonstrate such in-group preferences on explicit measures. On the other hand, because adolescents are more likely than younger children to be aware of conflicting information such as cultural stereotypes favoring boys in math, gender differences in school performance favoring girls, and campaigns in recent decades to promote girls’ math engagement, we did not have specific predictions about their explicit beliefs regarding math. In regard to language ability, we predicted that boys and girls would favor girls over boys because cultural stereotypes, gender differences in school performance, and the social acceptability of the language-female stereotype are all congruent. Youths’ explicit reports were generally consistent with those hypotheses: Elementary-school aged girls favored girls in both domains, whereas boys reported egalitarian beliefs regarding both language and math. Because of the consistently better average performance of girls in elementary school, these egalitarian reports of boys may represent a lack of calibration to actual achievement disparities and, thus, might be considered as a type of in-group preference. In contrast, for the two older age groups, both boys and girls reported explicit stereotypes favoring girls in math and language abilities.

These explicit reports of adolescents reflect to some extent youth’s performance on U.S. national standardized tests. On the National Assessment of Educational Progress (NAEP) exams, girls outperform boys in reading at every grade level, and gender differences in math disappeared in 1996 [67]. Adolescents’ beliefs about girls’ superiority in math ability might reflect both cultural impetus to encourage girls to take math courses and women to consider math-related careers as well as historical change in the number of women pursuing math, engineering, and computer science degrees. However, although the number of women in math-related fields has increased substantially in the last generation, some gender differences persist. In high school, boys are more likely than girls to take Advanced Placement exams in BC Calculus and Physics [68]. Although women are equally as likely as men to major in math in college, in 2017 approximately twice as many doctoral degrees were awarded to men as to women in the physical and earth sciences, mathematics, and computer sciences, and almost three times as many in engineering [69]. Our results are consistent with the view that although we have not yet reached gender parity in math-intensive fields, the historical landscape of gender differences in math may be changing.

Relations between implicit and explicit stereotypes

Important contributions of this study were the age comparisons and also the examination of relations between implicit biases and explicit beliefs. Though we had expected to find weak positive relations, especially for elementary-school youth, we did not find any significant correlations between implicit biases and explicit beliefs across any age or gender groups.

These results differ from the meta-analytic findings regarding implicit-explicit correlations in the adult literature [43]. Hofmann and colleagues [43] found that, although general implicit-explicit representations are associated about 0.24 on average, the correlation is lower for stereotypes. They also found that, in adults, a critical moderator of implicit-explicit correlations is the spontaneity of explicit reports. When people relied more on “gut” reactions to report their beliefs, there was a greater congruence between implicit and explicit scores. The lack of significant correlations between math and language implicit gender biases and explicit gender stereotypes in our sample suggests that children as young as 8 years old are not reporting their “gut reactions” or automatic associations to explicit questions about gender differences. Rather, they are controlling their explicit responses. It is also possible that the lack of correspondence in our study might be linked to our choice of measures, and that a different methodological approach might yield significant relations between children’s implicit associations and their explicit reports.

Limitations and recommendations for future research

A significant limitation of the current study was that we had insufficient sample size to explore our research questions within racial/ethnic subgroups. Prior research has shown that gender stereotypes about academic abilities and students’ responses to stereotype threat vary according to racial/ethnic identity [18,70,71]. However, we were unable to test whether explicit and implicit beliefs differed across groups due to the lack of racial/ethnic diversity within our sample. Because of the cultural specificity of many gender stereotypes, including academic stereotypes [72], research using either a sample that is homogeneous or samples in which racial, ethnic, or national groups are compared might further advance our understanding of children and adolescents’ academic stereotypes.

A second limitation of the study is that we did not evaluate the personal relevance of the math and verbal domains, and therefore an important caveat to our conclusions is that it is unclear to what extent responses indicate a personal connection to the academic domain versus perceptions of gender differences in ability. For example, although several studies have found that young children tend to show own-gender preferences in competence reports, Cvencek et al. [45] found that as early as second grade, boys showed stronger “me-math” associations than girls. Our measures involved judgements of the gender group in general rather than the individual child’s connection to that domain. Girls’ implicit bias favoring girls in math in the current study in spite of gender differences at the national level in selections of high school math course-taking would indicate that the implicit scores are a stronger reflection of perceptions of gender-group competence rather than individual identification with the domain. Future research should examine the degree to which gender group competence associations differ from gender differences in individual identification, and age differences in those effects.

A third limitation of the study is that we assessed students’ implicit associations and explicit beliefs regarding gender differences in the abilities of youth targets, but not adults. Some studies have shown that children apply cultural academic stereotypes to adults more readily than they do to children [e.g., 33]. Although the explicit reports of adolescents in our sample favored girls in math, youth may have favored men over women had we used adult targets. Such results would reflect gender differences in career choices that favor men in STEM domains and would also be consistent with stereotype threat effects that show performance decrements for women when gender identity is made salient in test situations.

Finally, we note that we did not ask participants whether they were familiar with the Chinese symbols used as neutral stimuli. Although familiarity was unlikely in our sample due to the demographics and location, the effects reported here may be underestimates due to weakened priming effects.

Despite these limitations, this study advances our understanding of youth’s implicit biases and explicit beliefs regarding gender differences in academic abilities. We found no correlations between implicit biases and explicit reports, suggesting that youth as young as 8 years old are controlling their responses on explicit measures instead of reporting their automatic associations. Analyses of youth’s explicit reports suggested that youth are using information about academic performance and gender stereotype knowledge to adjust their responses, especially as they age. A major goal of our study was to analyze youth’s implicit reports across several age groups. Youth’s implicit biases were consistent with national gender differences in academic performance, especially for girls. These findings suggest that girls across a wide age range automatically associate good math performance with girls, rather than boys. A key takeaway of both our explicit and implicit findings is that girls strongly favor their in-group in both math and language. These beliefs and positive associations regarding girls’ language abilities may, in part, contribute to STEM disparities by giving girls more choices than boys, who perceive their gender group as less-qualified in non-STEM domains.

Supporting information

S1 File

(PDF)

Data Availability

All relevant data are uploaded to the Open Science Framework database and publicly accessible via the following URL: https://osf.io/fv5h8/.

Funding Statement

This research was supported by a grant from the National Institute of Child Health and Human Development (https://www.nichd.nih.gov/)#1R03HD072025-01A1 awarded to BKC and KP. The first author was supported by a National Science Foundation Graduate Fellowship and the Paul and Daisy Soros Fellowships for New Americans. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Bian L, Leslie SJ, Cimpian A. Gender stereotypes about intellectual ability emerge early and influence children’s interests. Science (80-). 2017;355: 389–391. [DOI] [PubMed] [Google Scholar]
  • 2.Ceci SJ, Williams WM, Barnett SM. Women’s underrepresentation in science: sociocultural and biological considerations. Psychol Bull. 2009;135: 218–261. 10.1037/a0014412 [DOI] [PubMed] [Google Scholar]
  • 3.Skinner OD, Perkins K, Wood DA, Kurtz-Costes B. Gender development in African American youth. J Black Psychol. 2016;42: 394–423. 10.1177/0095798415585217 [DOI] [Google Scholar]
  • 4.Nosek BA, Smyth FL, Sriram N, Lindner NM, Devos T, Ayala A, et al. National differences in gender-science stereotypes predict national sex differences in science and math achievement. PNAS. 2009;106: 10593–10597. 10.1073/pnas.0809921106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cheryan S, Master A, Meltzoff AN. Cultural stereotypes as gatekeepers: Increasing girls’ interest in computer science and engineering by diversifying stereotypes. Front Psychol. 2015;6: 49 10.3389/fpsyg.2015.00049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wang M-T, Eccles JS, Kenny S. Not lack of ability but more choice: Individual and gender differences in choice of careers in science, technology, engineering, and mathematics. Psychol Sci. 2013;24: 770–775. 10.1177/0956797612458937 [DOI] [PubMed] [Google Scholar]
  • 7.Tajfel H, Turner JC. The social identity theory of intergroup behavior. In: Jost JT, Sidanius J, editors. Key readings in social psychology Political psychology: Key readings. 1986. pp. 276–293. 10.4324/9780203505984-16 [DOI] [Google Scholar]
  • 8.Halim MLD, Ruble DN, Tamis-LeMonda CS, Shrout PE, Amodio DM. Gender attitudes in early childhood: Behavioral consequences and cognitive antecedents. Child Dev. 2017;88: 882–899. 10.1111/cdev.12642 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kurtz-Costes B, DeFreitas SC, Halle T, Kinlaw CR. Gender and racial favoritism in Black and White preschool girls. Br J Dev Psychol. 2011;29: 270–287. 10.1111/j.2044-835X.2010.02018.x [DOI] [PubMed] [Google Scholar]
  • 10.Abrams DB, Rutland A, Cameron L. The development of subjective group dynamics: Children’s judgments of normative and deviant in-group and out-group individuals. Child Dev. 2003;74: 1840–1856. 10.1046/j.1467-8624.2003.00641.x [DOI] [PubMed] [Google Scholar]
  • 11.Fehr E, Bernhard H, Rockenbach B. Egalitarianism in young children. Nature. 2008;454: 1079–1083. 10.1038/nature07155 [DOI] [PubMed] [Google Scholar]
  • 12.Horn TS, Weiss MR. A developmental analysis of children’s self-ability judgements in the physical domain. Pediatr Exerc Sci. 1991;3: 310–326. [Google Scholar]
  • 13.Shaw A, Montinari N, Piovesan M, Olson KR, Gino F, Norton MI. Children develop a veil of fairness. J Exp Psychol Gen. 2014;143: 363–375. 10.1037/a0031247 [DOI] [PubMed] [Google Scholar]
  • 14.Blanton H, Christie C, Dye M. Social identity versus reference frame comparisons: The moderating role of stereotype endorsement. J Exp Soc Psychol. 2002;38: 252–267. 10.1006/jesp.2001.1510 [DOI] [Google Scholar]
  • 15.Ruble D, Martin CL, Berenbaum SA. Gender development 6th ed In: Eisenberg N, Damon W, Lerner RM, editors. Handbook of child psychology: Social, emotional, and personality development. 6th ed. New Jersey: Wiley; 2006. pp. 864–911. [Google Scholar]
  • 16.McKown C, Weinstein RS. The development and consequences of stereotype consciousness in middle childhood. Child Dev. 2003;74: 498–515. 10.1111/1467-8624.7402012 [DOI] [PubMed] [Google Scholar]
  • 17.Nasir NS, Mckinney De Royston M, O’connor K, Wischnia S. Knowing about racial stereotypes versus believing in them. Urban Educ. 2017;52: 491–524. 10.1177/0042085916672290 [DOI] [Google Scholar]
  • 18.Copping KE, Kurtz-Costes B, Rowley SJ, Wood D. Age and race differences in racial stereotype awareness and endorsement. J Appl Soc Psychol. 2013;43: 971–980. 10.1111/jasp.12061 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Rowley SJ, Kurtz-Costes B, Mistry R, Feagans L. Social status as a predictor of race and gender stereotypes in late childhood and early adolescence. Soc Dev. 2007;16: 150–168. 10.1111/j.1467-9507.2007.00376.x [DOI] [Google Scholar]
  • 20.Else-Quest NM, Hyde JS, Linn MC. Cross-national patterns of gender differences in mathematics: A meta-analysis. Psychol Bull. 2010;136: 103–127. 10.1037/a0018053 [DOI] [PubMed] [Google Scholar]
  • 21.Voyer D, Voyer SD. Gender Differences in Scholastic Achievement: A Meta-Analysis. Psychol Bull. 2014;140 10.1037/a0036620 [DOI] [PubMed] [Google Scholar]
  • 22.Riegle-Crumb C. More girls go to college: Exploring the social and academic factors behind the female postsecondary advantage among Hispanic and White students. Res High Educ. 2010;51: 573–593. 10.1007/s11162-010-9169-0 [DOI] [Google Scholar]
  • 23.Hudley C, Graham S. Stereotypes of achievement striving among early adolescents. Soc Psychol Educ. 2001;5: 201–224. [Google Scholar]
  • 24.Buchmann C, DiPrete TA, McDaniel A. Gender inequalities in education. Annu Rev Sociol. 2008;34: 319–337. 10.1146/annurev.soc.34.040507.134719 [DOI] [Google Scholar]
  • 25.Cooper KS. Eliciting engagement in the high school classroom: A mixed methods examination of teaching practices. Am Educ Res J. 2014;51: 363–402. 10.3102/0002831213507973 [DOI] [Google Scholar]
  • 26.Hartley BL, Sutton RM. A stereotype threat account of boys’ academic underachievement. Child Dev. 2013;84: 1716–1733. 10.1111/cdev.12079 [DOI] [PubMed] [Google Scholar]
  • 27.Cvencek D, Meltzoff AN, Kapur M. Cognitive consistency and math–gender stereotypes in Singaporean children. J Exp Child Psychol. 2014;117: 73–91. 10.1016/j.jecp.2013.07.018 [DOI] [PubMed] [Google Scholar]
  • 28.Galdi S, Cadinu M, Tomasetto C. The roots of stereotype threat: When automatic associations disrupt girls’ math performance. Child Dev. 2014;85: 250–263. 10.1111/cdev.12128 [DOI] [PubMed] [Google Scholar]
  • 29.Tomasetto C, Galdi S, Cadinu M. Quando l’implicito precede l’esplicito: gli stereotipi di genere sulla matematica in bambine e bambini di 6 anni. Psicol Soc. 2012;7: 169–185. 10.1482/37693 [DOI] [Google Scholar]
  • 30.Passolunghi MC, Rueda Ferreira TI, Tomasetto C. Math-gender stereotypes and math-related beliefs in childhood and early adolescence. Learn Individ Differ. 2014;34: 70–76. 10.1016/j.lindif.2014.05.005 [DOI] [Google Scholar]
  • 31.Steffens MC, Jelenec P, Noack P. On the leaky math pipeline: Comparing implicit math-gender stereotypes and math withdrawal in female and male children and adolescents. J Educ Psychol. 2010;102: 947–963. 10.1037/a0019920 [DOI] [Google Scholar]
  • 32.Kurtz-Costes B, Cooping KE, Rowley SJ, Kinlaw CR. Gender and age differences in awareness and endorsement of gender stereotypes about academic abilities. Eur J Psychol Educ. 2014;29: 603–618. 10.1007/s10212-014-0216-7 [DOI] [Google Scholar]
  • 33.Martinot D, Désert M. Awareness of a gender stereotype, personal beliefs and self-perceptions regarding math ability: When boys do not surpass girls. Soc Psychol Educ. 2007;10: 455–471. 10.1007/s11218-007-9028-9 [DOI] [Google Scholar]
  • 34.Chatard A, Guimond S, Selimbegovic L. “How good are you in math?” The effect of gender stereotypes on students’ recollection of their school marks. J Exp Soc Psychol. 2007;43: 1017–1024. 10.1016/j.jesp.2006.10.024 [DOI] [Google Scholar]
  • 35.Morrissey K, Hallett D, Bakhtiar A, Fitzpatrick C. Implicit math-gender stereotype present in adults but not in 8th grade. J Adolesc. 2019;74: 173–182. 10.1016/j.adolescence.2019.06.003 [DOI] [PubMed] [Google Scholar]
  • 36.Nowicki EA, Lopata J. Children’s implicit and explicit gender stereotypes about mathematics and reading ability. Soc Psychol Educ. 2017;20: 329–345. 10.1007/s11218-015-9313-y [DOI] [Google Scholar]
  • 37.Plante I, Theoret M, Favreau OE. Student gender stereotypes: Contrasting the perceived maleness and femaleness of mathematics and language. Educ Psychol. 2009;29: 385–405. 10.1080/01443410902971500 [DOI] [Google Scholar]
  • 38.Galdi S, Mirisola A, Tomasetto C. On the relations between parents’ and children’s implicit and explicit academic gender stereotypes. Psicol Soc. 2017;12: 215–238. 10.1482/87248 [DOI] [Google Scholar]
  • 39.Rutland A, Cameron L, Milne A, McGeorge P. Social norms and self-presentation: Children’s implicit and explicit intergroup attitudes. Child Dev. 2005;76: 451–466. 10.1111/j.1467-8624.2005.00856.x [DOI] [PubMed] [Google Scholar]
  • 40.Fazio RH, Olson MA. Implicit measures in social cognition research: Their meaning and use. Annu Rev Psychol. 2003;54: 297–327. 10.1146/annurev.psych.54.101601.145225 [DOI] [PubMed] [Google Scholar]
  • 41.Gawronski B, Bodenhausen G V. Associative and propositional processes in evaluation: An integrative review of implicit and explicit attitude change. Psychol Bull. 2006;132: 692–731. 10.1037/0033-2909.132.5.692 [DOI] [PubMed] [Google Scholar]
  • 42.Petty RE, Fazio RH, Brinol P. The new implicit measures: An overview In: Petty RE, Fazio RH, Brinol P, editors. Attitudes: Insihgts from the New Implicit Measures. New York: Psychology; 2008. pp. 3–19. [Google Scholar]
  • 43.Hofmann W, Gawronski B, Gschwendner T, Le H, Schmitt M. A meta-analysis on the correlation between the Implicit Association Test and explicit self-report measures. Personal Soc Psychol Bull. 2005;31: 1369–1385. 10.1177/0146167205275613 [DOI] [PubMed] [Google Scholar]
  • 44.del Río MF, Strasser K, Cvencek D, Susperreguy MI, Meltzoff AN. Chilean kindergarten children’s beliefs about mathematics: Family matters. Dev Psychol. 2019;55: 687–702. 10.1037/dev0000658 [DOI] [PubMed] [Google Scholar]
  • 45.Cvencek D, Meltzoff AN, Greenwald AG. Math-gender stereotypes in elementary school children. Child Dev. 2011;82: 766–779. 10.1111/j.1467-8624.2010.01529.x [DOI] [PubMed] [Google Scholar]
  • 46.Steffens MC, Jelenec P. Separating implicit gender stereotypes regarding math and language: Implicit ability stereotypes are self-serving for boys and men, but not for girls and women. Sex Roles. 2011;64: 324–335. 10.1007/s11199-010-9924-x [DOI] [Google Scholar]
  • 47.Payne BK, Cheng CM, Govorun O, Stewart BD. An inkblot for attitudes: Affect misattribution as implicit measurement. J Pers Soc Psychol. 2005;89: 277–293. 10.1037/0022-3514.89.3.277 [DOI] [PubMed] [Google Scholar]
  • 48.Payne BK, Lundberg K. The affect misattribution procedure: Ten years of evidence on reliability, validity, and mechanisms. Soc Personal Psychol Compass. 2014;8: 672–686. 10.1111/spc3.12148 [DOI] [Google Scholar]
  • 49.Sava FA, Maricutoiu LP, Rusu S, Macsinga I, Virga D, Cheng CM, et al. An inkblot for the implicit assessment of personality: The semantic misattribution procedure. Eur J Pers. 2012;26: 613–628. 10.1002/per.1861 [DOI] [Google Scholar]
  • 50.Blaison C, Imhoff R, Hühnel I, Hess U, Banse R. The Affect Misattribution Procedure: Hot or not? Emotion. 2012;12: 403–412. 10.1037/a0026907 [DOI] [PubMed] [Google Scholar]
  • 51.Payne BK, Brown-Iannuzzi J, Burkley M, Arbuckle NL, Cooley E, Cameron CD, et al. Intention invention and the Affect Misattribution Procedure: Reply to Bar-Anan and Nosek (2012). Personal Soc Psychol Bull. 2013;39: 375–386. 10.1177/0146167212475225 [DOI] [PubMed] [Google Scholar]
  • 52.Geldhof GJ, Fenn ML, Finders JK. A self-determination perspective on self-regulation across the life span In: Wehmeyer ML, Shogren K, Little T, Lopez S, editors. Development of self-determination through the life-course. Dordrecht: Springer; 2017. pp. 221–235. [Google Scholar]
  • 53.Williams A, Steele JR, Lipman C. Assessing children’s implicit attitudes using the Affect Misattribution Procedure. J Cogn Dev. 2016;17: 505–525. 10.1080/15248372.2015.1061527 [DOI] [Google Scholar]
  • 54.Payne BK, Vuletich HA, Lundberg KB. The Bias of Crowds: How Implicit Bias Bridges Personal and Systemic Prejudice. Psychol Inq. 2017;28 10.1080/1047840X.2017.1335568 [DOI] [Google Scholar]
  • 55.Cameron CD, Brown-Iannuzzi JL, Payne BK. Sequential priming measures of implicit social cognition: A meta-analysis of associations with behavior and explicit attitudes. Personal Soc Psychol Rev. 2012;16: 330–350. 10.1177/1088868312440047 [DOI] [PubMed] [Google Scholar]
  • 56.Faul F, Erdfelder E, Buchner A, Lang AG. Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behav Res Methods. 2009;41: 1149–1160. 10.3758/BRM.41.4.1149 [DOI] [PubMed] [Google Scholar]
  • 57.Operario D, Adler NE, Williams DR. Subjective social status: Reliability and predictive utility for global health. Psychol Heal. 2004;19: 237–246. 10.1080/08870440310001638098 [DOI] [Google Scholar]
  • 58.Perszyk DR, Lei RF, Bodenhausen G V, Richeson JA, Waxman SR. Bias at the intersection of race and gender: Evidence from preschool-aged children. Dev Sci. 2019;22: 1–8. 10.1111/desc.12788 [DOI] [PubMed] [Google Scholar]
  • 59.Williams A, Steele JR. Examining children’s implicit racial attitudes using exemplar and category-based measures. Child Dev. 2019;90: e322–e338. 10.1111/cdev.12991 [DOI] [PubMed] [Google Scholar]
  • 60.Evans AB, Copping KE, Rowley SJ, Kurtz-Costes B. Academic self-concept in Black adolescents: Do race and gender stereotypes matter? Self Identity. 2011;10: 263–277. 10.1080/15298868.2010.485358 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Collie RJ, Martin AJ, Nassar N, Roberts CL. Social and emotional behavioral profiles in kindergarten: A population-based latent profile analysis of links to socio-educational characteristics and later achievement. J Educ Psychol. 2019;111: 170–187. 10.1037/edu0000262 [DOI] [Google Scholar]
  • 62.Kessels U, Steinmayr R. Macho-man in school: Toward the role of gender role self-concepts and help seeking in school performance. Learn Individ Differ. 2013;23: 234–240. 10.1016/j.lindif.2012.09.013 [DOI] [Google Scholar]
  • 63.Heyder A, Kessels U. Is school feminine? Implicit gender stereotyping of school as a predictor of academic achievement. Sex Roles A J Res. 2013;69: 605–617. 10.1007/s11199-013-0309-9 [DOI] [Google Scholar]
  • 64.Breda T, Napp C. Girls’ comparative advantage in reading can largely explain the gender gap in math-related fields. PNAS. 2019;116: 15435–15440. 10.1073/pnas.1905779116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Flore PC, Wicherts JM. Does stereotype threat influence performance of girls in stereotyped domains? A meta-analysis. J Sch Psychol. 2015;53: 25–44. 10.1016/j.jsp.2014.10.002 [DOI] [PubMed] [Google Scholar]
  • 66.Master A, Cheryan S, Meltzoff AN. Computing whether she belongs: Stereotypes undermine girls’ interest and sense of belonging in computer science. J Educ Psychol. 2016;108: 424–437. [Google Scholar]
  • 67.Coley RJ. Differences in the gender gap: Comparisons across racial/ethnic groups in education and work. In: Educational Testing Service; [Internet]. 2001. Available: https://www.ets.org/Media/Research/pdf/PICGENDER.pdf [Google Scholar]
  • 68.College Board. AP data: AP report to the nation 2014. 2019 [cited 29 Jan 2019]. Available: https://research.collegeboard.org/programs/ap/data/nation/2014
  • 69.National Science Foundation. Science and engineering doctorates. Table 15: Doctorate recipients by sex and major field of study: 2008–17. 2017. [cited 29 Jan 2019]. Available: https://ncses.nsf.gov/pubs/nsf19301/data [Google Scholar]
  • 70.Armenta BE. Stereotype boost and stereotype threat effects: The moderating role of ethnic identification. Cult Divers Ethn Minor Psychol. 2010;16: 94–98. [DOI] [PubMed] [Google Scholar]
  • 71.Cheryan S, Bodenhausen G V. When positive stereotypes threaten intellectual performance: The psychological hazards of “model minority” status. Pyschological Sci. 2000;11: 399–402. [DOI] [PubMed] [Google Scholar]
  • 72.Miller DI, Eagly AH, Linn MC. Women’s representation in science predicts national gender-science stereotypes: Evidence from 66 nations. J Educ Psychol. 2014;107: 631–644. 10.1037/edu0000005 [DOI] [Google Scholar]

Decision Letter 0

Jennifer Steele

25 Nov 2019

PONE-D-19-26304

Math and language gender stereotypes: Age and gender differences in implicit biases and explicit beliefs

PLOS ONE

Dear Mrs. Vuletich,

Thank you for submitting your manuscript to PLOS ONE. I had the benefit of receiving feedback from two experts in the field.  I have also had the opportunity to thoroughly consider your paper myself.  As you will see, both reviewers saw a great deal of merit in this work, and I certainly agree with this assessment.  Developing an implicit measure of math-gender stereotyping that can disentangle the potential influences of a competing category (reading/language, etc) is important.  However, the reviewers also raised a number of concerns. I had some related and additional questions while reading this paper and am not yet certain whether they can be adequately addressed through a revision.  As such, after careful consideration, I would like to invite you to submit a revised version of the manuscript that addresses the points raised during the review process so that I can better assess this paper's suitability for publication in PLOS ONE. 

Should you decide to embark on this revision, I will most likely send this paper back out for a second round of reviews and cannot guarantee that it will be accepted following these revisions.  However, in a field where evidence of bias and stereotyping can be more likely to be accepted for publication than evidence that there is no stereotyping, I believe that it is important for the field to be open to publishing these findings.  This is also a high-powered study examining important questions with children and I believe this has the potential to make an important contribution to this literature and to inspire new research.     

I will not reiterate the reviewers' points, but instead will note some of my own: 

  1. My main question had to do with this measure (see also Review 1, point 4).  I found that this stereotyping measure was quite cleverly designed.  It did not initially map onto my vision for this type of AMP. I therefore found myself wondering whether it has previously been validated with adults?  If so, this should be made clear in the introduction.  If not, I wonder what evidence there is that children could complete this measure successfully and that they believed the cover story.  At the risk of seeming self-serving, you might consider referencing previous research that has used the AMP with children to provide some initial evidence (see Perszyk et al., 2019; Williams et al. 2016; Williams & Steele, 2019).  However, these papers either validate the AMP or make use of a child-friendly AMP to examine racial bias.  This stereotyping measure is different in many ways.  For one, I would guess that some children might question whether a language can really have roughly 20 words for good at math, 20 words for bad at math, 20 for good at language arts and 20 for bad at language arts when our own language has no single word to describe these?  Were there any questions to assess the believability of this measure and/or did your instructions make it clear what language the symbols were from (perhaps multiple languages?)?  I agree with Reviewer 2 that the additional blocks might have influenced the effects and/or might provide additional insights. 

  2. In addition, in my own work using the AMP with children, we found that even with extensive instructions and reminders, a portion of our child participants needed to be removed either because of patterned responding (e.g., the same key or alternating keys on each trial), or because when they were questioned at the end of the measure they reported judging the primes and not the neutral stimuli (despite repeated reminders and extensive instructions/explanations that we wanted them to rate the neutral stimuli).  I noticed that you had excluded very few participants – were there any checks of this sort?  (see also Reviewer 1, minor point 1 regarding exclusions). One concern, of course, is that this is instead some of type of explicit measures, at least for some portion of your child participants.  Another concern, particularly given the use of "good" or "bad" is that this a measure of gender bias (see Baron et al. for related findings of gender bias in children using the IAT).

  3. I also found myself having questions about the analyses.  First, why is race treated as a covariate?  What happens if it is not included as a covariate?  Second, I wondered why both math and arts were not included in the same model.  This, of course, would make for a more complex design (a 2x3x2x2 design), but it seems more appropriate.  In fact, part of me wondered whether separating one of the other between subjects variables would make more sense than separating out this within-subjects measure in order to simplify the design.  For example, looking at each age group separately or each gender group separately.  Ideally, of course, this would have all been decided a priori, but looking at your data it would seem that one might draw different conclusions depending on how these data are analyzed.

  4. One additional concern, that is noted by both reviewers (point 1 for each), is that I am not quite sure what to make the results.  If the measures are sound, then these results contribute to what are already mixed findings in childhood regarding the development of a math-gender stereotypes.  I would like to see their comments addressed and specific points in the discussion toned down.  For example, I do not agree that, based on your data, the results “are consistent with the idea that implicit linkages between girls and language domains were driving those earlier findings.”       

  5. Related to my previous point, I felt that the introduction was not as comprehensive as it might have been.  There are papers that provide evidence that the math-gender IAT predicts math-related outcomes.  These are not consistent with the notion that these biases are driven by a language-girls association.  This should be acknowledged.  I also found the comment in the discussion that “very few studies have examined both implicit biases and explicit beliefs within the same sample” to be inaccurate, as most do.  What they might not do, is report the correlation between the measures

I hope that you will find these and the reviewers' comments to be helpful as you look to revise your manuscript.  Should you decide to resubmit, we would appreciate receiving your revised manuscript by Jan 09 2020 11:59PM. As this is right after the holidays, if you feel that you require more time, please feel free to request it. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Jennifer Steele

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide the full name of the Institutional Review Board that approved your study.

3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This study examined the development of implicit and explicit math-gender stereotypes in 8 to 15 year olds. Results indicate implicitly girl participants favour girls (girls better at math and language vs boys) whereas there is no evidence of preference for boy participants. Explicitly, elementary girls rated girls more highly and elementary boys rated boys more highly in math ability, ratings decreased with age. For explicit language ability, elementary girls (but not boys), middle school boys and girls, and highschool boys and girls rated girls targets as having higher ability than boy targets.

Overall, this study adds to the emerging body of literature focused on the expression of math-gender stereotypes across childhood. On the whole, this literature presents conflicting pattern of results and I don’t believe the current study adds much clarity here, particularly with the explanation of the explicit results. However, this study does make a novel contribution in methodology as the AMP (not the IAT) was used to assess implicit stereotypes. This allows for math and language stereotypes to be examined separately; although gender is still confounded (i.e., stereotypes for girl targets vs boy targets are compared in the two domains). I have raised points below that might be useful to address

1. In the introduction, the argument that social identify theory and ingroup gender preferences drive children’s responses fits for published literature on girl participants (i.e., girls rated as better at academics than boys) but not boys (who also seem to rate girls as better at academics than boys). Is this the best theory to use to explain the whole pattern of results? The authors do a more comprehensive job in the discussion at explaining why the pattern of results might reflect ingroup preference for boys as well. Perhaps this should be worked into the intro as well?

2. A summary of the overall pattern of results for literature on explicit and implicit stereotypes would be helpful. Overall, I get the sense that the results are not consistent. Do the results of the current study aim to clarify the field in any way?

3. I think the paper would benefit from greater clarity regarding for whom the stereotype applies. Research (e.g., Steele, 2003) suggests that children are more likely to apply math-gender stereotypes to adult (but not child) targets. It seems that the literature reviewed is focused on child targets (perhaps explaining the discrepancies in results from different studies). If cultural stereotypes are more readily applied to adults, this has implications for the hypotheses specified page 10 (line 212) as child targets were included in the AMP, and for conclusions regarding the explicit measure.

4. Has the AMP been used to measure stereotypes previously? My concern (especially with the younger children) is that the valance of the response categories trumped the stereotype component. Can this be disentatngled?

5. The implicit and explicit results have a similar pattern in that girls are deemed better in language than boys. Why is this said to reflect “cultural knowledge” for the explicit results but “academic success” for the implicit results? Is it possible that implicit and explicit measures reflect the same underlying constructs, but differences in measurement variability prevent strong correlations from emerging?

On a related point, Page 9, line 186. “Children, however, may not yet have learned the cultural stereotypes and so may vary in awareness”. Or it could be that children have learned the stereotype, but have not yet internalized it to the point it can be automatically activated by attitude object. What are the implications of these possibilities for the hypotheses and results?

Minor Points

1. 2.2% of sample of Asian heritage. Did these participants have any familiarity with Chinese symbols. If so, should they be removed?

2. What were the correlations for the four prime-gender scores? (page 20)

3. Tone down language around conclusions. For example, p 21 line 328; p 23, line 345. Other studies (using the IAT) have demonstrated age-related differences in implicit biases.

Reviewer #2: This article explored the development of gender differences in stereotypes about math and language. Children were administered an implicit and explicit gender stereotype measure across age groups. The manuscript aims to tackle an important issue - distinguishing (in measurement) stereotypes about math from stereotypes about language which, with the exception of a few studies, has not been investigated much.

The study reports that girls have an implicit own group gender bias (thinking own gender is better at math and language) whereas boys do not have an implicit gender bias for either domain.

There are several areas that I think could benefit from revision.

1. It would help if the authors could make greater sense of the implicit data from boys. That is, these findings seem to contradict past published work where boys show an implicit gender stereotype. Is there a way to compare the strength of the egalitarian associations with math and language to see if the effect is stronger in one direction? There are now a number of papers by Cvencek, as well as those who have done stereotype threat work (Tomasetto, Steele etc) arguing that in some way shape or form boys have a gender stereotype in this domain. Is there something unique about how the IAT measures bias that might make it a more suitable measure in this case? Of course, the data are what they are but I think much more attention should be given to this contradictory finding both in terms of possible methodological explanations as well conceptual. Related, can the authors report more info on average latencies with the AMP? It might help to understand how implicit these responses likely were.

2. I didn't follow the arguments the authors made about how the data on the implicit/explicit measures directly speaks to the sources of these stereotypes. That is, to say that if implicit bias is more influenced by cultural messages about stereotypes then they should increase with age doesn't make clear sense to me. And, by contrast, classroom cue sensitivity would lead to no age differences (lines 210-214). First, it's odd that there wasn't a direct measure of sensitivity to cultural stereotypes or some quantification of classroom cues. It was assumed that patterns of bias uniquely are constrained by these cues when there are a multitude of factors that also uniquely shape bias (e.g., surely they interact). Further, it is not clear that being influenced by cultural messages about stereotypes should mean that the bias increases with age (there isn't strong evidence that implicit bias reflects a cumulative learning model whereby the magnitude of the bias increases with frequency of exposure), there could be sensitive periods for learning biases, etc. Similarly, what's the evidence the present classroom made available diagnostic cues to performance/ability?

3. The authors noted that other school domains were studied but not the focus of the present manuscript. Rarely do I think these additional study data are informative but for the present manuscript I especially think it's informative because it speaks to broader arguments the paper seems to be very focused on - own group bias, internalization/awareness of cultural stereotypes and classroom cues. Do any of the other data not reported shed light on children's more general sensitivities here?

4. I think it would be helpful if the authors could include more discussion on the growing stereotype threat literature in this domain as it seems to be particularly informative for our thinking and predictions about the development of gender differences in these academic stereotypes. And, in some cases, may even present contradictory findings that require some explanation.

5. The authors setup two primary views about measuring bias - importance and limitations for studying both explicit and then implicit bias. This made sense. I got lost a bit when indirect measures were then discussed because, conceptually, I didn't understand where the authors saw indirect measures fitting in the literature - is it a level of analysis like implicit/explicit or is it kinda orthogonal to the implicit/explicit distinction and more of a way to measure things explicitly while reducing some the potential demand characteristics that can plague explicit measures?

6. Lastly, can the authors highlight/note the analyses they were likely underpowered for given then power analysis they did as the effect sizes reported in a number of cases seemed to be below the threshold they set for the study.

I am excited and inspired by this work as it is important theoretically and methodologically to be examining this issues. Thanks!

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Sep 8;15(9):e0238230. doi: 10.1371/journal.pone.0238230.r002

Author response to Decision Letter 0


31 Mar 2020

Editor’s Comment #1: My main question had to do with this measure (see also Review 1, point 4). I found that this stereotyping measure was quite cleverly designed. It did not initially map onto my vision for this type of AMP. I therefore found myself wondering whether it has previously been validated with adults? If so, this should be made clear in the introduction. If not, I wonder what evidence there is that children could complete this measure successfully and that they believed the cover story. At the risk of seeming self-serving, you might consider referencing previous research that has used the AMP with children to provide some initial evidence (see Perszyk et al., 2019; Williams et al. 2016; Williams & Steele, 2019). However, these papers either validate the AMP or make use of a child-friendly AMP to examine racial bias. This stereotyping measure is different in many ways. For one, I would guess that some children might question whether a language can really have roughly 20 words for good at math, 20 words for bad at math, 20 for good at language arts and 20 for bad at language arts when our own language has no single word to describe these? Were there any questions to assess the believability of this measure and/or did your instructions make it clear what language the symbols were from (perhaps multiple languages?)? I agree with Reviewer 2 that the additional blocks might have influenced the effects and/or might provide additional insights.

Response: This version of the AMP, focusing specifically on academic stereotypes, has not been validated in adults or children, though we did conduct a pilot study to determine the most appropriate presentation time for each stimulus. We have clarified the point about validity in the introduction and also added the suggested references discussing previous use of the AMP with children (bottom of pg. 13). Thank you for this suggestion.

We did not assess the believability of the measure, in part because research with adults has shown that people have difficulty introspecting on how they judged the stimuli and tend to confabulate reasons (Payne et al., 2013). Moreover, the instructions were ambiguous as to whether the symbols actually meant “good at math” or “bad at math” (same for language arts). Our instructions stated that we were interested in how people make guesses about the meaning of words, and that they should guess that about half the symbols meant “good at math” and about half meant “bad at math.” We said that they should use their “feelings” to make a guess. We are now more explicit about the wording of these instructions in the Method section (pg. 19).

Regardless of children’s interpretation or suspicion of the instructions, the systematic differences that we found between boys’ and girls’ ratings of different gendered primes were inconsistent with their controlled responses as measured by the explicit measure. Yet, these systematic differences suggest that children did not respond completely at random either. One possibility, as raised by the editor and one of the reviewers, is that children ignored the academic subject and simply responded with their gender biases of girls versus boys as “good” or “bad.” To test this possibility, we ran additional analyses examining children’s sports implicit biases. As with math and language implicit biases, we found a significant gender x prime gender interaction. However, the pattern of results was different compared to the academic biases. For sports, boys showed an implicit bias favoring boys in sports, whereas girls did not show a gendered bias. As a reminder, for math and language, girls had shown an implicit bias favoring girls and boys had not shown a bias favoring either gender. These results suggest that the AMP was measuring domain-specific implicit biases rather than general gender biases. These results are now included in the supplemental information (and briefly mentioned in the main manuscript, bottom of pg. 24).

Editor’s Comment #2: In addition, in my own work using the AMP with children, we found that even with extensive instructions and reminders, a portion of our child participants needed to be removed either because of patterned responding (e.g., the same key or alternating keys on each trial), or because when they were questioned at the end of the measure they reported judging the primes and not the neutral stimuli (despite repeated reminders and extensive instructions/explanations that we wanted them to rate the neutral stimuli). I noticed that you had excluded very few participants – were there any checks of this sort? (see also Reviewer 1, minor point 1 regarding exclusions). One concern, of course, is that this is instead some of type of explicit measures, at least for some portion of your child participants. Another concern, particularly given the use of "good" or "bad" is that this a measure of gender bias (see Baron et al. for related findings of gender bias in children using the IAT).

Response: We have now included a statement clarifying that none of our participants met our exclusion criterion, which was pressing the same key on all trials (pg. 20). As for pressing alternating keys on each trial, we are not able to identify this type of patterned responding because the stimuli were randomly ordered for each participant and during each block. The software that we used did not record the order of stimuli presentation for each participant.

Please see our response to comment #1 above regarding the possibility that the AMP, in this context, was a type of explicit measure. First, children’s math and language implicit biases were not correlated with their explicit reports. Second, their implicit biases in sports showed a different pattern of bias by gender group, suggesting that the measure was sensitive to domain. Third, the implicit sport biases of children showed a small, but significant, correlation with their explicit beliefs about sports competence (r = .15). These results suggest that this version of the AMP was measuring systematic differences in automatic associations to academic subjects based on gender categories, and that for less socially sensitive stereotypes (like those about sports) there is a small correlation between implicit and explicit measures. We now include these robustness checks in the supplemental information. Thank you for raising this important point and encouraging us to run additional analyses.

We also included a new section in the introduction dedicated to discussing the AMP in detail (pg. 12).

Editor’s Comment #3: I also found myself having questions about the analyses. First, why is race treated as a covariate? What happens if it is not included as a covariate? Second, I wondered why both math and arts were not included in the same model. This, of course, would make for a more complex design (a 2x3x2x2 design), but it seems more appropriate. In fact, part of me wondered whether separating one of the other between subjects variables would make more sense than separating out this within-subjects measure in order to simplify the design. For example, looking at each age group separately or each gender group separately. Ideally, of course, this would have all been decided a priori, but looking at your data it would seem that one might draw different conclusions depending on how these data are analyzed.

Response: We are happy to clarify these points. Race was treated as a covariate because academic stereotypes can sometimes differ by race categories. Unfortunately, we did not have adequate numbers of youth in each racial group to test race differences, so we controlled for race instead. The results remain the same when race is removed from the analyses, except that the explicit three-way academic subject x target gender x grade interaction was reduced to a significant academic subject x prime gender interaction where children across all three grades (not just elementary school) favored girls in math. We now leave out race as a covariate.

We initially chose to analyze academic subjects in separate models to simplify the analyses, avoiding the possibility of a four-way interaction. However, we appreciate the editor’s point that it is more appropriate to keep this within-subject variable in one model. We now report our results using a 2x3x2x2 design. The findings remain the same.

Editor’s Comment #4: One additional concern, that is noted by both reviewers (point 1 for each), is that I am not quite sure what to make the results. If the measures are sound, then these results contribute to what are already mixed findings in childhood regarding the development of a math-gender stereotypes. I would like to see their comments addressed and specific points in the discussion toned down. For example, I do not agree that, based on your data, the results “are consistent with the idea that implicit linkages between girls and language domains were driving those earlier findings.”

Response: Thank you for raising this concern. We have now clarified our findings’ theoretical contributions. First, we clarify that the vast majority of previous studies examining math and language implicit biases in children or adolescents have used the IAT (pg. 8). These studies show a clear pattern of results among girls of all age-groups: girls show a male-math bias and this bias may predict math-related outcomes such as enrollment intentions for math courses and math self-concepts. However, because of the nature of paired categories in the IAT, these results could be due to girls having a strong association of math with boys, a strong association of girls with language, or both. We found only two studies that used implicit measures that did not confound math and language associations. In one, girls demonstrated an own-gender bias across all grades (4-6). In the other, 9th grade girls did not show a bias for either gender. In other words, the math-male bias among girls may only appear when math and language gender associations are confounded. The implication is that girls may not, in fact, have a negative association of their own gender to math. Rather, when math and language are directly contrasted, they might preferentially associate their own gender with language. Our study is important because it adds evidence that girls have an in-group math bias as well as an in-group language bias (summary at the end of pg. 11). Our findings highlight the need for more research that uses implicit measures that do not confound math and language biases to understand if and for what groups these findings replicate.

Editor’s Comment #5: Related to my previous point, I felt that the introduction was not as comprehensive as it might have been. There are papers that provide evidence that the math-gender IAT predicts math-related outcomes. These are not consistent with the notion that these biases are driven by a language-girls association. This should be acknowledged. I also found the comment in the discussion that “very few studies have examined both implicit biases and explicit beliefs within the same sample” to be inaccurate, as most do. What they might not do, is report the correlation between the measures.

Response: We have now expanded the introduction to describe previous explicit and implicit findings in more detail, and in particular, findings that link math-gender IAT to math outcomes (bottom of pg. 9). Likewise, in the general discussion, we expand upon why we believe that strong language-girls associations (as compared to math-girls associations), might still predict academic outcomes in math-related domains. In page 33, we added:

Though, at first glance, our results might not appear consistent with gender disparities in STEM careers, they are revealing in that they support recent theoretical frameworks suggesting that girls opt out of math, not due to perceived deficit in math ability compared to boys, but due to perceived strength in language ability over math ability. For example, a large international study of 15-year-old students found that girls’ comparative advantage in reading as opposed to math can largely explain gender disparities in intentions to pursue math-related careers (67). In that study, girls who were found to be good at math were more likely than boys to be even better at reading than at math. The gap between math and reading performance accounted completely for gender differences in math self-concept, interest in math, and attitudes towards math. Other studies have also found that intra-individual contrasts of math and language abilities predict STEM disparities. In a longitudinal study of twelfth grade students, those with high ability in both math and language (more girls than boys) were less likely to pursue STEM careers than those with high ability in math and moderate ability in language (6). Although cultural stereotypes can still be detrimental to girls insofar as they elicit stereotype-threat effects (68) or signal lack of belonging (69), our results imply that girls hold positive associations about their gender group across both math and language ability, consistent with models that depict girls as having more choices in their pursuits, rather than being bound by real or perceived ability constraints. Research that distinguishes between math and language implicit beliefs, then, is important because it can lead to different conclusions about the type of interventions that might be effective for reducing STEM disparities.

We have also deleted our comment about other studies not measuring implicit biases and explicit beliefs within the same sample.

Reviewer 1

Reviewer 1, Comment #1: In the introduction, the argument that social identify theory and ingroup gender preferences drive children’s responses fits for published literature on girl participants (i.e., girls rated as better at academics than boys) but not boys (who also seem to rate girls as better at academics than boys). Is this the best theory to use to explain the whole pattern of results? The authors do a more comprehensive job in the discussion at explaining why the pattern of results might reflect ingroup preference for boys as well. Perhaps this should be worked into the intro as well?

Response: We have expanded and restructured much of the introduction to make the pattern of results (or lack thereof) for past studies clearer. We also revised the way we discuss specific theories for explaining children’s responses on explicit measures. More specifically, we acknowledge different theoretical perspectives (including social identity theory), but point out that published research on this topic has yielded inconsistent findings (pg.6-7). Our view is that multiple factors likely influence explicit reports due to participants’ ability to control their responses. This challenge with explicit measures highlights the advantage of using implicit bias measures.

Reviewer 1, Comment #2: A summary of the overall pattern of results for literature on explicit and implicit stereotypes would be helpful. Overall, I get the sense that the results are not consistent. Do the results of the current study aim to clarify the field in any way?

Response: Thank you for this suggestion. We have included summary statements for the explicit (pg. 7, line 138) and implicit findings (pg. 11, line 238). For explicit findings, there is no overall pattern. We clarify this point and suggest reasons for why this might be true. For implicit findings, the pattern is very consistent for studies using IAT measures; youth (girls in particular) show a stereotypical math-male bias and language-female bias. Studies that did not rely on IAT, but used measures that disambiguate math/language biases, did not find math-male biases among girls. Our study aims to reconcile these existing, conflicting findings by providing evidence that measurement differences could account for the seemingly inconsistent findings in the implicit bias literature (specific to math/language biases). We find that girls show a counter-stereotypical math-female bias. These findings are important for our understanding of how automatic associations about math and language ability influence motivation and career trajectories.

Reviewer 1, Comment #3: I think the paper would benefit from greater clarity regarding for whom the stereotype applies. Research (e.g., Steele, 2003) suggests that children are more likely to apply math-gender stereotypes to adult (but not child) targets. It seems that the literature reviewed is focused on child targets (perhaps explaining the discrepancies in results from different studies). If cultural stereotypes are more readily applied to adults, this has implications for the hypotheses specified page 10 (line 212) as child targets were included in the AMP, and for conclusions regarding the explicit measure.

Response: Thank you for pointing out this important distinction (i.e., to what age group the stereotype applies). We chose to focus on child targets for two reasons. First, as you pointed out, much of the prior literature examining youths’ academic gender stereotypes has used child targets. Second, youths’ views of the competence of social group members—in this case, their perceptions of the abilities of boys and girls who are roughly their age—are known to be related to their perceptions of their own abilities, their domain-specific interests, and other motivational variables. We have added this point as a study limitation in the Discussion as excerpted below (pg. 35):

A third limitation of the study is that we assessed students’ implicit associations and explicit beliefs regarding gender differences in the abilities of youth targets, but not adults. Some studies have shown that children apply cultural academic stereotypes to adults more readily than they do to children (e.g., 33). Although the explicit reports of adolescents in our sample favored girls in math, youth may have favored men over women had we used adult targets. Such results would reflect gender differences in career choices that favor men in STEM domains and would also be consistent with stereotype threat effects that show performance decrements for women when gender identity is made salient in test situations.

Reviewer 1, Comment #4: Has the AMP been used to measure stereotypes previously? My concern (especially with the younger children) is that the valance of the response categories trumped the stereotype component. Can this be disentangled?

Response: This is an excellent point. We now include supplemental analyses showing that boys favor boys in sports, rather than girls, whereas girls show no bias. These results suggest that the AMP was testing domain-specific gender associations. Please see our response to Editor’s Comment #1.

Reviewer 1, Comment #5: The implicit and explicit results have a similar pattern in that girls are deemed better in language than boys. Why is this said to reflect “cultural knowledge” for the explicit results but “academic success” for the implicit results? Is it possible that implicit and explicit measures reflect the same underlying constructs, but differences in measurement variability prevent strong correlations from emerging?

On a related point, Page 9, line 186. “Children, however, may not yet have learned the cultural stereotypes and so may vary in awareness”. Or it could be that children have learned the stereotype, but have not yet internalized it to the point it can be automatically activated by attitude object. What are the implications of these possibilities for the hypotheses and results?

Response: We agree with the reviewer that our results do not speak to the specific constructs underlying the observed biases. We changed the language in our discussion to reflect their speculative nature. We also added a sentence in the Discussion regarding the possibility that measurement might have led to the lack of correlation between implicit and explicit stereotypes as excerpted below:

It is also possible that the lack of correspondence in our study might be linked to our choice of measures, and that a different methodological approach might yield significant relations between children’s implicit associations and their explicit reports (p. 34, line 674).

Reviewer 1, Comment #6: 2.2% of sample of Asian heritage. Did these participants have any familiarity with Chinese symbols. If so, should they be removed?

Response: Only six participants reported their race as Asian-American. If we exclude these participants, the results remain exactly the same; therefore, we retained their observations.

Reviewer 1, Comment #7: What were the correlations for the four prime-gender scores? (page 20)

Response: We now report these correlations in Table 7.

Reviewer 1, Comment #8: Tone down language around conclusions. For example, p 21 line 328; p 23, line 345. Other studies (using the IAT) have demonstrated age-related differences in implicit biases.

Response: We have made these suggested changes.

Reviewer 2

Reviewer 2, Comment #1: It would help if the authors could make greater sense of the implicit data from boys. That is, these findings seem to contradict past published work where boys show an implicit gender stereotype. Is there a way to compare the strength of the egalitarian associations with math and language to see if the effect is stronger in one direction? There are now a number of papers by Cvencek, as well as those who have done stereotype threat work (Tomasetto, Steele etc) arguing that in some way shape or form boys have a gender stereotype in this domain. Is there something unique about how the IAT measures bias that might make it a more suitable measure in this case? Of course, the data are what they are but I think much more attention should be given to this contradictory finding both in terms of possible methodological explanations as well conceptual. Related, can the authors report more info on average latencies with the AMP? It might help to understand how implicit these responses likely were.

Response: We now clarify that several past studies have also failed to find stereotypic implicit biases in boys. In fact, the findings among boys are much more inconsistent than those among girls. That said, we have also incorporated the references suggested above which document gender stereotypes among boys and have worked to clarify that we are unsure about why boys and girls show dissociative implicit biases (pg. 12):

We were agnostic about the implicit biases of boys, as previous findings have been inconsistent and do not clearly favor one theoretical account over another. The dissociative processes by which girls and boys form automatic associations is in itself interesting, but not the subject of this report.

We still maintain the following:

The view of academic success as a feminine trait may have led girls in the present study to show implicit biases favoring girls in both domains, whereas for boys, those views may have been tempered by a tendency to show in-group preference, resulting in their egalitarian scores on the task. (bottom of pg. 32)

As for the associations being stronger in the direction of math or language, we can now speak to that question. We changed our analyses such that math and language are included in the same model (a 2x3x2x2 design). The three-way interaction (i.e., academic subject x prime gender x gender) that might reveal differences in the relative strength of math versus language associations among boys was not significant. Finally, we cannot report information on average latencies because the AMP does not measure reaction times. Rather, it measures the proportion of targets that the participants rate as “good at” or “bad at” math/language when preceded by a photo of either a boy or a girl.

Reviewer 2, Comment #2: I didn't follow the arguments the authors made about how the data on the implicit/explicit measures directly speaks to the sources of these stereotypes. That is, to say that if implicit bias is more influenced by cultural messages about stereotypes then they should increase with age doesn't make clear sense to me. And, by contrast, classroom cue sensitivity would lead to no age differences (lines 210-214). First, it's odd that there wasn't a direct measure of sensitivity to cultural stereotypes or some quantification of classroom cues. It was assumed that patterns of bias uniquely are constrained by these cues when there are a multitude of factors that also uniquely shape bias (e.g., surely they interact). Further, it is not clear that being influenced by cultural messages about stereotypes should mean that the bias increases with age (there isn't strong evidence that implicit bias reflects a cumulative learning model whereby the magnitude of the bias increases with frequency of exposure), there could be sensitive periods for learning biases, etc. Similarly, what's the evidence the present classroom made available diagnostic cues to performance/ability?

Response: We agree with the reviewer that our results do not speak to the specific sources underlying the observed biases. We changed the language in our introduction and discussion to reflect the fact that our data cannot speak to mechanisms. We are also now explicit in the introduction about the fact that multiple factors can influence beliefs and biases. Thank you for raising this point.

Reviewer 2, Comment #3: The authors noted that other school domains were studied but not the focus of the present manuscript. Rarely do I think these additional study data are informative but for the present manuscript I especially think it's informative because it speaks to broader arguments the paper seems to be very focused on - own group bias, internalization/awareness of cultural stereotypes and classroom cues. Do any of the other data not reported shed light on children's more general sensitivities here?

Response: Thank you for this suggestion. We now include supplemental analyses showing that boys and girls have different implicit biases in sports compared to math and language, suggesting that our implicit measure was sensitive to both gender and academic domain. Please see our response to Editor’s Comment #1.

Reviewer 2, Comment #4: I think it would be helpful if the authors could include more discussion on the growing stereotype threat literature in this domain as it seems to be particularly informative for our thinking and predictions about the development of gender differences in these academic stereotypes. And, in some cases, may even present contradictory findings that require some explanation.

Response: This is an excellent point. Because stereotype threat refers to performance decrements in a test situation when membership in a negatively-stereotyped group is activated, results of the current study are not directly related to stereotype threat. Nonetheless, we added that point in the Discussion as excerpted below:

Although cultural stereotypes can still be detrimental to girls insofar as they elicit stereotype-threat effects (68) or signal lack of belonging (69), our results imply that girls hold positive associations about their gender group across both math and language ability, consistent with models that depict girls as having more choices in their pursuits, rather than being bound by real or perceived ability constraints (pg. 33, line 255).

Reviewer 2, Comment #5: The authors setup two primary views about measuring bias - importance and limitations for studying both explicit and then implicit bias. This made sense. I got lost a bit when indirect measures were then discussed because, conceptually, I didn't understand where the authors saw indirect measures fitting in the literature - is it a level of analysis like implicit/explicit or is it kinda orthogonal to the implicit/explicit distinction and more of a way to measure things explicitly while reducing some the potential demand characteristics that can plague explicit measures?

Response: Thank you for raising this point. We agree that results from indirect measures are not relevant to the current study and have removed reference to these studies.

Reviewer 2, Comment #6: Lastly, can the authors highlight/note the analyses they were likely underpowered for given then power analysis they did as the effect sizes reported in a number of cases seemed to be below the threshold they set for the study.

Response: We have noted in the Results section the effect that fell below the threshold calculated from our sensitivity analysis. Excerpted from pg 21:

The Age Group x Academic Subject x Target Gender interaction was also significant, F(2, 258) = 3.51, p = .031, η2 = .03. Children favored girls in language across all three age groups. In contrast, in the case of math, only children in elementary school showed a bias, favoring girls over boys in math. Youth in middle school and high school did not show a gender bias in explicit reports of math ability. These results should be interpreted with caution, though, as the effect size is below the threshold calculated by our sensitivity analysis. The four-way interaction was not significant, perhaps due to low power.

Attachment

Submitted filename: Response to reviewers.pdf

Decision Letter 1

Jennifer Steele

3 Jun 2020

PONE-D-19-26304R1

Math and language gender stereotypes: Age and gender differences in implicit biases and explicit beliefs

PLOS ONE

Dear Dr. Vuletich,

Thank you for submitting this revised manuscript to PLOS ONE. I have had the opportunity to read through your re-submission of this manuscript and have again had the benefit of receiving feedback from the two original reviewers. As you will see in their reviews, as well as my own comments below, we continue to feel that this manuscript has a great deal of promise. I believe that there is tremendous benefit to using a range of measures to gain a deeper understanding of the early developmental of implicit academic stereotypes. I also appreciate the additional information that you have provided in the supporting information document, which strengthens your interpretation of the data. 

You will also see that the reviewers and I also continue to raise some concerns. I believe that these can be addressed in a revision, and therefore would like to invite you to submit a revised version of the manuscript that addresses the points raised below. I cannot guarantee that this revised version will be accepted for publication, but I do not plan to send this back out for another round of reviews prior to making my decision.

I would encourage you to work to address each of the reviewer’s comments, with a focus on additional limitations that will need to be noted in the discussion section. In particular, Reviewer 1 raised two main points that will need to be adequately addressed in the discussion. That is, given the different nature of this particular measure, you cannot conclude that the implicit language-stereotyping effect for girls is driving the math-gender stereotyping on the IAT (more on that below). Reviewer 2 raises a number of important points, many of which should at the very least be addressed in the discussion. In particular, the relative lack of exclusion criteria should be discussed relative to other papers that make use of implicit measures with child participants, with a focus on what might have been done to ensure that your effects are not simply the result of a great deal of noisy participants in the data (more on that from me below as well).

In addition, my own comments include the following:

  1. The inclusion of the implicit sports biases measure was a great addition. But this led me to wonder what other measures were included in this study. I would request that in the current climate of replicability, these be outlined in the supporting information. Is it possible that some effects did not emerge because children were fatigued by the time they were completing the key measures? Could these effects have been influenced by other measures that they completed prior to the main measures of interest?

  2. I was unclear what the Mdiff score was for the implicit sports bias in the supporting information.  More information on that measure is needed. I would also recommend that this get integrated into the main manuscript as I believe that the measure has the potential to speak against a very clear alternative explanation. However, please see Reviewer 1 as the limitation to this possibility needs to be fully addressed in the discussion.

  3. Another way that the positivity bias could be addressed is by examining responses to your different stimuli. I wonder, if you separate the Black and White targets, do you see a race bias? That is, were participants more likely to select “good at” when presented with a White target prime as opposed to a Black target prime? Is this particularly true for the Black male primes and the non-Black participants? (see Perszyk et al., 2019). This might be worth exploring and might provide additional insight that could be included and considered in the supplement.

  4. The paragraph that starts on line 276 is meant to address my concern that many of your participants might have rated the primes instead of the neutral stimuli. However, I did not find this to really address the concern.  As noted by Reviewer 2 and by me above, the possibility exists that the lack of stereotyping is due to a small effect combined with noise in your child data due to a relative lack of exclusion criteria. Please see the other AMP papers with children (Williams et al., Williams & Steele, Perszyk et al.) for the criteria that were used. If you can apply these, wonderful – if not, this limitation needs to be more fully addressed in the discussion. I will also mention that I found the exclusion criteria description on lines 447-448 to be awkward and would recommend rewording.

  5. I am conflicted about the analyses that are presented and would suggest that you give some additional thought to them.  In the absence of the four-way interaction, you ultimately do not directly test your hypotheses that, for example, “the youngest age group…(will) favor their own gender in math”, etc. I know that one school of thought is that these direct comparisons should not be made in the absence of the interaction effects. On the other hand, given the complexity of your design and your specific hypotheses, I wondered whether including more direct tests would be helpful in making full sense of these data. I also wondered whether the conclusions that are drawn on lines 584-585 completely align with the analyses that you conduct. I do think that providing access to the data – which I don’t think were submitted, but you note the intention to link to these – might solve this issue. But I would be open to you including some additional analyses in the supplemental materials or main text.   

Some additional suggestions include:

  1. I found ‘road map’ statements to be unnecessary and would recommend that sentences like the one on line 260 be removed.

  2. The reliability of this measure is easily calculated and could be included with the measure. Please include it and remove the justification (on page 14) for not including this.

  3. What is the theoretical justification for the age grouping that was used?  Please provide some justification.

  4. I would report on the implicit measure first in the measures and results and then would report on the explicit measure for consistency.

  5. Line 456 – Prime Gender should be Target Gender.  Additional interactions could be fully reported in the supplement.

  6. Table 7 – it would be interesting to see this for boys and girls separately as well.

Overall, I think that there are some real strengths to this manuscript, and I believe that it has the potential to make an important contribution to the field.  I hope that you will decide to address each of these concerns and resubmit the paper for additional consideration.

Please submit your revised manuscript by Jul 18 2020 11:59PM. This is the revision date set by the journal, however, if you will need more time than this to complete your revisions this is not a problem. Please reply to this message or contact the journal office at plosone@plos.org. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

I hope that you and your co-authors are staying well at this strange and challenging time. I will look forward to receiving your revised manuscript and will aim to render a decision as quickly as possible after it is received.

Warmly,

Jennifer Steele

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I believe that this revised manuscript is stronger than the original submission. The comments I raised in my review have been adequately addressed to the extent that the data allows for this. There continues to be many strengths to this paper (i.e., Social ID theory, sample, methods, robust analyses, etc). Most importantly, I agree with the authors that the field would benefit from research that uses diverse implicit measures. The current paper meets that objective without doubt. However, I continue to have two conceptual concerns that may prevent this paper from being publishable in PLOS One.

1. The authors argue that implicit language associations could be driving the IAT math-gender stereotype findings. I agree. My issue is that there is no data to support this claim (pages 32-33, line 612-618). Instead what we see is a more general implicit girls = good pattern of results. I note that there could still be a contribution to make here in that social id theory can be used to explain this pattern of results, but I am uncertain whether this is novel enough for publication in PLOS One.

2. This girls = good issue was raised in the original reviews. To address this the authors examined responses on a Sport-AMP and found that for male participants boys were more positively associated with sport, girls demonstrated no bias (mirroring the academic-related AMPs where boys showed no bias). In my opinion, this analysis does not adequately address the issue. What we may be seeing here is a broader "girls = good at school / boys = good at sports" stereotype that would be consistent with input via cultural exposure. Again, social identity theory could be used for a framework to interpret this pattern of results.

Reviewer #2: I commend the effort to address so many of the reviewer comments in a thoughtful, clear and concise way. There are some issues that I do think are still quite important to tend to as it bears directly on the framing and claims.

The primary focus is to advance our understanding of implicit and explicit gender stereotypes. As such, we want to have some reasonable comfort that the measures are indeed capturing something implicit (and explicit). How can one tell if it's implicit? As we know, there are a variety of different ways to address this, some better than others. But I'm not sure what can be said here as this procedure hasn't been established with children. Not a direct line into what's implicit, an earlier reviewer comment re latency data would be quite useful - particularly if latencies were quite slow. I understand from the response letter that AMPs calculate proportions of response types. Is it the case that the software used really doesn't capture latency data for each trial? I understand the AMP analysis doesn’t incorporate these data but my question is asking whether the software itself has such data. What software was used? Most programs I know capture these data. Assuming this isn’t available, what then can we point to as evidence that this procedure with children has been shown to capture implicit (as opposed to explicit) bias?

How come the presentation stimuli times differed from the one AMP study with children to date?

I remain concerned about not checking for whether participants were familiar with the Chinese characters. Imagine, for example, this were taking place today with the rising amounts of overt racism toward China. I could imagine familiarity with the characters could present two issues. 1. Prime negative affect itself. 2. Lead participants to doubt that the characters actually stand for words meaning good/bad at x,y,z if they have a mutual exclusivity hypothesis about Chine language (one character per concept, similar to word learning bias in English and other Western languages). Perhaps this believability doesn't matter?

As well, I’m still confused by how we can reasonably conclude this is not an attitude measure toward the primed stimuli (vs stereotypes). Is the study powered for the Sports AMP that was used to demonstrate there isn’t just a positivity bias? Is there a correlation between the two AMPS (presumably there would not be if it were not measuring a general gender good/bad bias)? Was there an order effect with the different AMPs conducted? Is there a domain difference for boys/girls (on ave or by gender) to further help us to see if they’re indeed capturing different constructs?

I’m puzzled by the very lax exclusion criteria for the AMP. With IAT, for example, exclusion criteria is around 20% errors or greater. Alternating key presses would be missed and this is not uncommon for children. Before I can really make sense of these data I’d want to see much clearer reporting of proportion of trials with one key press, what the range is, SD, etc for age groups. What software was used to run this program?

Do we think the mixed results for explicit gender stereotypes reported in the intro is conceptual or methodological? That is, does it reflect variability due to differences in personal and or cultural stereotypes represented by the child or due to methodological differences employed or something else? This would be helpful to discuss perhaps somewhere (but doesn't haven't to be solved here).

The authors note “We expected girls would show traditional math-male biases if

they have assimilated cultural stereotypes that favor boys in math. In contrast, girls would favor

girls in math if pervasive differences in academic performance are the primary factor shaping

automatic associations about gender and math ability. “ Is it possible they could hold both stereotypes and randomly (or non-randomly as with some kind of prime) exhibit one of these stereotypes in the moment? Some children were tested in a local library. Were these more female than male? Were stereotype assessments different here (potentially because of its linkage to language- reading). Given sensitivity on explicit measures to social desirability, was there an effect of experimenter gender on the explicit measure for older children?

I was to reiterate how important I think it is to measure stereotypes about math/reading/language separately as this confound is really apparent when thinking about the existing findings in this domain with the IAT.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Amanda Williams

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 2

Jennifer Steele

13 Aug 2020

Math and language gender stereotypes: Age and gender differences in implicit biases and explicit beliefs

PONE-D-19-26304R2

Dear Dr. Vuletich,

I have now had the opportunity to review your most recent submission of this paper. I feel that you did an excellent job integrating the suggestions made by both me and the reviewers. I am therefore pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. I feel confident that this paper will make an important contribution to our field.

I have one final suggestion for you to consider as you finalize the supplement for publication. I noticed that in the S1 Table the pairwise comparisons focus only on explicit language stereotypes. I feel that it would be helpful to have a comparable table containing explicit math stereotypes. This is not a requirement, but rather is a suggestion that could be integrated into the supplement (or posted on the OSF) should you agree.

Within one week, you will receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

I want to commend you on this interesting research and I look forward to seeing this paper published in PLOS ONE.

Warmly,

Jenn Steele

Academic Editor

PLOS ONE

Acceptance letter

Jennifer Steele

24 Aug 2020

PONE-D-19-26304R2

Math and language gender stereotypes:Age and gender differences in implicit biases and explicit beliefs

Dear Dr. Vuletich:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jennifer Steele

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (PDF)

    Attachment

    Submitted filename: Response to reviewers.pdf

    Attachment

    Submitted filename: Response to reviewers_7.10.20.docx

    Data Availability Statement

    All relevant data are uploaded to the Open Science Framework database and publicly accessible via the following URL: https://osf.io/fv5h8/.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES