Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Oct 1.
Published in final edited form as: Am Psychol. 2023 Jan 30;78(7):886–900. doi: 10.1037/amp0001117

The Gender Self-Report: A Multidimensional Gender Characterization Tool for Gender-Diverse and Cisgender Youth and Adults

John F Strang 1,2, Gregory L Wallace 3, Jacob J Michaelson 4, Abigail L Fischbach 1, Taylor R Thomas 4, Allison Jack 5, Jerry Shen 1, Diane Chen 6,7, Andrew Freeman 8, Megan Knauss 1, Blythe A Corbett 9, Lauren Kenworthy 1,2, Amy C Tishelman 10, Laura Willing 1,2, Goldie A McQuaid 5, Eric E Nelson 11, Russell B Toomey 12, Jenifer K McGuire 13, Jessica N Fish 14, Scott F Leibowitz 15, Leena Nahata 11,16, Laura G Anthony 17, Graciela Slesaransky-Poe 18, Lawrence D’Angelo 19, Ann Clawson 1,2, Amber D Song 1, Connor Grannis 11, Eleonora Sadikova 1, Kevin A Pelphrey 20; GENDAAR Consortium20, Michael Mancilla 19, Lucy S McClellan 1, Kelsey D Csumitta 1, Molly R Winchenbach 1, Amrita Jilla 1, Farrokh Alemi 21, Ji Seung Yang 22
PMCID: PMC10697610  NIHMSID: NIHMS1887140  PMID: 36716136

Abstract

Gender identity is a core component of human experience, critical to account for in broad health, development, psychosocial research, and clinical practice. Yet, the psychometric characterization of gender has been impeded due to challenges in modeling the myriad gender self-descriptors, statistical power limitations related to multigroup analyses, and equity-related concerns regarding the accessibility of complex gender terminology. Therefore, this initiative employed an iterative multi-community-driven process to develop the Gender Self-Report (GSR), a multidimensional gender characterization tool, accessible to youth and adults, nonautistic and autistic people, and gender diverse and cisgender individuals. In Study 1, the GSR was administered to 1,654 individuals, sampled through seven diversified recruitments to be representative across age (10–77 years), gender and sexuality diversity (~33% each gender diverse, cisgender sexual minority, cisgender heterosexual), and autism status (>33% autistic). A random half-split subsample was subjected to exploratory factor analytics, followed by confirmatory analytics in the full sample. Two stable factors emerged: Nonbinary Gender Diversity and Female–Male Continuum (FMC). FMC was transformed to Binary Gender Diversity based on designated sex at birth to reduce collinearity with designated sex at birth. Differential item functioning by age and autism status was employed to reduce item–response bias. Factors were internally reliable. Study 2 demonstrated the construct, convergent, and ecological validity of GSR factors. Of the 30 hypothesized validation comparisons, 26 were confirmed. The GSR provides a community-developed gender advocacy tool with 30 self-report items that avoid complex gender-related “insider” language and characterize diverse populations across continuous multidimensional binary and nonbinary gender traits.

Keywords: gender identity, measurement, transgender, nonbinary, autism

Introduction

Gender identity is an individual’s inner experience of gender. “Gender diversity” is used as an umbrella term to describe gender expressions and/or identities that extend beyond the norms or stereotypes for a person’s designated sex at birth. “Gender diverse” is employed more narrowly in this study as a designation to indicate when a person’s gender identity differs from their designated sex at birth (e.g., transgender, nonbinary).

The last decade has witnessed a burgeoning of gender identity terms and descriptions (Richards et al., 2016). The resulting kaleidoscope of gender self-descriptions speaks to the range of individual gender experiences, as well as the importance of common languages of identities to connect people with shared gender experiences. Examples include terms such as trans feminine, nonbinary, agender, genderqueer, demigender, gender vague, trans masculine-nonbinary, faesari. As important as these gender identity descriptors are for individuals and communities, their definitions are highly personal and variable; understanding and use of gender self-descriptors (e.g., nonbinary) can vary greatly between individuals, and there is insufficient operationalization of the terms to allow for precise and equitable characterization—across broad populations—of the inner experience of gender (Jacobsen et al., 2021). Further, the gender self-descriptor terms are often culturally-based: Use of these descriptors varies by community—and at a broad level—by race and language accessibility (Morgan, 2020). A large-scale example of comprehensibility challenges with gender self-descriptors occurred within the first wave of the ABCD study, the largest long-term study of brain development and youth health ever conducted in the United States: 40.2% of participants, who were of late elementary school age at Time Point 1, did not understand a question that asked whether they are “transgender,” one of the most common of the gender diversity terms currently used (Calzo & Blashill, 2018). Gender-diverse (GD) adolescents often describe coming upon a certain term online or within a social group and realizing the description fits their experience of gender (Garrison, 2022). For some individuals, until discovering a certain term, they have no words to express their gender or gender-related needs (Strang, Powers, et al., 2018). Disparities accessing gender terms and concepts may be exaggerated in some subgroups, including neurodivergent individuals (i.e., individuals who differ in neurological/neurocognitive functioning from the norm), who often describe long histories of gender diversity, but without sufficient access to gender terminology to self-advocate around identities or needs (Strang et al., 2021).

As gender identity terms may be privileged to those with access to these ideas and words, this current research initiative aimed to create a gender identity characterization tool that levels the playing field in research and service provision settings such that all individuals can describe, assert, and self-advocate around their genders using accessible language. In order to accomplish this aim, this study developed, refined, calibrated, and validated a straightforward self-report gender identity scale that captures a range of gender experiences across diverse individuals. In no way does this project intend to undermine or reduce the use of community-driven identity terms. Such terms are critical for understanding individualized experiences of gender and are a cornerstone of gender-related community building. Instead, this initiative aims to create a self-communication tool that supports the characterization of genders across the gender spectrum, for youth and adults with varying communication styles, so that gender identities might be more equitably expressed, including for those without access to sophisticated gender terminology. The intended uses of the measure are broad. In research, a multidimensional quantitative gender characterization tool, which measures gender traits continuously, could be employed to more easily (and potentially more accurately) give a psychometric “voice” to people of diverse genders, thereby including and modeling the genders of people previously ignored or excluded from analyses due to a lack of tools to characterize and include them. There may also be clinical and community advocacy uses of the measure to give voice and inclusion to the genders of youth and adults, including those without access to nuanced gender-diversity terminology. For example, consensus guidance recommends gender diversity screening in autism spectrum disorder (ASD) to help identify those who may benefit from gender-related consultation/support (Strang, Meagher, et al., 2018).

A range of gender identity-related self-report measures have been developed; they are catalogued according to key attributes on the study’s technical website (Open Science Framework [OSF]): https://osf.io/da3zp/?view_only=448e3116de214f10bd9cb69652080db4. Beyond their psychometric properties—and any potential limitations therein—the very conceptualization and construction of available measures presents challenges for equitable characterization of gender across broad populations. For example, most of these measures have been designed for either youth or adult populations, but do not span both. The Gender Self-Report (GSR) is a new gender characterization tool designed to span youth (age 10) through adulthood. This broad age range has been deemed critical by our diverse key stakeholder GSR development partners given the importance of a continuity of gender characterization over time, especially as adolescents enter adulthood. Of note, the recent update of the World Professional Association for Transgender Health (WPATH) Standards of Care highlighted the dearth of longitudinal studies following GD young people from childhood to adulthood (WPATH, 2022). An additional limitation of most existing measures is their focus on binary conceptualizations of gender (male vs. female), which restricts the range of measurable gender experiences, and which may be experienced as identity-dismissing by those whose identities lie outside the gender binary. Therefore, the GSR richly includes a broad range of nonbinary gender descriptions and experiences and, in accordance with reports from GD individuals, allows for the simultaneity of binary and nonbinary genders.

Further, many existing measures combine gender identity diversity- and gender dysphoria- (i.e., distress related to gender incongruence) related items in item sets (see above OSF link), and there is some psychometric support for these approaches (Deogracias et al., 2007). However, the measurement of gender dysphoria as linked to gender identity can be problematic, as members of GD communities are increasingly critical of inappropriately blurring gender identity diversity and classification of gender-related needs (Jacobsen et al., 2021). In the words of a transgender young adult from the present study, “Why should I fill out a questionnaire that determines whether I’m transgender based on how much I hate my body? Yes, my body does not fit what I want and need, but I’m working every day to be happier with who I am.” A new movement in gender self-reporting focuses instead on “gender euphoria”—gender traits, experiences, and goals that feel right and bring comfort, as opposed to despised aspects of the body (Beischel et al., 2021). Once gender needs are better addressed (e.g., through gender-affirming treatments), dysphoria may decrease; in such situations, gender diversity is clearly independent of gender dysphoria (de Vries et al., 2014). Therefore, measures that conflate GD and dysphoria may have variable psychometric properties depending on where an individual is in their process of gender affirmation and/or social or medical transition. For this reason, the GSR focuses primarily on the direct experiences, wishes, and needs of the respondent around gender. Conceptually, the GSR is designed to measure those internal aspects of gender identity, not components that are dependent on whether a person has received gender-affirming care.

Additionally, many of the most commonly used gender measures are designed specifically around a person’s designated sex at birth, with separate male and female forms (see above OSF link). However, such dual-form measures are problematic for use with transgender populations in which changes in the body from gender-affirming treatments may make the designated sex at birth form of a measure no longer relevant (e.g., if the body parts or secondary sex characteristics are no longer present). A switch to the other binary sex version of such measures is also problematic, such as for longitudinal studies where there is a need for continuity of psychometric measurement over time. Further, there is a subset of individuals, many of whom may benefit from a self-report and self-advocacy-based measure of gender, who do not have a clear designated sex at birth (e.g., a subset of individuals with variations in sex traits [VST], also known as differences of sex development or intersex variations, whose sex traits vary from binary male or female). Existing scales that present male or female forms exclude such individuals. Therefore, the GSR is developed with a single form that can be: given to anyone of any designated sex at birth or gender identity; completed regardless of which body parts are present or absent; and completed over time, regardless of any changes related to gender-affirming treatment.

Finally, there is a significant proportional overrepresentation of autism among GD youth and adults (6 times the odds of being autistic for GD as compared to cisgender individuals: Warrier et al., 2020; ~11% of GD individuals are autistic: Kallitsounaki & Williams, 2022) and GD is overrepresented in autistic populations, with reports of up to 13% of autistic people experiencing GD (Walsh et al., 2018). Autism is characterized by differences in communication and thinking style, which can impact the way autistic people respond to questions (Strang, Powers, et al., 2018). Therefore, to optimize the accessibility of the GSR for autistic people, autistic perspectives (cisgender and GD) informed the development and refinement of the item set. The inclusion of autistic people in the measure’s development aims to reduce disparities in access to gender measurement for this large subpopulation.

This study presents the 12-year community-driven development and refinement process of the GSR, including its calibration (Study 1) and validation (Study 2) in a large sample balanced across youth and adults, autistic and non-autistic people, sexuality (sexual minority [SM] and heterosexual), and gender (GD and cisgender). Differential item functioning (DIF), an equity-based psychometric method, is examined to identify and reduce bias in the measure for subgroups. Construct, convergent, and ecological validity characteristics are assessed with self-descriptors of gender, existing report measures of GD, and gender-affirming treatment status, respectively.

Study 1

Method

Study 1 aimed to: (a) calibrate the GSR item set based on the examination of underlying factor structure and latent variable distributions, (b) examine a range of psychometric properties of the final items, (c) test DIF across subpopulations (i.e., by age and ASD status) to achieve measurement invariance, and (d) produce final scale scores.

Transparency and Openness

Due to the nature of the consents, the entire data set is not publicly available. Study materials are available through the study’s technical website (see links throughout article text). The study was approved by the Children’s National Institutional Review Board (IRB) and linked IRB at University of Virginia, as well as IRBs at University of Iowa, George Mason University, Lurie Children’s Hospital, and Nationwide Children’s Hospital. Validation hypotheses were made and preregistered through Open Science Framework following Study 1, prior to Study 2 analyses.

GSR Item Creation and Revision

The GSR was developed through an iterative community-driven multi-input process. The GSR aims to give voice to an individual’s experience of their gender (i.e., gender identity), including how they think of themselves, want to be seen, want to be addressed, and want to be, physically. A guiding feature is the use of language describing gender identities, neutral or positive gender-related experiences, and gender-related wishes rather than language implying hatred of features of the self (which assumes transgender people are damaged or defined by dysphoria).

The multiphase GSR development process is described in detail, including descriptions of clinic and community contributions and psychometrics for a protoversion of the GSR, on the study’s technical website: https://osf.io/qh25d/?view_only=c0ce41d07bca4af1b792e074d51b7ded. There was an intentional centering on youth for initial item generation in order to ensure that the final measure would be linguistically and conceptually accessible to youth. Given that experiences of gender fluidity were often reported, item response choices were organized according to temporally-centered self-ratings of gender experiences (i.e., “always true,” “often true,” “sometimes true,” “never true”) to capture the fluidity of experiences at the item level. The item set was piloted in gender diversity clinics for youth and young adults for 8 years, resulting in the following sequence of content expansion and refinement: (a) development of a single GSR form agnostic to designated sex at birth, (b) expansion of nonbinary content, (c) inclusion of items that map onto gender-related medical and broader care decisions available to GD individuals, and (d) refinement of item content and vocabulary to optimize characterization of gender in youth and adults equally (i.e., by balancing and synthesizing feedback from both GD youth and adults).

Concurrent with the pilot in gender clinics, early psychometric pilot studies were conducted in samples of autistic and nonautistic youth; and cisgender and GD youth and young adults (Total N = 370; pilots described in the link above). Results (a) established basic accessibility of simple gender-related self-report items for autistic and nonautistic cisgender youth as well as for autistic and nonautistic gender-diverse youth, (b) identified two primary gender dimensions (i.e., binary gender, nonbinary gender), and (c) demonstrated appropriate internal reliability and initial validation of the factors. The item set was further refined based on community advisement from two United States National Institutes of Health-funded studies that administered the items (2R01HD082554–06; R01 HD097122), the first of which recruited youth from four U.S. gender clinics and the second of which recruited youth across the United States. The principal investigators for these studies summarized the feedback they received from youth and families. This feedback prompted the removal of (a) items that, by the nature of their wording, associate a gender with a type of clothing or behavior, perpetuating gender stereotypes and (b) items that associate a body type with gender (e.g., “a male body”), which were experienced as hurtful by some individuals whose bodies do not conform to such common patterns. Specifically, community feedback indicated that a measure of gender must not perpetuate gender stereotypes through items related to clothing, behavior, or bodies that could be hurtful to people whose clothing, behavior, or bodies do not conform to common (i.e., stereotypical) patterns observed by gender. The final item set of the GSR that was subjected to calibration was 30 items. The length of the item set was determined to reconcile the following: (a) appropriate item-coverage for multifactor measurement, (b) community feedback emphasizing the importance of broad item coverage such that individuals across the gender spectrum can find their gender experience sufficiently included/represented within the item set, and (c) sufficient brevity to be completable as part of a battery of report measures (i.e., shorter than common standardized self-reports [e.g., Achenbach System of Empircally Based Assessment; Behavior Rating Inventory of Executive Function]; Achenbach & Rescorla, 2013; Gioia et al., 2015).

Participants

This study employed seven separate recruitments, spanning the years of 2017–2021, to maximize the breadth of the sample across the following key characteristics: GD identities (binary and nonbinary), ASD, the intersection of GD and ASD, transition age/young adult age, and female designation at birth (within the entire sample and the ASD subset). See Table 1, for participant characteristics. Ethical imperatives driving the equity-based sampling approach are described as follows.

Table 1.

Participant characteristics

n %

Gendera and Sexual Minority Status
 Binary Genders
  Binary transgenderb 243 18.7
  Binary cisgender (sexual minority) b 459 35.3
  Binary cisgender (exclusively heterosexual) b 595 45.8
 Not Binary Genders
  Nonbinaryb 142 48.0
  Questioningb 51 17.2
  Fluidb 68 23.0
  Genderqueerb 82 27.7
  Agenderb 41 13.9
  Demigenderb 65 22.0
  Third genderb 33 11.1
 Unreported 58 3.5
Designated Sex at Birth
 Designated Female at Birth (DFAB) 1222 73.9
  Autistic DFABb 423 34.6
 Designated Male at Birth (DMAB) 431 26.1
  Autistic DMABb 197 45.7
 Indeterminate related to variation in sex traits 1 0.1
Autism Status
 Autistic 621 37.5
  Autistic gender diverseb 320 51.5
Race
 Black/African American 115 7.0
 Asian 162 9.8
 American Indian/Alaska Native/Native American 42 2.5
 Native Hawaiian/Pacific Islander 4 0.2
 White 964 58.3
 Other 40 2.4
 More than one race 128 7.7
 Unreported 199 12.1
Ethnicity
 Hispanic/Latino/a/x 144 8.7

Note. N = 1,654. Participants were on average 27.73 years old (SD = 12.15, range 10.00 –77.25).

a

121 participants reported more than one gender category.

b

Percentages listed are proportions within each overarching demographic category.

Several conditions are overrepresented in GD populations (e.g., autism, depression, anxiety; (Warrier et al., 2020). Of the overrepresented conditions, autism is associated with the greatest variation in cognition and communication, as well as notable differences in self-report style and self-advocacy behaviors. Equitable representation of autistic youth and adults (from clinics and the community) in this current sample aims to sufficiently incorporate autism-related experience and self-report style in the psychometric development and calibration of the GSR (including through DIF). This is critical given that the GD proportional overrepresentation in autism is greater than in any other documented human subpopulation (Walsh et al., 2018; Warrier et al., 2020). Four of the seven recruitments specifically overrecruited autistic individuals.

Five recruitments sampled GD individuals, specifically. The representation of GD in the general population is estimated to be as high as 2.7% (Gower et al., 2018), insufficient to adequately represent GD characteristics without targeted sampling. GD individuals were recruited to include GD youth and adults from gender clinic and community settings, and both binary and nonbinary identities (see Table 1). Three recruitments specifically invited autistic GD individuals from gender clinic and community settings.

All but one of the recruitments spanned from youth into adulthood, and adult sampling included a robust representation of young adults. The approach is responsive to evidence that the current generation of youth and young adults increasingly acknowledge and report gender expansive experiences, including among those with overall cisgender identities (Wilson & Meyer, 2021). To sufficiently capture the range and variability of gender experiences, youth and emerging adults are well-represented. This also facilitates the psychometric development/ calibration of the GSR to allow for measurement continuity from adolescence into adulthood, rarely accomplished in standardized report measures of any psychological phenomena.

Special focus on the inclusion of people who were designated female at birth (DFAB) addresses measurement- and equity-related factors. Autistic people who were DFAB have traditionally been underrepresented in research and measure development, and emerging evidence indicates that autistic people who were DFAB are often not identified as autistic until later in life (Begeer et al., 2013). Further, overrepresentation of gender diversity among autistic people appears to be more common among those who were DFAB (Cooper et al., 2018). Outside of ASD, people who were DFAB report significantly greater gender expansiveness (Beltz et al., 2021) and gender identity diversity (Chiniara et al., 2018). Purposeful inclusion of a broad range of individuals who were DFAB (autistic, nonautistic, GD, cisgender) aims to sufficiently capture this notable variation of gender experience for the psychometric development and calibration of the GSR.

Recruitments were as follows. The Measurement and Mechanisms of Cognition, Behavior and Gender in Youth study recruited GD, cisgender, and autistic GD youth and young adults from the DC metropolitan area and the United States broadly. The THRIVE Gender Clinic Study (TGCS) recruited gender clinic-based transgender and nonbinary youth and young adults from the Midwest United States, approximately half of whom were pursuing gender-related medical interventions. The Lurie Children’s Hospital (LCH) study recruited gender clinic-based GD youth in the Chicago, Illinois, region who were receiving gender-affirming pubertal suppression. The Children’s National Hospital Gender Clinic Study (CNHGCS) recruited gender clinic-based GD youth and young adults: Approximately half of the sample was pursuing gender-related medical intervention; independent of this, half of the sample was autistic. The Autism Center of Excellence GENDAAR Study (ACE GENDAAR) recruited autistic and nonautistic youth and young adults across five national sites (UCLA, Seattle Children’s, University of Washington, Yale/New Haven Connecticut, and UVA/Charlottesville, VA), with an intentional oversampling of individuals who were DFAB. The Simons Powering Autism Research Knowledge study (Feliciano et al., 2018) recruited autistic and nonautistic adults nationally, with a special focus on the inclusion of GD individuals. The George Mason University Study (GMUS) recruited young adult college students from the most ethnoracially diverse public university in Virginia (Moody, 2021); less than 45% of GMUS participants were White. Detailed specification regarding participant characteristics of the seven recruitments and years of data collection is included on the study’s technical website: https://osf.io/49d25/?view_only=8f58683b40344d2b90a1fe4b67234855.

Within the research protocols, participants were asked their gender as well as their designated sex at birth. Participants completed the GSR and were asked four questions regarding their sexual attraction (i.e., attraction to females, males, and/or nonbinary, no attraction), given evidence of greater heterogeneity in inner gender experience and expression among sexual minority cisgender individuals as compared to exclusively heterosexual cisgender people (Lippa, 2000; Wilson & Meyer, 2021). As described in Study 2, a subset of participants completed additional gender and body image questions for validation. Gender and sexuality groupings were as follows: cisgender exclusively heterosexual (cisgender with exclusive sexual attraction to the other binary gender), cisgender sexual minority (cisgender with sexual attraction other than exclusively heterosexual), and GD. Due to sample size of the GD group (i.e., õne third of the sample), this study does not parse sexuality within GD for psychometric measure development. The study was approved by the Children’s National Institutional Review Board (IRB; IRB 0009948) and linked IRB at University of Virginia (for the five-site ACE GENDAAR project; IRB 00000447) as well as University of Iowa (IRB 201611784), George Mason University (IRB 1550597), LCH (IRB 2018–2240), and Nationwide Children’s Hospital (IRB 18–00741). Adult participants provided informed consent to participate. Minors (17 and younger) provided assent and their parent or guardian provided informed consent for their child to participate. Due to the nature of the consents, the entire data set is not publicly available. Study materials are available through the study’s technical website.

Statistical Analyses

An empirical data-driven approach was employed to develop the GSR. First, an exploratory factor analysis (EFA) was conducted on the item response data of a randomly half-split subsample of participants (N = 827) to decide the appropriate number of latent factors for extraction and to identify a meaningful pattern of item factor loadings. Both (a) oblique rotation, preferred for identifying simple structures, and (b) targeted rotation to test possible hierarchical factor structure (i.e., bifactor model; were employed to comprehensively examine factor loading patterns. Note, oblique rotation was employed because this rotation method is more flexible than orthogonal in that both correlated and uncorrelated factors can be managed. The four-level ordinal nature of item responses was accounted for in the EFA and follow-up analyses, and full-information maximum likelihood (FIML) estimation was used to address item-level response missingness, which was exceptionally low (0.81% of items missing). Both absolute model fit (i.e., root mean square error of approximation [RMSEA]) and relative model fit indices (i.e., likelihood ratio test, Akaike’s information criteria [AIC], Bayesian information criteria [BIC]) were assessed for final model determination. Both AIC and BIC were examined as each could favor different models in the context of the available sample size if the fitted model is complex. flex-MIRT v.3.51 was used for these analytics, as absolute model fit index M2 RMSEA is uniquely available in this package.

Once the underlying factor structure was determined, confirmatory factor analysis (CFA) was conducted for the whole sample (N = 1,654). For this CFA, a mixture modeling approach was adopted to address expected non- normality of the latent variable distribution of gender (i.e., expected to be weighted toward the tails of femaleness and maleness). Multiple item factor models with a normal- mixture latent density (i.e., graded response models with a normal-mixture latent density) with varying numbers of classes (i.e., 2–4) were fitted and compared to determine the final calibration model. Mixture modeling was employed not to classify participants, but rather to address the expected violation of the latent distributional assumption of conventional item factor modeling. Alternative approaches, including a semiparametric item response model, were tested, but mixture model estimation was identified to be the most effective and stable for this empirical data under varying estimation conditions. All mixture factor models were estimated in Mplus v.8.1 with FIML using 49 guesstimate quadrature points with varying random starts up to 200 to replicate the best log likelihood for starting values.

After the calibration model was determined, model-based DIF tests were conducted using Wald tests with controlled false discovery rate at 5%, the same rate as the typical Type I error rate (Benjamini-Hochberg procedure). Mplus v.8.1 was used to fit multiple group (two age groups [youth age 10–21; adults age ≥ 22]; and autistic vs. nonautistic groups) mixture models, and Wald tests were calculated in R after obtaining item parameter estimates and the variance–covariance matrix from Mplus. Finally, after refining the set of items based on DIF analysis, Expected A Posteriori (EAP) scores were calculated using the item parameter estimates and mixture of normal density that were obtained from the final calibration model for each subdomain. For scoring, the flexMIRT v.3.51 scoring option was used by supplying item parameter estimates (from Mplus output) and mixture of normal density (calculated in R using model parameter estimates from Mplus). FlexMIRT was employed for scoring in order to obtain summed score EAP. Reliability coefficients were calculated for scores.

Results

Underlying Factor Structure

EFA revealed that a two-factor model with simple structure yielded satisfactory model fit as well as a meaningfully interpretable rotated loading solution. While the three-factor solution improved the model fit slightly, only three items loaded on the third factor (and these items also cross-loaded heavily with the second factor), producing a third factor with insufficient item coverage to achieve reliability. The two-factor solution had acceptable model fit with M2 RMSEA 0.06. When the same two-factor model was estimated with limited information estimation, the comparative fit index and Tucker–Lewis index reached 0.99, again indicating a satisfactory level of model-data fit. Therefore, the two-factor solution was identified as the most parsimonious solution of the current item set. The first factor is called the Female–Male Continuum (FMC) and includes items related to the binary experiences of femaleness and maleness. The second factor is named Nonbinary Gender Diversity (NGD) and includes items describing experiences of nonbinariness: neither male nor female, both male and female, and/or a gender completely different from male or female. Eleven femaleness and seven maleness items all heavily loaded on the first factor with opposite signs of the factor loading as expected. All of the 12 NGD items significantly loaded on the second factor. Note that none of two-factor or three-factor models with different rotation methods separated femaleness and maleness as different dimensions. The items on the third potential factor that emerged in exploratory analytics consisted of those questions related to experiencing femaleness and maleness simultaneously (“bothness”). As noted above, these items loaded solidly on the second factor, as well. See Discussion section for future directions regarding the dimension of “bothness”.

Multidimensional Mixture Model

EFA results indicate the existence of two latent variables; however, the exploratory analysis of summed score distribution clearly indicates violation of the assumption of normality or multivariate-normality for latent variables in the conventional item factor models. To capture the nonnormal multivariate density of two latent variables, we fit two-dimensional mixture factor models varying the number of classes (i.e., 1–4 classes). In this sense, this mixture modeling still employs a data-driven approach to the modeling procedure. A four-class model did not converge. Both AIC and BIC indicated the best model had three classes; this finding suggests the sample size was sufficient for the complex models fitted. The estimated density was far from a bivariate normal distribution that is assumed for regular multidimensional item factor models. There was generally high concentration on the low NGD dimension; there were also high concentrations on the extreme ends of the FMC (e.g., high femaleness/low maleness and low femaleness/high maleness). For a visual depiction of the factors’ latent densities, see the study’s technical website: https://osf.io/h7stb/?view_only=d6485c2e285f4c7ca686cb9e186477d5. The CFA with mixture density analysis indicates that a simple correlation characterization (e.g., Pearson correlation) between the two latent variables would be inappropriate given the joint distribution of the two latent variables. Moreover, each item is clearly loaded on only one dimension; therefore, unidimensional calibration with respect to each factor is justifiable.

Multiple Group Unidimensional Mixture Models: DIF Analysis

After determining the number of classes per group, two FMC items were flagged as uniform-DIF where the largest absolute difference between the two characteristic curves was greater than one raw scoring point (because the result would be changed at the summed score level if the items were included). The expected item scores were consistently higher across the latent variable levels for the autism group, with the largest DIF effect of 1.8. The two items were: An item capturing the level of congruence regarding having a vagina and an item assessing whether having a beard would be distressing. Accordingly, these two items were dropped for the final calibration. For detailed information regarding the determination of classes per group and minor DIF items, as well as visual depictions of the expected score curves of the two dropped items for the autism and no autism groups, see the study’s technical website (see link immediately above).

Unidimensional Mixture Models: Scoring

After dropping two items from the FMC domain, the final domains included a total of 28 items (9 femaleness + 7 maleness = 16 items for FMC and 12 items for NGD). Note that maleness items were reverse coded before scoring. The final calibration and scoring model for the FMC required three classes of mixture to characterize the nonnormal density and model parameter estimates, which are reported in Table 2. The final calibration and scoring model for NGD needed two normal mixtures to capture the bimodal distribution of density, and model parameter estimates are reported in Table 2. EAP scores are calculated for each domain—FMC and NGD—and we linearly transformed the scores to put them on a [0,1] scale to match the traditional convention of coding sex as either 0 or 1, but here allowing a continuum of gender values between 0 and 1. For the FMC, higher scores mean a higher level of femaleness and lower level of maleness, and lower scores mean a lower level of femaleness and a higher level of maleness. EAP scores for NGD were also linearly transformed to range from 0 to 1. Binary gender diversity (BGD) values were conditioned on designated sex at birth, so as to quantify how far each individual’s FMC value is from their designated sex at birth. If designated sex at birth is male, high BGD means a reported higher level of femaleness; the inverse is true for a person DFAB. Empirical reliability coefficients for the response pattern EAP scores were as follows. The estimated reliability for FMC values is 0.87, for BGD values is 0.85, and for NGD values is 0.75. Summed score EAP reliabilities are comparable, with 0.87 for FMC values, 0.87 for BGD values, and 0.73 for NGD values. The GSR and the GSR calculator to convert raw to scaled values is available here: https://osf.io/a3emh/?view_only=a0190df0087e4e98a0d0fbebe946c94a

Table 2.

Final item set and calibration model (mixture item response model) parameter estimates

Nonbinary Gender Diversity (NGD) Female-Male Continuum/Binary Gender Diversity (FMC/BGD)

Class Prop M SE Variance SE Class Prop M SE Variance SE

Class 1 0.5 0.00 NA 1.00 NA Class 1 0.25 0.00 NA 1.00 NA
Class 2 0.5 1.73 0.1 0.19 0.05 Class 2 0.23 −0.50 0.57 0.13 0.09
Class 3 0.52 0.41 0.06 0.07 0.10
GSR items NGD FMC/BGD Slope Intercept 1 Intercept 2 Intercept 3

1. I think of myself as female. X 12.09 2.20 −0.42 −2.32
2. I think of myself as male. X 12.02 3.29 1.67 −0.92
3. I think of myself as both male and female. X 2.76 −5.64 −7.17 −8.49
4. I think of myself as neither male nor female. X 7.41 −13.77 −16.39 −18.61
5. I think of myself as completely different than male or female. X 7.73 −15.56 −18.01 −19.50
6. I want people to see me as female. X 13.23 1.45 −1.06 −3.10
7. I want people to see me as male. X 13.82 3.86 1.71 −1.14
8. I want people to see me as both male and female. X 3.70 −7.83 −9.40 −10.79
9. I want people to see me as neither male nor female. X 6.99 −12.96 −15.11 −17.27
10. I want people to see me as a gender completely different than male or female. X 10.17 −20.93 −23.65 −25.42
11. Overall, I feel that deep down my true gender is female. X 10.82 0.63 −1.14 −2.48
12. Overall, I feel that deep down my true gender is male. X 11.58 3.57 1.99 0.05
13. Overall, I feel that deep down my true gender is both male and female. X 2.84 −6.17 −7.38 −8.69
14. Overall, I feel that deep down my true gender is neither male nor female. X 10.33 −19.78 −22.71 −24.85
15. Overall, I feel that deep down my true gender is completely different than male or female. X 10.21 −20.84 −23.62 −25.23
16. Having a female name feels or would feel right for me. X 11.43 0.61 −1.46 −3.08
17. Having a male name feels or would feel right for me. X 11.57 3.27 1.57 −1.03
18. Having a gender neutral name (not clearly male or female) feels or would feel right for me. X 2.63 −3.91 −5.41 −6.60
19. Being called “she” or “her” feels or would feel right for me. X 13.35 0.95 −0.61 −2.35
20. Being called “he” or “him” feels or would feel right for me. X 12.38 3.02 1.77 0.04
21. Being called “they” or “them” or something that is gender neutral feels or would feel right for me. X 4.10 −8.00 −9.50 −10.39
22. I’m fine having a penis. Or if I don’t have a penis, I wish I had been born with one. X 7.19 1.99 0.84 −0.88
23. I’m fine with having a vagina. Or if I don’t have a vagina, I wish I had been born with one.
24. I’m fine having breasts. Or if I don’t have breasts now, I would be fine having them in the future. X 7.98 1.43 −0.23 −1.64
25. I’d rather have breasts than a flat chest. X 5.73 0.40 −0.94 −2.12
26. I am upset or would be upset having a deeper (low-pitch) voice. X 3.40 −0.52 −1.54 −2.45
27. Having hair on my face (like a beard or mustache) would be very upsetting.
28. I want my voice to sound like most girls or women (higher-pitch). X 7.36 0.21 −1.48 −2.92
29. I want my voice to sound like most boys or men (lower-pitch). X 7.45 2.82 1.12 −0.89
30. I want my voice to sound neither male nor female (gender neutral). X 2.55 −3.60 −5.08 −6.54

Note. Prop = proportion; SE = standard error; GSR = Gender Self-Report. All slopes were statistically significant at p < 0.05 level.

Study 2

Method

To validate the GSR, three gender-related self-report measures, one gender-related parent-report measure, and one self-report measure of body image were administered to portions of the Study 1 sample, concurrent with completion of the GSR. Because none of the validation measures have been validated for all subgroups within our larger sample, either because of age range (e.g., validated for adults; validated for parents/caregivers of youth) or gender (e.g., questions appropriate for binary transgender individuals, but not nonbinary individuals), they were administered to specific subsets of the larger recruited sample. Validation metrics related to gender identity, sexual minority status, and gender-related medical care were also included.

Measures

Genderqueer Identity Scale, Challenge the Binary

The Genderqueer Identity (GQI; McGuire et al., 2019) is a 22-item adult self-report measure of genderqueer identities validated in binary transgender (n = 255), genderqueer (n = 140), and cisgender sexual minority (SM; n = 115) individuals (18–74 years). The GQI Challenge the Binary factor (five items), which has consistently demonstrated strong internal reliabilities across samples (i.e., Cronbach’s αs .74–.82), has previously distinguished gender identity and SM groups. The internal reliability of the GQI Challenge the Binary in this sample was high: Cronbach’s α = .81.

Feminine and Masculine Slider Scales

A slider scale of feminine expression (i.e., slider with anchors of “not feminine” vs. “very feminine”) and a separate slider scale of masculine expression were completed by 818 adult participants (40.5% GD). The sliders were based on the feminine–masculine slider scales from the Genderbread Person (Killermann, 2017) gender self-assessment, capturing self-reported feminine and masculine expression independently. Gender-related sliders are acceptable to cisgender and GD individuals and show ecological validity in both GD and cisgender populations (Ho & Mussap, 2019; Kasabian, 2015).

Utrecht Gender Dysphoria Scale (UGDS)

The Utrecht Gender Dysphoria Scale (UGDS; Cohen-Kettenis & van Goozen, 1997) is a 12-item self-report measure that captures the experience of binary gender dysphoria, including physical and identity-related aspects of gender dysphoria; higher scores relate to greater gender dysphoria. Internal reliability is strong: Cronbach’s α = .80. Sensitivity is high for identifying gender-referred youth (88.3%) and distinguishing nonreferred youth (specificity = 99.5%; Steensma et al., 2013). The UGDS is an intrinsically binary measure in its absence of coverage of nonbinary experiences. Some UGDS items present a severe tone (e.g., an item that links gender dysphoria to hopelessness and a wish to not exist). For these reasons, the measure was administered to only a subset of GD youth (n = 64) who: (a) had clinician-confirmed binary transgender gender identities; (b) participated in in-person clinical research appointments, with an on-call clinician in case of psychiatric emergencies; and (c) completed consent and assent procedures that described to youth and families that some of the measures might evoke strong emotional experiences. The internal reliability of the UGDS in this sample was acceptable: Cronbach’s αs = .76 and .74 in young women and young men, respectively.

Body Image Scale

The Body Image Scale (BIS; Lindgren & Pauly, 1975) assesses body image as it relates to 30 different parts of the body. Items relate to primary and secondary sex characteristics; there are also neutral items (i.e., not related to sexual development), which produce a separate total not employed in the present study. Scores range from 1 (very satisfied) to 5 (very dissatisfied) for each item. The BIS has been used in research with transgender and GD youth; BIS scores have been shown to improve with the treatment of gender dysphoria (de Vries et al., 2014). For the validation study, the BIS was administered to 64 GD youth who met criteria for gender dysphoria, but who—unlike the UGDS subsample—were not selected based on binary identity. The internal reliability of the BIS in this sample was strong: Cronbach’s α = .92.

Gender Diversity Screening Questionnaire–Informant, Binary Gender Identity

The Gender Diversity Screening Questionnaire–Informant (GDSQ-I; Strang, 2021) has been calibrated and validated in a sample of parents/caregivers of 244 youth; internal reliability is strong (Binary Gender Identity [BGI]; α = .89): https://osf.io/kp7yg/?view_only=c5c7503b622042389ac1ca26c90ff313. The BGI factor captures the degree to which a young person’s gender identity is congruent with their designated sex at birth. This factor is comprised of items capturing how an individual thinks of their gender as rated by the parent/caregiver on a 4-point scale. GDSQ-I validation in GD and cisgender young people showed significant differentiation of GDSQ-I scores by self-disclosed gender and by Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition gender dysphoria diagnostic status (see OSF link immediately above). The internal reliability of the GDSQ-I BGI factor in this sample was strong: Cronbach’s α = .98.

Youth Requested Gender-Affirming Treatment (Yes/No)

A GD youth’s request for gender-affirming medical treatment (e.g., puberty blockers, gender-affirming hormones, surgery) does not necessarily correspond to receipt of such treatments. Reasons for this include requirement of: (a) parent consent to commence treatment; (b) specialist clinician assessment, recommendation, and resulting documentation to commence treatment (e.g., WPATH letter); (c) the absence of medical or developmental contraindicators for receiving treatment; and (d) financial resources and/or insurance coverage to pay for treatment. Therefore, this study captured whether the young person requested gender-affirming medical treatments, not whether they received the treatment. Asking whether youth were receiving these treatments could have created a bias, with positive endorsements more likely by youth from more accepting families and those with more immediate access to gender-related care providers.

Adult Received Gender-Affirming Treatment (Yes/No)

Whereas GD youth are reliant on parents/caregivers to consent to gender-affirming treatment, adults can consent themselves. Therefore, for GD adults, receipt of gender-affirming treatment was the validity metric.

Validation Hypotheses and Data Analytics

Validation hypotheses for construct, convergent, and ecological validity (encompassing 30 comparisons in total; see Table 3) were made and preregistered following Study 1 prior to Study 2 analyses: https://osf.io/3u5mv/?view_only=273f49d493974ee897ca419fcefd5219. However, power analyses were not performed a priori.

Table 3.

Validation Hypotheses and Results

Construct Validity
Scale Hypothesized GSR differences by gender identity and SM status Actual

NGD Nonbinary > binary trans > cisgender (cis) Nonbinary > binary trans > Cis
BGD Binary trans > nonbinary > cisgender (cis) Binary trans > nonbinary > Cis
NGD Nonbinary > binary trans > SM cis > Het cis Nonbinary > binary trans = SM cis > Het cis
BGD Binary trans > nonbinary > SM cis > Het cis Binary trans > nonbinary > SM cis > Het cis
Convergent Validity Effect Size

FMC NGD BGD

Validation measures/metrics Hyp Actual Hyp Actual Hyp Actual Hyp Actual

Genderqueer Identity Scale-CB + + + + NGD>BGD NGD>BGD
Feminine Slider Scale + +
Masculine Slider Scale
UGDS + + + BGD>NGD BGD>NGD
Body Image Scale + NS + +
GDSQ-Informant BGI + NS + + BGD>NGD BGD>NGD

Ecological Validity Effect Size
Youth request GAT SSD SSD SSD SSD BGD>NGD BGD>NGD
Adult receives GAT SSD SSD SSD SSD BGD>NGD BGD>NGD

Note. Blank table cells indicate no hypothesis or test. + = positive, significant correlation hypothesized and/or observed; − = negative, significant correlation hypothesized and/or observed; NS = nonsignificant; SSD = statistically significant difference; NGD = Nonbinary Gender Diversity Scale (GSR); BGD = Binary Gender Diversity Scale (GSR); FMC = Female–Male Continuum (GSR); SM = sexual minority (attractions other than exclusively heterosexual); Het = exclusively heterosexual attraction (attraction to the other binary gender exclusively); UGDS = Utrecht Gender Dysphoria Scale; GDSQ-Informant BGI = Gender Diversity Screening Questionnaire completed by a parent/caregiver, Binary Gender Identity factor; Genderqueer Identity Scale-CB = Genderqueer Identity Scale, Challenge the Binary factor; GAT = gender-affirming treatment; GSR = Gender Self-Report.

Construct validity, assessed as differences in GSR gender diversity factor values (NGD, BGD) was hypothesized across: (a) gender identities and (b) gender identities and sexual attraction subgrouping, tested through omnibus analysis of variances (i.e., using Kruskal–Wallis tests to accommodate nonnormality: see Study 1, Results, Multidimensional Mixture Model section), and when indicated, multiple comparison-corrected post hoc tests (i.e., Bonferroni adjusted Mann–Whitney U tests). Specifically, GSR NGD was predicted to differ across gender identity groups, with the highest NGD values for nonbinary participants, lower NGD values for binary transgender participants, and the lowest NGD values for cisgender participants. GSR BGD was predicted to differ across identity groups as follows: highest values for binary transgender participants, lower values for nonbinary participants, and the lowest for cisgender participants. As described above, given evidence of a greater heterogeneity of gender experience among sexual minority cisgender individuals as compared to exclusively heterosexual cisgender people, we also hypothesized that cisgender sexual minority individuals would have greater NGD and BGD values than exclusively heterosexual cisgender participants.

Convergent validity of GSR values was hypothesized with the six gender measures administered and tested through correlations. Effect size differences were assessed for the three gender measures for which effect size differences were hypothesized among subgroups, tested through comparisons between Spearman’s rank-order correlations. We hypothesized that NGD and BGD would be positively related to the GQI Scale Challenge the Binary, with NGD showing a greater effect size relationship. Convergent validity of the GSR FMC value was hypothesized as follows: FMC values would be positively associated with the femininity slider and negatively associated with the masculinity slider. Convergent validity between the GSR gender diversity values and the UGDS and BIS in the two subsamples of GD youth was hypothesized as follows: (a) BGD and NGD were hypothesized to be positively correlated with the UGDS, with a larger effect size for BGD given the binary nature of the UGDS and (b) BGD and NGD were also hypothesized to be positively correlated with the BIS, with a larger effect size for BGD given reports of greater levels of body-related dysphoria among binary transgender as compared to nonbinary youth, overall (Galupo & Pulice-Farrow, 2020). Also, among youth, convergent validity (i.e., positive correlations) between the GSR BGD/NGD values was hypothesized with the parent/caregiver-reported GDSQ-I BGI, with larger effect size for BGD versus NGD given the binary nature of the GDSQ-I BGI.

Last, we hypothesized that ecological validity would be demonstrated via relationships between the GSR BGD/NGD values and: (a) youth report of a request for gender-affirming medical treatments and (b) adult report of receiving gender-affirming treatments; importantly, request and/or receipt of gender-affirming medical care may be in response to gender dysphoria or incongruence. To test these relationships, logistic regressions were performed with GSR values predicting gender-affirming medical request/receipt. We also hypothesized that the GSR BGD would have stronger relationships with request/receipt of gender-affirming medical care than the GSR NGD, given findings that nonbinary versus binary transgender individuals are less likely to request medical care (Beek et al., 2015; Galupo et al., 2021; Nolan et al., 2019). In this way, we expected a differentiation of BGD and NGD by effect size. Three exploratory validation metrics were also examined (but not preregistered given their exploratory nature): The ability of GSR values to predict gender-affirming medical requests among GD youth specifically (i.e., excluding cisgender youth); the ability of GSR values to predict gender-affirming medical treatments among GD adults specifically (i.e., excluding cisgender adults); and the comparative ability of GSR BGD versus parent report of BGI (GDSQ-I) to predict gender-affirming medical requests among youth.

The ability of GSR factors to predict gender-affirming medical requests/receipt was tested (and, in the case of the within-gender-diverse sample prediction, explored) as an indicator of ecological validity. However, the authors emphasize that this validation approach should in no way be extended to a clinical use for the GSR as a determiner of a person’s appropriateness for gender-affirming care (e.g., through the use of a “cut-point”). To avoid this usage, we publicly report whether the logistic regressions were significant and whether the effect size differences of the logistic regressions were significant, but we do not publish the actual statistics of these analyses.

Results

Of the 30 hypothesized validation comparisons, 26 were confirmed. See Table 3 “Actual” columns for validation results. The study’s technical website, https://osf.io/dp5w9/?view_only=0f98396e9ad946beac57698601d5beeb, provides statistics of these results, minus those for ecological validity metrics. The four hypothesized comparisons not confirmed were as follows. Binary trans and SM cisgender did not significantly differ in NGD values. The binary UGDS (given to the subset of 64 transgender youth who underwent additional consenting related to the completion of the UGDS) was negatively related to GSR NGD values. Finally, GSR NGD values were not related to the BIS or the binary GDSQ-I BGI scores. Regarding the three exploratory validation analyses (i.e., in the GD subset), all were confirmed.

Discussion

This study presents the development, DIF analytics, psychometric calibration, and validation of a new 30-item self-report gender characterization tool: the Gender Self-Report (GSR). Designed to be accessible for youth as young as age 10 and adults, nonautistic and autistic individuals, and cisgender and GD individuals across the gender spectrum, the GSR shows strong psychometrics. Following DIF analytics, two items were found to function differently by autism status (though none by age) and were removed from the factor analytics (though they were retained as items mapping onto goals of select gender-affirming interventions). The two data-driven factors that emerged from factor analytics align ecologically with the broadest community descriptions of gender identities: nonbinary and binary gender identities. Also, in accordance with emerging community descriptions of gender, the GSR captures nonbinary and binary gender simultaneously, in keeping with self-descriptions of gender that include both characteristics concurrently (e.g., “I’m trans masc leaning toward nonbinary”). Internal reliabilities of the factors were acceptable to good, and construct, convergent, and ecological validities were established. In addition to producing reliable factor scores, the item set contains nine critical items mapping onto goals of select gender-affirming interventions. These critical items are voice-related, facial hair-related, chest-related, and genital-related.

The GSR provides a multidimensional gender characterization capturing nonbinary and binary gender experience across a continuum. The two continuous factors expand beyond discrete gender identity descriptors/groupings, which is important as such groupings may lack specificity, consistency of definition, and accessibility for many individuals (Morgan, 2020). Unlike other dimensional gender characterization methods (e.g., Beischel et al., 2022), the GSR is developed to capture the continuous spectra of nonbinary and binary genders, rather than producing a group-based classification of gender identities based on discrete categories. As described in the next paragraph, the GSR approach may facilitate more equitable inclusion of a full range of gender experiences within research, as multiple discrete categories of gender are challenging to model in analyses with sufficient statistical power, especially given the typically small numbers of individuals within some (or many) of the categories. Further, discrete categories may miss the nuance of experienced gender in the population. For example, the continuum-based values also capture meaningful gender variability for cisgender individuals, which has been linked to polygenic propensity variation (Thomas et al., 2022).

Including gender meaningfully in analytics for biomedical and psychosocial research has been a challenge given the myriad gender identity self-descriptors which parse an already small subset of self-identified gender diverse individuals into multiple subgroups, reducing statistical power. By capturing dimensional gender continuously across broad populations (gender diverse and cisgender), GSR-characterized gender may be included in analytics without multiple subgroup comparisons. Although this approach may to some degree lose the individualized vocabulary of gender experience—and is not designed as a replacement for this vocabulary—it may also provide an option to more equitably represent gender in analytics previously impossible due to power limitations. Further, it may remain robust amidst the ongoing evolution in personalized gender terminology/vocabulary. However, more investigation is necessary to understand how continuous versus categorical gender metrics may function similarly and/or differently across various biological and psychological systems and analytics. In fact, the GSR could be used to study new emerging gender self-descriptors, including across diverse populations, where open-ended gender self-descriptors are then compared to GSR profiles.

In addition to the direct validations of the GSR conducted through Study 2, the relative lack of problematic DIF may reflect the success of the multiyear community-based participatory framework that aimed to develop and refine an item set responsive to and appropriate for diverse populations. The two items that showed substantial DIF, both by autism status, relate to physical attributes commonly associated with sexual development (i.e., genitals and facial hair). In both cases, these items were less related to the Binary Gender Diversity (BGD) factor for autistic compared to nonautistic people, suggesting that physical manifestations commonly associated with sex and gender in the general population may be less salient for autistic people. In fact, previous qualitative research with autistic GD youth suggests that gender expression and gender identity may be less clearly yoked for some autistic individuals (Strang, Powers, et al., 2018).

In addition to the simultaneity of nonbinary and binary gender diversity factors, another fundamental psychometric finding emerged: Maleness and femaleness in identity did not separate as individual psychometric dimensions. Previous research regarding feminine- and masculine-associated behaviors has separated femininity and masculinity as independent characteristics (Bem, 1974). Yet, in this present study, no factor analytic approach supported a separation of female identity and male identity factors. There may be several reasons for this. Previous investigations identifying separate feminine and masculine factors focused on gender-related behaviors, and not gender identity, specifically; gender identity is not equivalent to feminine or masculine-associated behaviors, and may organize differently (Wood & Eagly, 2015). Further, the present study (and measure) included a range of nonbinary experiences in addition to experiences of female and male identities. These nonbinary items include self-descriptions of bothness (e.g., both female and male), which may better capture the experience of simultaneity of female and male gender identities than separate female and male identity factors.

Given its community-driven development and equity-based psychometric refinement (e.g., targeted recruitment; DIF analytics), the GSR may facilitate more equitable inclusion of GD individuals who do not have, understand, and/or use specialized self-descriptors for gender; this situation may be more common among neurodivergent individuals and those with less exposure to GD communities (Morgan, 2020). Capturing multidimensional gender across continuous factors may have immediate application in social science research, as well as health-related genetics, neuroimaging, and broader medical research. The GSR nonbinary (NGD) and binary (BGD) factors are designed to be employed simultaneously in analytics, and designated sex at birth may also be included with these factors as an additional variable of interest. The BGD factor (transformed based on designated sex at birth) avoids the direct collinearity of the FMC factor with designated sex at birth. Additionally, for samples that include that subset of individuals with sex trait variations that complexify binary sex designation, the FMC and NGD factors may sidestep the necessity of specifying designated sex at birth.

Constraints on Generality

Future studies should address several limitations of the development process. First, equity-based DIF should be accomplished across broader subgroupings, including ethnoracial identities. The current sample aimed to maximize diversity and balance across autism status, gender diversity, and sexual attraction. Diversity across ethnoracial identity was accomplished to an extent (i.e., < 60% self-reported White), but there was insufficient representation across individual ethnoracial identities to study DIF by ethnoracial subgroup. Further, the sample was not sufficiently inclusive of intersex people/people with VST. Our current work in this area (supported by R21MD015860–0) is identifying that experiences of gender and gender-related needs may be more complex among a subset of intersex individuals/individuals with VST, and the current GSR item set may require expansion to fully capture these experiences. Additionally, although items related to the simultaneity of more than one gender factored onto the NGD dimension, there was also some evidence that these items showed potential cross-loading onto a third underpopulated factor. This preliminary finding suggests the potential value of future item exploration/expansion to more comprehensively capture multigender experiences and gender fluidity; these experiences have yet to be well-described in the research literature. Finally, this study established a key aspect of reliability: internal consistency. Future work is needed to understand the reliability and other characteristics of the GSR in repeated administration.

Conclusions

This study has developed a new gender-affirming characterization and inclusion method. The iterative GSR development process has produced a reliable and valid self-report appropriate for youth and adults, GD and cisgender individuals, and autistic and nonautistic people.

Public Significance Statement:

Calibrated and validated in youth and adults, neurotypical and neurodivergent populations, and a spectrum of gender identities, the Gender Self-Report is a multidimensional gender characterization tool developed for use with diverse populations. Capturing gender profiles through simple, accessible language and characterizing simultaneously nonbinary and binary gender diversity across continuous spectra, the Gender Self-Report is designed for broad research, clinical, and services-based applications.

Acknowledgments

This research was supported by the Clinical and Translational Science Institute, Children’s National (UL1TR001876), a National Institutes of Health Clinical and Translational Science Award (KL2TR001877), the National Institute of Mental Health (R01MH100028), the National Human Genome Research Institute (R01HG012697), and the Fahs-Beck Fund for Research and Experimentation. The authors have no conflicts of interest. Due to the nature of the consents, the entire dataset is not publicly available. Validation hypotheses were made and preregistered through Open Science Framework following Study 1 prior to Study 2 analyses: https://osf.io/3u5mv/?view_only=273f49d493974ee897ca419fcefd5219

John F. Strang played lead role in conceptualization, data curation, funding acquisition, investigation, methodology, project administration, supervision, writing of original draft and writing of review and editing and equal role in formal analysis. Gregory L. Wallace played supporting role in conceptualization, formal analysis, methodology and writing of review and editing. Jacob J. Michaelson played supporting role in conceptualization, investigation, methodology and writing of review and editing and equal role in data curation. Abigail L. Fischbach played supporting role in formal analysis, methodology, project administration, visualization, writing of original draft and writing of review and editing. Taylor R. Thomas played supporting role in data curation, formal analysis, project administration and writing of review and editing. Allison Jack played supporting role in conceptualization, project administration and writing of review and editing and equal role in data curation. Jerry Shen played supporting role in formal analysis, resources and writing of review and editing. Diane Chen played supporting role in data curation, methodology and writing of review and editing. Andrew Freeman played supporting role in conceptualization, methodology and writing of review and editing. Megan Knauss played supporting role in conceptualization and writing of review and editing. Blythe A. Corbett played supporting role in conceptualization and writing of review and editing. Lauren Kenworthy played supporting role in conceptualization, methodology, supervision and writing of review and editing. Amy C. Tishelman played supporting role in writing of review and editing and study interpretation and contextualization. Laura Willing played supporting role in data curation, investigation and writing of review and editing. Goldie A. McQuaid played supporting role in conceptualization, data curation and writing of review and editing. Eric E. Nelson played supporting role in data curation, supervision and writing of review and editing. Russell B. Toomey played supporting role in writing of review and editing and study interpretation and contextualization. Jenifer K. McGuire played supporting role in methodology and writing of review and editing. Jessica N. Fish played supporting role in methodology and writing of review and editing. Scott F. Leibowitz played supporting role in data curation, project administration and writing of review and editing. Leena Nahata played supporting role in writing of review and editing and study interpretation and contextualization. Laura G. Anthony played supporting role in conceptualization, supervision and writing of review and editing. Graciela Slesaransky-Poe played supporting role in conceptualization and writing of review and editing. Lawrence D’Angelo played supporting role in writing of review and editing and study interpretation and contextualization. Ann Clawson played supporting role in writing of review and editing and study interpretation and contextualization. Amber D. Song played supporting role in data curation, project administration and writing of review and editing. Connor Grannis played supporting role in data curation and writing of review and editing. Eleonora Sadikova played supporting role in data curation, project administration and writing of review and editing. Kevin A. Pelphrey played supporting role in data curation, funding acquisition, project administration and writing of review and editing. The GENDAAR Consortium played supporting role in data curation and funding acquisition. Michael Mancilla played supporting role in data curation and writing of review and editing. Lucy S. McClellan played supporting role in writing of review and editing. Kelsey D. Csumitta played supporting role in project administration and writing of review and editing. Molly R. Winchenbach played supporting role in visualization and writing of review and editing. Amrita Jilla played supporting role in visualization and writing of review and editing. Farrokh Alemi played supporting role in writing of review and editing and study interpretation and contextualization. Ji Seung Yang played lead role in formal analysis, methodology and visualization and supporting role in writing of original draft and writing of review and editing. The preregistered design is available at https://osf.io/3u5mv/?view_only=273f49d493974ee897ca419fcefd5219

References

  1. Achenbach T, & Rescorla L. (2013). Achenbach system of empirically based assessment. In Volkmar FR (Ed.), Encyclopedia of autism spectrum disorders (pp. 31–39). Springer; New York. 10.1007/978-1-4419-1698-3_219 [DOI] [Google Scholar]
  2. Beek TF, Kreukels BP, Cohen-Kettenis PT, & Steensma TD. (2015). Partial treatment requests and underlying motives of applicants for gender affirming interventions. Journal of Sexual Medicine, 12(11), 2201–2205. 10.1111/jsm.13033 [DOI] [PubMed] [Google Scholar]
  3. Begeer S, Mandell D, Wijnker-Holmes B, Venderbosch S, Rem D, Stekelenburg F, & Koot HM. (2013). Sex differences in the timing of identification among children and adults with autism spectrum disorders. Journal of Autism and Developmental Disorders, 43(5), 1151–1156. 10.1007/s10803-012-1656-z [DOI] [PubMed] [Google Scholar]
  4. Beischel WJ, Gauvin SEM, & van Anders SM. (2021). “A little shiny gender breakthrough”: Community understandings of gender euphoria. International Journal of Transgender Health, 23(3), 274–294. 10.1080/26895269.2021.1915223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Beischel WJ, Schudson ZC, Hoskin RA, & van Anders SM. (2022). The gender/sex 3×3: Measuring and categorizing gender/sex beyond binaries. Psychology of Sexual Orientation and Gender Diversity. Advance online publication. 10.1037/sgd0000558 [DOI] [Google Scholar]
  6. Beltz AM, Loviska AM, & Weigard A. (2021). Daily gender expression is associated with psychological adjustment for some people, but mainly men. Scientific Reports, 11(1), Article 9114. 10.1038/s41598-021-88279-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bem SL. (1974). The measurement of psychological androgyny. Journal of Consulting and Clinical Psychology, 42(2), 155–162. 10.1037/h0036215 [DOI] [PubMed] [Google Scholar]
  8. Calzo JP, & Blashill AJ. (2018). Child sexual orientation and gender identity in the adolescent brain cognitive development cohort study. JAMA Pediatrics, 172(11), 1090–1092. 10.1001/jamapediatrics.2018.2496 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chiniara LN, Bonifacio HJ, & Palmert MR. (2018). Characteristics of adolescents referred to a gender clinic: Are youth seen now different from those in initial reports? Hormone Research Paediatrics, 89(6), 434–441. 10.1159/000489608 [DOI] [PubMed] [Google Scholar]
  10. Cohen-Kettenis PT, & van Goozen SH. (1997). Sex reassignment of adolescent transsexuals: A follow-up study. Journal of the American Academy of Child & Adolescent Psychiatry, 36(2), 263–271. 10.1097/00004583-199702000-00017 [DOI] [PubMed] [Google Scholar]
  11. Cooper K, Smith LGE, & Russell AJ. (2018). Gender identity in autism: Sex differences in social affiliation with gender groups. Journal of Autism and Developmental Disorders, 48(12), 3995–4006. 10.1007/s10803-018-3590-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. de Vries ALC, McGuire JK, Steensma TD, Wagenaar ECF, Doreleijers TAH, & Cohen-Kettenis PT. (2014). Young adult psychological outcome after puberty suppression and gender reassignment. Pediatrics, 134(4), 696–704. 10.1542/peds.2013-2958 [DOI] [PubMed] [Google Scholar]
  13. Deogracias JJ, Johnson LL, Meyer-Bahlburg HFL, Kessler SJ, Schober JM, & Zucker KJ. (2007). The gender identity/gender dysphoria questionnaire for adolescents and adults. Journal of Sex Research, 44(4), 370–379. 10.1080/00224490701586730 [DOI] [PubMed] [Google Scholar]
  14. Feliciano P, Daniels AM, Green Snyder L, Beaumont A, Camba A, Esler A, Gulsrud AG, Mason A, Gutierrez A, Nicholson A, Paolicelli AM, McKenzie AP, Rachubinski AL, Stephens AN, Simon AR, Stedman A, Shocklee AD, Swanson A, Finucane B, … the SPARK Consortium. (2018). SPARK: A US cohort of 50,000 families to accelerate autism research. Neuron, 97(3), 488–493. 10.1016/j.neuron.2018.01.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Galupo MP, & Pulice-Farrow L. (2020). Subjective ratings of gender dysphoria scales by transgender individuals. Archives of Sexual Behavior, 49(2), 479–488. 10.1007/s10508-019-01556-2 [DOI] [PubMed] [Google Scholar]
  16. Galupo MP, Pulice-Farrow L, & Pehl E. (2021). “There is nothing to do about it”: Nonbinary individuals’ experience of gender dysphoria. Transgender Health, 6(2), 101–110. 10.1089/trgh.2020.0041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Garrison SA. (2022). “Trans Enough” for Tumblr? Gender Accountability and Identity Challenge in Online Communities for Trans and Non-Binary Youth [Doctoral dissertation, University of Michigan]. Deep Blue Repositories. https://deepblue.lib.umich.edu/bitstream/handle/2027.42/172575/nicomel_1.pdf?sequence=1 [Google Scholar]
  18. Gioia GA, Isquith PK, Guy SC, & Kenworthy L. (2015). BRIEF-2: Behavior rating inventory of executive function: Professional manual. Psychological Assessment Resources. [Google Scholar]
  19. Gower AL, Rider GN, Coleman E, Brown C, McMorris BJ, & Eisenberg ME. (2018). Perceived gender presentation among transgender and gender diverse youth: Approaches to analysis and associations with bullying victimization and emotional distress. LGBT Health, 5(5), 312–319. 10.1089/lgbt.2017.0176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Ho F, & Mussap AJ. (2019). The gender identity scale: Adapting the gender unicorn to measure gender identity. Psychology of Sexual Orientation and Gender Diversity, 6(2), 217–231. 10.1037/sgd0000322 [DOI] [Google Scholar]
  21. Jacobsen K, Devor A, & Hodge E. (2021). Who Counts as Trans? A Critical Discourse Analysis of Trans Tumblr Posts. The Journal of Communication Inquiry, 46(1), 60–81. 10.1177/01968599211040835 [DOI] [Google Scholar]
  22. Kallitsounaki A, & Williams DM. (2022). Autism spectrum disorder and gender dysphoria/incongruence. A systematic literature review and meta-analysis. Journal of Autism and Developmental Disorders. Advance online publication. 10.1007/s10803-022-05517-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kasabian A. (2015). Capturing the gendiverse: A test of the gender self-perception scale, with implications for survey data and labor market measures [Doctoral dissertation, The University of Nebraska]. Sociology Theses, Dissertations, & Student Research, 37. [Google Scholar]
  24. Killermann S. (2017). Genderbread person Version 4. https://www.genderbread.org/
  25. Lindgren TW, & Pauly IB. (1975). A body image scale for evaluating transsexuals. Archives of Sexual Behavior, 4(6), 639–656. 10.1007/BF01544272 [DOI] [PubMed] [Google Scholar]
  26. Lippa RA. (2000). Gender-related traits in gay men, lesbian women, and heterosexual men and women: The virtual identify of homosexual-heterosexual diagnosticity and gender diagnosticity. Journal of Personality, 68(5), 899–926. 10.1111/1467-6494.00120 [DOI] [PubMed] [Google Scholar]
  27. McGuire JK, Beek TF, Catalpa JM, & Steensma TD. (2019). The Genderqueer Identity (GQI) Scale: Measurement and validation of four distinct subscales with trans and LGBQ clinical and community samples in two countries. The International Journal of Transgender Health, 20(2–3), 289–304. 10.1080/15532739.2018.1460735 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Moody J. (2021). Campus ethnic diversity: National universities. U.S. News & World Report. [Google Scholar]
  29. Morgan RE, Dragon C, Daus G, Holzberg J, Kaplan R, Menne H, Symens Smith A, & Spiegelman M. (2020). Updates on terminology of sexual orientation and gender identity survey measures. Federal Committee on Statistical Methodology. [Google Scholar]
  30. Nolan IT, Kuhner CJ, & Dy GW. (2019). Demographic and temporal trends in transgender identities and gender confirming surgery. Translational Andrology and Urology, 8(3), 184–190. 10.21037/tau.2019.04.09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Richards C, Bouman WP, Seal L, Barker MJ, Nieder TO, & T’Sjoen G. (2016). Non-binary or genderqueer genders. International Review of Psychiatry, 28(1), 95–102. 10.3109/09540261.2015.1106446 [DOI] [PubMed] [Google Scholar]
  32. Steensma TD, Kreukels BP, Jürgensen M, Thyen U, de Vries ALC, & Cohen-Kettenis P. (2013). The Utrecht Gender Dysphoria Scale: A Validation Study. Research portal Amsterdam UMC—Vrije Universiteit Amsterdam. https://research.vumc.nl/ws/portal%20files/portal/354253/hoofdstuk%2003.pdf [Google Scholar]
  33. Strang JF. (2021, September 16). Gender Diversity Screening Questionnaire—Parent/informant. Open Science Framework. https://osf.io/kp7yg/?view_only=c5c7503b622042389ac1ca%2026c90ff313
  34. Strang JF, Anthony LG, Song A, Lai MC, Knauss M, Sadikova E, Graham E, Zaks Z, Wimms H, Willing L, Call D, Mancilla M, Shakin S, Vilain E, Kim DY, Maisashvili T, Khawaja A, & Kenworthy L. (2021). In addition to stigma: Cognitive and autism-related predictors of mental health in transgender adolescents. Journal of Clinical Child & Adolescent Psychology. Advance online publication. 10.1080/15374416.2021.1916940 [DOI] [PubMed] [Google Scholar]
  35. Strang JF, Meagher H, Kenworthy L, de Vries ALC, Menvielle E, Leibowitz S, Janssen A, Cohen-Kettenis P, Shumer DE, Edwards- Leeper L, Pleak RR, Spack N, Karasic DH, Schreier H, Balleur A, Tishelman A, Ehrensaft D, Rodnan L, Kuschner ES, … Anthony LG. (2018). Initial clinical guidelines for co-occurring autism spectrum disorder and gender dysphoria or incongruence in adolescents. Journal of Clinical Child & Adolescent Psychology, 47(1), 105–115. 10.1080/15374416.2016.1228462 [DOI] [PubMed] [Google Scholar]
  36. Strang JF, Powers MD, Knauss M, Sibarium E, Leibowitz SF, Kenworthy L, Sadikova E, Wyss S, Willing L, Caplan R, Pervez N, Nowak J, Gohari D, Gomez-Lobo V, Call D, & Anthony LG. (2018). “They thought it was an obsession”: Trajectories and perspectives of autistic transgender and gender-diverse adolescents. Journal of Autism and Developmental Disorders, 48(12), 4039–4055. 10.1007/s10803-018-3723-6 [DOI] [PubMed] [Google Scholar]
  37. Thomas TR, Tener AJ, Pearlman AM, Imborek KL, Yang JS, Strang JF, & Michaelson JJ. (2022). Dimensional gender diversity is associated with greater polygenic propensity for cognitive performance and interacts with other genetic factors in predicting health outcomes. MedRxiv. 10.1101/2021.11.22.21266696 [DOI] [Google Scholar]
  38. Walsh RJ, Krabbendam L, Dewinter J, & Begeer S. (2018). Brief report: Gender identity differences in autistic adults: Associations with perceptual and socio-cognitive profiles. Journal of Autism and Developmental Disorders, 48(12), 4070–4078. 10.1007/s10803-018-3702-y [DOI] [PubMed] [Google Scholar]
  39. Warrier V, Greenberg DM, Weir E, Buckingham C, Smith P, Lai M-C, Allison C, & Baron-Cohen S. (2020). Elevated rates of autism, other neurodevelopmental and psychiatric diagnoses, and autistic traits in transgender and gender-diverse individuals. Nature Communications, 11(1), Article 3959. 10.1038/s41467-020-17794-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Wilson BDM, & Meyer IH. (2021). Nonbinary LGBTQ adults in the United States. UCLA School of Law Williams Institute. [Google Scholar]
  41. Wood W, & Eagly AH. (2015). Two traditions of research on gender identity. Sex Roles, 73(11–12), 461–473. 10.1007/s11199-015-0480-2 [DOI] [Google Scholar]
  42. World Professional Association for Transgender Health. (2022). World Professional Association for Transgender Health, Standards of Care Version 8 (SOC8). https://www.wpath.org/soc8

RESOURCES