Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2023 Apr 24;120(18):e2218700120. doi: 10.1073/pnas.2218700120

Discordance in chromosomal and self-reported sex in the UK Biobank: Implications for transgender- and intersex-inclusive data collection

Sarah F Ackley a, Scott C Zimmerman a, Jason D Flatt b, Alicia R Riley c, Jae Sevelius d,e, Kate A Duchowny f,1
PMCID: PMC10161036  PMID: 37094118

Significance

Despite recent calls to distinguish between sex and gender, these constructs are often assessed in isolation or are used interchangeably. Our study quantifies the disagreement between chromosomal and self-reported sex and identifies potential reasons for discordance using data from the UK Biobank. We show that among approximately 200 individuals with sex discordance, 71% of discordances were explained by intersex traits or transgender identity. These findings imply that health and clinical researchers have a unique opportunity to advance the rigor of scientific research as well as the health and well-being of transgender, intersex, and nonbinary people, who have long been excluded from and overlooked in clinical and survey research.

Keywords: transgender research, inclusive research methods, health surveys

Abstract

There is growing need to distinguish between sex and gender. While sex is assigned at birth, gender is socially constructed and may not correspond to one’s assigned sex. However, in most research studies, sex or gender is assessed in isolation or the terms are used interchangeably, which has implications for research accuracy and inclusivity. We used data from the UK Biobank to quantify the prevalence of disagreement between chromosomal and self-reported sex and identify potential reasons for discordance. Among approximately 200 individuals with sex discordance, 71% of discordances were potentially explained by the presence of intersex traits or transgender identity. The findings indicate that when describing sex- and/or gender-specific differences in health, researchers may be limited in their ability to draw conclusions regarding specific sex and/or gender health information.


There is growing recognition of the importance of distinguishing between sex and gender (1). Sex is a multidimensional construct encompassing one’s reproductive/sexual anatomy, sex chromosomes, hormone levels, and secondary sex characteristics (24). Gender is personally and socioculturally defined and is based on one’s identity (e.g., man, woman, transgender man, transgender woman, nonbinary, gender diverse, or another identity). The terms transgender and nonbinary, for example, are often used to describe individuals whose sex assigned to them at birth does not correspond with their gender identity (5). Therefore, sex and gender should be measured as distinct concepts and not be used interchangeably (6, 7).

While numerous funding agencies and scientific journals call on health and clinical researchers to distinguish between sex and gender (6, 8), clinical questionnaires and health surveys rarely delineate between these two concepts, implying that sex and gender are one and the same. For example, the Centers for Disease Control and Prevention’s National Health Interview Survey inquires about sex but does not ask about gender, while the Health and Retirement Study only inquires about gender but not sex (9).

The failure to measure and distinguish between sex and gender has implications for research accuracy and inclusivity since some individuals may be misclassified or excluded due to assumed data input errors or the assumption that sex and gender are one and the same. Data on sex, gender, and their intersection are needed for accurate and inclusive clinical and survey research that incorporates the lived experiences of all people, including transgender, gender nonbinary, and intersex populations, which have been traditionally excluded from research despite known health disparities (1012). Indeed, among transgender individuals, 30% report having attempted suicide at least once in their lives and the prevalence of lifetime suicidal ideation is almost nine times higher than that of the general population (13, 14). This is exacerbated by structural barriers to accessing gender-affirming care, with nearly one-quarter of transgender individuals reporting avoiding health-care services due to anticipated discrimination (15).

In order to assess the implications of collecting data on sex alone, we use data from the UK Biobank, a large, genotyped dataset of adults to 1) quantify the prevalence of discordant sex, defined as male self-reported sex and XX chromosomal sex or female self-reported sex and XY chromosomal sex, and 2) identify potential reasons for the discordance, thereby highlighting the need for sex and gender to be measured as distinct constructs.

Methods

Data.

We utilized data from the UK Biobank study (https://www.ukbiobank.ac.uk/), a large population-based, prospective study for examining the risk factors for adult diseases in middle and old age. Details about the UK Biobank study have been described elsewhere (16). The University of California, San Francisco Institutional review board reviewed the study and deemed it exempt because we used deidentified data. Ethical approval was obtained from the North West Multi-Centre Research Ethics Committee (REC reference: 21/NW/0157) and was conducted in accordance with the principles of the Declaration of Helsinki. This research has been conducted using the UK Biobank Resource under Application Number 78748.

Study Participants.

Approximately 500,000 community-dwelling individuals aged 40 to 70 y who attended their baseline visit during 2006 to 2010 from the United Kingdom (UK) and provided biologic samples, detailed personal information, and underwent clinical measurements. Our analysis included ~487,600 individuals (see note in Statistical Analysis section) with both self-reported sex and chromosomal sex and excluded individuals with sex chromosome aneuploidy (n = ~700).

Demographic Information.

The following participant demographic information was obtained from the baseline visit: race/ethnicity (Asian, Black, mixed/other, and White; see SI Appendix), country of birth (outside of the United Kingdom vs. UK born), education at baseline, year of birth, and age at recruitment.

Self-Reported Sex.

Participant sex was obtained from a central registry at recruitment based on data from the National Health Service (NHS). For some cases, this measure was updated by the participant. Therefore, this measure contains both the sex the NHS had recorded for the participant and self-reported sex at the time of enrollment (17).

Chromosomal Sex.

Chromosomal sex was determined from genotype data using a combination of the X- and Y-probe intensities using the Affymetrix metric, which measures average probe intensity on a set of nonpolymorphic probes on the X and Y chromosomes, as previously described (18).

Discordance in Chromosomal Sex and Self-Reported Sex.

Discordance was defined as male self-reported sex and XX chromosomal sex or female self-reported sex and XY chromosomal sex.

Medical Diagnoses and Prescription Drugs Indicative of Transgender Identity and Intersex Traits.

To ascertain reasons for discordance, we used ICD9 and ICD10 diagnostic codes from hospitalizations and as well as a combination of self-report, primary care, and hospital data from NHS records to ascertain diagnoses of intersex traits and gender dysphoria, a diagnosis that is associated with transgender identity and is typically needed to receive transgender-related health care. A full list of ICD9 and ICD10 codes associated with transgender identity and/or intersex traits is provided in SI Appendix (19, 20). Information on hormone replacement therapy (HRT) and hormonal oral contraceptive (OC), which may be indicative of gender-affirming hormone therapy for someone with XY chromosomal sex, was obtained from assessment center visit questionnaires.

Statistical Analysis.

To test whether there were significant differences in the prevalence of discordance across race/ethnic groups, education level, country of birth, year of birth, and age of enrollment, we performed Fisher’s exact and t tests. We also examined whether discordances were more common in individuals with samples identified as outliers in heterozygosity, defined as the fraction of nonmissing genotype calls that are heterozygous, and missingness rates, both of which could be indicators of poor-quality genotyping. That is, poor-quality genotyping could lead to apparent discordances in individuals with concordant self-reported sex and chromosomal sex, that may not be due to a true discordance (21, 22). To protect participant anonymity, we report all numbers rounded to the hundreds digit as indicated by (~) and only percentages rounded to the nearest percent when reporting numbers that would yield small cell sizes.

Results

In our sample of ~487,600 adults, a total of ~200 individuals had discordance between chromosomal sex and self-reported sex. Among those who were discordant, the distribution of race/ethnicity (P = 0.61), education level (P = 0.32), or UK vs. non-UK birth (P = 0.15) did not significantly vary. Individuals with discordance were, on average, born later [average year of birth 1953.8 (SD 8.0) vs. 1951.5 (SD 8.1), P < 0.001] and were younger at enrollment [average age at enrollment 54.3 y (SD 8.0) vs. 56.5 y (SD 8.1, P < 0.001)]. Despite an overall sample that was majority of XX chromosomal sex (n = 264,300), 73% of those with discordant sex had XY chromosomal sex (P < 0.01) (Table 1).

Table 1.

Demographic information on participants with concordant and discordant sex in the UK Biobank (n = 487,596)

Chromosomal vs. reported sex concordant

(N = ~487,400)

Chromosomal vs. reported sex discordant

(N = ~200)

Chromosomal sex
 XX chromosomal sex 264,300 (54.2%) (27%)
 XY chromosomal sex 223,100 (45.8%) (73%)
Country of birth
 Outside of the United Kingdom 39,700 (8.1%) (11%)
 The United Kingdom 447,700 (91.9%) (89%)
Race/ethnicity-collapsed categories
 Asian 11,800 (2.4%) (2%)
 Black 8,600 (1.8%) (1%)
 Mixed/other 5,400 (1.1%) (1%)
 White 459,300 (94.2%) (96%)
Year of birth [year, (SD)] 1,951.5 (8.1) 1,953.8 (8.0)
Age at recruitment [years, (SD)] 56.5 (8.1) 54.3 (8.0)
High school equivalent
 Yes 316,800 (65.0%) (69%)
 No 169,700 (34.8%) (31%)

Table 2 enumerates potential reasons for discordance. At least one potential reason for discordance was identified in 71% of individuals with discordant sex, which included: intersex-related diagnosis (4%), gender dysphoria diagnosis (29%), and OC or HRT use in individuals with XY chromosomal sex (63%). We note that some individuals had multiple reasons for discordance identified. Percentages of individuals with samples identified as outliers in heterozygosity and missingness rates were higher among individuals with discordant sex (3%) than those in the remainder of the sample (0.2%), indicative of higher rates of poor-quality genotyping (23). However, none of the individuals with samples identified as outliers in heterozygosity and missingness rates were in the set of individuals with at least one documented potential reason for discordance.

Table 2.

Potential reasons for discordance among individuals with discordant chromosomal vs. reported sex*

Chromosomal vs. reported sex discordant

(N = ~200)

Reason for discordance identified?
Yes 71%
No 29%
Reason for discordance (% of total with discordance)
Diagnosed gender dysphoria 29%
Potential intersex condition 4%
HRT or OCs & XY chromosomal sex 63%

*Note: Some individuals had multiple reasons for discordance identified.

Discussion

We quantified discordance between chromosomal sex and self-reported sex in the UK Biobank and identified potential reasons for discordance. Overall, we identified approximately 200 individuals with discordant chromosomal and self-reported sex in nearly half of a million participants. Approximately 71% of discordances could be explained by intersex traits or transgender identity using readily available data, while only a small number of discordances (<3%) were in individuals with poor-quality genotyping. Thus, 71% reflects a likely lower bound for the percentage of discordances due to intersex traits and/or transgender identity because more individuals may self-identify as transgender or gender diverse than is reflected in the medical record. That is, most discordances observed in this study may indicate medically and demographically meaningful information that is unlikely to be due to typographical or self-reporting errors.

While the prevalence of individuals who identify as transgender ranges from 0.6% in the United States and 0.5 to 1.3% internationally (24, 25), the number of individuals identifying as transgender or nonbinary in the United States and Europe has steadily increased in recent years (2627). This is consistent with our finding that discordance was inversely associated with age.

Our findings also indicate that while individuals with discordant sex participate in clinical and health research, there are still major gaps in how sex and gender are assessed in research. As a result, we call on clinical and health researchers to employ a more rigorous assessment of sex and gender. A two-step method has been proposed which first inquires about gender identity terms and then asks about one’s sex assigned at birth. For example: 1) What is your current gender identity? (please select all that apply) [options: “woman,” “man,” “nonbinary,” “transgender,” “genderqueer,” “a gender identity not listed here (please specify),” “prefer not to state”]; and 2) What sex were you assigned at birth? [options: “female,” “male,” “intersex,” “a sex not listed here (please specify),” “prefer not to state”] (24, 28). We note that this ascertainment method may be imperfect since, for some intersex individuals, sex assigned at birth may be discordant with chromosomal sex. Additionally, measures of gender identity should be routinely reassessed in longitudinal clinical and research studies since, unlike chromosomal sex, gender identity may evolve over the life course (29).

There are several important implications of this work. First, when describing sex- or gender-specific differences in health, researchers may not be measuring what they intend to measure. Our findings suggest that data collected on self-reported sex may in some cases be a proxy for participants’ gender. This may be due to the fact that study participants may be more likely to report their current gender identity when asked to report their sex (30). Second, when either sex or gender is assessed in isolation, researchers cannot identify sex and/or gender diverse groups, such as intersex and transgender individuals (5), which limits the potential for clinicians and health researchers to measure, assess, and intervene on health disparities among sex and/or gender diverse groups. When performing quality control in genetics research, individuals with sex discordance are typically excluded from analysis. For example, in a recent analysis using UK Biobank data, individuals whose self-reported sex did not match inferred sex from their genetic data were excluded from the analysis (31). Such practices might, for instance, exclude transgender women in the validation of prostate cancer risk scores, which could lead to bias and yield inaccurate results.

This study has several strengths. We leverage chromosomal data and medical record data from a large population-based sample, the UK Biobank, to assess the consequence of measuring only self-reported sex and not gender in health research. The extensive data collected for each participant in UK Biobank enabled us to identify reasons for most individuals with discordant sex, which may not have been possible in other research studies. Despite these strengths, there are several limitations. First, we cannot rule out data entry errors as a potential reason for discordance for some individuals, but this applies to only a small minority of individuals with discordant sex. Second, we excluded individuals with sex chromosome aneuploidy, some of whom may be intersex. Of note, among the individuals with sex chromosome aneuploidy, ~300 were documented as female and ~400 were documented as male in the UKB self-reported sex field. Third, we did not have data on self-reported gender and our measure of self-reported sex contained NHS data that were updated by the participant at the time of enrollment for some individuals. As a result, self-reported sex may in fact be capturing one’s gender, and not sex per se, at the time of enrollment, underscoring the importance of employing a two-step method. Finally, we were unable to identify a potential reason for discordance for approximately 27% of participants. This may be due to incomplete information from primary care or other medical settings (e.g., outside the NHS), poor-quality genotyping not captured by heterozygosity or missingness, or data input errors.

This study also highlights some of the unique ethical considerations for research that seeks to identify people with diverse gender identities and/or intersex traits. Because gender diverse and intersex people represent a small percentage of the overall population, even when working with very large datasets, the number of people identified is usually quite small, especially since some such individuals may be hesitant to participate in research. In some studies, this may introduce the potential for research participants to be identifiable. We have protected against this possibility by only presenting data in aggregate and not including details, such as specific intersex conditions identified in our sample. While all participants in this study consented to participate in the UK Biobank study and all data were deidentified, it is possible that some participants may be uncomfortable with being described by their chromosomal sex. For this reason, we do not identify XX chromosomes as “female” or XY chromosomes as “male,” as is typical of health research. We affirm that some men have XX chromosomes and some women have XY chromosomes, and that there are other sex chromosomal configurations (e.g., XXY, XXX, and XO), which further acknowledges the necessary distinction between sex and gender in our language as well as our data collection efforts.

Health and clinical researchers have a unique opportunity to advance the rigor of scientific research as well as the health and well-being of transgender, intersex, and nonbinary people, who have long been excluded from and overlooked in clinical and survey research. The two-step method of sex and gender is a necessary advancement.

Supplementary Material

Appendix 01 (PDF)

Acknowledgments

Author contributions

S.F.A., S.C.Z., J.D.F., A.R.R., and K.A.D. designed research; S.F.A., S.C.Z., J.D.F., A.R.R., and K.A.D. performed research; S.F.A. and K.A.D. contributed new reagents/analytic tools; S.F.A., S.C.Z., J.S., and K.A.D. analyzed data; and S.F.A., S.C.Z., J.D.F., A.R.R., J.S., and K.A.D. wrote the paper.

Competing interests

The authors declare no competing interest.

Footnotes

This article is a PNAS Direct Submission.

Data, Materials, and Software Availability

Code data have been deposited in Github (Will be provided upon acceptance). Some study data available UK Biobank encourages broad access to its data and samples for health-related research. Researchers who wish to access UKB data may do so by submitting an application through the online access management system (https://bbams.ndph.ox.ac.uk/ams/). Summary statistics for data is made available through the data showcase (https://biobank.ndph.ox.ac.uk/showcase/).

Supporting Information

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

Data Availability Statement

Code data have been deposited in Github (Will be provided upon acceptance). Some study data available UK Biobank encourages broad access to its data and samples for health-related research. Researchers who wish to access UKB data may do so by submitting an application through the online access management system (https://bbams.ndph.ox.ac.uk/ams/). Summary statistics for data is made available through the data showcase (https://biobank.ndph.ox.ac.uk/showcase/).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES