Table 1.
Examples of how to discuss limitations of existing data sources
Scenario | Example |
---|---|
Medical and/or insurance records that collect a binary male/female variable. | “The binary male/female categories available in our data likely reflects the legal gender marker of each patient. This is neither an accurate measure of sex assigned at birth or of gender identity, since many transgender and non-binary individuals have a legal gender marker that does not reflect their gender identity, and few states allow for a gender-neutral gender marker. We are thus unable to identify transgender and non-binary patients, who are misclassified in our source data. The direction of this misclassification is also unknown.” |
Survey or interview collects data using imprecise language that conflates sex assigned at birth and gender (e.g. “Are you male or female?”). | “The interview script does not distinguish between sex assigned at birth and gender identity, for example, by conflating individuals who are men with male sex assigned at birth. We assume that this measure may more likely reflect a participant’s reported gender identity rather than their sex assigned at birth. Notably, this measure does not allow us to identify transgender and non-binary participants, who are misclassified in our source data.” |
Data source uses outdated or problematic language. | “We subsequently refer to individuals who selected transgender male to female or who reported female gender identity and male sex assigned at birth as transgender women.” |