Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2023 Mar 15;30(6):1047–1055. doi: 10.1093/jamia/ocad039

Validation of an administrative algorithm for transgender and gender diverse persons against self-report data in electronic health records

Carl G Streed Jr 1,2,3,, Dana King 4, Chris Grasso 5, Sari L Reisner 6,7,8, Kenneth H Mayer 9,10, Guneet K Jasuja 11,12,13, Tonia Poteat 14, Monica Mukherjee 15, Ayelet Shapira-Daniels 16, Howard Cabral 17, Vin Tangpricha 18, Michael K Paasche-Orlow 19, Emelia J Benjamin 20,21,22
PMCID: PMC10198536  PMID: 36921287

Abstract

Objective

To adapt and validate an algorithm to ascertain transgender and gender diverse (TGD) patients within electronic health record (EHR) data.

Methods

Using a previously unvalidated algorithm of identifying TGD persons within administrative claims data in a multistep, hierarchical process, we validated this algorithm in an EHR data set with self-reported gender identity.

Results

Within an EHR data set of 52 746 adults with self-reported gender identity (gold standard) a previously unvalidated algorithm to identify TGD persons via TGD-related diagnosis and procedure codes, and gender-affirming hormone therapy prescription data had a sensitivity of 87.3% (95% confidence interval [CI] 86.4–88.2), specificity of 98.7% (95% CI 98.6–98.8), positive predictive value (PPV) of 88.7% (95% CI 87.9–89.4), and negative predictive value (NPV) of 98.5% (95% CI 98.4–98.6). The area under the curve (AUC) was 0.930 (95% CI 0.925–0.935). Steps to further categorize patients as presumably TGD men versus women based on prescription data performed well: sensitivity of 97.6%, specificity of 92.7%, PPV of 93.2%, and NPV of 97.4%. The AUC was 0.95 (95% CI 0.94–0.96).

Conclusions

In the absence of self-reported gender identity data, an algorithm to identify TGD patients in administrative data using TGD-related diagnosis and procedure codes, and gender-affirming hormone prescriptions performs well.

Keywords: transgender, gender identity, diagnosis codes, electronic health record

INTRODUCTION

An important first step toward achieving equity in population health and well-being is the accurate identification and characterization of diverse populations.1,2 For people whose gender identity differs from their sex assigned at birth (ie, transgender and gender diverse [TGD] persons, including binary and nonbinary persons),3 the need for accurate population and individual-level health data is critical to addressing the clinical care needs and numerous hardships, within society and in healthcare in particular.4,5 TGD persons experience worse health outcomes compared to cisgender peers, and the extent and causes of these inequities are not fully understood.6–13 It has been postulated that TGD persons have higher rates of health risk behaviors (eg, tobacco, alcohol, and substance use),13–15 but there remains some controversy due to disparate data available both refuting8,16 and supporting this claim.14,17 Underlying the inequities experienced by TGD persons are the various minority stressors unique to TGD populations, notably gender nonaffirmation, stigma, interpersonal as well as policy-level discrimination, and victimization based on gender identity.18,19

Data on TGD populations have been steadily increasing, driven by large-scale surveys and reports such as the Institute of Medicine 2011 report on the health of lesbian, gay, bisexual, and transgender persons6 and the subsequent National Academies of Science, Engineering, and Medicine 2020 report.2 Currently, more than a half-dozen federal surveys, such as the Behavioral Risk Factor Surveillance System and National Crime Victimization Survey, include gender identity questions.20 Data generated by these surveys are invaluable in describing the health and well-being of TGD persons.4,5 Whereas these surveys allow for self-report of a respondent’s gender identity, they lack additional objective measures of health, including biometric data, prescription data, preventative health screenings, and clinical diagnoses. Consequently, alternative data sources, such as electronic health records (EHRs), and administrative data sets, which include diagnosis and procedure codes alongside additional measures of health,21 have garnered attention as potential sources of individual and population-level data,22 especially since the onset of the COVID-19 epidemic.23 Given the limitations of prospective data collection programs to describe clinical contexts, research with administrative data and EHRs could offer significant advantages. Notably, research with administrative data often benefits from large numbers of persons, allowing for more opportunities to examine the health of relatively small patient populations. Additionally, administrative data offer clinical diagnoses, treatments, and, in some instances, objective biometric data. However, current administrative data sets rarely include information on sex assigned at birth and gender identity, which are important for TGD clinical care and population health.24,25

Efforts have been made to incorporate information regarding sexual orientation and gender identity in EHRs by federal agencies, such as the Centers for Medicare & Medicaid Services (CMS) and the Office of the National Coordinator of Health Information Technology. The latter has required EHR systems certified under Stage 3 of the Meaningful Use program to allow users to record, change, and access structured data on sexual orientation and gender identity.26,27 Further, in 2016, the Human Resources Services Administration (HRSA) Bureau of Primary Health Care began requiring federally funded community health centers to collect and provide sexual orientation and gender identity data as part of their annual Uniform Data Systems report.4

These requirements from HRSA, CMS, and as well as the Affordable Care Act motivate the collection and documentation of gender identity patient information in EHRs.28–31 While many TGD patients are willing to have their gender identity included in the EHR,28,29,32 some health care organizations do not provide an opportunity to collect these data leading to underreporting.32,33 Consequently, identifying and employing additional methods to ascertain TGD cohorts within existing data sources has gained attention.4,21,34–36 As a result, there have been efforts to identify TGD persons within such data using algorithms based on readily available data fields. These efforts are consistent with prior work to examine populations with stigmatized identities or conditions with the explicit protection of utilizing deidentified data.37–39 One of the more comprehensive algorithms to identify TGD persons in a commercial health insurance data set utilized transgender-related diagnosis codes (ie, International Classification of Diseases, Ninth and Tenth Revision, Clinical Modification [ICD-9-CM and ICD-10-CM]), Common Procedural Technology (CPT) codes related to gender-affirming surgeries, and receipt of gender-affirming hormone therapy that appears “incongruent” with the sex variable (eg, testosterone therapy for a person with a documented “female” sex).21 However, due to the absence of self-reported gender identity in that data set to serve as the reference standard, the proposed algorithm could not be validated. While a prior study conducted in the Veterans Health Administration (VHA) validated a modified version of the algorithm against chart reviews,40 the algorithm has not been validated using the gold standard of patient-reported gender identity data.

Thus, the purpose of this study is to validate this existing algorithm that identifies TGD patients in administrative data sets using an EHR with patient-reported gender identity. The novelty of our study is that validation of this algorithm using patient self-reported gender identity, which is the gold standard, could significantly expand the opportunities for health research with TGD patients. Careful utilization of data sources lacking self-reported gender identity may enable more accurate research to define and redress inequities experienced by TGD patients and communities.32

METHODS

Data

We conducted a cross-sectional study at Fenway Health, a large federally qualified community health center in Boston, MA, specializing in care for sexual and gender minorities.41,42 The current study analyzed EHR data from adults aged 18 years and older between January 1, 2008 and December 31, 2021 who had at least 2 medical appointments within any year in the study period and reported their gender identity at patient registration. Patients were identified as transgender, including nonbinary identities, by self-report, based on a standardized 2-step EHR designation system,25 in which sex assigned at birth (coded as binary, ie, male or female) and current gender identity (“male,” “female,” or “neither exclusively male nor female”) were recorded sequentially. Based on this 2-step approach, a patient was identified as transgender, for example, if they indicated they were assigned female at birth and currently identified as male. In addition to gender identity data, additional patient characteristics were extracted from the EHR, including age, race, ethnicity, sexual orientation, and insurance status. The study was approved by the Institutional Review Board at Fenway Health. All patients had been informed, in keeping with the Fenway Health’s privacy policy, that their EHR information may be used for research purposes.

Method of identifying TGD patients

We utilized a previously reported algorithm of identifying TGD patients within administrative claims data21 that expanded on prior algorithms which had utilized only transgender-related diagnosis codes.34–36 Specifically, this previously developed algorithm utilized: (1) ICD-9 and ICD-10 diagnosis codes specific to gender identity disorder or gender dysphoria; (2) ICD-9 and ICD-10 diagnosis codes for Endocrine Disorder Not Otherwise Specified (Endo NOS), which are frequently utilized to avoid labeling a person as transgender when providing gender-affirming services;43 (3) CPT codes for gender-affirming surgeries; or (4) receipt of sex hormones discordant with the sex recorded at the time of prescribing (“sex-discordant hormone therapy”), such as receipt of testosterone by a person with an administrative (registration) sex of female; these steps are presented in the Flowchart below. ICD-9/10 and CPT codes were reported in Jasuja et al21 and listed in Supplementary Data. For validation of this algorithm, the data available with the EHR included ICD-9 and ICD-10 diagnosis codes, documentation of gender-affirming surgeries, and prescription medications; as this single site is not the location of gender-affirming surgery, CPT codes were not utilized.

The proposed algorithm utilizes a hierarchy for identifying TGD patients and hence, each individual could only be identified as TGD by one method (Figure 1).

Figure 1.

Figure 1.

Flow diagram of algorithm to identify TGD patients in the absence of patient-reported gender identity data. Abbreviations: Dx: diagnosis; Rx: prescription; HT: hormone therapy.

The first step in our hierarchical strategy was the use of the gender identity disorder/dysphoria codes, as these codes remain the primary method for ascertaining TGD persons within claims data. We then utilized the additional data sources described above (Endo NOS, transgender-related procedure codes, and sex-discordant hormone therapy). We selected individuals with Endo NOS and a TGD-related procedure code, followed by sex-discordant hormone therapy in conjunction with either Endo NOS or a TGD-related procedure code. A second step was then taken to classify individuals as presumably TGD men and TGD women based on receipt of sex-discordant hormone therapy. Notably, this step would likely include nonbinary and gender-diverse persons but would not be able to identify them as such. We validated the algorithm and evaluated the performance of the algorithm by comparing TGD patients identified using the above-mentioned data elements without self-reported gender identity data with their self-reported gender identity.

Statistical analysis

Descriptive statistics was generated to compare cohort characteristics by self-reported gender identity. Comparisons between groups were tested with chi-square and non-normal age distributions were tested with Kruskal-Wallis and Mann-Whitney tests. Diagnostics performance of the algorithm to ascertain TGD patients in the absence of self-reported gender identity data was calculated. We calculated sensitivity, specificity, and predictive values utilizing Microsoft Excel (2018, Redmon, WA). Area under the ROC and precision-recall curves, and appropriate confidence intervals for the administrative algorithm against the self-report data from the EHR were calculated utilizing SPSS versions 25 and 29 (Armonk, NY).44,45

RESULTS

We utilized a sample of 52 746 adults aged 18 years and older who had at least 2 medical visits in a 1-year period between January 1, 2008 and December 31, 2020 (Table 1).

Table 1.

Patient characteristics by self-reported gender identity (N = 52 746) from 2008 to 2020

Cisgender men Cisgender women Transgender men Transgender women Nonbinary AMAB Nonbinary AFAB
n = 26 963 n = 20 167 n = 2065 n = 1969 n = 530 n = 1052
Mean age (SD) 38.9 (13.5) 33.5 (12.2) 29.0 (9.4) 31.6 (11.3) 31.1 (11.5) 27.8 (7.2)
Age cat, years n (%)
 18–24 2568 (9.5) 3849 (19.1) 755 (36.6) 597 (30.3) 169 (31.9) 382 (36.3)
 25–34 10 651 (39.5) 10 189 (50.5) 938 (45.4) 847 (43.0) 231 (43.6) 530 (50.4)
 35–44 5543 (20.6) 3264 (16.2) 225 (10.9) 262 (13.3) 61 (11.5) 107 (10.2)
 45–54 3726 (13.8) 1108 (5.5) 75 (3.6) 144 (7.3) 31 (5.8) 14 (1.3)
 55–64 3110 (11.5) 969 (4.8) 55 (2.7) 76 (3.9) 25 (4.7) 16 (1.5)
 65+ 1365 (5.1) 788 (3.9) 17 (0.8) 43 (2.2) 13 (2.5) <11
Race, n (%)
 Asian 2374 (8.8) 2768 (13.7) 104 (5.0) 98 (5.0) 27 (5.1) 31 (2.9)
 Black/African American 1813 (6.7) 1585 (7.9) 110 (5.3) 91 (4.6) 27 (5.1) 39 (3.7)
 Multiracial 1384 (5.1) 1058 (5.2) 143 (6.9) 147 (7.5) 42 (7.9) 108 (10.3)
 Native American/Alaskan Native/Inuit 97 (0.4) 74 (0.4) 14 (0.7) 19 (1.0) <11 0 (0.0)
 Native Hawaiian/Pacific Islander 102 (0.4) 62 (0.3) <11 <11 <11 <11
 White/Caucasian 18 494 (68.6) 12 741 (63.2) 1545 (74.8) 1428 (72.5) 399 (75.3) 821 (78.0)
 Another race 513 (1.9) 415 (2.1) 31 (1.5) 25 (1.3) <11 11 (1.0)
 Declined/missing 1712 (6.3) 1062 (5.3) 86 (4.2) 114 (5.8) 18 (3.4) 31 (2.9)
Ethnicity, n (%)
 Latinx 1615 (6.0) 995 (4.9) 93 (4.5) 98 (5.0) 24 (4.5) 39 (3.7)
 Non-Latinx 7041 (26.1) 4841 (24.0) 534 (25.9) 533 (27.1) 131 (24.7) 312 (29.7)
 Missing/declined 18 307 (67.9) 14 331 (71.1) 1438 (69.6) 1338 (68.0) 375 (70.8) 701 (66.6)
Sexual orientation n (%)
 Straight or heterosexual 8735 (32.4) 12 688 (62.9) 443 (21.5) 266 (13.5) 26 (4.9) 15 (1.4)
 Lesbian, gay, or homosexual 14 627 (54.2) 2931 (14.5) 362 (17.5) 504 (25.5) 91 (17.2) 280 (26.6)
 Bisexual 1300 (4.8) 2130 (10.6) 446 (21.6) 552 (28.0) 144 (27.2) 259 (24.6)
 Something else 310 (1.1) 711 (3.5) 504 (24.4) 336 (17.1) 208 (39.2) 450 (42.8)
 Do not know 353 (1.3) 303 (1.5) 115 (5.6) 153 (7.8) 52 (9.8) 30 (2.9)
 Declined/missing 1638 (6.1) 1404 (7.0) 195 (9.4) 158 (8.0) <11 18 (1.7)
Insurance, n (%)
 Private 17 691 (75.5) 12 589 (75.4) 1288 (70.5) 1002 (60.9) 341 (72.1) 692 (75.5)
 Self-pay/uninsured 464 (2.0) 680 (4.1) 53 (2.9) 56 (3.4) 13 (2.7) 25 (2.7)
 Medicaid 3030 (12.9) 1882 (11.3) 357 (19.5) 436 (26.5) 90 (19.0) 150 (16.4)
 Medicare 803 (3.4) 356 (2.1) 52 (2.8) 77 (4.7) 13 (2.7) 15 (1.6)
 Other public (grants, workers' comp, Veterans) 1433 (6.1) 1187 (7.1) 77 (4.2) 73 (4.4) 16 (3.4) 34 (3.7)
Trans Dx, n (%)
 Yes (gender dysphoria code) 74 (0.3) 77 (0.4) 446 (21.6) 416 (21.1) 108 (20.4) 220 (20.9)
 No 26 889 (99.7) 20 090 (99.6) 1619 (78.4) 1553 (78.9) 422 (79.6) 832 (79.1)
Endocrine disorder, Dx n (%)
 Yes 528 (2.0) 264 (1.3) 1710 (82.8) 1548 (78.6) 367 (69.2) 633 (60.2)
 No 26 435 (98.0) 19 903 (98.7) 355 (17.2) 421 (21.4) 163 (30.8) 419 (39.8)
GAHT prescription, n (%)
 Yes 345 (1.3) 225 (1.1) 1693 (82.0) 1714 (87.0) 377 (71.1) 607 (57.7)
 No 26 618 (98.7) 19 942 (98.9) 372 (18.0) 255 (13.0) 153 (28.9) 445 (42.3)
Documentation of GA surgery, n (%)
 Yes 0 (0.0) 0 (0.0) 0 (0.0) <11 0 (0.0) <11
 No 26 963 (100.0) 20 167 (100.0) 2065 (100.0) 1969 (99.9) 0 (100.0) 1051 (99.9)

Abbreviations: Dx: diagnosis; GA: gender-affirming; GAHT: gender-affirming hormone therapy; AMAB: assigned male at birth; AFAB: assigned female at birth.

Using self-reported gender identity, the sample included 26 963 (51.1%) cisgender men, 20 167 (38.2%) cisgender women, 2065 (3.9%) transgender men, 1969 (3.7%) transgender women, 530 (1.0%) nonbinary adults assigned male at birth, and 1052 (2.0%) nonbinary adults assigned female at birth.

The TGD group was younger compared to cisgender adults (P < .001). With a higher percentage of TGD adults identifying as White, the TGD group was less racially diverse (P < .001) than the cisgender group. There was significant variation in the type of insurance coverage between TGD and cisgender adults (P < .001), with a consistently higher percentage of TGD patients utilizing Medicaid and cisgender adults utilizing other public forms of insurance coverage such as Veterans benefits.

When exploring the various data elements of the algorithm to ascertain transgender persons if self-reported gender identity was not available, there were significant variations across gender identities by diagnosis codes and receipt of sex-discordant hormone therapy. Notably, few TGD persons had an ICD code specific to gender identity disorder/dysphoria (ranging from 20.9% to 21.6%). Instead, many TGD individuals had a diagnosis code for an endocrine disorder not otherwise specified (ranging from 60.2% among nonbinary adults assigned female at birth to 82.8% among transgender men). As this single health care center did not perform gender-affirming surgical procedures, capture of surgery using CPT codes was not available; only documentation of gender-affirming surgical procedures by a clinician could be evaluated and was low across gender identities.

Evaluating the performance diagnostics for the algorithm to identify TGD patients (Step 1 of Figure 1) in the absence of self-reported gender identity data revealed a sensitivity of 87.3%, a specificity of 98.7%, a PPV of 88.7%, and a NPV of 98.5% (Table 2).

Table 2.

Performance of an algorithm identifying TGD patients

Self-report TGD Self-report cisgender Total
Algorithm TGD 4903 626 5529
Algorithm not TGD 713 46 504 47 217
Total 5616 47 130 52 746
Sensitivity % (95% CI) 87.3% (86.4–88.2)
Specificity % (95% CI) 98.7% (98.6–98.8)
PPV % (95% CI) 88.7% (87.9–89.4)
NPV % (95% CI) 98.5% (98.4–98.6)

Abbreviations: CI: confidence interval; TGD: transgender and gender diverse; PPV: positive predictive value; NPV: negative predictive value.

The area under the receiver operating characteristic curve was 0.93 (95% CI 0.92–0.93) (Figure 2). The Preciscion-Recall Curve is also reported (Figure 3).

Figure 2.

Figure 2.

Area under the curve for algorithm to identify TGD patients in the absence of patient-reported gender identity data. Abbreviations: ROC curve: receiver operating characteristic curve analysis.

Figure 3.

Figure 3.

Area under the precision-recall curve for algorithm to identify TGD patients in the absence of patient-reported gender identity data.

Taking additional steps (Step 2 of Figure 1) to categorize patients as presumably transgender men and transgender women based on prescriptions for sex-discordant gender-affirming hormone therapy performed well: sensitivity of 97.6%, specificity of 92.7%, PPV of 93.2%, and NPV of 97.4% (Table 3).

Table 3.

Performance of an algorithm identifying TGD patients; identifying transgender men and transgender women

Self-reported TW Self-reported TM Total
Algorithm TW 1566 115 1681
Algorithm TM 39 1467 1506
Total 1605 1582 3187
Sensitivity % (95% CI) 97.6 (96.7–98.3)
Specificity % (95% CI) 92.7 (91.3–94.0)
PPV % (95% CI) 93.2 (92.0–94.2)
NPV % (95% CI) 97.4 (96.5–94.2)

Abbreviations: CI: confidence interval; TGD: transgender and gender diverse; TW: transgender women; TM: transgender men; PPV: positive predictive value; NPV: negative predictive value.

The area under the receiver operating characteristic curve was 0.95 (95% CI 0.94–0.96) (Figure 4). The Preciscion-Recall Curve is also reported (Figure 5).

Figure 4.

Figure 4.

Area under the curve for algorithm to categorize transgender men and transgender women in the absence of patient-reported gender identity data. Abbreviations: ROC curve: receiver operating characteristic curve analysis.

Figure 5.

Figure 5.

Area under the precision-recall curve for algorithm to categorize transgender men and transgender women in the absence of patient-reported gender identity data.

DISCUSSION

Within an EHR data set with self-reported gender identity data, an algorithm utilizing ICD, CPT, or documentation of gender-affirming surgeries, and prescription data correctly categorized 4903 out of 5616 (87.3%) TGD adults and 46 504 out of 47 130 (98.7%) cisgender adults. Variations in specific and nonspecific diagnostic codes as well as varying rates of receipt of hormone therapy accounted for the majority of misclassifications in data from Fenway Health. Further, in an additional step to dichotomize based on sex-discordant hormone therapy, this algorithm was capable of correctly categorizing 1566 out of 1605 (97.6%) self-reported TGD women and 1467 out of 1582 (92.7%) self-reported TGD men; the difference in performance reflects the varying rates of receiving gender-affirming hormone therapy between these 2 groups at this health center. Of note, in the absence of self-reported data to validate gender identity, a more accurate description of the classified groups would be TGD patients receiving either feminizing or masculinizing hormone therapy.

While available data sources to study TGD populations remain limited, our study demonstrates that our algorithm using ICD, CPT, and prescription data to identify TGD patients performs well in some respects in the absence of self-reported gender identity data. When assessed in the Veterans Health Administration (VHA) and validated via chart review, this algorithm was found to have a high PPV for the ICD codes specific to gender identity and a low PPV in the absence of such ICD codes.40 As the current data set utilizes patient self-reported gender identity collected from all patients at registration, the variable performance of the algorithm reflects the varied ways gender identity is recorded in EHRs.

Large, multisite studies utilizing EHR data have utilized similar algorithms of identifying TGD patients in the absence of self-reported data. The Study of Transition, Outcomes and Gender (STRONG) to assess health status of transgender people,46 which is a collaboration across Kaiser Permanente health plans in Georgia, Northern California, and Southern California, utilized a stepwise methodology involving computerized searches (eg, keywords) of EHR data and free-text validation. Steps include ICD-9 codes specific to transgender identity (eg, 302.85—gender identity disorder in adolescents or adults) in conjunction with internal codes specific to the Kaiser Permanente system. Additional computerized keyword searches of the patient chart (eg, vaginoplasty and phalloplasty) were utilized to expand the cohort of potentially TGD persons. Validation included 2-reviewer validation as well as the presence of at least 2 ICD codes specific to gender-affirming care. These methods, while potentially time-consuming, identified a cohort of 3475 transfeminine and 2892 transmasculine adults. The STRONG cohort is a more racially diverse group of TGD adults than the patients in the current sample, but less likely to utilize Medicaid for health care coverage compared to the TGD cohort in Boston.

Compared to the TGD cohort identified within a 2006–2017 sample of the OptumLabs Data Warehouse (OLDW), which included deidentified claims data for commercially insured and Medicare Advantage enrollees and included approximately 200 million unique individuals with the greatest representation in the Midwest and South US Census Regions, our TGD cohort in a single health care center in Boston is younger and less racially diverse. This is likely a reflection of the demographics of Boston47 versus a national data set such as OLDW.21

Notably, administrative data, like EHRs, predominantly support a binary system in which patients are only provided the option to select male or female as their sex and presumed gender identity. As such, gender nonbinary or gender nonconforming individuals are frequently overlooked or left invisible within administrative claims data and EHR data. While EHR systems can be adapted to capture such data and many TGD patients are willing to have their gender identity included in the EHR despite prior experiences of discrimination,28,29,32,48 such data typically remain severely limited, often due to institutional inertia.32 The proposed algorithm, which we have validated in this work, may be useful for quality assurance and quality improvement activities regarding the care of TGD patients, but ultimately, formally eliciting the information by routinely asking about gender identity as part of a primary care encounter will provide a more patient-centered means of ensuring that TGD patients receive needed health services.

We acknowledge several study limitations. First, our study does not reflect the reality that gender identity and reporting of gender identity can change over time. This caveat reflects the reality of gender identity, patient trust in reporting their gender identity,32,49 and appropriate documentation of gender identity. One’s understanding of their gender identity and the language to describe it can change over time, which the validated algorithm does not capture. Sharing one’s gender identity requires navigating various environments that could be threatening, including health care settings.49,50 According to the 2015 US Transgender Survey, a nonprobability survey of 27 715 TGD adults in the United States, nearly a third (31%) are not out to their health care clinicians. Misclassification could account for some patients who reported their gender identity as cisgender while having ICD codes and hormone therapy consistent with gender-affirming medical care. Next, while we were able to utilize a unique EHR data source with self-reported gender identity data, it lacked CPT codes as the Fenway clinical site does not perform gender-affirming surgeries; this was addressed by identifying documented gender-affirming surgeries in chart “problem lists.” Additionally, given how most health care systems advance a binary assumption, gender diverse patients (eg, nonbinary and gender nonconforming) who are seeking medical and surgical care may report a binary identity in order to access care.51 The validated algorithm would classify these persons with the reported binary identity. Similarly, whereas medications were used to classify persons as presumably transgender man and transgender women, medications alone were not sufficient as not all TGD persons receive or desire hormone therapy. Further, other data sources may have incomplete medication data. In addition, the algorithm and our validation process would likely have reduced accuracy for people who, for whatever reason, do not share aspects of their gender identity. Patient gender identity nondisclosure may reflect mistrust of health care systems and society that is concordant with many people’s lived experience of bias and discrimination. Finally, accurate and comprehensive collection of self-reported gender identity is highly variable across health care settings.4,33 Fortunately, the site we utilized for validation purposes has been collecting self-reported patient data since 1997. Although the EHR data set we used to perform validation has significant strengths, such as longitudinal data and complete prescription data as well as self-reported gender identity data, application of this validated algorithm in other data sets needs to be done with caution and an understanding of the limitations of each data source. Further, as the original algorithm by Jasuja et al does not account for intersex diagnoses, we did not include those diagnosis codes in this validation process; understanding the overlap of these issues warrants dedicated research.

With regard to conducting studies in the absence of self-reported gender identity data, there are several key logistical considerations. Identifying TGD persons within administrative data has its limits, and ICD claims data related to TGD care have varied over time and across specialties. As an example, endocrinologists more frequently have utilized “endocrine disorder, unspecified” (ICD-9 259.9; ICD-10 E34.9) when providing hormone care for TGD patients, which reflects practice preference. Accordingly, any algorithmic approach may have to adapt over time. Our inclusion/exclusion criteria may account for some of this variation in ICD claims data. There is potential to miss a proportion of transgender patients—reducing sensitivity of our TGD cohort identification algorithm. Focusing on the methods described will provide a cohort that is more likely TGD—improving identification specificity. Additionally, though surgical CPT codes are an important component of the algorithm, they are not typically utilized within this health center’s EHR as patients are referred to outside facilities for surgery, including gender-affirming procedures; this accounts for the low rate of gender-affirming procedures noted in our data set. Consequently, we were not able to utilize the full capabilities of the algorithm to aggregate persons into gender-identity categories based on procedures (eg, chest reconstruction to identify transgender men and nonbinary persons assigned female at birth). We accounted for this by searching for EHR documentation of prior gender-affirming procedures. Further, the proposed algorithm identifying a TGD cohort relies on receipt of gender-affirming care; we were unable to verify the gender identity of TGD persons who were unable to access or did not desire accessing gender-affirming medical or surgical care. While the proportion of TGD persons who do not desire or are unable to access gender-affirming care is not known, the 2015 US Transgender Survey found that more than 3-quarters (78%) of respondents wanted gender-affirming hormone therapy and nearly half (49%) were able to receive it.48 Consequently, any use of the algorithm we have validated is an underestimate of TGD persons. Additionally, even the gold standard of self-report has limits and may have nonrandom misclassification as it requires institutions and individuals providing care to demonstrate they are trustworthy for patients to disclose their identity. Further, for clinical and research purposes, embedded EHR forms, such as an organ inventory, may provide a standardized mechanism to collect and document this information in the patient’s chart with patient approval rather than making assumptions based on sex assigned at birth or gender identity.

When conducting research regarding marginalized populations, ethical concerns and individual safety must be considered. As discussed previously, TGD persons have had reason not to trust the health care system, which has a pathologizing, disparity-focused, binary perspective of gender.32,52 Consequently, some persons may choose not to share their gender identity with the medical establishment. However, depending on the care they receive, they could be identified as TGD based on the algorithm we have validated. As such, no identifiable information should be linked to any algorithmic process to identify groups of people. In our own study, a deidentified data set was utilized and not linked to any protected health information. Similarly, the algorithm we validated by Jasuja et al was developed in a deidentified administrative data set with appropriately strict rules on reporting any cell size of <11 persons. These are critical steps in respecting individual privacy and providing some level of safety from unwanted scrutiny and inadvertent disclosure of personal health information. Arguably, though, the lack of complete self-reported gender identity data in all population surveys and data sets has created limited opportunities to characterize, understand, and intervene on inequities in health care access and outcomes for TGD persons.2,5 We submit that developing methods utilizing existing data, and being aware of its imperfections and shortcomings, is preferable to awaiting perfection in data collection and limiting opportunities to redress inequities experienced by TGD persons and communities.

Lastly, although Fenway Health follows best practices in the provision of gender-affirming health and documentation of gender-affirming care,24,27,28,53,54 it is a single health system, limiting generalizability. Conversely, Fenway Health is unique in being a respected center for gender-affirming care and leader in the appropriate implementation of gender identity data collection in clinical settings thereby serving as an ideal source of data for this analysis. Ultimately, the acceptable performance of an algorithm to identify TGD persons in the absence of self-reported gender identity data expands the possible data sources available to characterize and understand the health and well-being of TGD persons.

CONCLUSION

Our study demonstrated that, in the absence of self-reported gender identity data, there are promising strategies for identifying TGD people in claims-based data sets. This algorithm creates the option to examine larger TGD samples and to expand health services and health outcomes research for TGD persons. To begin to improve the health and well-being of TGD patients, more complete data must be recorded following best practices in collecting self-reported gender identity data. In the absence of such data, our approach validated in the current study can be leveraged. By utilizing a variety of methods to identify TGD patients, large data sets become rich sources of information about an often overlooked and marginalized population. Unlocking the potential of existing data is a first critical step in assessing and addressing the health and well-being of TGD populations.

Supplementary Material

ocad039_Supplementary_Data

ACKNOWLEDGMENTS

The authors are appreciative of the support and guidance of transgender and gender diverse patients, community leaders, scholars, and clinicians.

Contributor Information

Carl G Streed, Jr, Section of General Internal Medicine, Boston University Chobanian and Avedisian School of Medicine, Boston, Massachusetts, USA; Center for Transgender Medicine and Surgery, Boston Medical Center, Boston, Massachusetts, USA; The Fenway Institute, Fenway Health, Boston, Massachusetts, USA.

Dana King, The Fenway Institute, Fenway Health, Boston, Massachusetts, USA.

Chris Grasso, The Fenway Institute, Fenway Health, Boston, Massachusetts, USA.

Sari L Reisner, The Fenway Institute, Fenway Health, Boston, Massachusetts, USA; Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

Kenneth H Mayer, The Fenway Institute, Fenway Health, Boston, Massachusetts, USA; Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA.

Guneet K Jasuja, Section of General Internal Medicine, Boston University Chobanian and Avedisian School of Medicine, Boston, Massachusetts, USA; Center for Healthcare Organization & Implementation Research, VA Bedford Healthcare System, Bedford, Massachusetts, USA; Department of Health Law, Policy and Management, Boston University School of Public Health, Boston, Massachusetts, USA.

Tonia Poteat, Department of Social Medicine, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, USA.

Monica Mukherjee, Division of Cardiology, Department of Medicine, Johns Hopkins University, Baltimore, Maryland, USA.

Ayelet Shapira-Daniels, Department of Medicine, Boston Medical Center, Boston, Massachusetts, USA.

Howard Cabral, Department of Biostatistics, School of Public Health, Boston University, Boston, Massachusetts, USA.

Vin Tangpricha, Division of Endocrinology, Metabolism & Lipids, Department of Medicine, School of Medicine, Emory University, Atlanta, Georgia, USA.

Michael K Paasche-Orlow, Department of Medicine, Tufts Medical Center, Boston, Massachusetts, USA.

Emelia J Benjamin, Section of Cardiovascular Medicine, Department of Medicine, Boston University School of Medicine and Boston Medical Center, Boston, Massachusetts, USA; Boston University's and National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, Massachusetts, USA; Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts, USA.

FUNDING

National Heart, Lung, and Blood Institute career development grant number NHLBI 1K01HL151902-01A1, an American Heart Association career development grant AHA 20CDA35320148, Doris Duke Charitable Foundation grant number 2022061, and the Boston University Chobanian and Avedisian School of Medicine Department of Medicine Career Investment Award (to CGS); National Heart, Lung, and Blood Institute grant numbers R01HL092577, R01HL141434 and American Heart Association AHA_18SFRN34110082 (to EJB).

AUTHOR CONTRIBUTIONS

All authors provided substantial contributions to the conception and design of the work as well as the analysis and interpretation of the data for this work. CGS created the draft manuscript and all authors provided critical revisions for important intellectual content. All authors provided final approval of the revision to be published. All authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

CONFLICT OF INTEREST STATEMENT

None declared.

DATA AVAILABILITY

The data underlying this article were provided by Fenway Health by permission. Data will be shared on request to the corresponding author with permission of Fenway Health.

REFERENCES

  • 1. Smedley B, Stith AY, Nelson AR.. Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care. Washington, DC: National Academies Press; 2002. [PubMed] [Google Scholar]
  • 2. National Academies of Sciences Engineering and Medicine (NASEM). Understanding the Well-Being of LGBTQI+ Populations. Washington, DC: The National Academies Press; 2020. [Google Scholar]
  • 3. Ashley F. ‘Trans’ is my gender modality: a modest terminological proposal. In: Erickson-Schroth L, ed. Trans Bodies, Trans Selves . 2nd ed. New York City, NY: Oxford University Press; 2021. [Google Scholar]
  • 4. Streed CG Jr, Grasso C, Reisner SL, et al. Sexual orientation and gender identity data collection: clinical and public health importance. Am J Public Health 2020; 110 (7): 991–3. [Google Scholar]
  • 5. Baker KE, Streed CG, Durso LE.. Ensuring that LGBTQI+ people count — collecting data on sexual orientation, gender identity, and intersex status. N Engl J Med 2021; 384 (13): 1184–6. [DOI] [PubMed] [Google Scholar]
  • 6. Institute of Medicine (IOM). The Health of Lesbian, Gay, Bisexual, and Transgender People: Building a Foundation for Better Understanding. Washington, DC: National Academies Press (US); 2011. [PubMed] [Google Scholar]
  • 7. Streed CG Jr, McCarthy EP, Haas JS.. Self-reported physical and mental health of gender nonconforming transgender adults in the United States. LGBT Health 2018; 5 (7): 443–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Streed CG Jr, McCarthy EP, Haas JS.. Association between gender minority status and self-reported physical and mental health in the United States. JAMA Intern Med 2017; 177 (8): 1210–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Schuster MA, Reisner SL, Onorato SE.. Beyond bathrooms — meeting the health needs of transgender people. N Engl J Med 2016; 375: 101–3. [DOI] [PubMed] [Google Scholar]
  • 10. White Hughto JM, Reisner SL.. A systematic review of the effects of hormone therapy on psychological functioning and quality of life in transgender individuals. Transgend Health 2016; 1 (1): 21–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Meyer IH, Brown TN, Herman JL, et al. Demographic characteristics and health status of transgender adults in select US regions: behavioral risk factor surveillance system, 2014. Am J Public Health 2017; 107 (4): 582–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Caceres BA, Streed CG Jr, Corliss HL, et al. Assessing and addressing cardiovascular health in LGBTQ adults: a scientific statement from the American Heart Association. Circulation 2020; 142 (19): e321–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Streed CG Jr, Beach LB, Caceres BA, et al. Assessing and addressing cardiovascular health in people who are transgender and gender diverse: a scientific statement from the American Heart Association. Circulation 2021; 144 (6): e136–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Baker KE. Findings from the behavioral risk factor surveillance system on health-related quality of life among US transgender adults, 2014-2017. JAMA Intern Med 2019; 179 (8): 1141–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Braun HM, Jones EK, Walley AY, et al. Characterizing substance use disorders among transgender adults receiving care at a large urban safety net hospital. J Addic Med 2021; 16 (4): 407–12. [DOI] [PubMed] [Google Scholar]
  • 16. Streed CG Jr, McCarthy EP, Haas JS.. Self-reported physical and mental health of gender nonconforming transgender adults in the United States. LGBT Health 2018; 5 (7): 443–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Buchting FO, Emory Scout KT, et al. Transgender use of cigarettes, cigars, and e-cigarettes in a national study. Am J Prev Med 2017; 53 (1): e1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Testa RJ, Habarth J, Peta J, et al. Development of the gender minority stress and resilience measure. Psychol Sex Orient Gender Divers 2015; 2 (1): 65–77. [Google Scholar]
  • 19. Delozier AM, Kamody RC, Rodgers S, et al. Health disparities in transgender and gender expansive adolescents: a topical review from a minority stress framework. J Pediatr Psychol 2020; 45 (8): 842–7. [DOI] [PubMed] [Google Scholar]
  • 20. Federal Interagency Working Group on Improving Measurement of Sexual Orientation and Gender Identity in Federal Surveys. Current Measures of Sexual Orientation and Gender Identity in Federal Surveys. 2016. [Google Scholar]
  • 21. Jasuja GK, de Groot A, Quinn EK, et al. Beyond gender identity disorder diagnoses codes: an examination of additional methods to identify transgender individuals in administrative databases. Med Care 2020; 58 (10): 903–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Institute of Medicine (US) Board on the Health of Select Populations. Collecting Sexual Orientation and Gender Identity Data in Electronic Health Records: Workshop Summary. Washington (DC: ): National Academies Press (US); 2013. [PubMed] [Google Scholar]
  • 23. Phillips Ii G, Felt D, Ruprecht MM, et al. Addressing the disproportionate impacts of the COVID-19 pandemic on sexual and gender minority populations in the United States: actions toward equity. LGBT Health 2020; 7 (6): 279–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Cahill S, Makadon H.. Sexual orientation and gender identity data collection in clinical settings and in electronic health records: a key to ending LGBT health disparities. LGBT Health 2014; 1 (1): 34–41. [DOI] [PubMed] [Google Scholar]
  • 25. The GenIUSS Group. Best Practices for Asking Questions to Identify Transgender and Other Gender Minority Respondents on Population-Based Surveys. Los Angeles, CA: The Williams Institute; 2014. [Google Scholar]
  • 26. Department of Health and Human Services and Centers for Medicare and Medicaid Services. 42 CFR Parts 412 and 495 [CMS-3310-FC and CMS-3311-FC], RINs 0938-AS26 and 0938-AS58. Medicare and Medicaid Programs; Electronic Health Record Incentive Program—Stage 3 and Modifications to Meaningful Use in 2015 through 2017. 2015. [Google Scholar]
  • 27. Cahill SR, Baker K, Deutsch MB, et al. Inclusion of sexual orientation and gender identity in stage 3 meaningful use guidelines: a huge step forward for LGBT health. LGBT Health 2016; 3 (2): 100–2. [DOI] [PubMed] [Google Scholar]
  • 28. Cahill S, Singal R, Grasso C, et al. Do ask, do tell: high levels of acceptability by patients of routine collection of sexual orientation and gender identity data in four diverse American community health centers. PLoS One 2014; 9 (9): e107104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Suen LW, Lunn MR, Katuzny K, et al. What sexual and gender minority people want researchers to know about sexual orientation and gender identity questions: a qualitative study. Arch Sex Behav 2020; 49 (7): 2301–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Rullo JE, Foxen JL, Griffin JM, et al. Patient acceptance of sexual orientation and gender identity questions on intake forms in outpatient clinics: a pragmatic randomized multisite trial. Health Serv Res 2018; 53 (5): 3790–808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Morgan R, Dragon C, Daus G, et al. Updates on terminology of sexual orientation and gender identity survey measures. FCSM 20-03. 2020. [cited October 25, 2020]. https://nces.ed.gov/fcsm/pdf/FCSM_SOGI_Terminology_FY20_Report_FINAL.pdf. Accessed September 1, 2020.
  • 32. Kronk CA, Everhart AR, Ashley F, et al. Transgender data collection in the electronic health record: current concepts and issues. J Am Med Inform Assoc 2021; 29 (2): 271–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Grasso C, Goldhammer H, Funk D, et al. Required sexual orientation and gender identity reporting by US health centers: first-year data. Am J Public Health 2019; 109 (8): 1111–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Proctor K, Haffer SC, Ewald E, et al. Identifying the transgender population in the Medicare program. Transgend Health 2016; 1 (1): 250–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Ewald ER, Guerino P, Dragon C, et al. Identifying Medicare beneficiaries accessing transgender-related care in the era of ICD-10. LGBT Health 2019; 6 (4): 166–73. [DOI] [PubMed] [Google Scholar]
  • 36. Dragon CN, Guerino P, Ewald E, et al. Transgender medicare beneficiaries and chronic conditions: exploring fee-for-service claims data. LGBT Health 2017; 4 (6): 404–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Sugarman J, Carrithers J.. Certificates of confidentiality and unexpected complications for pragmatic clinical trials. Learn Health Syst 2021; 5 (2): e10238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Barnes M, Carrithers J, Sugarman J.. Ethical and practical concerns about IRB restrictions on the use of research data. Ethics Hum Res 2020; 42 (6): 29–34. [DOI] [PubMed] [Google Scholar]
  • 39. Quinn DM, Earnshaw VA.. Concealable stigmatized identities and psychological well-being. Soc Personal Psychol Compass 2013; 7 (1): 40–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Wolfe HL, Reisman JI, Yoon SS, et al. Validating data-driven methods for identifying transgender individuals in the Veterans Health Administration of the US Department of Veterans Affairs. Am J Epidemiol 2021; 190 (9): 1928–34. [DOI] [PubMed] [Google Scholar]
  • 41. Mayer K, Appelbaum J, Rogers T, et al. The evolution of the Fenway Community Health model. Am J Public Health 2001; 91 (6): 892–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Reisner SL, Bradford J, Hopwood R, et al. Comprehensive transgender healthcare: the gender affirming clinical and public health model of Fenway Health. J Urban Health 2015; 92 (3): 584–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Thompson HM. Patient perspectives on gender identity data collection in electronic health records: an analysis of disclosure, privacy, and access to care. Transgend Health 2016; 1 (1): 205–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Trevethan R. Sensitivity, specificity, and predictive values: foundations, pliabilities, and pitfalls in research and practice. Front Public Health 2017; 5: 307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Cho H, Matthews G, Harel O.. Confidence intervals for the area under the receiver operating characteristic curve in the presence of ignorable missing data. Int Stat Rev 2019; 87 (1): 152–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Quinn VP, Nash R, Hunkeler E, et al. Cohort profile: study of transition, outcomes and gender (STRONG) to assess health status of transgender people. BMJ Open 2017; 7 (12): e018121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Bureau, U.C. QuickFacts Boston City, Massachusetts. QuickFacts 2021. [cited July 20, 2022] https://www.census.gov/quickfacts/bostoncitymassachusetts. Accessed July 1, 2022.
  • 48. James SE, Herman JL, Rankin S, Keisling M, Mottet L, Anafi M.. The Report of the 2015 U.S. Transgender Survey. Washington, DC: National Center for Transgender Equality; 2016. [Google Scholar]
  • 49. James SE, Herman JL, Rankin S, et al. The Report of the 2015 U.S. Transgender Survey. Washington, DC: National Center for Transgender Equality; 2016. [Google Scholar]
  • 50. Mirza SA, Rooney C.. Discrimination Prevents LGBTQ People from Accessing Health Care. 2018. [cited March 13, 2020]; https://www.americanprogress.org/issues/lgbt/news/2018/01/18/445130/discrimination-prevents-lgbtq-people-accessing-health-care. Accessed September 1, 2018.
  • 51. Becker A. These nonbinary patients were seeking trans health care. But in a binary system, they felt ‘invalidated.’ In: The Lily. 2021. [Google Scholar]
  • 52. Eckhert E. A case for the demedicalization of queer bodies. Yale J Biol Med 2016; 89 (2): 239–46. [PMC free article] [PubMed] [Google Scholar]
  • 53. Cahill S, Makadon H.. Toward better care for lesbian, gay, bisexual and transgender patients. LGBT Health 2014; 1 (1): 41. [Google Scholar]
  • 54. Cahill S, Trieweiler S, Guidry J, et al. High rates of access to health care, disclosure of sexuality and gender identity to providers among house and Ball Community Members in New York City. J Homosex 2018; 65 (5): 600–14. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocad039_Supplementary_Data

Data Availability Statement

The data underlying this article were provided by Fenway Health by permission. Data will be shared on request to the corresponding author with permission of Fenway Health.


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES