Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2023 Apr 29;2022:1032–1041.

Assessing Phenotype Definitions for Algorithmic Fairness

Tony Y Sun 1, Shreyas A Bhave 1, Jaan Altosaar 1, Noémie Elhadad 1
PMCID: PMC10148336  PMID: 37128361

Abstract

Phenotyping is a core, routine activity in observational health research. Cohorts impact downstream analyses, such as how a condition is characterized, how patient risk is defined, and what treatments are studied. It is thus critical to ensure that cohorts are representative of all patients, independently of their demographics or social determinants of health. In this paper, we propose a set of best practices to assess the fairness of phenotype definitions. We leverage established fairness metrics commonly used in predictive models and relate them to commonly used epidemiological metrics. We describe an empirical study for Crohn’s disease and diabetes type 2, each with multiple phenotype definitions taken from the literature across gender and race. We show that the different phenotype definitions exhibit widely varying and disparate performance according to the different fairness metrics and subgroups. We hope that the proposed best practices can help in constructing fair and inclusive phenotype definitions.

Introduction

When conducting an observational health study, one of the core, routine tasks researchers must address is defining the study population. If the population of interest is a set of patients with a particular manifestation of disease, this task is referred to as phenotyping. Phenotype definitions select patients into disease cohorts that are used to improve our collective knowledge about a particular condition, including fundamental epidemiological queries (e.g., quantifying incidence of disease overtime)1,a risk estimation and prediction questions (e.g., identifying risk factors for stroke)2 and comparative effectiveness studies (e.g., comparing diuretics vs ace-inhibitors for treating hypertension)3. High-impact research using phenotypes eventually impacts policy-making about potential medical treatments, and consequently the health of populations35. Thus, it is critical to evaluate whether phenotype definitions adequately represent all patients in a population.

Despite our best efforts as a research community to reduce bias in phenotype construction, phenotype definitions are still subject to multiple potential sources of bias; we illustrate here three such biases. Diagnosis bias prevents (or delays) disease diagnosis, often because of differences in initial presentation of disease across sub-groups. For instance, a meta-analysis of acute myocardial infarction symptom literature finds that men presenting with acute myocardial infarction are more likely to complain of chest pain, while women are more likely to complain of other forms of pain. A phenotype definition designed with chest pain as the primary presenting symptom may thus under-represent women6. Treatment bias prevents (or delays) appropriate medical treatment for individuals in certain groups. For instance, in a study of patients at the VA, evidence shows that Black patients are less likely to be prescribed cardio-protective drugs (beta blockers, statins, and ACE inhibitors) than white patients7. A phenotype definition for hyperlipidemia for instance might require proof of treatment. Under this condition, the phenotype might under-represent Black patients. Lastly, access to care biases are systemic issues that prevent patients from getting into the healthcare system. As such, a phenotype definition that requires the disease diagnosis to occur in a particular out-patient setting could under-represent patients that primarily rely on emergency visits for their regular care.

As standards in observational health data have emerged, the research community has acknowledged the need to standardize and validate phenotype definitions across multiple research sites (like in, for instance, the eMERGE network8. This practice has helped avoid potential biases in patient selection with respect to institution-specific documentation practices and geographical location idiosyncrasies. However, to date, there is no standard approach to assess the fairness of a phenotype definition beyond institutional care documentation differences.

In this paper, we develop and propose a set of best practices for evaluating phenotype definitions by bridging the evaluation measures used in epidemiological literature with those used in the algorithmic fairness and machine learning literature. We hope these practices can help make the adoption and reporting of fairness metrics in observational health studies standard practice. As illustrative scenarios, we assess the fairness of several highly-cited phenotypes for Crohn’s disease and diabetes type 2 with respect to gender (women, men) and race (Black, white) and show that phenotype definitions have varying performance characteristics, revealing real-world tradeoffs.

Related Work

The related work fall broadly into two categories: (1) evaluation and validation of phenotypes and (2) algorithmic fairness in healthcare.

Evaluation and Valuation of Phenotype Definitions

Rule-based and unsupervised phenotyping algorithms are primarily evaluated via clinical adjudication of patient records. The most common approach is to take a random sample of subjects identified by the phenotype and recruit one or more clinical experts to assess whether each subject meets the criteria for inclusion. Such an analysis typically yields a single number, an estimate of the positive predictive value (or precision) of the phenotype, which is calculated by comparing the gold standard clinical labels against the predicted labels from the phenotype algorithm. Related performance metrics such as sensitivity and specificity are rarely assessed, as they require clinical review of a much larger set of subjects (i.e., subjects identified as having the disease and not).

Previous literature on the subject of phenotype evaluation has already identified that phenotype validation, if it occurs at all, is rare and often incomplete9. This evaluation gap is persistent across numerous disease phenotypes. For example, in a systematic review of myocardial infarction phenotypes, researchers reported that of 33 validation studies, only 11 reported sensitivity and 5 reported specificity (all provided estimates for positive predicted value)10. This trend of providing precision but rarely estimating sensitivity and specificity is borne out in review of disease phenotypes for atrial fibrillation and stroke11,12.

The same trends are also present in the two diseases (Type 2 Diabetes and Crohn’s) we focus on in this paper. For Crohn’s disease, most publications which leverage rule-based phenotypes do not mention any form of clinical validation13,14. Among the phenotypes we leveraged in this paper, three report sensitivity and specificity along with positive predictive value1517, while the others only report precision18. For Type 2 Diabetes, we also find that many papers do not mention any clinical validation19,20. Among the phenotypes we use, two of them report sensitivity, specificity, and precision21,22.

Since phenotypes in most cases are not evaluated even using population-wide validation metrics such as sensitivity and specificity, it is evident that the vast majority of published phenotypes do not report these statistics within sub-groups or stratified by protected classes. Swerdel et al. proposed a general approach to evaluating a phenotype definition and estimate these population-level metrics, even in the absence of gold standard, but does not provide guidance for assessing its fairness9.

Algorithmic Fairness in Healthcare

Recently, algorithms used to make healthcare decisions have come under greater scrutiny for potentially being biased against certain protected classes such as race and gender. For example, Obermeyer et al. demonstrated that a commercial algorithm used to identify and select at-risk patients was biased against Black patients because the algorithm was primarily focused on cost minimization23. McCradden et al. emphasizes that existing performance metrics such as sensitivity and specificity might “camouflage” persistent health inequities, and suggests reporting group-specific performance metrics for algorithms trained on fundamentally biased healthcare data24.

Ethical machine learning in healthcare requires considering how biases might be introduced at all levels of the experimental design process including problem selection, data collection, outcome definition, and algorithm development25. While much research has focused on bias in downstream tasks such as clinical outcome prediction, less emphasis has been placed on upstream steps such as data collection and outcome definition. Phenotyping algorithms are one such early step in many observational studies for which there is no standard for assessing whether these biases exist. Given the broad use of phenotypes in observational health research, we argue that biases introduced at the level of phenotypes harbor the risk of exacerbating existing health disparities by influencing clinical guidelines and public policy.

Best Practices for Assessing the Algorithmic Fairness of Phenotypes

We propose a set of best practices for the research community to use when constructing and assessing phenotype definitions. These best practices are centered around the use of fairness metrics commonly cited in the fairness in machine learning literature. We bridge these fairness metrics with commonly used epidemiological measures, enabling researchers to interpret tradeoffs more easily. First, we detail the interpretation of fairness metrics as common epidemiological measures (Figure 1). Subsequently, we enumerate best practices for how to use these metrics when developing a phenotype. We hope that using these best practices will enable researchers to be transparent, intentional and explicit about the assessment and construction of their phenotypes.

Figure 1.

Figure 1.

Algorithmic fairness metrics mapped to existing epidemiological measures stratified by a protected class such as gender. We visualize three fairness metrics, by treating phenotyping algorithms as classifiers. In taking this perspective, we can equate demographic parity to equality of predicted prevalence; equality of opportunity to equality of sensitivity; and predictive rate parity to equality of precision.

Fairness Metrics: An Epidemiological Perspective

To assess the impact of phenotype algorithms on subgroup inclusion, we rely on fairness metrics that are commonly applied to supervised models. As notation, let A ∈ {0, 1} refer to a protected attribute of a patient which could be a binary variable, like the gender (woman, man) or race (Black, white) of a patient. Let Y^{0,1} be the predicted output from a phenotyping function, for exclusion or inclusion in a disease cohort. Further, let P0(y^)=P(y^|A=0) be the predicted output of a phenotyping function given a protected attribute. Below, we define and compare how these metrics can map to pre-existing epidemiological measures.

Demographic Parity as Equality of Predicted Prevalence

Also called “independence” or “statistical parity” in the fairness literature, demographic parity is the difference in the proportion for each protected class that receives the positive (and negative) outcome. In epidemiology, minimizing demographic parity would be equivalent to asserting that, among patients diagnosed with the disease, protected classes should have the same prevalence among the diagnosed. Mathematically this translates to:

P0(Y^=y^)=P1(Y^=y^)y^{0,1}

The basis of demographic parity is the federal four-fifths rule, which states that “a selection rate for any race, sex, or ethnic group which is less than four-fifths of the rate of the group with the highest rate will generally be regarded by the Federal enforcement agencies as evidence of adverse impact”. For many diseases, we might a priori expect there to be differences in prevalence across protected categories, for reasons unrelated to disparate treatment - for example, scientific literature has identified biological reasons for why many autoimmune disorders are more prevalent among women than men26. Despite this, demographic parity can be an appropriate in some healthcare settings or for some phenotypes where strict non-discriminatory practices are desirable27, such as phenotypes for inpatients visits or outpatient referrals for common conditions.

Equality of Opportunity as Equality of Sensitivity

Also called “separation” or “positive rate parity” in the fairness literature, equality of opportunity is achieved when the true positive rates of a model are equal across demographics. In epidemiology, maintaining equality of opportunity is equivalent to ensuring that the sensitivities within each protected class are equal. This translates to:

P0(Y^=1|Y=1)=P1(Y^=1|Y=1)

Equality of opportunity does not adjudicate false positives (or high precision) - it primarily captures whether the same proportion of patients across classes truly “had” the disease. To calculate the equality of opportunity, note that a “true” label is required. We approximate this ground-truth label using a silver standard by computing the majority vote label across multiple published phenotypes (see Approach: Assessing Fairness of Crohn’s Disease and Diabetes Phenotype Definitions).

Predictive Rate Parity as Equality of Precision

Also called “sufficiency” in the fairness literature, predictive rate parity is achieved when the probability of the true labels given the predicted label is equivalent across classes. In epidemiology, maintaining predictive rate parity is equivalent to ensuring that phenotypes have equivalent precision within each protected class. This translates to:

P0(Y=1|Y^=1)=P1(Y=1|Y^=1)                              andP0(Y=0|Y^=0)=P1(Y=0|Y^=0)

Note that again that to calculate this a “true” label is required, which we discuss below.

Using Fairness Metrics to Assess and Construct Phenotypes

We now highlight steps researchers can take when building and assessing phenotypes using these fairness metrics:

  1. Enumerate protected classes. The first step in the workflow is to simply enumerate all protected classes which are of interest in phenotyping a particular disease. Examples could include race (Black, white, etc.), gender (women, men, etc.), ethnicity (Hispanic, non-Hispanic, etc.), or age group (patients under 18, over 65, etc.).

  2. Identify priorities to optimize. Next, prioritize optimizing a particular fairness metric before constructing a phenotype definition. For example, in certain scenarios an all-encompassing, highly sensitive definition for all protected classes may be desirable. On the other hand, a specific definition which exhibits high precision may be desirable in other cases.

  3. Construct phenotype. The phenotype can now be designed with priorities in mind. The priorities will affect how broad inclusion criteria are and whether to incorporate diverse data types (specific medications, treatment regimes, etc.).

  4. Acquire gold/silver standard. A gold or silver standard is required to assess fairness metrics. If a gold standard cannot be obtained on a subset of patients, a silver standard may be constructed using methods like PheValuator or majority vote across many commonly used phenotypes for the same disease9.

  5. Compute fairness metrics. Use the gold/silver standard to compute fairness metrics and assess the tradeoffs and their epidemiological interpretation. For example, perhaps the phenotype is poorly sensitive for Black patients under the equality of opportunity criterion compared to white patients.

  6. Revise phenotype definitions. Based on the fairness metrics, revise the phenotype definition to attempt to mitigate any fairness gaps. For example, requiring continuous observation for a year or at least one inpatient stay may decrease the sensitivity of the phenotype for Black patients who have fewer interactions with the healthcare system. This inclusion criteria may then be dropped and then fairness metrics may be recomputed.

In (Figure 2) above we provide the full workflow and an example scenario. We hope that the use of this workflow may aid researchers in (1) making their priorities for their phenotype explicit (2) being intentional about which protected groups are being considered in phenotype design (3) assessing the fairness tradeoffs intuitively in epidemiological language (4) iteratively redesigning their phenotypes to optimize for their priorities. We recommend that researchers document this iterative process and disseminate all relevant performance and fairness metrics associated with their phenotype.

Figure 2.

Figure 2.

Best Practices for Assessment and Construction of Fair Phenotypes. We enumerate a sequence of steps that researchers can take to develop phenotypes with their specific concerns regarding biases across protected groups in mind. The steps are highlighted in the top blue boxes and an example of this workflow applied in practice is shown in the orange boxes.

Approach: Assessing Fairness of Crohn’s Disease and Diabetes Phenotype Definitions

We illustrate the use of the proposed best practices on the following scenarios for two conditions: Crohn’s disease and type 2 diabetes. We selected these two conditions, as they are well-studied diseases with publicly available phenotype definitions. For each condition, we compare multiple phenotype definitions and assess their fairness across two types of patient subgroups (gender and race). Each definition is derived from highly-used or highly-cited publications.

To assess the equality of opportunity and predictive rate parity fairness metrics, ground truth labels are needed about who has a disease. We approximate this using a silver standard, which we define as the group of patients that belong to a majority of phenotypes for a given disease. For example, in our Crohn’s analysis using five distinct phenotype definitions, patients that belong to at least three of the five phenotype definitions are labeled as “true positives.”

Data source and phenotype source

The different phenotypes are implemented using electronic health record data from NewYork-Presbyterian Hospital, a tertiary academic medical center that serves a heterogeneous patient population in New York City. The data are translated and standardized to the Observational Medical Outcomes Partnership (OMOP) common data model28, which allows research to easily extend to other institutions using the same common data format. Phenotype definitions are taken from the Observational Health Data Sciences and Informatics (OHDSI) PhenotypeLibrary, which provides publicly-available phenotype definitions for various disease diagnoses, along with citations explaining their sourcing.

Crohn’s phenotypes

We choose five Crohn’s disease phenotypes from the OHDSI PhenotypeLibrary: (1) a commonly used OHDSI literature definition most recently included in the ongoing OHDSI Health Equity Research Assessment (HERA) study that was slightly modified based on existing literature15, (2) the original heavily-cited Crohn’s phenotype15, (3-4) phenotype definitions from publications that have impacted clinical diagnosis or medical guidelines17,18, and (5) another heavily-cited Crohn’s phenotype16. Phenotype demographics are provided in (Table 1), while phenotype descriptions as well as the number of included concepts are provided in (Table 3).

Table 1.

Crohn’s disease phenotype demographics: The gender, race, and age demographic information for each Crohn’s phenotype implemented.

graphic file with name 529t1.jpg

Tables 3 and 4.

Crohn’s disease phenotype demographics: The gender, race, and age demographic information for each Crohn’s phenotype implemented.

graphic file with name 529t3.jpg

Type 2 Diabetes phenotypes

We choose three diabetes disease phenotypes from the publicly-available phenotypes listed on the OHDSI PhenotypeLibrary (1) a highly cited literature definition that requires specific diagnoses, medication, and hemoglobin A1c measurements21, (2) a widely used OHDSI definition that proposed new therapeutic guidelines for hypertension prescriptions3, and (3) a validated diabetes phenotype from the pheKB initiative designed to be portable across sites22. Phenotype demographics are provided in (Table 2) below, while phenotype descriptions as well as the number of included concepts are provided in (Table 4).

Table 2.

Crohn’s disease phenotype demographics: The gender, race, and age demographic information for each Crohn’s phenotype implemented.

graphic file with name 529t2.jpg

Results

After empirically analyzing Crohn’s and type 2 diabetes mellitus phenotyping definitions across the three fairness metrics, we identify examples of significant real-world trade-offs arising from phenotype construction. In (Figure 3) and (Figure 4) below we visualize the demographic parities, equality of opportunities, and predictive rate parities across gender and race for Crohn’s patients and type 2 diabetes mellitus patients, respectively. Statistically significant differences between subgroups are assessed by calculating the difference in proportions using a two-sided proportion z-test, at significance level α = 0.05.

Figure 3.

Figure 3.

Crohn’s disease phenotype performance across gender and race for the five phenotypes measured across demographic parity, equality of opportunity, predictive rate parity. A trade-off exists between using various phenotype definitions, as measured by the differences in performance across subgroups and fairness metrics

Figure 4.

Figure 4.

Figure 4.

Diabetes phenotype performance across gender and race for the three phenotypes measured across demographic parity, equality of opportunity, predictive rate parity. A trade-off exists between using various phenotype definitions, as measured by the differences in performance across subgroups and fairness metrics

Crohn’s phenotypes

We visualize the demographic parity as the estimated prevalence difference between subgroups. The Crohn’s disease phenotypes demonstrate a clear, consistent difference in estimated prevalence across subgroups, where the estimated prevalence is systematically higher among men than women, and among white patients than Black patients. This demographic disparity is statistically significant for all phenotypes for race, and all phenotypes besides HERA for gender. Recent Crohn’s prevalence literature suggests that despite clinical manifestations of Crohn’s disease being similar among Black and white patients29, Black patients are often underdiagnosed and under-represented in Crohn’s clinical trials30, and that the true population prevalence for Black and white patients is approximately equal31, suggesting that our data might also contain access-to-care biases favoring the inclusion of white patients.

The estimated sensitivity for the Crohn’s phenotypes do not show a consistent pattern favoring any particular subgroup. However, the estimated sensitivity differences are noticeably impacted by hospitalization status. Phenotype definitions that require in-patient hospitalization or emergency room visits prior to Crohn’s diagnosis, such as the Thirumurthi and Stepaniuk definitions, are particularly sensitive toward identifying men and Black patients, while other phenotypes, such as the HERA and Benchimol that included outpatient visits, are generally more sensitive for women and white patients respectively. This trend potentially reflects an access-to-care bias, where the healthcare setting impacts patient cohort inclusion. From these results, we contend men and Black patients at our hospital are more likely to be diagnosed or receive care for their Crohn’s disease during an acute in-patient hospitalization, or during an emergency room visit, while women and white patients are more likely to be diagnosed for Crohn’s during an out-patient visit. Existing literature highlights how Black children and adolescents with Crohn’s disease are more likely to repeatedly visit the emergency room for disease management, highlighting how phenotype definitions requiring particular settings could be used to identify particularly at-risk patients32.

The estimated precision for Crohn’s phenotypes show an inconsistent pattern with significant differences across the various phenotypes. Among the race subgroups, the HERA and Ananthakrisnan definitions are more precise at identifying Black patients. When we consider gender subgroups, an interesting pattern emerges; unlike the other phenotype definitions, the HERA and Benchimol definitions require patients to have no prior history of Crohn’s condition codes in their patient records before diagnosis and are more precise at identifying men. Meanwhile, the Thirumurthy definition that has no history requirement is more precise at identifying women. This disparate trend suggests that women are more likely to have received a previous (potentially relevant) diagnosis in their medical record compared to men, demonstrating a diagnosis bias where there are differences in initial presentation (e.g. women being more likely to have been suspected of having or previously having had the disease).

Type 2 diabetes mellitus phenotypes

The estimated prevalence for the diabetes phenotypes is consistently higher among Black patients than white patients. Across genders, two of the phenotype definitions (LEGEND, PheKB) estimate a higher prevalence for men, while one definition (Miller) estimates a higher prevalence for women. A study by Danaei et al. roughly estimates that type 2 diabetes mellitus is more prevalent among men than women33. Among Black and white patients, Signorello et al. suggest that after adjusting for socioeconomic status, prevalence rates across races are approximately equal34. Our results reflect the heterogeneous patient population at our academic medical center.

The estimated sensitivity for the diabetes phenotypes shows a mixed trend; LEGEND is more sensitive toward Black patients and women, while the PheKB definition is more sensitive toward white patients and men. When we critically examine the phenotype definitions (Table 3), the main differences are in medication requirements. All three definitions require patients to have been prescribed some form of medication, with the Miller definition including 18,368 drug concepts for “all glycemic meds” including insulin, while the LEGEND phenotype definition includes 18,290 drug concepts for “all type 2 diabetes mellitus medication excluding insulin”. The PheKB definition is even more strict, including only 15,433 drug concepts for “type 2 diabetes prescriptions” that also excludes insulin. We note that the Miller definition (which includes the broadest category of medication, as well as insulin) performs approximately equally across genders and races. Thus, excluding insulin from the LEGEND and PheKB definitions potentially explains the difference in predicted sensitivities. This empirical analysis shows that future definitions of diabetes should potentially consider including insulin in its concept set if the research aims to include all patients diagnosed with type 2 diabetes mellitus.

The estimated precision for the diabetes phenotypes shows the Miller and LEGEND phenotypes having a higher precision for men and Black patients, while the PheKB phenotype having a higher precision for women and white patients. Given that the PheKB definition is the most stringent when it comes to inclusion and exclusion criteria, our results suggest that women and white patients are more likely to have received the more specific diabetes prescriptions that fit PheKB’s narrower definition, while men and Black patients were more likely to have received insulin or other diabetes medication.

Discussion and Conclusion

In this study we present a workflow and best practices for assessing phenotypes using algorithmic fairness metrics, and apply our workflow to assessing Crohn’s and diabetes phenotypes from literature. Our empirical analysis demonstrates that the appropriate use of a phenotype definition necessitates understanding the implicit biases generated during phenotype creation. We demonstrate this trade-off by measuring fairness metrics across protected categories, showing that even among highly-cited phenotype definitions that have influenced clinical guidelines, existing phenotype definitions can preferentially include (or exclude) certain protected subgroups. Because these trade-offs are unavoidable and researchers cannot avoid creating phenotype definitions, we advocate researchers (1) document, (2) open source, and (3) make transparent the process used to identify and protect or include particular subgroups of interest. By assessing phenotypes using fairness metrics these disparate impacts can be mitigated and balanced with the commensurate benefits to health. We highlight that the present study is only possible by the community’s support of open science. It is only possible to conduct such assessments and improve the fairness of phenotypes because the authors have released their research as open source JSON definitions via GitHub. Finally, we encourage similar critique of our workflow and have released all analysis code. We encourage practitioners to extend our methods and report algorithmic fairness metrics in future observational health studies where health disparities might affect how a phenotype is defined or selected.

Figures & Table

References

  • 1.Dubberke ER, Nyazee HA, Yokoe DS, Mayer J, Stevenson KB, Mangino JE, et al. Implementing Automated Surveillance for Tracking Clostridium difficile Infection at Multiple Healthcare Facilities. Infect Control Hosp Epidemiol Off J Soc Hosp Epidemiol Am. 2012;33(3):305–8. doi: 10.1086/664052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kaelber DC, Foster W, Gilder J, Love TE, Jain AK. Patient characteristics associated with venous thromboembolic events: a cohort study using pooled electronic health record data. J Am Med Inform Assoc. 2012;19(6):965–72. doi: 10.1136/amiajnl-2011-000782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Suchard MA, Schuemie MJ, Krumholz HM, You SC, Chen R, Pratt N, et al. Comprehensive comparative effectiveness and safety of first-line antihypertensive drug classes: a systematic, multinational, large-scale analysis. The Lancet. 2019 Nov 16;394(10211):1816–26. doi: 10.1016/S0140-6736(19)32317-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Forbes A, Escher J, Hébuterne X, Kłek S, Krznaric Z, Schneider S, et al. ESPEN guideline: Clinical nutrition in inflammatory bowel disease. Clin Nutr. 2017;36(2):321–47. doi: 10.1016/j.clnu.2016.12.027. [DOI] [PubMed] [Google Scholar]
  • 5.Nguyen GC. First Do No Harm: Is It Safe to Use Immunosuppressants in Inflammatory Bowel Disease Patients With Prior Cancer? Gastroenterology. 2016;151(1):22–4. doi: 10.1053/j.gastro.2016.05.018. [DOI] [PubMed] [Google Scholar]
  • 6.Coventry LL, Finn J, Bremner AP. Sex differences in symptom presentation in acute myocardial infarction: a systematic review and meta-analysis. Heart Lung J Crit Care. 2011 Dec;40(6):477–91. doi: 10.1016/j.hrtlng.2011.05.001. [DOI] [PubMed] [Google Scholar]
  • 7.Mehta JL, Bursac Z, Mehta P, Bansal D, Fink L, Marsh J, et al. Racial disparities in prescriptions for cardioprotective drugs and cardiac outcomes in Veterans Affairs Hospitals. Am J Cardiol. 2010;105(7):1019–23. doi: 10.1016/j.amjcard.2009.11.031. [DOI] [PubMed] [Google Scholar]
  • 8.McCarty CA, Chisholm RL, Chute CG, Kullo IJ, Jarvik GP, Larson EB, et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics. 2011;4(1):1–11. doi: 10.1186/1755-8794-4-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Swerdel JN, Hripcsak G, Ryan PB. PheValuator: Development and evaluation of a phenotype algorithm evaluator. J Biomed Inform. 2019 Sep;97:103258. doi: 10.1016/j.jbi.2019.103258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rubbo B, Fitzpatrick NK, Denaxas S, Daskalopoulou M, Yu N, Patel RS, et al. Use of electronic health records to ascertain, validate and phenotype acute myocardial infarction: A systematic review and recommendations. Int J Cardiol. 2015 May 6;187:705–11. doi: 10.1016/j.ijcard.2015.03.075. [DOI] [PubMed] [Google Scholar]
  • 11.McCormick N, Bhole V, Lacaille D, Avina-Zubieta JA. Validity of Diagnostic Codes for Acute Stroke in Administrative Databases: A Systematic Review. PLOS ONE. 2015 Aug 20;10(8):e0135834. doi: 10.1371/journal.pone.0135834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Jensen PN, Johnson K, Floyd J, Heckbert SR, Carnahan R, Dublin S. A systematic review of validated methods for identifying atrial fibrillation using administrative data. Pharmacoepidemiol Drug Saf. 2012;21(S1):141–7. doi: 10.1002/pds.2317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Jess T, Frisch M, Simonsen J. Trends in overall and cause-specific mortality among patients with inflammatory bowel disease from 1982 to 2010. Clin Gastroenterol Hepatol. 2013;11(1):43–8. doi: 10.1016/j.cgh.2012.09.026. [DOI] [PubMed] [Google Scholar]
  • 14.Long MD, Martin C, Sandler RS, Kappelman MD. Increased risk of pneumonia among patients with inflammatory bowel disease. Am J Gastroenterol. 2013;108(2):240–8. doi: 10.1038/ajg.2012.406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Thirumurthi S, Chowdhury R, Richardson P, Abraham NS. Validation of ICD-9-CM Diagnostic Codes for Inflammatory Bowel Disease Among Veterans. Dig Dis Sci. 2010;55(9):2592–8. doi: 10.1007/s10620-009-1074-z. [DOI] [PubMed] [Google Scholar]
  • 16.Benchimol EI, Manuel DG, Guttmann A, Nguyen GC, Mojaverian N, Quach P, et al. Changing Age Demographics of Inflammatory Bowel Disease in Ontario, Canada: A Population-based Cohort Study of Epidemiology Trends. Inflamm Bowel Dis. 2014;20(10):1761–9. doi: 10.1097/MIB.0000000000000103. [DOI] [PubMed] [Google Scholar]
  • 17.Stepaniuk P, Bernstein CN, Targownik LE, Singh H. Characterization of Inflammatory Bowel Disease in Elderly Patients: A Review of Epidemiology, Current Practices and Outcomes of Current Management Strategies. Can J Gastroenterol Hepatol. 2015;29(6):327–33. doi: 10.1155/2015/136960. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ananthakrishnan AN, Cagan A, Gainer VS, Cai T, Cheng S-C, Savova G, et al. Normalization of Plasma 25-Hydroxy Vitamin D Is Associated with Reduced Risk of Surgery in Crohn’s Disease. Inflamm Bowel Dis. 2013;19(9):1921–7. doi: 10.1097/MIB.0b013e3182902ad9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Johnston SS, Conner C, Aagren M, Smith DM, Bouchard J, Brett J. Evidence linking hypoglycemic events to an increased risk of acute cardiovascular events in patients with type 2 diabetes. Diabetes Care. 2011;34(5):1164–70. doi: 10.2337/dc10-1915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Müller N, Heller T, Freitag MH, Gerste B, Haupt CM, Wolf G, et al. Healthcare utilization of people with type 2 diabetes in Germany: an analysis based on health insurance data. Diabet Med. 2015;32(7):951–7. doi: 10.1111/dme.12747. [DOI] [PubMed] [Google Scholar]
  • 21.Miller DR, Safford MM, Pogach LM. Who has diabetes? Best estimates of diabetes prevalence in the Department of Veterans Affairs based on computerized patient data. Diabetes Care. 2004;27(Suppl 2):B10–21. doi: 10.2337/diacare.27.suppl_2.b10. [DOI] [PubMed] [Google Scholar]
  • 22.Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O, Peissig PL, et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc JAMIA. 2016 Nov;23(6):1046–52. doi: 10.1093/jamia/ocv202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019 Oct 25;366(6464):447–53. doi: 10.1126/science.aax2342. [DOI] [PubMed] [Google Scholar]
  • 24.McCradden MD, Joshi S, Mazwi M, Anderson JA. Ethical limitations of algorithmic fairness solutions in health care machine learning. Lancet Digit Health. 2020;2(5):e221–3. doi: 10.1016/S2589-7500(20)30065-0. [DOI] [PubMed] [Google Scholar]
  • 25.Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M. Ethical Machine Learning in Healthcare. Annu Rev Biomed Data Sci. 2021;4(1):123–44. doi: 10.1146/annurev-biodatasci-092820-114757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Whitacre CC. Sex differences in autoimmune disease. Nat Immunol. 2001 Sep 1;2(9):777–80. doi: 10.1038/ni0901-777. [DOI] [PubMed] [Google Scholar]
  • 27.Friedler SA, Scheidegger C, Venkatasubramanian S. On the (im)possibility of fairness. ArXiv160907236 Cs Stat [Internet] 2016 Sep 23 [cited 2022 Jan 15]. Available from: http://arxiv.org/abs/1609.07236.
  • 28.Stang PE, Ryan PB, Racoosin JA, Overhage JM, Hartzema AG, Reich C, et al. Advancing the Science for Active Surveillance: Rationale and Design for the Observational Medical Outcomes Partnership. Ann Intern Med. 2010;153(9):600–6. doi: 10.7326/0003-4819-153-9-201011020-00010. [DOI] [PubMed] [Google Scholar]
  • 29.Straus WL, Eisen GM, Sandler RS, Murray SC, Sessions JT. Crohn’s disease: does race matter? Am J Gastroenterol. 2000 Feb 1;95(2):479–83. doi: 10.1111/j.1572-0241.2000.t01-1-01531.x. [DOI] [PubMed] [Google Scholar]
  • 30.Jackson JF, Kornbluth A. Do Black and Hispanic Americans With Inflammatory Bowel Disease (IBD) Receive Inferior Care Compared With White Americans? Uneasy Questions and Speculations. Off J Am Coll Gastroenterol ACG. 2007 Jul;102(7):1343–9. doi: 10.1111/j.1572-0241.2007.01371.x. [DOI] [PubMed] [Google Scholar]
  • 31.Nguyen GC, Torres EA, Regueiro M, Bromfield G, Bitton A, Stempak J, et al. Inflammatory Bowel Disease Characteristics Among African Americans, Hispanics, and Non-Hispanic Whites: Characterization of a Large North American Cohort. Off J Am Coll Gastroenterol ACG. 2006 May;101(5):1012–23. doi: 10.1111/j.1572-0241.2006.00504.x. [DOI] [PubMed] [Google Scholar]
  • 32.Dotson JL, Kappelman MD, Bricker J, Andridge R, Chisolm DJ, Crandall WV. Multicenter Evaluation of Emergency Department Treatment for Children and Adolescents With Crohn’s Disease According to Race/Ethnicity and Insurance Payor Status. Inflamm Bowel Dis. 2019 Jan 1;25(1):194–203. doi: 10.1093/ibd/izy192. [DOI] [PubMed] [Google Scholar]
  • 33.Danaei G, Friedman AB, Oza S, Murray CJ, Ezzati M. Diabetes prevalence and diagnosis in US states: analysis of health surveys. Popul Health Metr. 2009 Sep 25;7(1):16. doi: 10.1186/1478-7954-7-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Signorello LB, Schlundt DG, Cohen SS, Steinwandel MD, Buchowski MS, McLaughlin JK, et al. Comparing Diabetes Prevalence Between African Americans and Whites of Similar Socioeconomic Status. Am J Public Health. 2007 Dec 1;97(12):2260–7. doi: 10.2105/AJPH.2006.094482. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES