Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Apr 4.
Published in final edited form as: N Engl J Med. 2021 Jan 6;384(5):474–480. doi: 10.1056/NEJMms2029562

Race and Genetic Ancestry in Medicine — A Time for Reckoning with Racism

Luisa N Borrell 1,*, Jennifer R Elhawary 2,*, Elena Fuentes‑Afflick 3, Jonathan Witonsky 4, Nirav Bhakta 5, Alan HB Wu 6, Kirsten Bibbins‑Domingo 7, José R Rodríguez‑Santana 8, Michael A Lenoir 9, James R Gavin III 10, Rick A Kittles 11, Noah A Zaitlen 12, David S Wilkes 13, Neil R Powe 14, Elad Ziv 15, Esteban G Burchard 16,*
PMCID: PMC8979367  NIHMSID: NIHMS1782318  PMID: 33406325

In the United States, race, ancestry, genetics, and medicine are inextricably linked in a complex and fraught history. Medicine is replete with examples of racial injustice inflicted by the use of race and ethnicity as biologic constructs to engender hierarchical discrimination. Race and ethnicity are dynamic, shaped by geographic, cultural, and sociopolitical forces; they can influence people’s socioeconomic position and lead to disproportionately high morbidity and mortality for racial and ethnic minorities by sustaining inequitable access to resources, including health care.1

Nevertheless, we believe that it is inappropriate to simply abandon the use of race and ethnicity in biomedical research and clinical practice, since these variables capture important epidemiologic information, including social determinants of health such as racism and discrimination, socioeconomic position, and environmental exposures. Eliminating the use of race/ethnicity, or implementing a race/ethnicity-blind approach, could enable inequitable health care systems to persist and exacerbate racial/ethnic inequities in health outcomes. Complementing the use of race/ethnicity with data on genetic ancestry, genotypes, or biomarkers might be useful, but risks and benefits should be analyzed carefully for specific clinical applications.

RACIAL CATEGORIZATIONS IN THE UNITED STATES

The 1787 U.S. Constitutional Convention adopted the “Three-Fifths Compromise,” which considered each enslaved African to be three fifths of a person, allowing increased representation for the southern states in the House of Delegates, without what they saw as overtaxation. Thus, three racial categories were defined in the first U.S. Census in 1790 and became deeply ingrained in the social fabric of the United States: White people and Native Americans each counted as one whole tax-paying person, and slaves or Black people counted as three fifths of a person.2 Although the Three-Fifths Compromise was repealed in 1868, the U.S. Census continues to classify people based on their racial identification.3

The Office of Management and Budget classifies people by ethnicity as well as racial identification.4 Ethnicity (as in Hispanic/Latino) captures the common values, cultural norms, and behaviors of people who are linked by shared culture and language, whereas race refers to one’s identification with a group or identity ascribed on the basis of physical characteristics and skin color.5 Census questions are intended to reflect self-defined membership in a social category, without anthropologic or genetic meaning,6 and census data are used to determine resource allocation and political representation.

RACE AS A MASTER STATUS VARIABLE

Race is considered a master status,7 or a primary identifying characteristic reflecting a social position ascribed to a person that may affect every aspect of their life. Race influences social interactions and access to opportunities and societal resources.8 For example, race was the driver of “redlining,” a legal form of residential segregation9 that resulted in disinvestment in education and social services, poor housing, limited community resources such as parks and grocery stores, unemployment, and poor access to health care for Black communities.

Race/ethnicity has been used to evaluate differences in clinical measures and outcomes and is used by researchers in established analytic approaches. Unfortunately, even after analysts control for socioeconomic indicators such as education and income, environmental exposures, and other established risk factors, they frequently observe a greater risk of adverse health outcomes among Black Americans than among White Americans. This increased risk is often reported without explanation or is presented as an intrinsic biologic difference between races. These “intrinsic differences” actually capture racialized expressions of biology or the embodiment of inequities related to unmeasured risk factors or exposures, including exposure to individual and structural racism.10

GENETIC ANCESTRY AND ADMIXTURE

In a society in which inequities in health care affect many disease outcomes, it may seem reasonable to assume that all racial/ethnic differences in disease incidence and outcomes derive from socioeconomic differences. However, race is also directly associated with genetic ancestry and therefore indirectly related to genetic variants that may affect disease and health outcomes. Genomewide genotyping methods and advanced computational algorithms now enable scientists to infer the geographic origins of a person’s ancestors from minute differences in the cumulative frequency of thousands of genetic variants (alleles). These methods and algorithms have been applied, without bias, to large populations worldwide. The largest genetic clusters of people correspond to geographic regions and specific populations in Africa, Europe, Asia, Oceania, and the Americas,11 suggesting that continental-level ancestry captures the greatest population differences in genetic variation. Ancestry assessment within continents can provide information on a finer scale.12

Although race/ethnicity correlates with genetic ancestry,13 it captures different information. Race and ethnicity are self-ascribed or socially ascribed identities and are often “assigned” by police, hospital staff, or others on the basis of physical characteristics. Genetic ancestry is the genetic origin of one’s population. Although race/ethnicity may capture information about the likely presence of certain genetic variants, ancestry is a better predictor.14 Genetic admixture, or genetic exchange among people from different ancestries, is an important characteristic of many populations and may correlate with individuals’ risk for certain genetic diseases.15 And there may be substantial variation in ancestry among and within populations16; U.S. Black populations, for example, have larger proportions of African than of European ancestry, which vary with the year and location in which samples are obtained.17 Latino Americans, the largest and fastest-growing U.S. minority population, are an admixed group of European, Native American, and African ancestries (Fig. 1).18

Figure 1. Genetic Admixture in the Mexican American and Puerto Rican Populations.

Figure 1.

Data are from the Genes‑environments and Admixture in Latino Americans (GALA II) Study.

The race/ethnicity categories used in biomedical research and clinical practice are broad and less precise than ancestry. Consider a Black–White biracial male firefighter who presents with a smoke-inhalation injury. How would he be classified? He could self-identify as Black or White, but society would probably label him as Black. From a clinical perspective, he is a combination of Black and White. This ambiguity may contribute to misdiagnosis and is particularly troubling when someone’s race/ethnicity is assigned by health professionals or police. In addition, different health systems may use different racial/ethnic categories. In contrast, ancestry is a fixed characteristic of the genome.

Ancestry testing using millions of genetic markers has significantly advanced our understanding of globally and geographically diverse populations, leading to improved clinical predictions. For example, in Black and Latino people, the proportion of African ancestry predicts differences in creatinine levels and estimated glomerular filtration rate (eGFR). When 10% of Latino people initially deemed to have stage 3 chronic kidney disease had their disease reclassified as stage 2 on the basis of ancestry, their electrolyte levels were more consistent with their ancestry-adjusted stage than their race-adjusted stage.19 In addition, validation of the eGFR equations within three Asian populations yielded different adjusted predicted values,20 suggesting that GFR varies within racial/ethnic groups. We do not yet know, however, whether ancestry adjustment leads to better estimation of GFR than do race-adjusted or race/ancestry-independent methods. The alarming decision by some health care institutions to remove race from GFR calculations ignores potential population differences without considering the clinical performance characteristics or consequences for Black patients.14,21 Though it may be tempting to consider ancestry in such equations, the true cause of observed racial differences in creatinine levels is unknown.

Racial/ethnic differences in risk for disease and response to treatments are partially related to biologic factors, including genetic and epigenetic variants. Using ancestry as a variable helps to capture and explain a portion of the biologic variation between and within groups. For example, in the first large-scale epigenetic study of asthma in minority children, ancestry explained 75% of the total variance in epigenetic patterns, suggesting that race/ethnicity, as a proxy for socioenvironmental exposures, explained the remaining 25%.22 Thus, race/ethnicity may be better than ancestry as a predictor of nongenetic factors. We would argue that both variables are important and are complementary in biomedical research and clinical practice.

GENETIC ANCESTRY VERSUS INDIVIDUAL CLINICAL PREDICTORS

The National Institutes of Health has made a concerted effort to include racial/ethnic minority populations in biomedical and clinical studies. However, years of inadequate funding for research in these communities have created significant knowledge gaps regarding the generalizability of biomedical discoveries and clinical advances to non-White populations. Less than 2% of National Cancer Institute–funded clinical trials have included non-White participants.23

Still, population-specific genetic variants contributing to clinical differences between racial/ethnic groups have been identified using a limited number of racially/ethnically diverse studies. For example, genetic variants at the 6q25 locus identified in Latina women are associated with protection against breast cancer and originate from Indigenous American populations.24 APOL1 genotypes, which are more common among people with West African ancestry,25 are strongly associated with focal sclerosing glomerulosclerosis, nondiabetic kidney disease, and HIV nephropathy, which can lead to early-onset end-stage kidney failure.26 However, most people with the high-risk genotype do not have rapid progression to kidney failure, which suggests that additional genetic and nongenetic factors influence its effect.

Prostate cancer is more than twice as common among Black men as among White men.27 Genomewide association studies have identified variants at 8q24 that are associated with prostate-cancer risk in many populations, including variants that are more common in Black men and account for much of their excess risk of prostate cancer.28 In another example, a black-box warning added to Plavix (clopidogrel) in 2010 stated that “poor metabolizers may not receive the full benefit of Plavix treatment and may remain at risk for heart attack, stroke, and cardiovascular death.”29 Among people with no response to Plavix, as many as 75% of Asians and Pacific Islanders lack the CYP2C19 genetic polymorphism required to metabolize the prodrug into its active form.29,30 Although there are examples of genetic variants underlying racial/ethnic differences in disease occurrence or outcomes, more often the causes of such differences are unknown, either because unrecognized nongenetic factors are key or because genetic research has failed to incorporate racial/ethnic diversity.31

Globally diverse populations must be studied because genetic variation and genome architecture vary among populations. More than 80% of participants in existing genomewide association studies are of European background; Black and Latino people, who account for more than 30% of the U.S. population, are dramatically under-represented (about 2% and <0.5%, respectively).31 Less than 4.5% of federally funded pulmonary research has included minority populations, despite evidence of significant population-specific differences in the distribution of genetic risk variants for common diseases such as asthma.32,33

Such disparities perpetuate the gap in access to precision medicine for non-White populations. For example, genetic variants within known cancer risk genes are well identified in populations of European ancestry, but often the same variants are classified as “variants of uncertain significance” in people of non-European ancestry.34 As the push toward precision medicine intensifies, this worrisome deficit in genetic research will grow, leaving much of the global population behind. Unless we act now, the promise of precision medicine will be available to, and benefit, only a select few.31,35

Furthermore, genetic studies of non-European populations are important even if genetic variants are not responsible for overall differences in disease incidence or outcomes. Specifically, the frequency and effect sizes of genetic variants associated with disease risk may vary across populations.31 Polygenic risk scores derived from studies of populations with European ancestry have less predictive power when applied to non-European populations.31 For example, the polygenic risk score for breast cancer is about one third as predictive for Black women as for women of European descent,36 a disparity with clear implications for the future of precision medicine.

INFORMED USE OF RACE, ETHNICITY, AND ANCESTRY

Race, ethnicity, and ancestry have a complex and intertwined relationship that demands nuanced analyses. We believe that associations between race/ethnicity and disease outcomes should be interpreted carefully and that we should not assume that environmental, social, or genetic factors represent the only contributors to a given disease until causation has been proven. Conversely, we should avoid assuming that genetic causes have been ruled out, as this could undermine the discovery of genetic variants like the 8q24 variants that may partially explain increased prostate-cancer incidence among Black men.28

We believe that decisions regarding the use of race/ethnicity as a predictor in algorithms and mathematical risk models should consider whether the model’s underlying data are strongly associated with race/ethnicity and whether the inclusion or exclusion of race/ethnicity results in better health outcomes and reduced health inequities. For example, it has been claimed that race adjustment may overestimate the GFR in some Black patients and contribute to delays in referral for renal transplantation, but the nonadjusted equation may underestimate Black patients’ GFR, resulting in underdosage or denial of certain medications or foreclosed opportunities for kidney donation. An alternative approach is to calculate the eGFR using cystatin C, a biomarker of renal function, instead of creatinine, but the related testing costs are significantly higher.

Similarly, race-specific reference equations for lung function reflect the lower average measures of normal lung function observed in non-White groups.37,38 Consequently, relative to the equations derived from White populations, those derived from Black populations will yield a higher percentage of predicted values for lung function, which could lead to underestimating the severity of lung disease, with clinical implications including delayed detection, missed opportunities for medical management of symptoms, denial of disability claims, and delayed access to lifesaving treatments such as lung transplantation. On the flip side, using an equation derived from White populations in other racial/ethnic groups may lead to overdiagnosis, excessive follow-up testing, anxiety for patients, and compromised eligibility for treatments such as stem-cell transplantation for cancer.39 Moreover, the application of White-derived lung- and kidney-function equations to Black patients ignores long-recognized racial/ethnic differences in normal physiological function or biomarkers and is itself a form of racial discrimination.

As noted above, adjusting eGFR for ancestry rather than race could result in reclassification of patients’ kidney disease. However, before ancestry adjustment is widely adopted, it is important to demonstrate that it provides results at least as accurate as those of race adjustment. Ideally, ancestry-adjusted results should be evaluated on the basis of prediction of disease or clinically significant outcomes. In several diverse cohorts, for example, mathematical risk models of lung function that included ancestry plus self-identified race/ethnicity yielded more strongly predictive results than models including only self-identified race/ethnicity.40 Data from longitudinal clinical studies of diverse populations evaluated for kidney and lung disease are needed to determine whether race-based equations, ancestry-adjusted equations, or equations that ignore both variables better predict clinically significant outcomes such as diagnosis, disease severity, prognosis, risk of surgical complications, and eligibility for lung transplantation. This debate calls attention to the National Institutes of Health and its disease-focused and organ-based institutes—that is, the National Institute of Diabetes and Digestive and Kidney Diseases and the National Heart, Lung, and Blood Institute—to challenge researchers to determine which prediction equation is the most clinically accurate.

Even where there is known genetic variation related to specific diseases, the use of race/ethnicity may be important in measuring and addressing nongenetic causes of health inequities. Although the higher incidence of prostate cancer among Black men, for example, may be partially explained by genetic variants,28 ancestry may be less important than race/ethnicity in determining clinical outcomes: among men with prostate cancer, race/ethnicity is associated with disparities in access and treatment.41,42

Although some such disparities may be partially captured by careful attention to socioeconomic factors, others may be more deeply rooted in racial stratification, which drives access to care, bias, and racial discrimination or racism. For example, access to organ transplantation is systematically lower for Black patients with end-stage renal disease than for their White counter-parts,43 possibly owing in part to physician bias.44 Attention to race/ethnicity is important not only for documenting disparities; interventions designed to reduce disparities have been demonstrated to improve outcomes.45

CONCLUSIONS

Considering genetic ancestry in addition to self-identified race/ethnicity has improved our understanding of disease and facilitated the development of interventions. But for many conditions, the relative importance of bias, racial discrimination, culture, socioeconomic status, access to care, environmental factors, and genetics to racial/ethnic differences in disease has not been adequately studied. The combination of these influential correlates of health is captured, albeit imperfectly, by the variable of race/ethnicity, and ignoring it would be counterproductive.

Indeed, we contend that the epidemiologic importance of race/ethnicity will never disappear. Genetic research has advanced our understanding of human disease and therapies that, if made available equitably, could advance care and promote health equity in all groups. But we also recognize that financial, privacy, and societal costs associated with advances in genetics and medicine could exacerbate racial/ethnic health inequities. Therefore, ignoring race and ethnicity in biomedical research and medicine is not the answer to the health-inequity epidemic. Instead, scientists and clinicians should continue to use racial/ethnic categories to address and eliminate health inequities until better predictors are available.

By attending to these issues, we can further elucidate variations in disease onset, progression, and severity among and within racial/ethnic groups. Furthermore, given the emergence of precision medicine and the persistent salience of overt racism, abandoning race/ethnicity without substituting better disease predictors not only is irresponsible but also ignores the reality of U.S. social stratification and its implications for population health.

Footnotes

Disclosure forms provided by the authors are available at NEJM.org.

Contributor Information

Luisa N. Borrell, Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York.

Jennifer R. Elhawary, Department of Medicine, California.

Elena Fuentes‑Afflick, Department of Pediatrics, California

Jonathan Witonsky, Department of Medicine, Department of Pediatrics, California

Nirav Bhakta, Department of Medicine, California

Alan H.B. Wu, Department of Laboratory Medicine, California

Kirsten Bibbins‑Domingo, Department of Epidemiology and Biostatistics, Priscilla Chan and Mark Zuckerberg San Francisco General Hospital, California

José R. Rodríguez‑Santana, Emory University School of Medicine, Atlanta

Michael A. Lenoir, University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland, California

James R. Gavin, III, Centro de Neumología Pediátrica, San Juan, PR

Rick A. Kittles, Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte, California

Noah A. Zaitlen, Department of Neurogenetics, University of California, Los Angeles, Los Angeles, California

David S. Wilkes, School of Medicine, University of Virginia, Charlottesville

Neil R. Powe, Department of Medicine, Priscilla Chan and Mark Zuckerberg San Francisco General Hospital, California

Elad Ziv, Department of Medicine, Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center, California

Esteban G. Burchard, Department of Medicine, Department of Bioengineering and Therapeutic Sciences, California.

References

RESOURCES