Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jun 15.
Published in final edited form as: Biol Psychiatry. 2022 Apr 26;91(12):e51–e52. doi: 10.1016/j.biopsych.2022.02.953

From Evolutionary History to the Concepts of Race and Ancestry: Shifting our Perspective in Clinical Research

Angela M Haeny 1, Renato Polimanti 1,2
PMCID: PMC9527646  NIHMSID: NIHMS1838177  PMID: 35483984

Biomedical studies continue to confound race with ancestry and treat the former as a biological indicator of risk as opposed to a social construct (1-4). Thus, research guidelines are needed to ensure researchers understand the difference between racial categories defined by historical events and ancestry reflecting the continuum of human diversity (5).

Racism in the U.S. is a social caste system used to group people based on shared physical or social characteristics that privileges White people and is ever-evolving (6). According to the U.S. Census Bureau, race can be classified into five major categories: Black/African American, American Indian/Alaska Native, Asian, Pacific Islander/Native Hawaiian, and White/European American (8). Further, the U.S. Census Bureau recognizes only one ethnicity: Hispanic/Latine,i though Latine was considered a racial group in the past (9). The Bureau offers an approach for re-categorizing multiracial people into a single category in the following order: Black, Indigenous, Native Hawaiian/Pacific Islander, or Asian.ii That is, if a person identifies as Black and Indigenous, they would be categorized as Black. If someone identifies as White and Asian, they would be categorized as Asian. A person would only be considered White if they do not endorse any other racial category.

There are multiple issues with these categorizations. First, the approach to recategorize multiracial people is consistent with the “one-drop rule” that anyone with a single relative who is Black would be considered Black (10), which stems from racism. Further, ethnicity represents shared food, music, language, history, and cultural traditions that distinguish groups of people (11). Thus, everyone holds an ethnic identity (e.g., Nigerian, Puerto Rican, Italian, Filipino, German, Chinese). For Indigenous peoples, ethnic identity may be better described as tribal identity (e.g., Karuk Tribe, Penobscot Nation, Shawnee Tribe). These racial/ethnic categorizations do not apply across countries and they are fluid. How a person is racially categorized may not represent how they self-identify. Notably, although some White people may self-identify as Caucasian, this is an archaic term referring only to people from the area between the Black and the Caspian seas and should not continue to be used (11,12).

The diversity of human populations represents a continuum reflecting evolutionary history. Homo sapiens originated in Africa around 200,000-300,000 years ago. Early humans migrated across landmasses and created bottlenecks where human genetic variation moved to different continents. This results in populations with the same continental origin (ancestry) with higher molecular similarity to each other than other continental groups. The difference between ancestry groups accounts for ~9% of the human genetic variation while the remaining represents differences among individuals within the same population (~89%) and among populations within the same continental group (~2%) (13). Similar variation distributions are also observed with other molecular domains, including epigenome, transcriptome, proteome, metabolome, and microbiome (14-16).

Ancestry categories reflect a small portion of the molecular diversity of populations that is mostly accounted by inter-individual variability. The assignment of ancestry categories becomes more complicated considering population groups affected by recent admixture events such as those that occurred on the North American continent in the last 500 years. Individuals self-identified as Black or Latine present genomic segments inherited from different ancestral groups with a variable admixture degree. Black people inherited genomic segments from African and European ancestries with variation ranging from 10% to 90% (17). Latine individuals present genomic segments from three ancestries, African (5% to 15%), European (45% to 80%), and Native American (10% to 50%) (18).

Thus, molecular studies should not use racial categories because they do not adequately reflect biological diversity. For example, terms related to skin color categories reflect very poorly the spectrum of skin pigmentation, which is only one of the many products of human adaptation. For these reasons, we strongly recommend investigators focused on human biology to use the geographical origin (ancestry) to refer to the diversity of their participants. To adequately model the patterns observed within ancestry groups and among recently admixed populations, statistical methods should be used that account for human molecular variation to avoid results biased by the unaccounted population structure of the cohorts investigated.

Racially minoritized people are more likely to experience inequitable access to opportunities (e.g., quality education), goods (e.g., nutritious foods), and services (e.g., quality healthcare) (6,19), which contribute to racial disparities in health (6,20). Instead of interpreting race as a risk factor for health outcomes, race should be understood as a proxy for structural inequities. The structural factors that contribute to differences across race categories should be incorporated in studies aiming to address inequities.

We recommend the use of race is limited to ascertaining racial disparities and ensuring representation in studies. We suggest researchers specify how race was operationalized (e.g., an open-ended self-report question or a list of options--specify the list if used). For child studies, specify whether the child or caregiver reported the child’s race given these reports may differ. Longitudinal studies might re-assess race at each time point to capture the fluidity of race given that racial identity development theories (21-23) suggest racial identity can change over time particularly among multiracial people.

Researchers should explicate how race and ethnicity were categorized in their analyses and note the limitations. Consider the pros and cons of combining everyone who identifies as Latine and asking a follow-up question to determine which group multiracial people identify with most vs. using the algorithm vs. maintaining a multiracial subgroup. Specifying how race and ethnicity were assessed and analyzed will inform comparisons of findings across studies.

In summary, race should be used to characterize the sample and identify disparities whereas biological studies should use genetically-informed ancestry classifications (Figure 1). A greater understanding of the implications of race- and ancestry-related terms will better inform biomedical research questions utilizing these constructs. Appropriately using race and ancestry in research will further the science and contribute to ending scientific racism.

Figure 1.

Figure 1.

Differentiating between race and ancestry. MeSH, Medical Subject Headings.

Acknowledgements:

Supported by R25DA035163, K23AA028515, L30DA049246, R33DA047527, R21DC018098 from the National Institutes of Health.

Footnotes

Disclosures: None

i

We used the term Latine as a gender-neutral descriptor that is more widely accepted and used in the Spanish-speaking community than the term Latinx.

ii

See NESARC-III Data Notes: https://www.niaaa.nih.gov/sites/default/files/NESARC-III Data Notesfinal_12_1_14.pdf

References

RESOURCES