Skip to main content
. 2022 Mar 1;17(3):e0264270. doi: 10.1371/journal.pone.0264270

Table 4. General recommendations for implementing a name-based inference of race for U.S. authors.

Do’s Don’ts
Given Names Use only family names from U.S. Census to avoid bias. Do not use given names, except when the underlying distribution of your dataset matches that of mortgage data.
Thresholding Consider each person in your data as a distribution and adapt your summary statistics. Do not use a threshold for categorical classification of each person, as this under-represents Black population, due to the correlation between racial groups and name informativeness.
Imputation If needed, calculate first the aggregated distribution on your dataset, and use this for imputation of missing cases. Acknowledge the potential bias of imputation. Do not use the census aggregate distribution for imputation, except when your target population matches the U.S. population.