Given Names
|
Use only family names from U.S. Census to avoid bias. |
Do not use given names, except when the underlying distribution of your dataset matches that of mortgage data. |
Thresholding
|
Consider each person in your data as a distribution and adapt your summary statistics. |
Do not use a threshold for categorical classification of each person, as this under-represents Black population, due to the correlation between racial groups and name informativeness. |
Imputation
|
If needed, calculate first the aggregated distribution on your dataset, and use this for imputation of missing cases. Acknowledge the potential bias of imputation. |
Do not use the census aggregate distribution for imputation, except when your target population matches the U.S. population. |