Abstract
Black, Hispanic, and Indigenous persons in the United States have an increased risk of SARS-CoV-2 infection and death from COVID-19, due to persistent social inequities. Yet the magnitude of the disparity is unclear because race/ethnicity information is often missing in surveillance data. We quantified the burden of SARS-CoV-2 notification, hospitalization, and case fatality rates in an urban county by racial/ethnic group using combined race/ethnicity imputation and quantitative bias analysis for misclassification. The ratio of the absolute racial/ethnic disparity in notification rates after bias adjustment, compared with the complete case analysis, increased 1.3-fold and 1.6-fold for classified Black and Hispanic persons in reference to classified White persons, respectively. These results highlight that complete case analyses may underestimate absolute disparities in notification rates. Complete reporting of race/ethnicity information is necessary for health equity. When data are missing, quantitative bias analysis methods may improve estimates of racial/ethnic disparities in the COVID-19 burden.
Keywords: SARS-CoV-2, COVID-19, missing data, bias analysis, race/ethnicity disparities, surveillance
Introduction
In the United States, early surveillance reports highlight that persons of Hispanic, Black, and American Indigenous race and ethnicity are disproportionately affected by the COVID-19 pandemic.1 These disparities arise from historical and contemporary social and health inequities that result from structural racism, including racial capitalism—the systemic exploitation of Black, Indigenous, and People of Color by predominantly White institutions for social and economic gain.2–5 In the COVID-19 pandemic, racial capitalism produces structurally unequal exposure to (and protection from) SARS-CoV-2 infection.3
The role of systemic racism in the pandemic motivates the need for accurate surveillance of racial/ethnic disparities in SARS-CoV-2 infection and death. However, there are challenges in estimating COVID-19 racial/ethnic disparities.6,7 Although reports highlight the unequal burden across racial/ethnic groups, the magnitude of disparities is uncertain due to missing race/ethnicity information in surveillance data. In recent reports, race/ethnicity was missing in 56% of confirmed infections nationally, and in 36% in Georgia.8,9 Current surveillance estimates are reported as complete case analyses, which exclude cases with missing race/ethnicity.1,6,9,10 Complete case analyses may bias racial/ethnic disparity estimates if race/ethnicity information is not missing completely at random.11
Beginning in August 2020, the Department of Health and Human Services issued COVID-19 reporting guidelines requiring all labs to report race/ethnicity.12 These guidelines seek to address missing data moving forward, but fail to address missing information for case-patients identified before August. Collecting race/ethnicity information at time of testing is essential for improving our understanding, and ultimately addressing racial/ethnic health disparities. Until complete reporting becomes routine, imputation of missing race/ethnicity combined with quantitative bias analysis to account for misclassification of the imputed race/ethnicity can improve estimates of the COVID-19 burden among racial/ethnic groups when race/ethnicity data are missing.13 In this study, we calculate SARS-CoV-2 notification, hospitalization, and case fatality rates by race/ethnicity group and report the absolute racial/ethnic disparities in SARS-CoV-2 notification rates in Fulton County, Georgia accounting for missing race/ethnicity information.
Methods
Fulton County, Georgia includes the city of Atlanta and residents identify as Black (44%), White (40%), Hispanic (7%), Asian (7%), and Other races/ethnicities (2%).14 Between 2 March 2020 and 18 August 2020, 19,623 cases of SARS-CoV-2 infection were reported among Fulton County residents. Case reports included the case-patients’ residential address, full name, race/ethnicity, hospitalization (yes/no/unknown), and death (yes/no/unknown). We use the term “case-patient” to capture the definitions of both case—an occurrence of a clinical condition—and patient—an individual with a clinical condition; the term is commonly used in surveillance of disease outbreaks by public health organizations.15 Fulton County Board of Health staff geocoded case-patients’ address to census block groups. For this analysis, we categorized reported race/ethnicity as Hispanic (any race), and non-Hispanic Black, Asian, White, or Other. The Other race/ethnicity category included Indigenous Americans (1.3%), Native Hawaiian/Other Pacific Islanders (1.5%), and those who reported their race as “Other” (97%).16
We used quantitative bias analysis to account for missing race/ethnicity. Quantitative bias analysis entails imputation of race/ethnicity for case-patients who were missing this information, and then bias-adjusting estimates to account for the imputation algorithm’s misclassification of race/ethnicity. Hereafter, we refer to race/ethnicity as reported when provided in case-patient records, imputed when referring to the imputed case-patient race/ethnicity, and classified when referring to the combined reported and imputed race/ethnicity after bias adjustment.
First, for all case-patients with complete race/ethnicity information (n=12,492, 64%), we imputed race/ethnicity using the Bayesian Improved Surname Geocoding method to validate this method in our study population and to generate estimated values for bias parameters to be used in the quantitative bias analysis.14 The Bayesian Improved Surname Geocoding method is the current standard method for race/ethnicity prediction.17,18 This method estimates the probability that a person belongs to one of five racial/ethnic groups (Black, Hispanic, Asian, White or Other) based on the person’s surname and residential census block group, the population distribution of race/ethnicity in the census block, and race/ethnicity associated with a national list of surnames. The approach was previously validated with data from nearly 2 million individuals and imputed race/ethnicity was correlated (0.76) with self-reported race/ethnicity.17,18 However, replication has been inconsistent across other studies.19 We addressed imperfect imputation with probabilistic bias analysis. Imputation was performed using the R package “wru,” which includes the 2010 surname census distribution. The geographic distribution of race/ethnicity came from the 2018 5-year American Community Survey.20,21 We calculated predictive values (PV) for each imputed race/ethnic group using reported race/ethnicity as the gold standard. The PV is the probability that a person’s reported race/ethnicity group classification was correctly imputed.13
Second, among case-patients with missing race/ethnicity, we imputed the race/ethnicity category and used the PV values from the validation study to bias-adjust quantitatively for the expected misclassification of the imputed race/ethnicity groups. We assigned each race/ethnicity group PV from the validation study to a Dirichlet distribution (Table 1).13,22,23 Among those with imputed race/ethnicity, we reclassified individuals over 100,000 iterations using probabilistic bias analysis.13 The approach uses Monte Carlo sampling techniques to generate frequency distributions of the bias-adjusted estimates to account for inaccurate assignment of case-patients to a race/ethnicity group by the Bayesian Improved Surname Geocoding method. Sampling error was incorporated into the estimates using bootstrap approximation from a standard normal distribution.13
Table 1:
Imputed Race/Ethnicity | ||||||
---|---|---|---|---|---|---|
Black | Hispanic | Asian | White | Other | ||
|
||||||
Reported Race/Ethnicity | Black | 5106 | 68 | 13 | 1754 | 11 |
Hispanic | 77 | 1288 | 16 | 230 | 6 | |
Asian | 16 | 15 | 145 | 80 | 4 | |
White | 192 | 103 | 28 | 2818 | 2 | |
Other | 135 | 69 | 12 | 303 | 1 | |
Total | 5,526 | 1,543 | 214 | 5,185 | 24 | |
|
||||||
PV % (95% CI) | 92% (92%, 93%) | 83% (82%, 85%) | 68% (61%, 74%) | 54% (53%, 56%) | 3.0% (0.1%, 15%) | |
|
For both the complete case and bias-adjusted analyses, we calculated the SARS-CoV-2 notification rates (per 1,000 persons), hospitalization proportions (hospitalized cases/reported cases), and case fatality rates (deaths/reported cases) by race/ethnicity group. We reported 95% confidence intervals (CI) for the complete case analysis. For the bias-adjusted estimates, we reported the median with 95% simulation intervals (SI), which account for the potential misclassification of imputed race/ethnicity and sampling error. We calculated the differences in SARS-CoV-2 notification rates in each race/ethnicity group compared with persons of White race/ethnicity, among case-patients with reported race/ethnicity information, and among all case-patients after bias adjustment. To estimate the magnitude of the change in the absolute disparity after accounting for missing race/ethnicity information, we computed the relative change. We divided the absolute disparity accounting for missing race/ethnicity by the absolute disparity from the complete case analysis. All analyses used R v3.6 (Vienna, Austria). The Georgia Department of Health determined this activity to be consistent with public health surveillance, which does not require informed consent or IRB approval.
Results
Among the 19,623 cases reported in Fulton County from 2 March to 18 August 2020, 7,131 (36%) were missing race/ethnicity information in the case report. Data were more complete among the 1,776 hospitalized case-patients, where only 14 (3.5%) were missing race/ethnicity information. All deceased case-patients (n=456) had complete information on race/ethnicity.
Comparison of reported versus imputed race/ethnicity group showed that the algorithm’s imputation accuracy varied by race/ethnicity group (Table 1). Of the 5,526 persons who were imputed as Black race/ethnicity, 92% (95%CI: 92%, 93%) were reported as Black in case reports. Among persons imputed as Hispanic ethnicity, 83% (95%CI: 82%, 85%) were reported as Hispanic. The algorithm was less accurate for case-patients with race/ethnicity imputed as Asian (PV=68%, 95%CI: 61%, 74%) and as White (PV=54%, 95%CI: 53%, 56%). The PV estimates for racial/ethnic groups changed over time, likely due to changes in the prevalence of demographic groups affected by the pandemic over time (Supplemental Table 1).
In both the complete case and bias-adjusted analyses, the SARS-CoV-2 notification rates were highest among those classified as Other, followed by Hispanic, Black, White, and Asian (Table 2a and 2b). Imputation and bias adjustment yielded higher estimates of notification rates for each racial/ethnic group than complete case analysis because more case-patients were included in the numerator. Estimated notification rates increased 1.8-fold for persons classified as Asian, 1.7-fold for White, 1.7-fold for Hispanic, 1.6-fold for Other, and 1.5-fold for Black. Hospitalization proportions and case fatality rates decreased across all race/ethnicity groups with bias adjustment compared with the complete case analyses, because more cases were included in the denominator. In both the complete case and bias-adjusted analyses, case-patients who were classified as Black race/ethnicity had the highest hospitalization proportions (complete case: 17%, 95%CI: 16%, 18%; bias-adjusted: 12%, 95%SI: 11%, 12%) and case fatality rates (complete case: 4.6%, 95%CI: 4.1%, 5.1%; bias-adjusted: 3.1%, 95%SI: 2.8%, 3.4%).
Table 2a:
Race/Ethnicity | Total infections | Hospitalized | Died | At Riska | Notification rate per 1,000 (95%CI) | Hospitalized proportion (95%CI) | Case Fatality Rate as a proportion (95%CI) |
---|---|---|---|---|---|---|---|
Asian | 260 | 25 | 5 | 69987 | 3.7 (3.3, 4.2) | 9.6 (6.2, 14) | 1.9 (0.4, 3.8) |
Hispanic | 1617 | 214 | 15 | 74328 | 22 (21, 23) | 13 (12, 15) | 0.9 (0.5, 1.4) |
Black | 6952 | 1192 | 320 | 445992 | 16 (15, 16) | 17 (16, 18) | 4.6 (4.1, 5.1) |
White | 3143 | 312 | 112 | 406755 | 7.7 (7.5, 8.0) | 9.9 (8.9, 11) | 3.6 (2.9, 4.2) |
Other | 520 | 30 | 4 | 6056 | 86 (79, 93) | 5.8 (3.8, 7.9) | 0.8 (0.2, 1.5) |
Table 2b:
Race/Ethnicity | Total infections (95%SI) | Hospitalized | Died | At Riska | Notification rate per 1,000 (95%SI) | Hospitalized proportion (95%SI) | Case Fatality Rate as a proportion (95%SI) |
---|---|---|---|---|---|---|---|
Asian | 456 (438, 474) | 25 | 5 | 69987 | 6.5 (5.9, 7.2) | 5.5 (3.4, 7.6) | 1.1 (0.1, 2.1) |
Hispanic | 2,687 (2,657, 2717) | 214 | 15 | 74328 | 36 (35, 38) | 8.0 (6.9, 9.0) | 0.6 (0.3, 0.8) |
Black | 10,351 (10,301, 10,402) | 1195 | 320 | 445992 | 23 (23, 24) | 12 (11, 12) | 3.1 (2.8, 3.4) |
White | 5,284 (5,232, 5,337) | 312 | 112 | 406755 | 13 (13, 13) | 5.9 (5.3, 6.5) | 2.1 (1.7, 2.5) |
Other | 844 (817, 873) | 30 | 4 | 6056 | 139 (130, 149) | 3.6 (2.3, 4.8) | 0.5 (0.0, 0.9) |
American Community Survey 5-year 2018 estimates
The magnitude of the absolute disparity—difference in SARS-CoV-2 notification rates for case-patients classified in each race/ethnicity group compared with case-patients classified as White—increased in the bias-adjusted analysis relative to the complete case analysis for nearly all race/ethnicity groups (Table 3). When comparing bias-adjusted with complete case results, the absolute disparity in notification rates increased 1.3-fold among classified Black and 1.6-fold among classified Hispanic race/ethnicity groups in reference to case-patients classified as White.
Table 3:
Complete Case | Bias-Adjusted | ||||
---|---|---|---|---|---|
Race/Ethnicity | Notification rate per 1,000 (95%CI) | RD per 1,000 (95%CI) | Notification rate per 1,000 (95%SI) | RD per 1,000 (95%SI) | Relative change in magnitude of disparitya |
Asian | 3.7 (3.3, 4.2) | −4.0 (−4.5, −3.5) | 6.5 (5.9, 7.2) | −6.5 (−6.8,−6.2) | 0.6 |
Hispanic | 22 (21, 23) | 14 (13, 15) | 36 (35, 38) | 23 (23, 23) | 1.7 |
Black | 16 (15, 16) | 7.9 (7.4, 8.3) | 23 (23, 24) | 10 (10, 10) | 1.3 |
White | 7.7 (7.5, 8.0) | Reference | 13 (13, 13) | Reference | |
Other | 86 (79, 93) | 78 (71, 85) | 139 (130, 149) | 126 (122, 131) | 1.6 |
Estimated as the ratio of the bias-adjusted absolute disparity to the ratio of the complete case absolute disparity
Discussion
In this study, accounting for missing race/ethnicity information revealed greater differences in SARS-CoV-2 notification rates comparing most racial/ethnic groups with case-patients classified as White race. These results suggest that national estimates, which exclude case-patients with missing race/ethnicity information, may underestimate the magnitude of absolute racial/ethnic disparities in COVID-19 morbidity and mortality.7,9
Our results underscore the need for imputation combined with bias adjustment. In our study population, the PV estimates indicated that imputation without bias adjustment overestimated infections among case-patients classified as White and underestimated infections among case-patients classified as Black (Table 1). Since race/ethnicity information is relatively complete for hospitalized and deceased cases, an analysis based on imputed race/ethnicity without bias adjustment would underestimate the hospitalized proportions and case fatality rates in classified White case-patients and overestimate these measures in classified Black case-patients. Our bias-adjusted estimates account for this expected misclassification.
Notably, both the complete case analysis and the bias-adjusted estimates demonstrate important absolute racial/ethnic disparities in the notification rates. The bias-adjusted estimates do not change our understanding of the direction of racial/ethnic disparities in the COVID-19 pandemic; however, the magnitude of racial/ethnic disparities changed meaningfully after bias adjustment. In contrast, the hospitalization proportions and case fatality rates decreased across all classified race/ethnicity groups after accounting for missing race/ethnicity information because few hospitalized or deceased case-patients were missing race/ethnicity information. These results highlight the need for more complete reporting so that health equity and racial justice efforts aimed at addressing these disparities operate on the most accurate data possible.
The imputation of race/ethnicity has limitations. The Bayesian Improved Surname Geocoding algorithm limits the racial/ethnic groups that can be imputed to Black, Hispanic, Asian, White, or Other.16–18 The reliance on categories of ‘other’ is problematic for identifying and addressing disparities in other racial/ethnic populations (e.g. Indigenous populations). Future studies should explore how accounting for missing race/ethnicity affects other disease burden measures. Additionally, we assumed that the Bayesian Improved Surname Geocoding algorithm performs equally well among those with reported race/ethnicity as among those with missing race/ethnicity. Given that the data used to inform the imputed race/ethnicity are external to the study population, this is a reasonable assumption.16–18 Last, our results are conditioned on being tested. Although testing capacity has increased across most states, it was difficult to receive testing at the beginning of the pandemic. Therefore, our estimates of disparities in SARS-CoV-2 notification rates may not fully capture the underlying disparities in SARS-CoV-2 infection rates.
Our findings emphasize the importance of collecting complete race/ethnicity data at the time of testing, for the current pandemic and future outbreaks. When data are missing, Bayesian Improved Surname Geocoding combined with quantitative bias analysis may provide better estimates of the racial/ethnic disparities in SARS-CoV-2 notification rates, hospitalization proportions, and case fatality rates.
Supplementary Material
Financial Support:
This work was supported in part by the US National Institutes of Health F31CA239566 (PI L. J. Collin), R01LM013049 (PI T. L. Lash), and K24AI114444 (PI N. R. Gandhi). It was also supported by a grant from the Robert W. Woodruff foundation (PI A. Chamberlain). K. Labgold is supported in part by the Center for Reproductive Health Research in the Southeast (RISE) Doctoral Fellowship and an ARCS Foundation Award. S. Hamid was supported in part by the U.S. National Institutes of HAPIN trial, which is funded by the U.S. National Institutes of Health (cooperative agreement 1UM1HL134590) in collaboration with the Bill & Melinda Gates Foundation (OPP1131279). L Collin was also supported in part by TL1TR002540 from the National Center for Advancing Translational Sciences of the National Institutes of Health
Footnotes
Conflicts of Interest: The authors have no conflicts of interest to declare.
Data Access: Due to patient confidentiality, data are only available upon request from the Fulton County Board of Health and with IRB approval from the Georgia Department of Public Health. Example code used to perform the imputation and bias adjustment is available on GitHub (https://github.com/lcolli5/Adaptive-Validation).
References
- 1.Stokes EK, Zambrano LD, Anderson KN, et al. Coronavirus Disease 2019 Case Surveillance - United States, January 22-May 30, 2020. MMWR Morb Mortal Wkly Rep. 2020;69(24):759–765. doi: 10.15585/mmwr.mm6924e2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Health Equity Considerations & Racial & Ethnic Minority Groups. National Center for Immunization and Respiratory Diseases (NCIRD), Division of Viral Diseases. https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/racial-ethnic-minorities.html. Published 2020. Accessed July 17, 2020.
- 3.McClure ES, Vasudevan P, Bailey Z, Patel S, Robinson WR. Racial Capitalism within Public Health: How Occupational Settings Drive COVID-19 Disparities. Am J Epidemiol. 2020:113–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Egede LE, Walker RJ. Structural Racism, Social Risk Factors, and Covid-19 — A Dangerous Convergence for Black Americans. N Engl J Med. 2020;383(12):e77(1)–e77(3). doi: 10.1056/NEJMp2023616 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Laster Pirtle WN. Racial Capitalism: A Fundamental Cause of Novel Coronavirus (COVID-19) Pandemic Inequities in the United States. Heal Educ Behav. 2020;47(4):504–508. doi: 10.1177/1090198120922942 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Servik K ‘Huge hole’ in COVID-19 testing data makes it harder to study racial disparities. Science (80- ). July 2020. doi:0.1126/science.abd7715 [Google Scholar]
- 7.Cowger TL, Davis BA, Etkins OS, et al. Comparison of Weighted and Unweighted Population Data to Assess Inequities in Coronavirus Disease 2019 Deaths by Race/Ethnicity Reported by the US Centers for Disease Control and Prevention. JAMA Netw open. 2020;3(7):e2016933. doi: 10.1001/jamanetworkopen.2020.16933 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Georgia Department of Public Health COVID-19 Daily Status Report. https://dph.georgia.gov/covid-19-daily-status-report. Published 2020. Accessed July 18, 2020.
- 9.Oppel R, Gebelhoff R, Lai K, Wright W, Smith M. The Fullest Look Yet at the Racial Inequity of Coronavirus. New York Times. https://www.nytimes.com/interactive/2020/07/05/us/coronavirus-latinos-african-americans-cdc-data.html?campaign_id=2&emc=edit_th_20200706&instance_id=20039&nl=todaysheadlines®i_id=71026656&segment_id=32674&user_id=c99fb3a6b3b754c. Published 2020. Accessed July 18, 2020. [Google Scholar]
- 10.Wu SL, Mertens AN, Crider YS, et al. Substantial underestimation of SARS-CoV-2 infection in the United States. Nat Commun. 2020. doi: 10.1038/s41467-020-18272-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Perkins NJ, Cole SR, Harel O, et al. Principled Approaches to Missing Data in Epidemiologic Studies. Am J Epidemiol. 2017;187(3):568–575. doi: 10.1093/aje/kwx348 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.The Coronavirus Aid, Relief, and Economic Security (CARES) Act. United States; 2020. [Google Scholar]
- 13.Lash TL, Fox MP, Fink AK. Applying Quantitative Bias Analysis to Epidemiologic Data. (Gail M, Krickeberg K, Samet J, Tsiatis A, Wong W, eds.). New York: Springer; 2009. doi: 10.1007/978-0-387-87959-8 [DOI] [Google Scholar]
- 14.American Community Survey: Hispanic or Latino Origin by Race. The United States Census Bureau. https://data.census.gov/cedsci/table?t=RaceandEthnicity&g=0500000US13121&tid=ACSDT5Y2018.B03002&moe=false&hidePreview=true. Published 2020. Accessed August 19, 2020. [Google Scholar]
- 15.Panter M Case Reports: Terminology and Phrasing. doi: [DOI]
- 16.The United States Census Bureau. American Community Survey and Puerto Rico Community Survey: 2018 Subject Definitions.; 2018. https://www2.census.gov/programs-surveys/acs/tech_docs/subject_definitions/2018_ACSSubjectDefinitions.pdf.
- 17.Elliott MN, Fremont A, Morrison PA, Pantoja P, Lurie N. A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity. Health Serv Res. 2008;43(5 P1):1722–1736. doi: 10.1111/j.1475-6773.2008.00854.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Elliott M, Morrison PA, Fremont A, McCaffrey D, Pantoja P, Lurie N. Using the Census Bureau’s Surname List to Improve Estimates of Race/Ethnicity and Associated Disparities. Heal Serv Outcomes Res Methodol. 2009;9(2):69–83. https://www.rand.org/pubs/external_publications/EP20090611.html. [Google Scholar]
- 19.Adjaye-Gbewonyo D, Bednarczyk RA, Davis RL, Omer SB. Using the bayesian improved surname geocoding method (BISG) to create a working classification of race and ethnicity in a diverse managed care population: A validation study. Health Serv Res. 2014;49(1):268–283. doi: 10.1111/1475-6773.12089 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.About the American Community Survey. US Census Bureau. https://www.census.gov/programs-surveys/acs/about.html. Published 2020. Accessed July 8, 2020. [Google Scholar]
- 21.Khanna K, Imai K. Package ‘ wru ‘: Who are You? Bayesian Prediction of Racial Category Using Surname and Geolocation. 2019. https://cran.r-project.org/web/packages/wru/wru.pdf.
- 22.Lash TL, Fox MP, Thwin SS, et al. Using probabilistic corrections to account for abstractor agreement in medical record reviews. Am J Epidemiol. 2007;165(12):1454–1461. doi: 10.1093/aje/kwm034 [DOI] [PubMed] [Google Scholar]
- 23.Maclehose RF, Bodnar LM, Meyer C, Chu H, Lash TL. Hierarchical semi-Bayes methods for misclassification in perinatal epidemiology. Epidemiology. 2018;29(2):183–190. doi: 10.1097/EDE.0000000000000789 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.