Skip to main content
BMJ Health & Care Informatics logoLink to BMJ Health & Care Informatics
. 2023 Jul 24;30(1):e100703. doi: 10.1136/bmjhci-2022-100703

Social vulnerability and initial COVID-19 community spread in the US South: a machine learning approach

Moosa Tatar 1,, Mohammad Reza Faraji 2, Fernando A Wilson 3,4,5
PMCID: PMC10373713  PMID: 37487688

Abstract

Background and objectives

More than 93 million COVID-19 cases and more than 1 million COVID-19 deaths have been reported in the USA by August 2022. The disproportionate effect of the pandemic and its severe impact on vulnerable communities raised concerns. This research aimed to identify and rank Social Vulnerability Index (SVI) factors highly predictive of the spread of COVID-19 in the US South at the beginning of the pandemic.

Methods

We used Extreme Gradient Boosting (XGBoost) machine learning methodology and SVI data, and the number of COVID-19 cases across all counties in the US South to predict the number of positive cases within 30 days of a county’s first case.

Results

Our results showed that the percentage of mobile homes is the most important feature in predicting the increase in COVID-19. Also, population density per square mile, per capita income, percentage of housing in structures with 10+ units, percentage of people below poverty and percentage of people with no high school diploma are important predictors of COVID-19 community spread, respectively.

Conclusions

SVI can help assess the vulnerability or resilience of communities to the spread of COVID-19 and can help identify communities at high risk of COVID-19 spread.

Keywords: COVID-19


WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Social and economic factors influence vulnerability to infection and health outcomes and the severe impact of COVID-19 on vulnerable communities.

WHAT THIS STUDY ADDS

  • Percentage of mobile homes within a county, population density per square mile and per capita income are important predictors of community spread of COVID-19.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • The Social Vulnerability Index can help assess the resilience of communities to the spread of COVID-19 and can help identify communities at high risk of COVID-19 spread.

Introduction

More than 93 million COVID-19 cases and more than 1 million COVID-19 deaths have been reported in the USA by August 2022.1 The pandemic has disproportionally affected minority communities at the local level.2 Even at the early stages of the pandemic, the severe impact of COVID-19 on vulnerable communities raised concerns.3 Historically, poverty, inequalities and social determinants of health facilitate the spread of infectious diseases.4 There is evidence that socioeconomic factors may influence the spatial spread of COVID-19 at the county level.5 Past pandemics also have shown that social and economic factors influence vulnerability to infection and health outcomes.6 Further, individuals residing in deprived neighbourhoods (ie, neighbourhoods with higher poverty, lower education, low housing quality and low employment rates) had a higher risk of COVID-19 infection.7 Also, a recent study analysed the association of social, economic and demographic factors in the initial spread of COVID-19 and reported that social and economic factors are strongly and positively associated with COVID-19.8

Many communities in the US South have substantial social vulnerabilities that may worsen the impact of COVID-19. In recent weeks, the US South has become a major region of community spread, ranging from Florida to Texas (figure 1). While studies suggest effective policies, including lockdowns and mandatory mask use, that are effective for controlling the spread of COVID-19 in communities,9 10 in several of these states, lack of consistent and effective public policies to mitigate infection spread has been a source of debate. In Georgia, for example, the governor filed a lawsuit (later dropped) against the mayor of Atlanta in order to prevent the latter’s enforcement of a mask mandate.11 The city of Atlanta is racially diverse and minority communities have experienced both high rates of poverty and other socioeconomic vulnerabilities as well as COVID-19 community spread.

Figure 1.

Figure 1

County-level distribution of COVID-19 cases in the US South (August 2020). US South region includes the states of Alabama, Arkansas, Florida, Georgia, Louisiana, Mississippi, North Carolina, Oklahoma, South Carolina, Tennessee and Texas.

Social vulnerability is the resilience of communities against disease outbreaks and natural or human-caused disasters.12 It is applicable to identify communities most at risk when faced with adverse events that may impact health (eg, disease outbreaks). Social vulnerability refers to socioeconomic and demographic factors that affect a community’s ability and power to prevent human suffering in the event of disaster or outbreaks. The Centers for Disease Control and Prevention (CDC) categorises these socioeconomic and demographic factors into four overall vulnerability domains: socioeconomic status, household composition, and disability, minority status and language, and housing type and transportation.13 The Social Vulnerability Index (SVI) provides social and spatial information to help public health officials and local emergency response planners to identify communities at high risk of being adversely affected during a crisis.13 This information helps communities to prepare for a better response to emergency events especially disease outbreaks.12 13 SVI was associated with increased rates of COVID-19.14 Also, counties with the highest SVI had a greater risk of COVID-19 infection and death,3 and most vulnerable counties had higher death rates, especially at the beginning of the pandemic.15

Although race/ethnic minority communities have been disproportionately impacted by COVID-19,3 6 16 17 the role of specific social vulnerabilities such as poverty, housing insecurity and other issues faced in these communities that contribute to the spread of infection at the beginning of the pandemic and spread of the COVID-19 virus is unclear. To address this gap in knowledge, we use machine learning-based analyses of the SVI data to identify and rank SVI factors that are highly predictive of the spread of COVID-19 cases at the county level across 11 states in the US South.

Methods

Study setting and design

This machine learning-based study included COVID-19 cases and 16 social vulnerability features for all counties across 11 US states located in the South, including: Alabama, Arkansas, Florida, Georgia, Louisiana, Mississippi, North Carolina, Oklahoma, South Carolina, Tennessee and Texas (online supplemental figures A1, A2). To investigate the association of social vulnerability factors and the spread of COVID-19 at the county level, we use an effective prediction algorithm regression method. We regress the number of COVID-19 cases 30 days after the first confirmed COVID-19 case in each county against social vulnerability features (detailed below). We chose to examine the US South because of the number of major COVID-19 ‘hot spots’ located in that region as well as the region’s long-standing historical socioeconomic inequities across minority and non-minority communities.18

Supplementary data

bmjhci-2022-100703supp001.pdf (434.2KB, pdf)

Study sample and data

We used daily COVID-19 cases from January 2020 to August 2020 from the official website of Johns Hopkins University’s Coronavirus Resource Center.1 For each county in the US South (1086 counties), we identified the number of COVID-19 cases 30 days after their first COVID-19 case was confirmed.

We also used the latest SVI data available from the CDC released in 2018.13 We used 16 social vulnerability features as independent variables: percentage of people below poverty, unemployment rate, per capita income, percentage of people with no high school diploma, percentage of people aged 65 and older, percentage of people aged 17 and younger, percentage of non-institutionalised people with a disability, percentage of single-parent households with children, percentage of minority people (except white, non-Hispanic), percentage of people aged 5+ who speak limited English, percentage of housing in structures with 10+ units, percentage of mobile homes, percentage of overoccupied housing units, percentage of households with no vehicle available, percentage of institutionalised group quarters (eg, correctional institutions, nursing homes) and population density per square mile (see online supplemental table A1 for definitions). All data used in the manuscript are publicly available.

Statistical analysis

We used Extreme Gradient Boosting (XGBoost) to predict the number of positive cases within 30 days of a county’s first case. XGBoost is a scalable machine learning system using gradient tree boosting which is available as an open source software package.19 Chen and Guestrin presented the XGBoost algorithm in 2016.20 XGBoost is a highly effective and widely used machine learning method that can be used for regression, classification and prediction.20 Gradient boosted decision trees (GBDT) are an ensemble learning method (ie, a method that aggregates the predictions of a group of predictors) which uses decision trees as their base predictor and sequentially adds decision trees to the ensemble, while each added tree improves the fit of its predecessor to the data.21 XGBoost benefits from several innovations and optimisation techniques to add scalability to GBDT, making it faster and yielding better performance. In this study, the XGBoost algorithm is used to predict COVID-19 cases as the sum of predictions from thousands of individual decision trees, with each trained on the residual of all previous trees and making marginal improvements to the overall model prediction.19 21

While XGBoost learns from the training data and makes predictions with the testing data, it also uses different importance metrics to produce an importance matrix that contains the information gain, cover and frequency of features that have been actually used in the boosted trees. The interpretation of prediction results and how features contribute to the prediction is based on these three importance metrics. Gain is the most relevant attribute to interpret the relative importance of each feature and denotes the relative contribution of a feature in explaining variation in outcomes within the model, that is, a higher feature gain implies that the feature is more important for generating the prediction. Cover denotes the average coverage (the relative number of counties affected) of splits which use a specific feature. It simply corresponds to the percentage of the counties which the feature is used to decide the leaf node for them. Frequency is the percentage representing the relative number of times a specific feature occurs across all the trees estimated within the model.22 All measures are reported as relative amounts and hence all sum up to 1.

A subset of 869 counties (80% of the total 1086 counties) were used as our training data set, and 217 counties (20% of all counties) were used for our testing data set. We used 10-fold cross-validation, which is a commonly used statistical method in applied machine learning methods, to tune the model’s hyperparameters. Cross-validation assesses how the results of a statistical analysis will generalise to an independent data set and tests the model’s ability to predict with a new data set. It also points out problems like overfitting or selection bias.23 Tenfold cross-validation divided the training sample into 10 parts; the model is trained on nine parts (90% of the 869 counties), and performance is measured by the ability to accurately predict COVID-19 cases by the remaining part (the other 10% of 869 counties). When the hyperparameters of the XGBoost model are tuned, the XGBoost is trained using the tuned parameters on all the 869 counties. Finally, the model is used to predict the outcomes (ie, number of positive COVID-19 cases after 30 days of the county’s first confirmed case) for the test data (ie, the 217 counties). We also conducted a SHapley Additive exPlanations (SHAP) analysis to explain the predictions of machine learning models. A positive SHAP value means a positive impact of the features on prediction. Finally, for the sensitivity analysis the model was used to predict the outcomes that was number of positive COVID-19 cases after 60 days of the county’s first confirmed case. We used the RStudio V.4.0.2 (R Core Team, 2020) statistical package for all analyses.

Results

Table 1 provides sample characteristics of the 16 SVIs and COVID-19 cases and COVID-19 rates per 100 000 population after 30 days of the first COVID-19-positive cases in all counties in the 11 states of the US South (1086 counties). On average, 85.3 COVID-19 cases were reported after 30 days of the first reported case in a county, and a maximum of 6119 COVID-19 cases after 30 days of the first case in a county. Also, on average, 139.5 COVID-19 cases per 100 000 population were reported after 30 days of the first reported case, and a maximum of 4026.8 COVID-19 cases per 100 000 population after 30 days of the first case in a county.

Table 1.

Descriptive statistics of the 16 SVIs and COVID-19 cases and COVID-19 rates per 100 000 population after 30 days of the first COVID-19-positive cases in all counties in the US South (1086 counties)

Feature Min Median Mean Max SD
Below poverty, % 2.6 17.9 18.8 49.7 6.4
Unemployment rate, % 0 6.4 6.8 25.8 2.8
Per capita income 12 292 23 540 24 183 50 931 5078
No high school diploma, % 4.4 16.9 17.5 66.3 6.1
Aged 17 and younger, % 7.3 22.9 22.8 36.6 3.2
Non-institutionalised with a disability, % 5.3 17.2 17.3 31 4.1
Single-parent households with children, % 0 9.1 9.3 22.7 2.8
Minority (except white, non-Hispanic), % 1.1 34.1 35.4 99.3 19.7
Aged 5+ who speak limited English, % 0 1.1 2.2 30.4 3.4
Housing in structures with 10+ units, % 0 1.8 3.5 38.5 4.7
Mobile homes, % 0.5 18.0 18.9 59.3 9.9
Overoccupied housing units, % 0 2.4 2.8 21.2 1.8
Households with no vehicle available, % 0 5.9 6.4 20.3 2.9
Institutionalised group quarters, % 0 1.8 3.9 36.5 5.3
Population density per square mile 0.2 49.2 147.1 3499.1 318.9
COVID-19 cases after 30 days 0 23 85.3 6119 317.7
COVID-19 cases per 100 000 population after 30 days 0 63.6 139.5 4026.8 250.9

SVI, Social Vulnerability Index.

To evaluate the accuracy of our model, we tested the reliability of our predictions on 217 counties in the test data set. Goodness of fit and prediction evaluation (adjusted R-squared=0.59, root mean square error (RMSE)=92.36) indicates that the model was robust (online supplemental table A2). Online supplemental figure A5 also shows calibration plot of the predicted versus observed COVID-19 rates. Figure 2 shows the result of XGBoost gain relative importance. The percentage of mobile homes in counties is the most important feature, followed by population density per square mile and per capita income, in predicting the growth of COVID-19 within 30 days of the first case. The relative contributions of percentage of mobile homes, population density per square mile and per capita income to the model for generating predictions are 0.35, 0.12 and 0.12, respectively. Percentage of housing in structures with 10+ units, percentage of people below poverty and percentage of people with no high school diploma have relative contributions of 0.10, 0.08 and 0.04, respectively. The percentage of overoccupied housing units and the percentage of institutionalised group quarters are the least important features in the model with relative gains of 0.003 and 0.002, respectively.

Figure 2.

Figure 2

Extreme Gradient Boosting (XGBoost) gain relative importance. The measures are all reported as relative amounts and all sum up to 1.0.

The relative cover for percentage of mobile homes, population density per square mile and per capita income is 0.09, 0.12 and 0.07, respectively, which shows the relative proportion of counties in our sample that include these features across all the decision trees (online supplemental figure A3). Also, the relative cover for percentage of housing in structures with 10+ units, percentage of people below poverty and percentage of people with no high school diploma is 0.7, 0.06 and 0.06, respectively. Relative frequency is calculated as the proportion of decision tree nodes that include a specific feature. The result of relative frequency shows that percentage of mobile homes, population density per square mile and per capita income occurred in 0.069, 0.093 and 0.079 of nodes within the trees of the model, respectively (online supplemental table A4). In addition, percentage of housing in structures with 10+ units, percentage of people below poverty and percentage of people with no high school diploma accounted for 0.059, 0.085 and 0.061 of nodes in the trees of the model, respectively. Additional XGBoost feature importance matrix details can be found in online supplemental table A3. Figure 3 shows the results of the SHAP analysis. Population density per square mile, percentage of housing in structures with 10+ units and percentage of people below poverty had the most positive impact on the number of COVID-19 cases in a county. Also, per capita income and aged 17 and younger features had the most negative impact on the number of COVID-19 cases in a county.

Figure 3.

Figure 3

Shapley additive explanations (SHAP) analysis results.

Online supplemental table A4 shows the result of XGBoost gain relative importance after 60 days of the county’s first COVID-19 case. The population density per square mile in counties is the most important feature in predicting the growth of COVID-19 within 60 days of the first case with a relative gain of 31.8%. This is followed by percentage of housing in structures with 10+ units and percentage of mobile homes, with relative gains of 30.4% and 11.2%, respectively. Also, percentage of people aged 65 and older, per capita income and percentage of people aged 5+ who speak limited English have relative contributions of 5.5%, 4.9% and 2.6%, respectively. Additional XGBoost feature importance matrix details can be found in online supplemental table A4.

Discussion

Our machine learning study used SVI data and number of COVID-19 cases across all counties in the US South to analyse the association of social vulnerability features in predicting the community spread of infection. Our analysis suggests that the percentage of mobile homes within a county is the most important feature in predicting the increase in COVID-19. This was followed by population density per square mile and per capita income. Percentage of housing in structures with 10+ units, percentage of people below poverty and percentage of people with no high school diploma were also important predictors of community spread. However, the percentage of large, multifamily housing units and the percentage of institutionalised group quarters were the least important features in predicting COVID-19 spread at the county level.

Our findings are consistent with the results from prior studies that investigated COVID-19 cases and socioeconomic factors and considered the impact of the pandemic on racial and ethnic minorities.2 3 16 24 25 Studies report a disproportionate rate of infections and deaths among non-Hispanic Blacks and Hispanics.2 25 For example, a recent study found that minority status and language, household composition and transportation, and housing and disability were associated with the number of COVID-19 cases in the USA.25 Poverty, crowded housing and lack of vehicle ownership were reported to be associated with increased COVID-19 cases and deaths in urban areas. Also, high population densities catalyse the spread of COVID-19; therefore, avoiding situations with higher population densities will limit the spread of COVID-19.26 In addition, in rural communities, minority status and language are associated with increases in COVID-19 cases.3 Another study reported that counties with a higher percentage of minority, high-density housing structures and crowded housing units were at higher risk of becoming a COVID-19 hot spot.27 A study of urban-rural differences in COVID-19 exposures and outcomes in South Carolina has shown a positive correlation between the case rates, mortality rates and pre-existing social vulnerability. Also, a negative correlation between mortality rates and county resilience patterns suggests that counties with higher levels of inherent resilience had lower death rates.28

Although the US South has numerous hot spots of community spread of COVID-19, there are a few prior studies that have systematically investigated the initial spread of COVID-19 in relation to social vulnerabilities across counties in the region. A recent study investigated the spatial association of social vulnerability with COVID-19 prevalence and reported a spatially varying relationship between SVI and COVID-19 cases and deaths.29 Further, our use of a machine learning approach helped determine the specific community vulnerabilities that are most salient in determining the rapid spread of COVID-19. One study reported that mobility habits (eg, number of citizens who make at least one trip per day; transport accessibility; distance from the main city clusters) have a positive association for the spread of COVID-19.30 A recent study also forecasted the geographic spread of COVID-19 as a communicable disease by using social structure of networks.31 Aggregated data from Facebook also showed that COVID-19 cases were more likely to spread between regions that had stronger social network connections.32 Google COVID-19 Community Mobility Reports also provide a new tool to assess the role of policies to mitigate community spread (eg, to work from home, shelter in place and other recommendations) in flattening the curve of the COVID-19 pandemic.33

This study is subject to limitations. The results of this study should not be interpreted in a causality context. There are various state and local policies (eg, lockdown, business closure and facial mask mandate) that may have impacted our findings. Hence, residual confounding should be considered due to omission of important covariates. Also, the number of COVID-19 cases in a county might affect the number of cases in neighbouring counties through the connection between counties. Finally, our results are regional and may not generalise to other regions of the USA. With the availability of various free COVID-19 vaccines, the USA still struggles to fight the pandemic, and new waves of COVID-19 are an ongoing threat to public health in the USA. More studies are needed to investigate the resilience of vulnerable counties against COVID-19.

Conclusions

Our findings showed that SVI can help assess the vulnerability or resilience of communities to the spread of COVID-19. Thus, our results can help identify communities at high risk of spread and aid in policy efforts tailored to addressing these communities’ specific vulnerabilities to COVID-19. An understanding of the role social vulnerabilities have in determining the spread of COVID-19 is critical for forecasting the trajectory of this disease and designing effective mitigation interventions at the community level.

Footnotes

Contributors: MT as the corresponding author accepts full responsibility for the work and/or the conduct of the study, had access to the data, and controlled the decision to publish. MT and MRF performed the statistical analyses and had full access to all study data. All authors (MT, MRF and FAW) contributed to concept and design, provided data acquisition and interpretation, drafted the manuscript and critically revised the manuscript for important intellectual content. All authors approved the final version of the manuscript and are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. MT, MRF and FAW jointly supervised this work.

Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Map disclaimer: The inclusion of any map (including the depiction of any boundaries therein), or of any geographic or locational reference, does not imply the expression of any opinion whatsoever on the part of BMJ concerning the legal status of any country, territory, jurisdiction or area or of its authorities. Any such expression remains solely that of the relevant source and is not endorsed by BMJ. Maps are provided without any warranty of any kind, either express or implied.

Competing interests: None declared.

Provenance and peer review: Not commissioned; externally peer reviewed.

Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Data availability statement

Data are available in a public, open access repository. Data are available in a public, open access data set. Data are available in the GitHub through Novel Coronavirus (COVID-19) Cases provided by JHU CSSE: https://github.com/CSSEGISandData/COVID-19.

Ethics statements

Patient consent for publication

Not applicable.

References

  • 1.Center for Systems Science and Engineering (CSSE) . Global cases by the center for systems science and engineering (CSSE) at Johns Hopkins University (JHU)" Johns Hopkins CSSE. Available: https://github.com/CSSEGISandData/COVID-19 [Accessed 24 Aug 2022].
  • 2.Kim SJ, Bostwick W. Social vulnerability and racial inequality in COVID-19 deaths in Chicago. Health Educ Behav 2020;47:509–13. 10.1177/1090198120929677 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Khazanchi R, Beiter ER, Gondi S, et al. County-level association of social vulnerability with COVID-19 cases and deaths in the USA. J Gen Intern Med 2020;35:2784–7. 10.1007/s11606-020-05882-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Singu S, Acharya A, Challagundla K, et al. Impact of social determinants of health on the emerging COVID-19 pandemic in the United States. Front Public Health 2020;8:406. 10.3389/fpubh.2020.00406 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Baum CF, Henry M. Socioeconomic factors influencing the spatial spread of COVID-19 in the United States. SSRN Journal 2020. 10.2139/ssrn.3614877 [DOI] [Google Scholar]
  • 6.Clark E, Fredricks K, Woc-Colburn L, et al. Disproportionate impact of the COVID-19 pandemic on immigrant communities in the United States. PLoS Negl Trop Dis 2020;14:e0008484. 10.1371/journal.pntd.0008484 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.K C M, Oral E, Straif-Bourgeois S, et al. The effect of area deprivation on COVID-19 risk in Louisiana. PLoS ONE 2020;15:e0243028. 10.1371/journal.pone.0243028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mogi R, Spijker J. The influence of social and economic ties to the spread of COVID-19 in Europe. J Popul Res (Canberra) 2022;39:495–511. 10.1007/s12546-021-09257-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ayouni I, Maatoug J, Dhouib W, et al. Effective public health measures to mitigate the spread of COVID-19: a systematic review. BMC Public Health 2021;21:1015. 10.1186/s12889-021-11111-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Huang X, Shao X, Xing L, et al. The impact of lockdown timing on COVID-19 transmission across US counties. EClinicalMedicine 2021;38:101035. 10.1016/j.eclinm.2021.101035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Reuters Staff . Georgia governor to drop mask lawsuit against Atlanta Mayor and city. Retures; 2020. Available: https://www.reuters.com/article/us-health-coronavirus-usa-georgia/georgia-governor-to-drop-mask-lawsuit-against-atlanta-mayor-and-city-idUSKCN2592VV [Accessed 13 Aug 2020]. [Google Scholar]
  • 12.Flanagan BE, Gregory EW, Hallisey EJ, et al. A social vulnerability index for disaster management. J Homel Secur Emerg Manag 2011;8. 10.2202/1547-7355.1792 [DOI] [Google Scholar]
  • 13.Centers for Disease Control and Prevention . CDC’s social vulnerability index (SVI). 2018. Available: https://svi.cdc.gov [Accessed 22 Jun 2020].
  • 14.Karaye IM, Horney JA. The impact of social vulnerability on COVID-19 in the US: an analysis of spatially varying relationships. Am J Prev Med 2020;59:317–25. 10.1016/j.amepre.2020.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Neelon B, Mutiso F, Mueller NT, et al. Spatial and temporal trends in social vulnerability and COVID-19 incidence and death rates in the United States. PLoS One 2021;16:e0248702. 10.1371/journal.pone.0248702 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Alcendor DJ. Racial disparities-associated COVID-19 mortality among minority populations in the US. J Clin Med 2020;9:2442. 10.3390/jcm9082442 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Tai DBG, Shah A, Doubeni CA, et al. The disproportionate impact of COVID-19 on racial and ethnic minorities in the United States. Clin Infect Dis 2021;72:703–6. 10.1093/cid/ciaa815 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Whyte LE. Unpublicized recommendations say states should return to stringent control measures exclusive: White House document. The Center for Public Integrity; 2020. [Google Scholar]
  • 19.Chen T, He T, Benesty M, et al. Xgboost: extreme gradient boosting. R package version 04-2; 2015. 1–4.
  • 20.Chen T, Guestrin C. Xgboost: A Scalable tree boosting system. Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining 2016. 10.1145/2939672.2939785 [DOI] [Google Scholar]
  • 21.Géron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media, 2019. [Google Scholar]
  • 22.Achiron A, Gur Z, Aviv U, et al. Predicting refractive surgery outcome: machine learning approach with big data. J Refract Surg 2017;33:592–7. 10.3928/1081597X-20170616-03 [DOI] [PubMed] [Google Scholar]
  • 23.Cawley GC, Talbot NL. On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 2010;11:2079–107. [Google Scholar]
  • 24.Gaynor TS, Wilson ME. Social vulnerability and equity: the disproportionate impact of COVID-19. Public Adm Rev 2020;80:832–8. 10.1111/puar.13264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Karaye IM, Horney JA. The impact of social vulnerability on COVID-19 in the U.S.: an analysis of spatially varying relationships. Am J Prev Med 2020;59:317–25. 10.1016/j.amepre.2020.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rocklöv J, Sjödin H. High population densities catalyse the spread of COVID-19. J Travel Med 2020;27:taaa038. 10.1093/jtm/taaa038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Dasgupta S, Bowen VB, Leidner A, et al. Association between social vulnerability and a county's risk for becoming a COVID-19 hotspot - United States, June 1-July 25, 2020. MMWR Morb Mortal Wkly Rep 2020;69:1535–41. 10.15585/mmwr.mm6942a3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Huang Q, Jackson S, Derakhshan S, et al. Urban-rural differences in COVID-19 exposures and outcomes in the south: a preliminary analysis of South Carolina. PLoS ONE 2021;16:e0246548. 10.1371/journal.pone.0246548 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wang C, Li Z, Clay Mathews M, et al. The spatial association of social vulnerability with COVID-19 prevalence in the contiguous United States. Int J Environ Health Res 2022;32:1147–54. 10.1080/09603123.2020.1847258 [DOI] [PubMed] [Google Scholar]
  • 30.Cartenì A, Di Francesco L, Martino M. How mobility habits influenced the spread of the COVID-19 pandemic: results from the Italian case study. Sci Total Environ 2020;741:140489. 10.1016/j.scitotenv.2020.140489 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.y Piontti AP, Perra N, Rossi L, et al. Charting the Next Pandemic: Modeling Infectious Disease Spreading in the Data Science Age. Springer, 2018. [Google Scholar]
  • 32.Kuchler T, Russel D, Stroebel J. The geographic spread of COVID-19 correlates with structure of social networks as measured by Facebook. National Bureau of economic; 2020. 0898–2937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Aktay A, Bavadekar S, Cossoul G, et al. Google COVID-19 community mobility reports: anonymization process description. arXiv 2020. 10.48550/arXiv.2004.04145 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data

bmjhci-2022-100703supp001.pdf (434.2KB, pdf)

Data Availability Statement

Data are available in a public, open access repository. Data are available in a public, open access data set. Data are available in the GitHub through Novel Coronavirus (COVID-19) Cases provided by JHU CSSE: https://github.com/CSSEGISandData/COVID-19.


Articles from BMJ Health & Care Informatics are provided here courtesy of BMJ Publishing Group

RESOURCES