Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Mar 1.
Published in final edited form as: Proceedings (IEEE Int Conf Bioinformatics Biomed). 2023 Jan 2;2022:2934–2939. doi: 10.1109/bibm55620.2022.9995662

Identification of Social and Racial Disparities in Risk of HIV Infection in Florida using Causal AI Methods

Mattia Prosperi 1, Jie Xu 2, Jingchuan (Serena) Guo 3, Jiang Bian 4, Wei-Han (William) Chen 5, Shantrel Canidate 6, Simone Marini 7, Mo Wang 8
PMCID: PMC9977319  NIHMSID: NIHMS1865882  PMID: 36865610

Abstract

Florida –the 3rd most populous state in the USA–has the highest rates of Human Immunodeficiency Virus (HIV) infections and of unfavorable HIV outcomes, with marked social and racial disparities. In this work, we leveraged large-scale, real-world data, i.e., statewide surveillance records and publicly available data resources encoding social determinants of health (SDoH), to identify social and racial disparities contributing to individuals’ risk of HIV infection. We used the Florida Department of Health’s Syndromic Tracking and Reporting System (STARS) database (including 100,000+ individuals screened for HIV infection and their partners), and a novel algorithmic fairness assessment method –the Fairness-Aware Causal paThs decompoSition (FACTS)– merging causal inference and artificial intelligence. FACTS deconstructs disparities based on SDoH and individuals’ characteristics, and can discover novel mechanisms of inequity, quantifying to what extent they could be reduced by interventions. We paired the deidentified demographic information (age, gender, drug use) of 44,350 individuals in STARS –with non-missing data on interview year, county of residence, and infection status– to eight SDoH, including access to healthcare facilities, % uninsured, median household income, and violent crime rate. Using an expert-reviewed causal graph, we found that the risk of HIV infection for African Americans was higher than for non- African Americans (both in terms of direct and total effect), although a null effect could not be ruled out. FACTS identified several paths leading to racial disparity in HIV risk, including multiple SDoH: education, income, violent crime, drinking, smoking, and rurality.

Keywords: artificial intelligence, causal inference, disparity, epidemiology, human immunodeficiency virus, machine learning, real-world data, social determinants of health, surveillance

I. Introduction

Incidence of human immunodeficiency virus (HIV) infections has remained relatively stable in the USA in the last five years (https://www.cdc.gov/hiv/library/reports/hiv-surveillance.html). Nevertheless, new diagnoses are not homogeneously distributed across the USA and some regions are disproportionately affected more than others (https://www.cdc.gov/hiv/pdf/statistics/overview/cdc-hiv-us-ataglance.pdf). Population rates of new HIV diagnoses in the USA are highest in the South, where the state of Florida has the largest number of new diagnoses. In 2019, the US Department of Health and Human Services (DHHS) released the federal plan for “Ending the HIV Epidemic” (EHE) within ten years [1], identifying 48 counties with high incidence of HIV diagnoses. Seven Floridian counties (Broward, Duval, Hillsborough, Miami-Dade, Orange, Palm Beach, and Pinellas) have been prioritized (Fig. 1).

Fig. 1.

Fig. 1.

Prevalence of HIV in Florida by county, highlighting those prioritized by DHHS for “Ending the HIV Epidemic”.

The HIV epidemic disproportionately affects the underserved and racial and ethnic minorities. Not accessing care, going out-of-care, and antiretroviral therapy (ART) discontinuation / failure are seen at high rates in underserved populations across all states with high HIV prevalence [2]–[7]. Delivering health care to people with HIV has been particularly challenging in Florida due to the diverse population, political/structural factors, e.g., lack of Medicaid expansion, high poverty rates, high rates of substance use, lack of a statewide supported syringe exchange program, HIV-related stigma and discrimination, and other systemic, complex factors [8], [9].

In 2008, the Florida Department of Health (FDoH) started a precision intervention initiative by creating and maintaining a comprehensive in-house data system for sexually transmitted disease case reporting and network behavioral data (https://www.floridahealth.gov/diseases-and-conditions/aids/surveillance/), named the Syndromic Tracking and Reporting System (STARS). To date, data on 100,000+ participants are available including people with HIV who named at least one partner, and partners who were tested within 60 days after being named (with 73% of partners notified by follow up as a result of service intervention).

In this work, we leveraged the STARS database and public resources encoding social determinants of health (SDoH) to identify social and racial disparities contributing to an individual’s risk of HIV infection. We applied a novel causal decomposition method that merges causal inference and artificial intelligence, called the Fairness-Aware Causal paThs decompoSition (FACTS) [10].

II. Methods

A. Ethics statement and data availability

The authors abide to the Declaration of Helsinki. The study protocol was approved by the University of Florida’s Institutional Review Board (IRB) and by FDoH’s IRB (protocol #IRB201901041 and #2020-069, respectively) as exempt. We received data extracts from FDoH’s STARS in a fully de-identified format according to the Health Insurance Portability and Accountability Act (HIPAA). For replication purposes, a STARS data request to the FDoH can be made according to state, federal regulations and compliance with required ethical and privacy policies (Research@flhealth.gov), including IRB approval by FDoH and execution of data user agreement. Requests are independently reviewed by FDoH.

B. Study design, variables and causal assumptions

We included individuals and their partners in STARS who had been interviewed at least once, with complete information about interview year and county of residence. For these individuals, we considered gender (female vs. male or unknown), sexual orientation/risk (male who have sex with men vs. other), race (African American vs. other), age (35 and older vs. younger), ethnicity (Hispanic-Latinx vs. non-Hispanic-Latinx), history of illicit drug use (no vs. yes), and HIV infection status (positive vs. negative).

Each county and year pair was matched to the Agency for Healthcare Research and Quality (AHRQ) SDoH database (https://www.ahrq.gov/sdoh/data-analytics/sdoh-data.html) and the County Health Rankings and Roadmap (https://www.countyhealthrankings.org/), and we retrieved the following SDoH at the county level: percentage of adults that report currently smoking; percentage of adults that report excessive drinking; median household income (in US dollars, inflation-adjusted to file data year); percentage of population with less than high school education (ages 25 and over); violent crimes per 100,000 population; number of federally qualified health centers; percentage of population with no health insurance (ages 64 and under); rural-urban continuum codes (2013).

We developed a partially directed acyclic graph (pDAG) to encode the causal relationships among SDoH, individual demographics, and HIV diagnosis (Fig. 2). The pDAG was refined iteratively by co-authors until consensus was reached.

Fig. 2.

Fig. 2.

Partially directed acyclic graph encoding the causal relationships among individual demographics, social determinants of health, and HIV diagnosis. The magenta arrows are biasing paths.

C. Causal inference, AI modelling and software

Using d-calculus [11], we identified the adjustment sets [12] to infer the total and direct effects of race with respect to infection. The effects were estimated using propensity score matching, and the matched data were used to fit a doubly robust logistic regression model that included both adjustment covariates and propensity weights [13]. Since the SDoH were aggregated at the county level, we also employed random effects [14].

We then estimated a non-causal prediction model of infection using all the study covariates, comparing boosted logistic regression, decision trees, and random forests. The prediction performance was assessed through repeated holdout set validation (1/3 of the data, 25 times) and area under the receiver operating characteristic (AUROC) [15].

The FACTS algorithm was run on the study dataset using the expert-agreed pDAG, focusing on racial disparity. In brief, FACTS first develops a prediction model for the outcome using all the other variables, then calculates all the causal paths that incorporate a variable of interest for disparity (i.e., a sensitive attribute, here, race), and finally ranks paths that lead to higher outcome disparity outcome associated with the variable of interest.

The pDAG was drawn using the web version of dagitty (http://www.dagitty.net/). The FACTS software is written in Python and is available at: https://github.com/weishenpan15/FACTS. For all other analyses we used R (https://www.r-project.org/), including packages: mboost, rpart, ranger, ROCR, dagitty, MatchIt, optmatch, lme4.

III. Results

A total of 44,350 individuals in the STARS database met the study inclusion criteria. Twenty counties included 90% of the study population, and the top-seven (67.3%) coincided with the EHE’s Floridian priority counties: Miami-Dade (17.4%); Broward (12.9%); Orange (11.2%); Duval (7.9%); Hillsborough (7.5%); Palm Beach (6.4%); Pinellas (4.2%). Of note, 1,128 (2.5%) individuals were in Florida at the time of testing and interview, but their residence was out of jurisdiction: these individuals and other 18 with no available county-year SDoH were excluded from inferential analyses including SDoH. The median (interquartile range) calendar year when individuals were interviewed and tested for HIV was 2015 (2013-2017). Of the population sample, 75% were diagnosed with HIV (note that this is an at-risk population), 24.6% declared to be female, 51.1% were men who have sex with men, 41.8% were aged 35 years or older, 48.4% were African American, 22.2% were Hispanic-Latinx, and 22.2% had no history of drug abuse. Among county-level SDoH (year 2015), the median household income was $43,852, the population with less than high school education was 13.7%, the population with no health insurance was 20.9%, the prevalence of adults reporting excessing drinking and smoking was 17.7% and 17.3%, respectively. Table 1 shows population characteristics overall and stratified by the seven EHE priority counties.

TABLE 1.

CHARACTERISTICS OF THE STUDY POPULATION (N=44,350).

Level Variable Median (IQR) or prevalence %
All counties Miami-Dade Broward Orange Duval Hillsborough Palm Beach Pinellas
Individual-level demographics and behavioral factors Proportion over total 100% 17.4% 12.9% 11.2% 7.9% 7.5% 6.4% 4.2%
HIV-infected 75% 76.9% 77% 76.7% 79.3% 76% 74.6% 73.8%
Year 2015 (2013-2017) 2015 (2013-2017) 2015 (2013-2017) 2015 (2013-2017) 2014 (2012-2017) 2015 (2013-2017) 2015 (2013-2017) 2015 (2013-2017)
Female 24.6% 20% 25% 20.3% 30.3% 22.8% 35.1% 22.8%
Men who have sex with men 51.1% 60.3% 53.3% 55% 45.8% 57.1% 36.4% 57.1%
Age 35 and older 41.8% 43.2% 44.2% 37.6% 37.6% 41.2% 47.6% 46.9%
Black African American 48.4% 35.3% 50.7% 48% 72.4% 49.1% 62.5% 46.2%
Hispanic-Latinx 22.2% 53.8% 20.2% 22.7% 4.5% 20.5% 14.2% 8.9%
No history of illicit drug use 24.8% 32% 25.9% 25.5% 4.4% 25.3% 19.1% 25.8%
County-level social determinants of health (year 2015) Median household income ($) 43852 43129 51968 47943 47690 50579 53363 45819
Population with less than high school education 13.7% 19.9% 11.8% 12.4% 11.4% 12.5% 12.2% 10%
Population with no health insurance 20.9% 29.4% 22.7% 21.8% 17% 18.5% 22.1% 19.3%
Rural-urban continuum codes 2 1 1 1 1 1 1 1
Number of federally qualified health centers 2 65 8 15 7 24 13 8
Adults that report excessive drinking 17.7% 16.7% 18.9% 20.7% 20% 19.6% 17.8% 20.7%
Violent crimes per 100,000 population 388.5 650.3 440.9 681.4 644.7 339.1 462.9 535.9
Adults that report currently smoking 17.3% 13.5% 14.2% 14.5% 18.2% 16.6% 13.4% 17.4%

The crude effect of being African American on infection risk yielded an odds ratio (OR) of 0.97 (95% conf. int. 0.92-1.01, p=0.149). Through d-calculus, three adjustment sets were identified for the total effect, including the following covariates and corresponding ORs:

  1. {Percentage adults that report excessive drinking, No history of illicit drug use, Population with less than high school education, Female gender, Prevalence of population with no health insurance, Number of federally qualified health centers, Median household income, Percentage adults that report currently smoking, Violent crimes per 100,000 population}, OR 1.04 (95% conf. int. 0.99-1.09, p=0.140);

  2. {Percentage adults that report excessive drinking, No history of illicit drug use, Population with less than high school education, Female gender, Prevalence of population with no health insurance, Median household income, Metropolitan vs. rural area, Percentage adults that report currently smoking, Violent crimes per 100,000 population}, OR 1.04 (95% conf. int. 0.99-1.09, p=0.111);

  3. {Percentage adults that report excessive drinking, No history of illicit drug use, Population with less than high school education, Hispanic-Latinx ethnicity, Female gender, income, Metropolitan vs. rural area, Percentage adults that report currently smoking, Violent crimes per 100,000 population}, OR 1.05 (95% conf. int. 0.999-1.10, p=0.060);

The adjustment sets for the direct effect were the same. Of note, by removing the bidirectional arrow from race to education, crime, income, and rurality, the adjustment set for total effect reduced just to ethnicity, and the estimated OR was 1.02 (95% conf. int. 0.9-1.08, p=0.376).

When fitting the prediction model, the three methods exhibited similar performance on the validation sets: boosted logistic regression yielded an average (st.dev.) AUROC of 79.2% (0.4%), the decision tree AUROC of 77.5% (0.4%), and the random forest AUROC of 80.8% (0.4%). Fig. 3 shows the ROC curves for each method and each validation run.

Fig. 3.

Fig. 3.

Receiver’s operating curve for predicting risk of HIV infection using individual-level variables and social determinants of health, comparing boosted logistic regression, decision tree, and random forest (25 resampled test sets with 1/3 of the data).

The FACTS analysis was performed using the boosting prediction algorithm, and revealed multiple pathways associated to racial disparity in HIV infection risk. Table 2 shows the quantification of disparity in the paths that include race for the full PDAG and for a reduced DAG, where we simplified a number of arcs and assigned a direction to all undirected edges. From the table, we observe that multiple paths contribute to disparity with similar absolute weight, and they include a variety of SDoH: education, income, violent crime, drinking, smoking, healthcare coverage, and rurality among others. There is not a unique path that is responsible for major disparity. Of note, Hispanic-Latinx ethnicity seems to reduce disparity for African Americans in the risk of HIV infection.

TABLE 2.

Fairness-Aware Causal paThs decompoSition (FACTS) quantifying racial disparity in HIV risk.

Causal path Contribution to Disparity Contribution to Accuracy
Directed acyclic graph (reduced from partially directed acyclic graph)
Population with less than high school education -> Median household income 0.010316041 0.00166641
Hispanic-Latinx −0.007036374 0.00108008
Violent crimes per 100,000 population 0.004859869 0.002021293
Hispanic-Latinx -> Median household income -> Prevalence of population with no health insurance 0.004710793 −0.000617189
Population with less than high school education 0.003488372 0.000756056
Population with less than high school education -> Percentage adults that report excessive drinking −0.003458557 −0.000894924
Population with less than high school education -> Violent crimes per 100,000 population −0.003041145 −0.001265237
Hispanic-Latinx -> Median household income −0.001699463 −0.000138867
Population with less than high school education -> Median household income -> Prevalence of population with no health insurance 0.000715564 0.000123438
Median household income -> Prevalence of population with no health insurance 0.000596303 0.000308594
Median household income 0.000119261 −6.17E-05
Metropolitan vs. rural area -> Number of federally qualified health centers 8.94E-05 4.63E-05
Population with less than high school education -> Median household income -> Prevalence of population with no health insurance -> Number of federally qualified health centers 5.96E-05 3.09E-05
Hispanic-Latinx -> Median household income -> Prevalence of population with no health insurance -> Number of federally qualified health centers −2.98E-05 −4.63E-05
Partially directed acyclic graph (see Fig. 2)
Percentage adults that report excessive drinking <-> No history of illicit drug use <-> Population with less than high school education <-> Hispanic-Latinx <-> Median household income <-> Violent crimes per 100,000 population <-> Metropolitan vs. rural area <-> Percentage adults that report currently smoking -> Female gender −0.038073942 −0.008254899
Percentage adults that report excessive drinking <-> No history of illicit drug use <-> Population with less than high school education <-> Hispanic-Latinx <-> Median household income <-> Violent crimes per 100,000 population <-> Metropolitan vs. rural area <-> Percentage adults that report currently smoking -> Female gender -> men who have sex with men 0.033929636 0.012868385
Percentage adults that report excessive drinking <-> No history of illicit drug use <-> Population with less than high school education <-> Hispanic-Latinx <-> Median household income <-> Violent crimes per 100,000 population <-> Metropolitan vs. rural area <-> Percentage adults that report currently smoking -> Number of federally qualified health centers −0.010435301 0.001357815
Percentage adults that report excessive drinking <-> No history of illicit drug use <-> Population with less than high school education <-> Hispanic-Latinx <-> Median household income <-> Violent crimes per 100,000 population <-> Metropolitan vs. rural area <-> Percentage adults that report currently smoking 0.008199165 −0.00054004
Percentage adults that report excessive drinking <-> No history of illicit drug use <-> Population with less than high school education <-> Hispanic-Latinx <-> Median household income <-> Violent crimes per 100,000 population <-> Metropolitan vs. rural area <-> Percentage adults that report currently smoking -> Prevalence of population with no health insurance 0.001311866 0.000740626
Percentage adults that report excessive drinking <-> No history of illicit drug use <-> Population with less than high school education <-> Hispanic-Latinx <-> Median household income <-> Violent crimes per 100,000 population <-> Metropolitan vs. rural area <-> Percentage adults that report currently smoking -> Prevalence of population with no health insurance -> Number of federally qualified health centers 0.000596303 0

IV. Discussion

We deconstructed racial and social disparity in HIV infection risk in Florida using a large statewide surveillance database and causal AI methods. First, using traditional causal inference, we found that both the total and direct effects of African American race on to HIV infection risk, after removing confounding, are slightly positive (i.e., OR of ~1.04), although the confidence intervals include a null effect.

From the fairness analysis, we found that racial disparity in HIV infection risk is not driven by one major factor, but rather is a product of multiple SDoH, thus intervention programs might need more complex design.

One limitation of our study is that gender did not include non-binary categories. Another is that the causal assumptions might not be all correct; furthermore, we did not consider unmeasured confounders, although we set a number of undirected edges in the causal graph.

In conclusion, this study demonstrates that causal AI can enable the design of equitable public health interventions apt to reduce health risks and benefit more vulnerable populations.

Acknowledgment

The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the FDoH.

Contributor Information

Mattia Prosperi, Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL, USA.

Jie Xu, Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.

Jingchuan (Serena) Guo, Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, FL, USA.

Jiang Bian, Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.

Wei-Han (William) Chen, Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, FL, USA.

Shantrel Canidate, Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL, USA.

Simone Marini, Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL, USA.

Mo Wang, Department of Management, Warrington College of Business, University of Florida, Gainesville, FL, USA.

References

  • [1].Eisinger RW, Dieffenbach CW, and Fauci AS, “HIV Viral Load and Transmissibility of HIV Infection: Undetectable Equals Untransmittable,” JAMA, vol. 321, no. 5, pp. 451–452, Feb. 2019, doi: 10.1001/jama.2018.21167. [DOI] [PubMed] [Google Scholar]
  • [2].Wiginton JM, Eaton LA, Watson RJ, Maksut JL, Earnshaw VA, and Berman M, “Sex-Positivity, Medical Mistrust, and PrEP Conspiracy Beliefs Among HIV-Negative Cisgender Black Sexual Minority Men in Atlanta, Georgia,” Arch Sex Behav, Nov. 2021, doi: 10.1007/s10508-021-02174-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Meyers-Pantele SA et al. , “Examining HIV Stigma, Depression, Stress, and Recent Stimulant Use in a Sample of Sexual Minority Men Living with HIV: An Application of the Stigma and Substance Use Process Model,” AIDS Behav, Nov. 2021, doi: 10.1007/s10461-021-03517-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Meyers-Pantele SA, Sullivan P, Mansergh G, Hirshfield S, Stephenson R, and Horvath KJ, “Race-Based Medical Mistrust, HIV-Related Stigma, and ART Adherence in a Diverse Sample of Men Who Have Sex with Men with HIV,” AIDS Behav, Oct. 2021, doi: 10.1007/s10461-021-03500-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Cherenack EM, Enders K, Rupp BM, Seña AC, and Psioda M, “Daily Predictors of ART Adherence Among Young Men Living with HIV Who Have Sex with Men: A Longitudinal Daily Diary Study,” AIDS Behav, Nov. 2021, doi: 10.1007/s10461-021-03523-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].McFall AM et al. , “Understanding the disparity: Predictors of virologic failure in women using highly active antiretroviral therapy vary by race and/or ethnicity,” J Acquir Immune Defic Syndr, vol. 64, no. 3, p. 10.1097/QAI.0b013e3182a095e9, Nov. 2013, doi: 10.1097/QAI.0b013e3182a095e9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Dawit R et al. , “Neighborhood Factors Associated with Racial/Ethnic Disparities in Achieving Sustained HIV Viral Suppression Among Miami-Dade County Ryan White Program Clients,” AIDS Patient Care and STDs, vol. 35, no. 10, pp. 401–410, Oct. 2021, doi: 10.1089/apc.2021.0067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Trepka MJ, Niyonsenga T, Fennie KP, McKelvey K, Lieb S, and Maddox LM, “Sex and Racial/Ethnic Differences in Premature Mortality Due to HIV: Florida, 2000-2009,” Public Health Rep, vol. 130, no. 5, pp. 505–513, Oct. 2015, doi: 10.1177/003335491513000513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Lieb S, White S, Grigg BL, Thompson DR, Liberti TM, and Fallon SJ, “Estimated HIV incidence, prevalence, and mortality rates among racial/ethnic populations of men who have sex with men, Florida,” J Acquir Immune Defic Syndr, vol. 54, no. 4, pp. 398–405, Aug. 2010, doi: 10.1097/QAI.0b013e3181d0c165. [DOI] [PubMed] [Google Scholar]
  • [10].Pan W, Cui S, Bian J, Zhang C, and Wang F, “Explaining Algorithmic Fairness Through Fairness-Aware Causal Path Decomposition,” arXiv:2108.05335 [cs], Aug. 2021, Accessed: Dec. 02, 2021. [Online]. Available: http://arxiv.org/abs/2108.05335 [Google Scholar]
  • [11].Pearl J, “The Do-Calculus Revisited.” arXiv, Oct. 16, 2012. doi: 10.48550/arXiv.1210.4852. [DOI] [Google Scholar]
  • [12].Perković E, Textor J, Kalisch M, and Maathuis MH, “A Complete Generalized Adjustment Criterion.” arXiv, Jul. 06, 2015. doi: 10.48550/arXiv.1507.01524. [DOI] [Google Scholar]
  • [13].Rosenbaum PR and Rubin DB, “The central role of the propensity score in observational studies for causal effects,” Biometrika, vol. 70, no. 1, pp. 41–55, Apr. 1983, doi: 10.1093/biomet/70.1.41. [DOI] [Google Scholar]
  • [14].Linear Mixed-Effects Models Using R. Accessed: Sep. 30, 2022. [Online]. Available: 10.1007/978-1-4614-3900-4 [DOI]
  • [15].Hastie T, Tibshirani R, and Friedman J, The Elements of Statistical Learning. New York, NY: Springer, 2009. doi: 10.1007/978-0-387-84858-7. [DOI] [Google Scholar]

RESOURCES