Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 May 1.
Published in final edited form as: AIDS. 2021 May 1;35(Suppl 1):S19–S28. doi: 10.1097/QAD.0000000000002814

Application of Machine Learning Techniques in Classification of HIV medical care status for People Living with HIV (PLWH) in South Carolina.

Bankole OLATOSI 1, Xiaowen SUN 2, Shujie CHEN 2, Jiajia ZHANG 2, Chen LIANG 1, Sharon WEISSMAN 4, Xiaoming LI 3
PMCID: PMC8162887  NIHMSID: NIHMS1669689  PMID: 33867486

Abstract

Objectives

Ending the HIV Epidemic requires innovative use of data for intelligent decision-making from surveillance through treatment. This study sought to (1) examine the usefulness of using linked integrated PLWH health data to predict PLWH’s future HIV care status and (2) model and compare the performance of machine learning methods for predicting future HIV care status for SC PLWH.

Design

We employed supervised machine learning for its ability to predict PLWH’s future care status by synthesizing and learning from PLWH’s existing health data. This method is appropriate for the nature of integrated PLWH data due to its high volume and dimensionality.

Methods

A data set of 8,888 distinct PLWH’ health records were retrieved from an integrated PLWH data repository. We experimented and scored seven representative machine learning models including Bayesian Network, Automated Neural Network, Support Vector Machine, Logistic Regression, LASSO, Decision Trees and Random Forest to best predict PLWH’s care status. We further identified principal factors that can predict the retention-in-care based on the champion model.

Results

Bayesian Network (F=0.87, AUC=0.94, precision=0.87, recall=0.86) was the best predictive model, followed by Random Forest (F=0.78, AUC=0.81, precision=0.72, recall=0.85), Decision Tree (F=0.76, AUC=0.75, precision=0.70, recall=0.82) and Neural Network (cluster) (F=0.75, AUC=0.71, precision=0.69, recall=0.81).

Conclusions:

These algorithmic applications of Bayesian Networks and other machine learning algorithms hold promise for predicting future PLWH HIV care status at the individual level. Prediction of future care patterns for SC PLWH can help optimize health service resources for effective interventions. Predictions can also help improve retention across the HIV continuum.

Keywords: AIDS, HIV, Retention in Care, Machine Learning, Big Data, Champion Model

INTRODUCTION

Efforts towards ending the Human Immunodeficiency Virus (HIV) Epidemic can benefit from the innovative use of data for intelligent decision-making from surveillance through treatment. The goal of HIV treatment for people living with HIV (PLWH) is to achieve viral suppression. PLWH who achieve and maintain viral suppression have effectively little to no risk of transmitting HIV to others and live an improved quality of life. [12]To achieve this goal, the current approach used for HIV prevention in the United States has adjusted to engaging PLWH across the HIV care continuum. [3] The HIV care continuum depicts discrete steps from initial HIV diagnosis, linkage to care via a HIV healthcare provider within one month after diagnosis, receipt of HIV medical care (antiretroviral therapy), engagement in care (active PLWH participation), retention in care, and viral suppression.[4]

However, the HIV care continuum in the United States continues to face challenges. The Centers for Disease Control and Prevention (CDC) estimated that about 64% of all PLWH in the United States received HIV medical care, but only 49% were retained in care, of whom the viral suppression rate was 62%. [34] Upwards of 50% PLWH who are diagnosed with HIV infection did not remain in HIV care across the care continuum, including those who failed in linkage to care, those who received care but subsequently fell out of care, and those who reengaged in care or transferred from other settings. [4] An ongoing iEngage trial showed that among those who linked to HIV care, 86% of individuals maintained viral suppression after 48 weeks of diagnosis. [5] Disparities in linkage to and retention in care persist with studies indicating that a significant proportion of African Americans were not in care, or for those in care, not achieved viral suppression. [3,67] These findings collectively suggest that high volumes of drop-offs during the HIV care drive the low overall rates of viral suppression. Statistics from South Carolina support this suggestion. Cumulatively, only 66% of SC PLWH received any care across the treatment continuum; of these people, 54% received continuous HIV care; and 57% were virally suppressed. [68] Accordingly, in the 2020 update of the National HIV/AIDS Strategy for the United States, “prevention through linkage and retention” was included as an imperative objective. [9] To improve the HIV care continuum, research has centered on the factors that influence retention in care as well as effective clinical, behavioral, and structural interventions to PLWH adhere to care. Studies suggest that factors that influence retention in care can be summarized into an adapted socioecological framework, which consists of individual level, relationships, community-level, health care system, and healthcare policy. [4] This framework outlines numerous findings of poor engagement across the care continuum. For example, poor linkage and retention have been discovered in key populations, including youth and adolescents [10], heterosexual men [11,12], transgender women [13], African Americans [1114], and Hispanics/Latinos [11,12,15] in the United States. A number of factors are associated with poor linkage and retention, including mental illness [1619], stigma and fear [20], place of residence [2123], substance use [15,18,24], and access to health insurance [25]. These findings have informed interventions for improving retention in care with various foci, including linkage care management, medical case management, intensive outreach, peer and paraprofessional patient navigation, and clinic-wide messaging and culture. [4] However, many of these studies were focused on a few contributing factors and/or population samples without considering integrated evidence from clinical, administrative, and community-based datasets. This limitation has led to several missed opportunities for identifying those at risk for not engaging in care, and those at risk for disengaging in care. The reasons for these missed opportunities may be many. First, surveillance data across the HIV care continuum often fails to capture a complete portrait of PLWH’s health status because of the lack of interoperability among datasets provided by different health service agencies. Second, it is often difficult to systematically identify missed opportunities for HIV testing and treatments as well as vulnerabilities of HIV care services across the care continuum. Third, it is difficult to proactively intervene PLWH who are at high risk of dropping out of HIV care using limited data and predictive models with only linear combinations. This is important with studies showing reasonable costs and cost savings associated with retention in care for PLWH. [2, 26] Recent literature suggests that the plan for ending the HIV epidemic in the United States could improve when Big Data science is used to advance data integration and treatment, address gaps in transmission risks/behavior and improve HIV testing. [27]

To fill these knowledge gaps, our effort has led to an integrated data repository that accumulates individual level data of all types (i.e., clinical, administrative, claims and community-based health services data) for PLWH in South Carolina as reported elsewhere. [28] A unique advantage of this data repository is that it enables the prediction of future care status for PLWH through the rich, integrated, and high volume of individual level data. Despite sparse applications of predictive modeling in epidemiology and infectious diseases [2930], no existing research has reported the development of computational methods used for predicting retention-in-care using a comprehensive data repository.

In the present study, we sought to develop and identify the best machine-learning based predictive model that can predict PLWH’s HIV medical care status by learning from historical data from the established data repository. The out-performed model was selected by benchmark comparison among seven supervised machine learning algorithms for the ability of accurately predicting PLWH’s retention-in-care. The out-performed model was further analyzed to identify variables that are most effective in prediction and clinically meaningful to HIV medical care. The expected outcomes of this study hold potential to establish the benchmark evidence for predictive analysis of PLWH’s care continuum, which further extends to evidence-based interventions and HIV health services underpinned by integrated clinical, administrative, claims, and community-based health data.

METHODS

Data repository

The complete architecture of the South Carolina PLWH Big Data repository is described elsewhere. [28] Below we highlight information important to the present study. The data repository received data from (1) South Carolina HIV/AIDS electronic reporting system (e-HARS), which is a laboratory-based reporting system that collects statewide CD4 and viral load (VL) tests since January 2004; (2) Ryan White HIV/AIDS Program Service Reports (RSR), which collect clinical data provided by Ryan White-funded entities; and (3) Health Sciences South Carolina (HSSC) clinical data warehouse, which integrated clinical records from six of the state’s largest health systems (AnMed Health, McLeod Health, Medical University of South Carolina Hospital Authority, Palmetto Health, Self Regional Healthcare, and Spartanburg Regional Healthcare System). The South Carolina office of Revenue and Fiscal Affairs (RFA) integrated PLWH data from these sources with an all payer healthcare inpatient database, Medicaid claims data (including demographic files, visits files, and pharmacy files), state employee health services plan, Department of Corrections data (crime rates, prison history, etc.), Department of Mental Health, etc.

Study population

The data repository includes PLWH whose residence at diagnosis was in South Carolina and whose age was 13 or older since 2005. We selected 2005 because it was the first year South Carolina state law mandated the reporting of all CD4 and VL tests to e-HARS. We identified data qualified for machine learning experiments based on several criteria. First, we defined care status for this experiment using CD4 or VL test as a proxy measure. Second, we used linkage to care as an inclusion criterion for all PLWH in the database. As a result, a total of 8,888 distinct PLWH records with 3,670,845 observations were identified. To prepare for modeling, we cleaned the data, addressed missing values issues, and generated a total of 3,640,102 observations in the final study sample.

Machine learning experiments

There were 40 input variables and 1 output (target) variable included in the machine learning experiments. Table 1 shows a summary of selected variables. The target variable is PLWH care status. We used the definition of HIV medical care as recommended by the CDC: “documentation of at least two CD4 cell counts or viral load tests performed at least three months apart during the year of evaluation” to define the care status. [23] All data records were annotated for care status (in care vs not-in-care) per this definition.

Table 1:

Selected Machine Learning Variable Attributes for adult PLWHs in South Carolina

Variable Name Role Level Level (N)
Sociodemographics
Age INPUT INTERVAL
Gender INPUT NOMINAL 2
Race and ethnicity (White, African Americans, Other, Hispanic) INPUT NOMINAL 5
Education INPUT NOMINAL 7
Marital Status INPUT NOMINAL 6
County of Residence at HIV Diagnosis REJECTED UNARY 0
Type of Primary Caregiver INPUT NOMINAL 8
South Carolina Resident (Yes/No) INPUT BINARY 2
Clinical Factors
Retention in Care Status (In care vs. Not in Care) TARGET BINARY 2
Age at HIV Diagnosis INPUT INTERVAL
Age at AIDS Diagnosis REJECTED INTERVAL
AIDS Category (Yes/No) INPUT NOMINAL 3
HIV Transmission Risk Categories INPUT NOMINAL 6
Group (Care Groups) INPUT NOMINAL 4
HIV Viral Load INPUT INTERVAL
CD4 count INPUT INTERVAL
Year (Calendar year of HIV Diagnosis) INPUT NOMINAL
Supply (Days of drug supply) REJECTED INTERVAL
Years of retention in care (count) INPUT BINARY 2
Retention in Care Sequence for follow-up year INPUT INTERVAL
Health System Utilization
Zero Income Eligibility (Yes/No) REJECTED BINARY 2
Poverty (Percent) INPUT INTERVAL
Medicare Dual Eligibility REJECTED BINARY 2
Ever on AIDS Drug Assistance Program INPUT BINARY 2
# of Prescription Drug Refills used REJECTED INTERVAL
Emergency Room Flag (Yes/No) INPUT BINARY 2
DMH Payor Event 1 INPUT NOMINAL 6
DMH Payor Event 2 INPUT NOMINAL 5
DMH Payor Event 3 INPUT BINARY 2
DMH Payor Event 4 INPUT NOMINAL 4
DMH Payor Event 5 INPUT BINARY 2
Time (Time from diagnosis date to linkage) INPUT INTERVAL
Hospital Length of Stay (Days) INPUT INTERVAL
Hospital Payor INPUT NOMINAL 6
# of comorbidities INPUT NOMINAL 7
# of diagnoses INPUT INTERVAL
Therapeutic Drug Class REJECTED INTERVAL
# of Prescription Drugs INPUT INTERVAL
Corrections Information
Jail (Yes/No) INPUT BINARY 2
Jail (Length of Stay in Days) INPUT INTERVAL

All data cleaning/management and machine learning experiments were performed using SAS 9.4 and SAS Viya 3.4. Since missing values can create bias during predictive modeling, we imputed values for variables with 10% or less of missing data using the tree surrogate method [31] for categorical variables and median for interval variables. We employed seven supervised machine learning algorithms that represent a variety of approaches including Bayesian Network, Neural Network, Support Vector Machine, Logistic Regression, least absolute shrinkage and selection operator (LASSO), Decision Trees and Random Forest.

Data Partitioning

The 10-fold cross validation approach was employed as the framework for machine learning experiments to measure unbiased accuracy. Figure 1 illustrates the diagram of machine learning experiments used for this study. Specifically, we randomly generated ten equal splits of the data. We chose to partition our data into three groups namely training, validation and testing to avoid overfitting that could occur using only training and validation data (Figure 1). For each algorithm, six splits (N=2,184,061, [60%]) were used for training, three splits (N=1,092,031 [30%]) were used for validation, and one split (N=364,010 [10%]) was randomly selected for testing. We used F-1 measure, precision, recall, and the area under the receiver operating characteristic (AUC) to assess the model performance. These measures were generated based on a contingency table that specifies whether or not the model prediction is correct based on the true values in the dataset (see Figure 2). The AUC shows how true positive rates and false positive rates interact with each other.

Figure 1.

Figure 1.

Diagram of machine learning experiments.

Figure 2.

Figure 2.

Measures used for algorithm performance evaluation.

Algorithms

Bayesian Network is a supervised learning approach that fits a Bayesian Network model of inputs to our nominal target (care status). Bayesian Network is a directed, acyclic graphical model where the structure is based on conditional dependency between two random variables. Details about its recent utility and promise in personalized medicine for healthcare is documented by several studies [3740]. Neural Network is a supervised learning method we used to assess the connections between input variables, hidden layers, and an output layer (care status), and the connections between each of these similar to the biologic structure of a human brain. Its utility is also well documented in areas like cancer, diabetes and electronic health records based studies. [4145] For Neural Network, prior variable selection is preferred, so we used the fast-supervised selection method [4647] that identified input variables (cluster) which jointly explained the largest variance for care status and compared it to the Neural Network with no selection group. We used decision trees (classification and regression trees [CART]) as a non-parametric supervised learning method to create a tree model for relevant input variables associated with our outcome (care status) using a series of rules. Each rule assigns an observation to a segment, based on the value of one input. [4849] Random Forest is an extension of decision trees and consists of multiple decision trees using different samples with its use also reported in healthcare. [50] Support vector machine is a well-known optimization-based supervised learning method for fitting binary classification problems. [5154] It does this by identifying a decision boundary with the maximum possible margin between the data points and has also been applied to healthcare. Logistic regression is a member of discriminative models in machine learning [33] and was used here to predict care status as a function of input variables.

Feature analysis

To determine the relative importance of variables in predicting care status, we employed different measures for each algorithm. Consistent with our data driven approach, we ran machine learning algorithms against the variable selection options (fast-selection) and compared it to machine learning algorithms for all the variables without prior selection. We ranked the importance of input variables by the performance decrease when taking each input variable out of the model. Top-ranked input variables were candidate principal factors for predicting PLWH’s care status.

RESULTS

We present the selected results of these algorithms and compared model performance in predicting care status for PLWH using the selected input variables. Results presented differ based on types of algorithms. Figure 3A and 3B show decision tree models for the data driven input variable selection (cluster) and for all input variables. Notable tree splits differed in the highlighted branches based on the misclassification rate by variables. Both trees had varying inputs of relative importance. Both had years of retention in care but differed at the main split between hospital payor and age at HIV diagnosis. Other split differences are annotated in the figures below. For ease of understanding, we display a forward feed single layer neural network showing weights associated by input variable level to our target (care status) see Figure 4A. Input variables of importance included years of retention in care, age at HIV diagnosis, hospital payor, type of transmission risk, marital status and education. Similarly, figure 4B displays the Bayesian Network diagram for classifying/specifying the conditional relationships/dependencies between the input and target variables. Dependencies show intricate but important relationships between input variables and target (care status) highlighted in yellow. This result indicates that predictive relationship is often codependent on other relevant input variables. The variable selection by relative importance also differed for other algorithms.

Figure 3.

Figure 3.

Visualization of trained Decision Trees.

Figure 4.

Figure 4.

Visualization of trained Neural Network.

Machine learning performance

Consistent with the primary aims of this study, we compared and scored each algorithm for performance. Bayesian Network (F=0.87, AUC=0.94, precision=0.87, recall=0.86) has outperformed other algorithms, followed by Random Forest (F=0.78, AUC=0.81, precision=0.72, recall=0.85), Decision Tree (F=0.76, AUC=0.75, Precision=0.70, Recall=0,82) and Neural Network (cluster) (F=0.75, AUC=0.71, precision=0.69, recall=0.81). Table 2 shows the results for each algorithm. The AUC across the experimented algorithms are consistent with the F scores. Figure 5 illustrates a diagrammatic comparison of AUC among algorithms for training, validation and test data.

Table 2.

Prediction performance for all the algorithms

Algorithms Precision Recall F score AUC
Bayesian Network 0.87 0.86 0.87 0.94
Random Forest 0.72 0.85 0.78 0.81
Decision Tree 0.70 0.82 0.76 0.75
Neural Network (cluster) 1 0.69 0.81 0.75 0.71
Logistic Regression 0.69 0.82 0.75 0.71
Support Vector Machine 0.69 0.80 0.74 0.67
LASSO 0.56 1.00 0.72 0.50
Neural Network2 0.56 1.00 0.72 0.50
1

Neural Network with fast variable selection;

1

Neural Network without fast selection

Figure 5.

Figure 5.

Areas under Receiver Operating Characteristic (AUC) curves.

Recall scores are consistently higher than precision across all algorithms except for Bayesian Network, suggesting that these algorithms generally performed well in predicting as many PLWH who would remain in care as they will actually do so yet performed less well in assuring that all machine-identified retention-in-care individuals who actually will remain in care. Notably, for LASSO and Neural Network, the recall scores have achieved 1.00 yet the precision scores are low, which significantly impeded the balanced performance as indicated by the F scores and AUC. This result suggests that a considerably large portion of in-care PLWH that is identified by LASSO and Neural Network do not actually remain in care. Figure 5 shows AUC curves for all the algorithms and data splits (training, validation and testing). The AUC curve is a plot of sensitivity (the true positive rate) against 1-specificity (the false positive rate), which both are measures of classification based on the confusion matrix. These measures are calculated at various cutoff values. An AUC curve that rapidly approaches the upper-left corner of the graph, where the difference between sensitivity and 1-specificity is the greatest, indicates a more accurate model. Consistently, the Bayesian Network outperformed other models. Figure 6 illustrates the input variables by relative importance for the champion model – Bayesian Network. This plot shows the 23 most important input variables, as determined by the relative importance. The relative importance is calculated using a one-level decision tree for each input to predict the predicted value as a global surrogate model. The most important input variables for this model is years of retention-in-care with a relative importance score of 1, followed by a number of important variables with a relative importance scores ranging from 0.1 to 0.3, including CD4 count, Calendar year of HIV diagnosis, number of diagnoses, marital status, gender, number of prescription drugs, zero income eligibility (yes/no).

Figure 6:

Figure 6:

Relative Importance Champion Model - Bayesian Network

DISCUSSION

To help end the HIV epidemic, predictive healthcare decisions and interventions for PLWH must be driven by evidence and data. Studies suggest decision-making in healthcare can benefit from predictive models and artificial intelligence. [47, 5556] In this study, we compared different machine learning algorithms, and demonstrated important differences in model performance when predicting PLWH care status. Important predictive factors for care status are typically not linearly associated with care status suggesting more complex relationships (dependencies) may exist. Our study assessed both traditional and novel statistical methods of prediction that hold implications for the field particularly with the Bayesian Network performing best for prediction. The slight differences between the ranked variables of importance hold implications for designing future interventions.

Principal findings

The present study is among the first to leverage machine learning and Big Data analytics to predict HIV medical care status of PLWH using their previous health records consisting of clinical, health administrative, and community-based health data. The study demonstrated that PLWH’s future state of HIV care utilization is predictable by using supervised machine learning and integrated health data. Additionally, we identified factors, including last year of retention in care, CD4 count, calendar year of HIV diagnosis, number of diagnoses, marital status, gender, number of prescription drugs, zero income eligibility (yes/no) as important inputs that play a key role in indicating whether or not a PLWH is likely to retain in care. These identified key predictive factors demonstrate that nuanced information from PLWH’s clinical care, social determinants of health, and activities in social care jointly contributes to a precise predictive capability of PLWH’s future HIV medical care status.

Supervised machine learning algorithms are feasible for predicting retention-in-care. Algorithms such as Bayesian Network, Random Forest, Decision Trees, and Neural Network (cluster) showed superior predictive performance. However, algorithms such as LASSO and Neural Network did not show effectiveness in the prediction task. One possible reason for the substandard performance for the Neural Network could be that we only tested a basic three-layer model with default parameters. We believe there may be deeper temporal relationships that need detection to further improve our predictive models. [57] To provide a systematic investigation on algorithms that heavily relies on initial parameters and architecture (i.e., Neural Network), follow-up studies should be directed to develop tailored Neural Network models. Overall, the comparatively lower scores in precision over recall are a major source of error that affected the balanced performance. Considering the fact that we only selected a limited subset of input variables available from the integrated PLWH data repository, the problem of low precision could be improved if follow-up studies would include input variables that would contribute to the accurate identification of negative output variable (i.e., not in care).

Clinical and policy implications

Several of these predictive factors have been identified in literature as important for retention in care. [58]However, the challenge for ending the HIV epidemic goes beyond linking PLWH into care, to retaining them in HIV medical care. Identifying and predicting PLWH at high-risk for being not-in-care after linkage allows for reallocation of resources to keep such high-risk population in HIV medical care. This leads to cost savings from reduced transmissibility, improved service for PLWH, and cost savings from preventing individuals dropping out of care. From evidence-driven healthcare and precision-medicine perspectives, our study shows that the application of powerful machine learning techniques (in this case, Bayesian Network and Random Forests) makes enough difference to explore and justify their potentials for predicting future HIV care status for any PLWH population using integrated data sources. It also holds promise for predicting at the individual level, which could aid preventing dropouts in HIV medical care. To decipher the complex probability relationships observed from the decision trees, Neural Network and Bayesian Network for improving the prediction of future care status, Deeper Learning is needed. This work provides evidence for the increased use of machine learning techniques when large-scale integrated datasets exist. Observed performance differences also justify the need to apply machine learning methods on large-scale integrated data from multiple sources.

Limitations and future direction

This study presents a few limitations. First, as a pilot study, we only selected a limited number of input variables for machine learning experiments guided by domain experts. There may be important variables that can be used to improve prediction but were not included in machine learning. Second, other states in HIV care continuum such as linkage-to-care and recipient-of-care are equivalently important for HIV care adherence but were not tested in the machine learning experiments. Third, there might be intricate relations (e.g., temporal relations) among input variables across the years in care. The present study did not aim to investigate these relations. Fourth, identified predictive factors need to be further evaluated to confirm their clinical value. To address these limitations, future studies are directed to (1) expand the input variables and other states in HIV care continuum as the PLWH data repository continues to include and link data, (2) develop advanced predictive models (e.g., Deep Learning) to solve the computational complexity raised by the expanded dataset, and (3) conduct a comprehensive evaluation on the identified predictive factors.

ACKNOWLEDGMENTS

Research reported in this publication was supported by the National Institute of Allergy And Infectious Diseases of the National Institutes of Health under Award Number R01AI127203. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We thank the South Carolina Department of Health and Environmental Control, Office of Revenue and Fiscal Affairs and various state agencies for the data provided. The content provided is solely the responsibility of the authors and does not represent the views of these organizations.

Funding received for this work: This work is sponsored by a grant (5R01AI127203) from the National Institute of Allergy and Infectious Diseases.

REFERENCES

  • 1.Cohen MS, Chen YQ, McCauley M, Gamble T, Hosseinipour MC, Kumarasamy N, … & Godbole SV (2011). Prevention of HIV-1 infection with early antiretroviral therapy. New England Journal of Medicine, 365(6), 493–505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Shrestha RK, Gardner L, Marks G, Craw J, Malitz F, Giordano TP, … & Mugavero M. (2015). Estimating the cost of increasing retention in care for HIV-infected patients: results of the CDC/HRSA retention in care trial. Journal of acquired immune deficiency syndromes (1999), 68(3), 345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Centers for Disease Control and Prevention. Understanding the HIV care continuum. 2019. Understanding the HIV Care Continuum; Retrieved from https://www.cdc.gov/hiv/policies/continuum.html [Google Scholar]
  • 4.Mugavero MJ, Amico KR, Horn T, et al. The state of engagement in HIV care in the United States: from cascade to continuum to control. Clin Infect Dis 2013;57:1164–71. [DOI] [PubMed] [Google Scholar]
  • 5.Amico KR, Modi R, Westfall AO, Willig J, Keruly JC, Napravnik S, … & Long DM Viral Suppression Among People Initiating HIV Care: Outcomes from iENGAGE Trial. Age (years), 36, 12. [Google Scholar]
  • 6.Crepaz N, Dong X, Wang X, Hernandez AL, & Hall HI (2018). Racial and ethnic disparities in sustained viral suppression and transmission risk potential among persons receiving HIV care—United States, 2014. Morbidity and Mortality Weekly Report, 67(4), 113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.SC Department of Health and Environmental Control (DHEC) An Epidemiologic Profile of HIV and AIDS in South Carolina 2019. Retrieved from https://scdhec.gov/sites/default/files/media/document/2019-Epi-Profile.pdf
  • 8.Edun B, Iyer M, Albrecht H, et al. The South Carolina HIV Cascade of Care. South Med J 2015;108:670–4. [DOI] [PubMed] [Google Scholar]
  • 9.White House Office of National AIDS Policy. National HIV/AIDS strategy for the United States: updated to 2020. White House Office of National AIDS Policy; Washington, DC: 2015. [Google Scholar]
  • 10.Lall P, Lim SH, Khairuddin N, et al. An urgent need for research on factors impacting adherence to and retention in care among HIV-positive youth and adolescents from key populations. J Int AIDS Soc 2015;18:19393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tripathi A, Youmans E, Gibson JJ, et al. The impact of retention in early HIV medical care on viro-immunological parameters and survival: a statewide study. AIDS Res Hum Retroviruses 2011;27:751–8. [DOI] [PubMed] [Google Scholar]
  • 12.Hall HI, Gray KM, Tang T, et al. Retention in care of adults and adolescents living with HIV in 13 US areas. JAIDS J Acquir Immune Defic Syndr 2012;60:77–82. [DOI] [PubMed] [Google Scholar]
  • 13.Poteat T, Hanna DB, Rebeiro PF, et al. Characterizing the Human Immunodeficiency Virus Care Continuum Among Transgender Women and Cisgender Women and Men in Clinical Care: A Retrospective Time-series Analysis. Clin Infect Dis 2020;70:1131–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Dailey AF, Johnson AS, Wu B. HIV care outcomes among blacks with diagnosed HIV—United States, 2014. MMWR Morb Mortal Wkly Rep 2017;66:97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Dasgupta S, Oster AM, Li J, et al. Disparities in consistent retention in HIV care—11 states and the District of Columbia, 2011−−2013. Morb Mortal Wkly Rep 2016;65:77–82. [DOI] [PubMed] [Google Scholar]
  • 16.Schranz AJ, Barrett J, Hurt CB, et al. Challenges facing a rural opioid epidemic: treatment and prevention of HIV and hepatitis C. Curr HIV/AIDS Rep 2018;15:245–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Thompson MA, Mugavero MJ, Amico KR, et al. Guidelines for improving entry into and retention in care and antiretroviral adherence for persons with HIV: evidence-based recommendations from an International Association of Physicians in AIDS Care panel. Ann Intern Med 2012;156:817–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Dombrowski JC, Simoni JM, Katz DA, et al. Barriers to HIV care and treatment among participants in a public health HIV care relinkage program. AIDS Patient Care STDS 2015;29:279–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Coyle RP, Schneck CD, Morrow M, et al. Engagement in Mental Health Care is Associated with Higher Cumulative Drug Exposure and Adherence to Antiretroviral Therapy. AIDS Behav 2019;23:3493–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Giordano TP, Gifford AL, White AC Jr, et al. Retention in care: a challenge to survival with HIV infection. Clin Infect Dis 2007;44:1493–9. [DOI] [PubMed] [Google Scholar]
  • 21.Nelson JA, Kinder A, Johnson AS, et al. Differences in selected HIV care continuum outcomes among people residing in rural, urban, and metropolitan areas—28 US jurisdictions. J Rural Heal 2018;34:63–70. [DOI] [PubMed] [Google Scholar]
  • 22.Philbin MM, Feaster DJ, Gooden L, et al. The North-South Divide: Substance Use Risk, Care Engagement, and Viral Suppression Among Hospitalized Human Immunodeficiency Virus--Infected Patients in 11 US Cities. Clin Infect Dis 2019;68:146–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Rebeiro PF, Gange SJ, Horberg MA, et al. Geographic variations in retention in care among HIV-infected adults in the United States. PLoS One 2016;11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hartzler B, Dombrowski JC, Williams JR, et al. Influence of substance use disorders on 2-year HIV care retention in the United States. AIDS Behav 2018;22:742–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Mugavero MJ, Lin H-Y, Willig JH, et al. Missed visits and mortality among patients establishing initial outpatient HIV treatment. Clin Infect Dis 2009;48:248–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Jain KM, Maulsby C, Brantley M, SIF Intervention Team, Kim JJ, Zulliger R, … & Holtgrave DR (2016). Cost and cost threshold analyses for 12 innovative US HIV linkage and retention in care programs. AIDS care, 28(9), 1199–1204. [DOI] [PubMed] [Google Scholar]
  • 27.Rana Aadia I., and Mugavero Michael J.. “How Big Data Science Can Improve Linkage and Retention in Care.” Infectious Disease Clinics 33.3 (2019): 807–815. [DOI] [PubMed] [Google Scholar]
  • 28.Olatosi B, Zhang J, Weissman S, et al. Using Big Data analytics to improve HIV medical care utilisation in South Carolina: A study protocol. BMJ Open 2019;9:e027688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wiens J, Shenoy ES. Machine learning for healthcare: on the verge of a major shift in healthcare epidemiology. Clin Infect Dis 2018;66:149–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Johnson AS, Johnson SD, Hu S, et al. Monitoring selected national HIV prevention and care objectives by using HIV surveillance data: United States and 6 dependent areas, 2017. 2019. [Google Scholar]
  • 31.Feelders A. (1999, September). Handling missing data in trees: surrogate splits or statistical imputation?. In European Conference on Principles of Data Mining and Knowledge Discovery (pp. 329–334). Springer, Berlin, Heidelberg. [Google Scholar]
  • 32.Zhou XH, Eckert GJ, & Tierney WM (2001). Multiple imputation in public health research. Statistics in medicine, 20(9–10), 1541–1549 [DOI] [PubMed] [Google Scholar]
  • 33.Wu J, Roy J, & Stewart WF (2010). Prediction modeling using EHR data: challenges, strategies, and a comparison of machine learning approaches. Medical care, S106–S113. [DOI] [PubMed] [Google Scholar]
  • 34.Fushiki T. (2011). Estimation of prediction error by using K-fold cross-validation. Statistics and Computing, 21(2), 137–146. [Google Scholar]
  • 35.Ahmad LG, Eshlaghy AT, Poorebrahimi A, Ebrahimi M, & Razavi AR (2013). Using three machine learning techniques for predicting breast cancer recurrence. J Health Med Inform, 4(124), 3. [Google Scholar]
  • 36.SAS Cary Documentation: Bayesian Network Retrieved from https://documentation.sas.com/?activeCdc=vdmmlcdc&cdcId=capcdc&cdcVersion=8.5&docsetId=vdmmlref&docsetTarget=n06li68bxx073yn1eujtwuxe1dg3&locale=en
  • 37.SAS Fast Supervised Learning Retrieved from https://documentation.sas.com/?docsetId=vdmmlref&docsetTarget=p1l7tl7hddl0lon138uretl5muac.htm&docsetVersion=8.4&locale=en
  • 38.Velikova MV, Terwisscha van Scheltinga JA, Lucas PJ, & Spaanderman M. (2014). Exploiting causal functional relationships in Bayesian network modelling for personalised healthcare. [Google Scholar]
  • 39.Bayat S, Cuggia M, Rossille D, Kessler M, & Frimat L. (2009). Comparison of Bayesian network and decision tree methods for predicting access to the renal transplant waiting list. In MIE; (pp. 600–604). [PubMed] [Google Scholar]
  • 40.Lappenschaar M, Hommersom A, Lucas PJ, Lagro J, & Visscher S. (2013). Multilevel Bayesian networks for the analysis of hierarchical health care data. Artificial intelligence in medicine, 57(3), 171–183. [DOI] [PubMed] [Google Scholar]
  • 41.Sordo M. (2002). Introduction to neural networks in healthcare. Open Clinical knowledge management for medical care [Google Scholar]
  • 42.Ma F, Chitta R, Zhou J, You Q, Sun T, & Gao J. (2017, August). Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1903–1911). [Google Scholar]
  • 43.O’Neill TJ, Penm J, & Penm J. (2007). A subset polynomial neural networks approach for breast cancer diagnosis. International journal of electronic healthcare, 3(3), 293–302. [DOI] [PubMed] [Google Scholar]
  • 44.Karan O, Bayraktar C, Gümüşkaya H, & Karlık B. (2012). Diagnosing diabetes using neural networks on small mobile devices. Expert Systems with Applications, 39(1), 54–60. [Google Scholar]
  • 45.Choi E, Bahadori MT, Sun J, Kulas J, Schuetz A, & Stewart W. (2016). Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems (pp. 3504–3512). [Google Scholar]
  • 46.Samanta S, & Das S. (2009, December). A fast supervised method of feature ranking and selection for pattern classification. In International Conference on Pattern Recognition and Machine Intelligence (pp. 80–85). Springer, Berlin, Heidelberg. [Google Scholar]
  • 47.Razzaghi T, Roderick O, Safro I, & Marko N. (2015, July). Fast imbalanced classification of healthcare data with missing values. In 2015 18th International Conference on Information Fusion (Fusion) (pp. 774–781). IEEE. [Google Scholar]
  • 48.Gordon L. (2013, April). Using classification and regression trees (CART) in SAS® enterprise miner TM for applications in public health. In SAS Global Forum; (Vol. 2013, p. 2013). [Google Scholar]
  • 49.García MNM, Herráez JCB, Barba MS, & Hernández FS (2016). Random forest based ensemble classifiers for predicting healthcare-associated infections in intensive care units. In Distributed Computing and Artificial Intelligence, 13th International Conference (pp. 303–311). Springer, Cham. [Google Scholar]
  • 50.Ali J, Khan R, Ahmad N, & Maqsood I. (2012). Random forests and decision trees. International Journal of Computer Science Issues (IJCSI), 9(5), 272. [Google Scholar]
  • 51.Razzaghi T, Roderick O, Safro I, & Marko N. (2016). Multilevel weighted support vector machine for classification on healthcare data with missing values. PloS one, 11(5), e0155119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Naraei P, Abhari A, & Sadeghian A. (2016, December). Application of multilayer perceptron neural networks and support vector machines in classification of healthcare data. In 2016 Future Technologies Conference (FTC) (pp. 848–852). IEEE. [Google Scholar]
  • 53.Son Youn-Jung, Kim Hong-Gee, Kim Eung-Hee, Choi Sangsup, and Lee Soo-Kyoung. “Application of support vector machine for prediction of medication adherence in heart failure patients.” Healthcare informatics research 16, no. 4 (2010): 253–259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Lee SK, Kang BY, Kim HG, & Son YJ (2013). Predictors of medication adherence in elderly patients with chronic diseases using support vector machine models. Healthcare informatics research, 19(1), 33–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Haas LR, Takahashi PY, Shah ND, Stroebel RJ, Bernard ME, Finnie DM, and Naessens JM, “Risk-stratification methods for identifying patients for care coordination.” The American journal of managed care, vol. 19, no. 9, pp. 725–732, 2013. [PubMed] [Google Scholar]
  • 56.Plis K, Bunescu R, Marling C, Shubrook J, and Schwartz F, “A machine learning approach to predicting blood glucose levels for diabetes management,” Modern Artificial Intelligence for Health Analytics. Papers from the AAAI-14, 2014. [Google Scholar]
  • 57.Choi E, Schuetz A, Stewart WF, & Sun J. (2017). Using recurrent neural network models for early detection of heart failure onset. Journal of the American Medical Informatics Association, 24(2), 361–370. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Bulsara SM, Wainberg ML, & Newton-John TR (2018). Predictors of adult retention in HIV care: a systematic review. AIDS and behavior, 22(3), 752–764. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES