Skip to main content
PLOS Medicine logoLink to PLOS Medicine
. 2020 Mar 6;17(3):e1003042. doi: 10.1371/journal.pmed.1003042

Mapping and characterising areas with high levels of HIV transmission in sub-Saharan Africa: A geospatial analysis of national survey data

Caroline A Bulstra 1,2,#, Jan A C Hontelez 1,2,*,#, Federica Giardina 1, Richard Steen 1, Nico J D Nagelkerke 1, Till Bärnighausen 2,3,4, Sake J de Vlas 1
Editor: Ruanne V Barnabas5
PMCID: PMC7059914  PMID: 32142509

Abstract

Background

In the generalised epidemics of sub-Saharan Africa (SSA), human immunodeficiency virus (HIV) prevalence shows patterns of clustered micro-epidemics. We mapped and characterised these high-prevalence areas for young adults (15–29 years of age), as a proxy for areas with high levels of transmission, for 7 countries in Eastern and Southern Africa: Kenya, Malawi, Mozambique, Tanzania, Uganda, Zambia, and Zimbabwe.

Methods and findings

We used geolocated survey data from the most recent United States Agency for International Development (USAID) demographic and health surveys (DHSs) and AIDS indicator surveys (AISs) (collected between 2008–2009 and 2015–2016), which included about 113,000 adults—of which there were about 53,000 young adults (27,000 women, 28,000 men)—from over 3,500 sample locations. First, ordinary kriging was applied to predict HIV prevalence at unmeasured locations. Second, we explored to what extent behavioural, socioeconomic, and environmental factors explain HIV prevalence at the individual- and sample-location level, by developing a series of multilevel multivariable logistic regression models and geospatially visualising unexplained model heterogeneity. National-level HIV prevalence for young adults ranged from 2.2% in Tanzania to 7.7% in Mozambique. However, at the subnational level, we found areas with prevalence among young adults as high as 11% or 15% alternating with areas with prevalence between 0% and 2%, suggesting the existence of areas with high levels of transmission Overall, 15.6% of heterogeneity could be explained by an interplay of known behavioural, socioeconomic, and environmental factors. Maps of the interpolated random effect estimates show that environmental variables, representing indicators of economic activity, were most powerful in explaining high-prevalence areas. Main study limitations were the inability to infer causality due to the cross-sectional nature of the surveys and the likely under-sampling of key populations in the surveys.

Conclusions

We found that, among young adults, micro-epidemics of relatively high HIV prevalence alternate with areas of very low prevalence, clearly illustrating the existence of areas with high levels of transmission. These areas are partially characterised by high economic activity, relatively high socioeconomic status, and risky sexual behaviour. Localised HIV prevention interventions specifically tailored to the populations at risk will be essential to curb transmission. More fine-scale geospatial mapping of key populations,—such as sex workers and migrant populations—could help us further understand the drivers of these areas with high levels of transmission and help us determine how they fuel the generalised epidemics in SSA.


Jan Hontelez and colleagues study the spatial distribution of HIV infections among young adults in 7 countries in Eastern and Southern Africa.

Author summary

Why was this study done?

  • Previous studies showed that heterogeneity in human immunodeficiency virus (HIV) prevalence exists among the general population in Eastern and Southern Africa, the geographic area most severely affected by the HIV pandemic.

  • Whereas HIV prevalence among adults does not reveal when persons have been infected, young adults are most likely recently infected, and therefore high-prevalence areas among this subpopulation can proxy locations of ongoing transmission.

  • The location and underlying determinants of high HIV prevalence areas among young adults can help to shape spatially targeted and risk-group–tailored interventions to reduce transmission.

What did the researchers do and find?

  • We found clear areas of high prevalence in young adults in between vast regions with relatively low prevalence for all 7 countries in Eastern and Southern Africa.

  • HIV prevalence in young adults was partly explained by an interplay of behavioural, socioeconomic, and environmental (i.e., economic activity) factors, and environmental factors were especially predictive of high transmission locations.

What do these finding mean?

  • Our findings, together with the existing evidence, indicate that key population dynamics, especially related to seasonal and economic migration and associated sex work, might play a major role in fuelling HIV transmission.

  • In further reducing HIV transmission in Eastern and Southern Africa, areas of high HIV prevalence in young adults should be priority areas for tailored HIV prevention interventions towards reaching the fast-track commitments to end the HIV epidemic by 2030.

Introduction

Sustainable Development Goal (SDG) 3, “to ensure healthy lives and promote well-being for all at all ages” [1], together with the Joint United Nations Programme on HIV/AIDS (UNAIDS) fast-track strategy, explicitly call to end the pandemic by 2030 [2]. In 2017, about 37 million people were living with human immunodeficiency virus (HIV) worldwide, 70% of whom were residing in sub-Saharan Africa (SSA) [3]. The countries in Eastern and Southern Africa are especially severely affected by the pandemic, with general population prevalences ranging from 5% in Tanzania to 27% in eSwatini (former Swaziland) [3]. Mounting evidence suggests that these HIV epidemics are heterogeneous [4,5] and that the transmission of HIV is largely concentrated across clustered micro-epidemics of different scales [6,7]. As these high-prevalence areas are likely important drivers of the epidemic [8,9], identifying their location and underlying determinants is essential to further optimise HIV prevention and treatment interventions.

Although mapping overall HIV prevalence in the adult population gives an adequate indication of treatment service needs [4,5], it is not straightforward to use such data to inform policy makers on areas with high levels of transmission, because many, especially older, adults were infected many years prior to any survey and possibly at other locations. Mapping heterogeneity of HIV prevalence in young adults, who are most likely to have been recently infected, will more directly pinpoint areas of high HIV transmission [10]. Furthermore, identifying underlying determinants of heterogeneity in HIV prevalence among young adults in the high endemic countries can help shape spatially targeted and risk-group–tailored interventions to reduce transmission.

We identified areas of high HIV prevalence in young adults (women 15–24 years and men 15–29 years) for 7 countries in Eastern and Southern Africa (Kenya, Malawi, Mozambique, Tanzania, Uganda, Zambia, and Zimbabwe), using geolocated HIV prevalence data from demographic and health surveys (DHSs) and AIDS indicator surveys (AISs). Next, we explored to what extent sexual behavioural, socioeconomic, and environmental factors explain the geospatial heterogeneity in young adults, as well as what heterogeneity remains unexplained.

Methods

Data

The Demographic and Health Survey Programme is a programme that conducts national, population-level DHSs worldwide, in which individuals are interviewed about a wide range of behavioural, socioeconomic, and epidemiological parameters. The AISs and many DHSs include voluntary HIV testing in adults. Approximately 350 sample locations (primary sampling units) are randomly sampled throughout the country of interest, and at each location residents of about 25 households are sampled. All individuals that were at home during one of the visits and were between 15 and 49 years of age (women) or 54 years of age (men) were eligible for the survey. GPS coordinates of sample locations are randomly displaced up to 2 km for urban and up to 5 km for rural sample locations, to ensure confidentiality of participants. Sample weights are incorporated in the DHS to translate unbalanced sampling into national representative data. In our study, we used these sample weights to estimate national HIV prevalence among adults and young adults for all 7 countries. For the other analyses, we combined data of multiple countries and explored local-level variances and thus did not use the sampling weights. More details on survey protocols and questionnaires can be found on the DHS website (https://dhsprogram.com/).

We extracted data from countries in Eastern and Southern African that had a DHS or an AIS conducted and for which behavioural data, geographical coordinates of the sample locations, and HIV biomarker surveys were available. In case of multiple eligible surveys for a country, we selected the most recent one. The countries and surveys chosen for this study were Kenya (years 2008–2009), Malawi (2015–2016), Mozambique (2009), Tanzania (2011–2012), Uganda (2011), Zambia (2013–2014), and Zimbabwe (2015). The overall study area with DHS sample locations for each country included in the study are shown in S1 Fig. We selected all adults (women 15–49 years and men 15–54 years) and sub-selected young adults (women 15–24 years and men 15–29 years). The discrepancy in age cut-off reflects the common age difference in sexual debut and relationships between women and men [11].

A typical DHS data set contains over 250 variables. For the purpose of our study, we only extracted individual-level candidate variables that were deemed of interest, based on findings from previous studies [4,6,8,1215]: age, sex, HIV status, educational level, wealth index, primary occupation, whether the person is a de jure household member (usual resident) or de facto household member (slept in the household last night) as a proxy for mobility, number of lifetime sexual partners, number of sexual partners during the past 12 months, having had a sexually transmitted infection (STI) or signs of an STI during the past 12 months, condom use during last intercourse, male circumcision, and being paid for sexual intercourse during the past 12 months (only men). HIV status is determined by testing a blood sample from a finger prick with an enzyme-linked immunosorbent assay (ELISA). The other variables were self-reported by participants via the survey questionnaires.

We used the following environmental variables at sample locations: urban versus rural classification of the cluster, population density, proximity to highways, proximity to cities with more than 250,000 inhabitants, proximity to border crossings and major ports, enhanced vegetation index (EVI), and global human footprint (GHF). EVI reflects the vegetation in an area, in which low values represent areas with no or little green vegetation (e.g., urban areas) and high values represent areas where vegetation is more abundant (e.g., agricultural land, forests, grassland), and can be used as a proxy for degree of urbanization [4]. GHF represents the relative human influence and economic activity, incorporating 9 layers covering infrastructure, land use, population density, and access (coastlines, roads, railroads, navigable rivers). Areas of large human influence and high levels of economic activity are characterised by high GHF values. Population density estimates were originally obtained from WorldPop (https://www.worldpop.org/), and data from 2010—the most recent year—were utilised. EVI and GHF were available from the National Aeronautics and Space Administration (NASA) Earth Observatory Group (http://sedac.ciesin.columbia.edu/). EVI data were available for 2010, and GHF data were available for 2005. Because DHS sample locations are, to some extent, randomly displaced, population density, EVI, and GHF at each un-displaced DHS sample location are also made available for each survey. Locations of major cities were extracted from the World Population Review website (www.worldpopulationreview.com/worldcities/), and highways were derived from GADM national infrastructure shape files (http://www.gadm.org/) based on Google Maps. Locations of border crossings and major ports were obtained through the Southern Africa Integrated Regional Transport Program report (2010) [16]. The shortest Euclidean distances from each DHS sample location to the nearest highway, major city, and border crossing or port were calculated. An overview and description of all included variables can be found in S1 Table. Maps of the included environmental variables are provided in S2 Fig to S8 Fig.

Statistical analyses

Our study was explorative in nature, and we did not have a formal prespecified analysis plan. First, we determined the spatial distribution in HIV prevalence among adults and young adults by interpolating logit-transformed DHS sample location-level HIV prevalence data using ordinary kriging [17]. Kriging is an interpolation method based on the spatial autocorrelation of variables [18]. Spatial autocorrelation was measured by means of Moran’s I, using the inverse distance between sample locations as weights. Spatial autocorrelation structures were obtained through semivariogram modelling, in which the average squared difference in HIV prevalence between each pair of data points (on the y-axis) is plotted against the corresponding distance between the point-pairs (on the x-axis). The overall relation between HIV prevalence and distance was estimated by fitting an exponential curve through these points. We used this to create continuous surface maps of HIV prevalence, in which the HIV prevalence at each 5 km2 grid cell was estimated using the aforementioned method. The equations and model estimates are provided in S1 Equations. To enhance the power of our study, we decided to not stratify the kriging by sex in the main analysis. However, we also present sex-specific maps of kriged HIV prevalence (S9 Fig). We compared the sex-specific surfaces of HIV prevalence by means of mapping the square root of the squared difference in HIV prevalence (per 5 km2 grid cell), to illustrate the absolute differences in HIV prevalence (S10 Fig panel A), and by plotting the predicted HIV prevalence (per 5 km2 grid cell) of women against the predicted HIV prevalence of men (S10 Fig panel C). Both comparisons show that there are only minor differences in terms of the locations of high HIV prevalence for both sexes.

Second, we developed a series of multiple multilevel logistic regression models to determine to what extent behavioural, socioeconomic, and environmental determinants of HIV can explain individual- and location-level HIV prevalence among young adults. Missing values (up to 2.6%) were checked to be missing at random and, if so, imputed using multiple imputation. First, bivariate associations were tested, and all variables with a p-value larger than 0.1 were excluded. We then developed multiple models using a stepwise approach. In the first step, we made an ‘empty’ model—with HIV prevalence as the dependent variable, with only age- and sex-fixed effects, and with location random effects as predictors. In the second step, we used stepwise forward selection to construct 3 separate multiple regression models for behavioural, socioeconomic, and environmental factors, respectively, out of the nested model. Likelihood tests were used to determine whether the addition of a variable improved the statistical fit of the regression models significantly (p < 0.05). In the third step, we used the same stepwise forward selection to construct a full model containing both behavioural, socioeconomic, and environmental factors. We did not adjust for country-level confounding in the main analysis because we expect the associations between HIV and the predictor variables to be similar across countries. However, we did perform a sensitivity analysis in which we added country-fixed effects to the final model. We compared the marginal and conditional R2 of the models at each step. The marginal R2 indicates how much of the HIV heterogeneity is explained by the fixed factors in the model. The conditional R2 represents the amount of heterogeneity explained by both fixed and random factors in the model. By comparing both R2 values, we assessed how much of HIV heterogeneity is explained by the fixed factors in the models and how much is additionally captured by the location-level random effect in each model [19]. We translated variance of random effects into Median Odds Ratios (MORs) as an indicator of geographical heterogeneity. For each model, the MOR would be equal to 1.0 if there were no differences in probability of being HIV infected per sample location, and it can be interpreted as the increase in (median) HIV risk that is associated with moving from a location with a low random effect to a location with a high random effect. See elsewhere for a more detailed explanation [20]. The MOR equation is provided in S1 Equations. The final model is also fitted as modified Poisson (with robust variance) for easier interpretation of the estimates, here as relative risks (RRs) instead of odds ratios (ORs).

Third, we extracted and compared the location-specific random effect estimates from the empty model, the 3 separate models for behavioural, socioeconomic, and environmental factors, as well as the ‘full’ regression model. We kriged random effect estimates from the 5 models to visualise to what extent the variables in the different models explain the geospatial HIV heterogeneity and areas with a high HIV prevalence.

Next to the main analyses, we also performed an internal validation of our regression models and an external validation of our kriging results. The internal validation was done using nonrandom cross-validation [21,22] to check, for each country individually, how much heterogeneity (indicated by the conditional R2) was explained by the final regression model. For the external validation, we searched peer-reviewed literature reporting on age-stratified HIV prevalence estimates in population-based cohorts, such as those part of the ALPHA network [23], situated in one of the countries of our study. We compared our predicted HIV prevalence among young adults, obtained through kriging, to the estimates from these cohorts.

All analyses were done using ArcGIS Pro version 2.3 and R version 3.4.3. Reporting of study design and analysis followed RECORD guidelines (S1 Checklist) [24].

Ethical approval

All the utilised DHS and AIS data sets are publicly available, and the Demographic and Health Survey Programme de‐identifies all data before making them available to the public. The geospatial data (WorldPop, NASA) do not contain variables at the level of human subjects. Therefore, this work did not require ethical approval.

Results

Our study included 112,785 adults, of which there were 53,234 young adults (25,536 women, 27,698 men) from 3,665 different sample locations throughout the 7 countries. The number of individuals included in the study as well as location-level HIV prevalence in the study population for adults and young adults are provided in Fig 1. Among adults, the mean country-level HIV prevalence ranges from 5.4% (Tanzania) to 14.4% (Zimbabwe). Among young adults, mean country-level HIV prevalences are generally lower, ranging from 2.2% (Tanzania) to 7.7% (Mozambique), while the median HIV prevalences are (close to) zero. This reflects the fact that most sample locations have very low HIV prevalence, and only a minority of sample locations have high prevalence. HIV prevalence among adults and young adults is strongly spatially clustered (p < 0.001); the observed Moran’s I index values were 0.13 and 0.05, respectively (on a scale from −1, fully scattered, to 1, fully clustered). A detailed overview of the Moran’s I and (logit-transformed) HIV prevalence density plots can be found in S2 Table and S11 Fig and S12 Fig, respectively.

Fig 1.

Fig 1

Median, mean, and DHS sample location variance in national HIV prevalence estimates among adults (A) and young adults (B) for each country included in this study. The mean, presented in the labels left of the bars, represents the weighted HIV prevalence per country. Data obtained through https://dhsprogram.com/. DHS, demographic and health survey; HIV, human immunodeficiency virus.

Fig 2 shows the geospatial distribution of HIV prevalence in adults (panel A) and young adults (panel B) in 7 countries of Eastern and Southern Africa. All countries showed substantial levels of heterogeneity in HIV prevalence at the subnational level. Overall, HIV prevalence is higher among adults—with high-prevalence areas in the same locations—but prevalence among adults is larger and more spread out compared to among young adults (illustrated by the large red and purple areas). Geospatial heterogeneity in HIV prevalence among young adults was more profound than among adults in most of the countries in our analysis, illustrated by clear concentrated micro-epidemics (red and purple areas) located in between areas of very low prevalence (white, yellow, and orange areas). In both Zambia and Zimbabwe, high-prevalence areas, of over 15% HIV prevalence in young adults, were found. The national HIV prevalence among young adults in Malawi is about 3.6%, yet our analysis identified areas where HIV prevalence reached levels of up to 11%, in particular around the highways and major cities in the south. Similarly, in Kenya HIV prevalence in young adults is about 3.9% nationally, yet prevalence around Lake Victoria reaches levels of over 15%. Maps and scatterplots illustrating the more detailed differences in HIV prevalence (per 5 km2 grid cell) between adults and young adults are provided in S6 Fig.

Fig 2.

Fig 2

Continuous surface maps of HIV prevalence in adults (women 15–49 years and men 15–54 years) (A) and young adults (women 15–24 years and men 15–29 years) (B) for 7 countries in Eastern and Southern Africa. Predicted geographical distribution of HIV prevalence resulted from interpolating data on HIV prevalence in geolocated sample locations derived from the most recent DHS or AIS in each country, using ordinary kriging. Major cities (more than 250,000 inhabitants) are indicated on the maps. To enhance comparison between both panels, we applied the same legend for HIV prevalence levels. The HIV prevalence maps for women and men separately are provided in S4 Fig. Data obtained through https://dhsprogram.com/. AIS, AIDS indicator survey; DHS, demographic and health survey; HIV, human immunodeficiency virus.

According to the resulting best-fitting multiple multilevel logistic regression model on the association between HIV status and sexual behavioural variables in young adults, the following variables were strongly associated with being infected with HIV: 10 or more reported lifetime sex partners, an STI or STI symptoms over the past 12 months, condom use during last intercourse, and not being circumcised (men only). Educational level, wealth index, and occupation were variables associated with HIV in the final socioeconomic model. The highest level of education was most protective (adjusted odds ratio [aOR] 0.52 [0.26–0.78], p < 0.001), whereas being from asset quintile 4—the second wealthiest quintile—was associated with the highest risk of HIV (aOR 1.46 [1.31–1.62], p < 0.001). Several variables were significantly associated with HIV in the environmental model. Young adults living in rural sample locations were less likely to be infected with HIV than those with urban residence: in cities, the overall HIV prevalence among adults was over 7%, compared to below 4% in rural settings. Also, population density, proximity to nearest major city, EVI, and GHF at location of a DHS cluster showed a significant association with HIV. HIV prevalence levels were highest (about 6%) in areas with the highest population density (more than 500 people per km2) but was also relatively high (about 5%) in areas with the lowest population density (less than 25 people per km2). HIV prevalence levels did not differ considerably between sample locations with different levels of greenness (indicated by the EVI). Living in an area with a relatively high GHF, as a proxy for economic activity, was associated with a high HIV risk (aOR 1.68 [1.41–1.94], p < 0.001): HIV prevalence levels for the highest two levels of GHF were almost 7%, compared to around 4% at the lower levels. The best-fitting combined ‘full’ model contains both behavioural, socioeconomic, and environmental variables: lifetime number of sex partners, STIs, male circumcision, education, type of residence, EVI, and GHF. A complete overview of the bivariate models, nested model (only adjusted for age and sex), best-fitting models, and final model can be found in S3 to S8 Tables and S13 Fig. Finally, the combined ‘full’ model was also fitted as a modified Poisson regression model, resulting in adjusted relative risks that were comparable to the aORs from the logistic regression model (S9 and S10 Tables).

The results in Table 1 show that 7.2% (marginal R2: e.g., percentage of heterogeneity explained by fixed effects in the model) of the 26.3% (conditional R2: e.g., percentage of heterogeneity explained by both fixed and random effects in the model) of the HIV heterogeneity could be explained by age and sex alone (‘nested’ model). The remaining variance captured by the model (19.1%) is attributed to location-level random effects. Environmental fixed effects explain most of the heterogeneity in HIV prevalence among young adults (marginal R2 11.4%, conditional R2 26.8%), higher than behavioural (marginal R2 10.2%, conditional R2 27.8%) or socioeconomic (marginal R2 8.6%, conditional R2 25.8%) fixed effects. According to the R2 of the combined ‘full’ best-fitting model, HIV heterogeneity could be best explained (marginal R2 15.6%, conditional R2 29.6%) by an interplay of sexual behavioural, socioeconomic, and environmental variables. The MOR of the nested model is 2.41, whereas the MOR of the final model is 1.94. This illustrates that, although some of the location-level heterogeneity is captured by the model fixed-effect covariates, almost two-thirds of the heterogeneity in HIV prevalence at sample locations still could not be explained by the covariates in the model. Overall, environmental fixed effects reduce the location-level heterogeneity more than sexual behavioural and socioeconomic fixed effects (MORs of 1.95, 2.35, and 2.30, respectively). Adjusting for country improves the model fit but does not change the importance of the different predictors and does not considerably increase the degree of explained heterogeneity (marginal R2 17.9%, conditional R2 29.3%) (see S10 Table).

Table 1. Overview of the heterogeneity (R2) explained by the best-fitting multilevel multiple logistic regression models and random-effects MOR.

Conditional R2: total heterogeneity explained by model (%) Marginal R2: heterogeneity explained by included fixed effects (%) Random-effect R2: location-level heterogeneity captured by model (%) MOR
Nested ‘empty’ model 26.3 7.2 19.1 2.41
Sexual behavioural model 27.8 10.2 17.6 2.35
Socioeconomic model 25.8 8.6 17.2 2.30
Environmental model 26.8 11.4 15.4 1.95
Combined ‘full’ model 29.6 15.6 14.0 1.94

Abbreviation: MOR, Median Odds Ratio

The maps in Fig 3 show the interpolated random effects estimates—i.e., the unexplained heterogeneity in HIV prevalence—for the 5 models. The white areas represent locations where relatively most heterogeneity is explained by the model. The red and purple areas represent locations where relatively the least heterogeneity is explained. As expected, random effect estimates in the nested model were highest in high-prevalence areas (panel A) and decline as fixed effects are added to the models (panels B–E). Interpolated random effects estimates from the combined model (panel B) are substantially reduced. However, the geospatial heterogeneity in many areas with a high prevalence remains unexplained, for example, around Lake Victoria (1), at the major ports of Mozambique (2–4), at Plumtree (5), around Mongu (6) and the Copperbelt (7) and Nchelenge (8) districts in Zambia. In most of these locations, environmental variables were better at explaining heterogeneity (panel E) than sexual behavioural or socioeconomic variables (panels C and D, respectively).

Fig 3.

Fig 3

Maps present the interpolated random effect estimates of the nested ‘empty’ logistic regression model (A), best-fitting logistic regression model (B), and the separate models including only sexual behavioural (C), socioeconomic (D), and environmental variables (E) among young adults (women 15–24 years and men 15–29 years) for 7 countries in Eastern and Southern Africa. For the nested ‘empty’ model (A), random effect estimates reflect HIV prevalence levels among young adults (see Fig 2B). For the other models (B–E), random effect estimates are lower at (some of the) areas with high HIV prevalence levels, indicating that the additional variables in each model to some extent explain HIV heterogeneity at these locations. Circles point out high-prevalence areas of HIV among young adults (1) around Lake Victoria, (2) around Nacala (and Pemba) port, (3) around Beira (and Quelimane) port, (4) around Maputo city and port, (5) around Plumtree border crossing, (6) around Mongu city, (7) in the Copperbelt mining area, and (8) in Nchelenge district. HIV, human immunodeficiency virus.

Discussion

Our findings showed that substantial levels of spatial heterogeneity in HIV prevalence exist among adults and young adults throughout all 7 Eastern and Southern African countries analysed in this study. Especially in young adults, micro-epidemics of relatively high prevalence alternated with areas of very low prevalence, clearly illustrating the existence of areas with high levels of transmission. Overall, 15.6% (marginal R2) of the heterogeneity in HIV prevalence could be explained by an interplay of behavioural, socioeconomic, and environmental factors, including number of sex partners, STIs, GHF (a proxy for economic activity), and urbanization. Maps of interpolated random effect estimates at each sample location showed that environmental predictors were better at specifically predicting HIV prevalence at the high-prevalence areas than sexual behavioural or socioeconomic variables, yet substantial heterogeneity at other high-prevalence areas remains unexplained.

The geospatial patterns of HIV prevalence heterogeneity among adults shown in our study (Fig 2A) are very comparable to patterns shown in other recent studies in which other methods for spatial interpolation were used [4,5], confirming that our approach was suitable for creating reliable estimates. As an external validation for our geospatial patterns of HIV prevalence in young adults, we compared our estimates to the estimates from multiple small-scale surveillance sites from the ALPHA network [23]. We found that most estimates are comparable, but our estimates were lower for the Rakai (Uganda) and Manicaland (Zimbabwe) areas (S14 Fig). Both areas are characterised by well-known high levels of HIV transmission and relatively low numbers of sample locations in the utilised data with varying prevalence levels, leading to underestimation of the HIV prevalence. This indicates that our approach of kriging, in which prevalence at a specific location is estimated by the prevalence in surrounding clusters as a function of the distance between the location and surrounding clusters, may have resulted in an underestimation (i.e. smoothing) of HIV prevalence estimates, especially in high-prevalence areas.

Our findings that environmental factors were the most important determinants of geospatial heterogeneity is strengthened by the observation that all high HIV prevalence areas among young adults are in locations with known high levels of economic activity: characterised by high production and flow of goods and services [25,26]. This suggests that high-risk dynamics (involving seasonal work and commercial sex) in these areas might be important in generating this heterogeneity [27,28]. For instance, the fishing communities around Lake Victoria in Uganda (circle 1 in Fig 3) have frequently been reported as sites with high levels of transactional sex and HIV [29,30]. Furthermore, many high-prevalence areas in our analyses cluster around border crossings, major highways or major ports (e.g. circles 2 to 5 in Fig 3). Long distance truck driving and associated commercial sex have been documented as important contributors to HIV transmission [27,29]. In addition, some of the high-prevalence areas are in regions known to have high levels of migration, either work-related (seasonal) migration or other types of migration. For example, the Copperbelt mining area (circle 7 in Fig 3) or Nchelenge district in Zambia (circle 8 in Fig 3) in Zambia, which is known for its active fishing industry and a big refugee settlement. Mining and fishing areas have long been recognised as high risk settings for HIV transmission, as domestic or foreign male workers often work long stints, separated from their families surrounded by an active sex industry [2933].

Individual HIV prevalence in young adults was strongly associated with the reported number of lifetime sex partners and reported prevalence of STIs or STI symptoms, providing further evidence that geospatial clustering of HIV is linked to the clustering of risk behaviour, across all countries [12]. We found that the reported condom use at last sex act was associated with an increased risk of being infected with HIV. This is a well-known counterintuitive finding, and reflects the fact that condom use tends to be higher (yet sometimes insufficient) among people with riskier sexual behaviour, or reflects bias due to the fact that people who are aware of their positive HIV status are more likely to use a condom, to protect their partner [34]. This finding highlights the need to stimulate condom use, but more importantly increase access to effective HIV prevention interventions more broadly. Our finding that having education beyond the primary level seems to be strongly protective against HIV, is consistent with observations in the literature [13], and suggests that structural interventions to improve educational attainment [14] could help reduce HIV transmission in adolescent women.

Our results have important implications for the planning of prevention and treatment programs. High HIV prevalence areas in young adults are likely areas of high transmission [10], requiring prioritization of tailored prevention interventions. Our key findings that (i) high HIV prevalence areas among young adults are located at economically active or developing areas and (ii) that HIV among young adults is driven by risky sexual behaviour; indicate that preventive interventions targeted at young adults at these specific locations could strongly impact HIV transmission. Prevention programs aimed at improving exposure and uptake of effective prevention interventions for young adults [35,36]–such as pre-exposure prophylaxis (PrEP) [37], condoms [38], or voluntary male medical circumcision programmes [15,39]–should be prioritized in these areas. Also, intervention programs for young adults should aim at early diagnosis and treatment initiation of those who get infected with HIV, by creating accessible, affordable and youth-friendly HIV testing and counselling services [35,40]. Furthermore, governments typically have prior knowledge about future economic developments. Improving the resilience of affected populations against the associated health risks, might be considered an integral part of such development. Moreover, as the population in these areas is likely to be highly mobile [41,42], effective prevention for young adults in high-prevalence areas may not only affect the local HIV epidemics, but also the wider epidemic. Ultimately, our results demonstrate that the decade-old mantra of “know your epidemic, know your response” [43] is still highly relevant for SSA. Our study found important common denominators that are associated with increased HIV risk in areas with high levels of transmission, but it will be essential for policy makers to specifically evaluate the behavioural and socioeconomic context, and the interventions already in place at a specific high-risk setting to tailor interventions appropriately.

To our knowledge, we are the first to utilise geolocated HIV prevalence data from young adults specifically, to explore geospatial heterogeneity in the HIV epidemics of Eastern and Southern Africa, and identify potential areas with high levels of transmission. Previously, Cuadros et al proposed a co-kriging approach to estimate subnational HIV prevalence estimates, incorporating HIV prevalence and environmental determinants [4], yet they did not stratify by age. In addition, Palk and Blower recently showed that places of high HIV prevalence in adults (aged 15–49 years) in Malawi are associated with reported higher rates of high-risk sex, defined as the number of lifetime partners [12]. Our results show that heterogeneity in HIV is associated with a range of sexual behavioural, socioeconomic and environmental variables, and that these are highly affected by age. Therefore, a standardised approach to map heterogeneity of an HIV epidemic should take age stratifications into account, and future studies on geospatial heterogeneity and its drivers should include behavioural, socioeconomic and environmental determinants.

Our results have limitations. First, DHSs produce cross-sectional data intended to provide a measure of HIV prevalence and behaviour among the general population; and thus, high-risk subpopulations such as female sex workers, men who have sex with men, and mobile populations such as truck drivers and seasonal workers, are thought to be underrepresented in these surveys. Therefore, we cannot make definitive conclusions on economic activity and key populations driving HIV transmission in at these locations. Nevertheless, our analyses showed that high HIV prevalence areas in DHS data, especially among young adults, were almost invariably located near economically active areas suggesting that these key-population dynamics are still visible through general population-based surveys. Extending DHSs with other data sources that allow for mapping of key populations or performing incidence essays on HIV samples will allow for more accurate identification of HIV transmission areas and can further enhance our understanding of the epidemic. Second, DHS sampling locations are randomly selected, based on the underlying population density within a country. Consequently, very few locations were sampled from areas with large nature and wildlife conservation reserves, such as in Northern Kenya, Central Tanzania, and Northern and South-western Mozambique, and our interpolated prevalence estimates for such areas should be interpreted with caution. Designing alternative sampling techniques that over-sample areas with low densities could increase reliability of interpolated survey results. Third, we used survey data from a relatively wide range of years; 2008–2009 to 2015–2016. This time period in the study coincides with major initiatives to curb the pandemic, in particular the scale-up of antiretroviral treatment, as well as voluntary male circumcision campaigns and other HIV prevention interventions. Although these initiatives were potentially disproportionally targeted at high transmission areas, we expect that the disproportionate impact would not be so extreme that it completely alter the locations of high transmission areas. Furthermore, although the scale-up of these interventions may have reduced HIV incidence, the associations between HIV and the hypothesised main drivers of HIV transmission–the (sexual) behavioural, socioeconomic and environmental factors explored in this study–likely did not change substantially by these interventions. Fourth, explorative statistical analyses always run the risk of identifying patterns of random noise [44]. However, we believe this risk to be extremely low in our study, due to the very large sample size (n = 53,234), the rigid preselection of variables based on substantive knowledge, and the directions and magnitudes of association between the included covariates and HIV found in our study are in line with of findings from previous studies [25]. In addition, we performed nonrandom cross-validation by testing the final fitted model for each country separately. Reassuringly we found that, despite the differences in underlying epidemic and scale-up of interventions across countries, the conditional R2 of the model was strikingly similar for six out of the eight countries in our analysis, ranging between 23.5% and 36.0% (compared to 29.6% in the main analysis). Only the conditional R2 for Kenya (45.0%) and Zimbabwe (17.6%) deviated a little bit more from the combined model, yet not to an alarming extend (S11 Table).

In conclusion, we our findings show that consistent clustering of HIV prevalence exists among young adults in seven high burden countries in Eastern and Southern Africa, with clearly identifiable high-prevalence areas. This heterogeneity is driven by an interplay of behavioural, socioeconomic and environmental factors, and the locations of high-prevalence areas suggest that key population dynamics, especially related to seasonal and economic migration and associated sex work, play a major role. In further reducing HIV transmission in Eastern and Southern Africa, areas of high HIV prevalence in young adults could be priority areas for tailored HIV prevention interventions in line with SDG3 and UNAIDS targets to end the HIV pandemic by 2030.

Supporting information

S1 Checklist. The RECORD statement—Checklist of items, extended from the STROBE statement, that should be reported in observational studies using routinely collected health data.

(DOCX)

S1 Equations. Mathematical equations of our kriging model and MOR calculations.

(DOCX)

S1 Fig. Overview of the study area in SSA (top right panel) and the DHS and AIS sample locations (blue dots) for the 7 countries included in this study: Kenya (n = 394), Malawi (n = 847)1, Mozambique (n = 270), Tanzania (n = 570)1, Uganda (n = 470)1, Zambia (n = 719), and Zimbabwe (n = 400).

1AIS.

(PDF)

S2 Fig. Map presents whether the sample location is classified as urban or rural.

(PDF)

S3 Fig. Map presents the population density for the study area.

(PDF)

S4 Fig. Map presents the proximity from each sample location to the nearest highway.

(PDF)

S5 Fig. Map presents the proximity from each sample location to the nearest major city (more than 250,000 inhabitants).

(PDF)

S6 Fig. Map presents the proximity from each sample location to the nearest border crossing or major port.

(PDF)

S7 Fig. Map presents the EVI for the study area.

(PDF)

S8 Fig. Map presents the GHF for the study area.

(PDF)

S9 Fig

Maps present the predicted HIV prevalence in women (15–49 years) (A) and men (15–54 years) (B) for 7 countries in Eastern and Southern Africa. The maps of HIV prevalence among adults and young adults are shown in Fig 2. Continuous surface maps were created by kriging HIV prevalence data obtained from (https://dhsprogram.com/).

(PDF)

S10 Fig

Maps and scatterplots illustrating the difference in HIV prevalence (per 5 km2 grid cell) between women and men (A and C, respectively) and between adults and young adults (B and D, respectively), for 7 countries of Eastern and Southern Africa.

(PDF)

S11 Fig

Density plots illustrating the overall sample location-level distributions of HIV prevalence among adults (A), young adults (B), women (C), and men (D) for each country included in this study.

(PDF)

S12 Fig

Density plots illustrating the overall logit-transformed sample location-level distributions of HIV prevalence among adults (A), young adults (B), women (C), and men (D) for each country included in this study, as used for semivariogram modelling and ordinary kriging. The logit-transformed HIV prevalence of −6 (on the x-axis) represents a prevalence of 0%, −5 of 1%, −4 of 2%, −3 of 5%, −2 of 12%, −1 of 27%, 0 of 50%, and 1 of 73.

(PDF)

S13 Fig

Plots illustrating the observed versus the predicted sample location-level HIV prevalence (A) and the observed versus the predicted number of HIV cases (B) among young adults (women 15–24 years and men 15–29 years) for the combined ‘full’ best-fitting multiple multilevel regression model per DHS sample location for 7 countries of Eastern and Southern Africa (also see S9 Table).

(PDF)

S14 Fig. Map of HIV prevalence estimates for young adults, as interpolated in this study, and HIV prevalence estimates for young adults as reported from 7 population-based cohorts at small-scale geographical sites (ALPHA network) within the area covered by this study.

(PDF)

S1 Table. Overview of all variables included in the study.

(DOCX)

S2 Table. Spatial autocorrelation of HIV prevalence at the sample location level, estimated by Moran’s I index.

(DOCX)

S3 Table. Bivariate logistic regression models of HIV status and behavioural, socioeconomic, and environmental variables in young adults (women 15–24 years and men 15–29 years of age) in 7 countries of Eastern and Southern Africa, adjusted for age and sex.

Data obtained through (https://dhsprogram.com/).

(DOCX)

S4 Table. Multiple multilevel nested ‘empty’ logistic regression model of HIV status in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age and sex.

Data obtained through (https://dhsprogram.com/).

(DOCX)

S5 Table. Multiple multilevel logistic regression model of HIV status and behavioural variables in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age and sex.

Data obtained through (https://dhsprogram.com/).

(DOCX)

S6 Table. Multiple multilevel logistic regression model of HIV status and socioeconomic variables in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age and sex.

Data obtained through (https://dhsprogram.com/).

(DOCX)

S7 Table. Multiple multilevel logistic regression model of HIV status and environmental variables in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age and sex.

Data obtained through (https://dhsprogram.com/).

(DOCX)

S8 Table. Combined ‘full’ multiple multilevel logistic regression model of HIV status and behavioural, socioeconomic, and environmental variables in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age and sex.

Data obtained through (https://dhsprogram.com/).

(DOCX)

S9 Table. Combined ‘full’ multiple multilevel model as modified Poisson regression (with robust variance) of HIV status and behavioural, socioeconomic, and environmental variables in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age and sex.

Data obtained through (https://dhsprogram.com/).

(DOCX)

S10 Table. Combined ‘full’ multiple multilevel logistic regression model of HIV status and behavioural, socioeconomic, and environmental variables in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age, sex, and country.

Data obtained through (https://dhsprogram.com/).

(DOCX)

S11 Table. Overview of the heterogeneity (R2) explained by the full final logistic regression model (see S9 Table), for each country separately.

(DOCX)

Acknowledgments

The authors thankfully acknowledge Daan Nieboer, statistician at the Erasmus Medical Center, for the helpful discussions regarding statistical methods.

Abbreviations

AIS

AIDS indicator survey

aOR

adjusted odds ratio

DHS

demographic and health survey

ELISA

enzyme-linked immunosorbent assay

EVI

enhanced vegetation index

GHF

global human footprint

HIV

human immunodeficiency virus

MOR

Median Odds Ratio

NASA

National Aeronautics and Space Administration

OR

odds ratio

RR

relative risk

SDG

Sustainable Development Goal

SSA

sub-Saharan Africa

STI

sexually transmitted infection

UNAIDS

Joint United Nations Programme on HIV/AIDS

USAID

United States Agency for International Development

Data Availability

All utilised data are open-source, and the hyperlinks to the different data sources are provided in the Methods section of the manuscript.

Funding Statement

This study was funded by the Dutch AIDS Foundation (P-29702). Furthermore, JH was supported by the NWO Talent Scheme. Till Bärnighausen was supported by the Alexander von Humboldt Foundation through the Alexander von Humboldt Professor award, funded by the Federal Ministry of Education and Research; the Wellcome Trust; and from NICHD of NIH (R01-HD084233), NIA of NIH (P01-AG041710), NIAID of NIH (R01-AI124389 and R01-AI112339) as well as FIC of NIH (D43-TW009775). The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The views expressed in this article are our own and not necessarily those of the funders.

References

  • 1.Nations United. Transforming our World: The 2030 Agenda for Sustainable Development. United Nations Division for Social Policy and Development, Indigenous Peoples; 2017. Available from: https://sustainabledevelopment.un.org/post2015/. [cited 1 June 2019]. [Google Scholar]
  • 2.UNAIDS. Fast-track: ending the AIDS epidemic by 2030. 2014.
  • 3.UNAIDS. UNAIDS report 2017, Joint United Nations Programme on HIV/AIDS (UNAIDS). 2017. [PubMed]
  • 4.Cuadros DF, Li J, Branscum AJ, Akullian A, Jia P, Mziray EN, et al. Mapping the spatial variability of HIV infection in Sub-Saharan Africa: Effective information for localized HIV prevention and control. Nature. 2017;7: 1–11. 10.1038/s41598-017-09464-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Dwyer-Lindgren L, Cork MA, Sligar A, Steuben KM, Wilson KF, Provost NR, et al. Mapping HIV prevalence in sub-Saharan Africa between 2000 and 2017. Nature. 2019; 10.1038/s41586-019-1200-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Tanser F, LeSueur D, Solarsh G, Wilkinson D. HIV heterogeneity and proximity of homestead to roads in rural South Africa: An exploration using a geographical information system. Trop Med Int Heal. 2000;5: 40–46. 10.1046/j.1365-3156.2000.00513.x [DOI] [PubMed] [Google Scholar]
  • 7.Zulu LC, Kalipeni E, Johannes E. Analyzing spatial clustering and the spatiotemporal nature and trends of HIV/AIDS prevalence using GIS: The case of Malawi, 1994–2010. BMC Infect Dis. 2014;14: 1–21. 10.1186/1471-2334-14-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Barankanira E, Molinari N, Niyongabo T, Laurent C. Spatial analysis of HIV infection and associated individual characteristics in Burundi: Indications for effective prevention. BMC Public Health. 2016;16: 1–11. 10.1186/s12889-015-2639-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Anderson SJ, Cherutich P, Kilonzo N, Cremin I, Fecht D, Kimanga D, et al. Maximising the effect of combination HIV prevention through prioritisation of the people and places in greatest need: A modelling study. Lancet. 2014;384: 249–256. 10.1016/S0140-6736(14)61053-9 [DOI] [PubMed] [Google Scholar]
  • 10.Mahy M, Garcia-Calleja JM, Marsh KA. Trends in HIV prevalence among young people in generalised epidemics: implications for monitoring the HIV epidemic. Sex Transm Infect. 2012;88: i65–i75. 10.1136/sextrans-2012-050789 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Luke N, Kurz K. Cross-generational and transactional sexual relations in sub-Saharan Africa Washington, DC, International Center for Research on Women. 2002;1, 1–42. [Google Scholar]
  • 12.Palk L, Blower S. Geographic variation in sexual behavior can explain geospatial heterogeneity in the severity of the HIV epidemic in Malawi. BMC Med. 2018;16: 22 10.1186/s12916-018-1006-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Neve J De, Fink G, Subramanian S V, Moyo S, Bor J. Length of secondary schooling and risk of HIV infection in Botswana: evidence from a natural experiment. Lancet. 2015;3: 470–477. 10.1016/S2214-109X(15)00087-X [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Baird SJ, Garfein RS, McIntosh CT, Özler B. Effect of a cash transfer programme for schooling on prevalence of HIV and herpes simplex type 2 in Malawi: A cluster randomised trial. Lancet. 2012;379: 1320–1329. 10.1016/S0140-6736(11)61709-1 [DOI] [PubMed] [Google Scholar]
  • 15.Auvert B, Taljaard D, Lagarde E, Sobngwi-Tambekou J, Sitta R, Puren A. Randomized, controlled intervention trial of male circumcision for reduction of HIV infection risk: The ANRS 1265 trial. PLoS Med. 2005;2: 1112–1122. 10.1371/journal.pmed.0020298 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.JICA. Preparatory Survey for Southern Africa Integrated Regional Transport Program. 2010. Available from: http://open_jicareport.jica.go.jp/pdf/11991007_01.pdf. [cited 1 June 2019].
  • 17.Diggle PJ, Tawn JA, Moyeed RA. Model-Based Geostatistics. J R Stat Soc. 1998;67: 617–666. [Google Scholar]
  • 18.Miller HJ. Tobler’s First Law and Spatial Analysis. Ann Assoc Am Geogr. 2004;94: 284–289. 10.1111/j.1467-8306.2004.09402005.x [DOI] [Google Scholar]
  • 19.Nakagawa S, Schielzeth H. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol Evol. 2013;4: 133–142. 10.1111/j.2041-210x.2012.00261.x [DOI] [Google Scholar]
  • 20.Merlo J, Chaix B, Ohlsson H, Beckman A, Johnell K, Hjerpe P, et al. A brief conceptual tutorial of multilevel analysis in social epidemiology: Using measures of clustering in multilevel logistic regression to investigate contextual phenomena. J Epidemiol Community Health. 2006;60: 290–297. 10.1136/jech.2004.029454 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Steyerberg EW, Harrell FE. Prediction models need appropriate internal, internal—external, and external validation. J Clin Epidemiol. 2016;69: 245–247. 10.1016/j.jclinepi.2015.04.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Steyerberg EW. Validation in prediction research: the waste by data splitting. J Clin Epidemiol. 2018;103: 131–133. 10.1016/j.jclinepi.2018.07.010 [DOI] [PubMed] [Google Scholar]
  • 23.Reniers G, Wamukoya M, Urassa M, Nyaguara A, Nakiyingi-Miiro J, Lutalo T, et al. Data resource profile: network for analysing longitudinal population-based HIV/AIDS data on Africa (ALPHA Network). Int J Epidemiol. 2016;45: 83–93. 10.1093/ije/dyv343 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. Int J Surg. 2014;12: 1495–1499. 10.1016/j.ijsu.2014.07.013 [DOI] [PubMed] [Google Scholar]
  • 25.Parkhurst JO. Structural approaches for prevention of sexually transmitted HIV in general populations: definitions and an operational approach. J Int AIDS Soc. 2014;17: 19052 10.7448/IAS.17.1.19052 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.University of Toronto. The Nature of Economic Activity. Available from: https://www.economics.utoronto.ca/jfloyd/modules/neastkf.html. [cited 25 Nov 2019].
  • 27.Cassels S, Camlin CS. Geographical mobility and heterogeneity of the HIV epidemic. Lancet HIV. 2016;3: e339–e341. 10.1016/S2352-3018(16)30048-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Steen R, Hontelez JAC, Mugurungi O, Mpofu A, Matthijsse SM, de Vlas SJ, et al. Economy, migrant labour and sex work: interplay of HIV epidemic drivers in Zimbabwe over three decades. AIDS. 2019;33: 123–131. 10.1097/QAD.0000000000002066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Chang LW, Health R, Program S, Program BI, Hopkins J, Grabowski MK, et al. Heterogeneity of the HIV epidemic: an observational epidemiologic study of agrarian, trading, and fishing communities in Rakai, Uganda. Lancet HIV. 2016;3: 1–20. 10.1016/S2352-3018(16)30034-0.Heterogeneity [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tumwesigye NM, Atuyambe L, Wanyenze RK, Kibira SP, Li Q, Wabwire-Mangen F, et al. Alcohol consumption and risky sexual behaviour in the fishing communities: Evidence from two fish landing sites on Lake Victoria in Uganda. BMC Public Health. 2012;12: 1–11. 10.1186/1471-2458-12-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Faria NR, Rambaut A, Suchard MA, Baele G, Bedford T, Ward MJ, et al. The early spread and epidemic ignition of HIV-1 in human populations. Science. 2014;346: 56–61. 10.1126/science.1256739 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Corno L, Walque D De. Mines, Migration and HIV / AIDS in Southern Africa. J Afr Econ. 2012;21: 465–498. 10.1093/jae/ejs005 [DOI] [Google Scholar]
  • 33.Clift S, Anemona A, Watson-Jones D, Kanga Z, Ndeki L, Changalucha J, et al. Variations of HIV and STI prevalences within communities neighbouring new goldmines in Tanzania: importance for intervention design. Sex Transm Infect. 2003;79: 307 LP– 312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bachanas P, Medley A, Pals S, Antelman G, Benech I, Deluca N, et al. Disclosure, Knowledge of Partner Status, and Condom Use Among HIV-Positive Patients Attending Clinical Care in Tanzania, Kenya, and Namibia. AIDS Patient Care STDS. 2013;27: 425–435. 10.1089/apc.2012.0388 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Mavedzenge SMN, Doyle AM, Ross DA. HIV prevention in young people in sub-Saharan Africa: a systematic review. J Adolesc Heal. 2011;49: 568–586. [DOI] [PubMed] [Google Scholar]
  • 36.Pettifor A, Bekker L-G, Hosek S, DiClemente R, Rosenberg M, Bull S, et al. Preventing HIV among young people: research priorities for the future. J Acquir Immune Defic Syndr. 2013;63: S155 10.1097/QAI.0b013e31829871fb [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Baeten JM, Donnell D, Ndase P, Mugo NR, Campbell JD, Wangisi J, et al. Antiretroviral Prophylaxis for HIV Prevention in Heterosexual Men and Women. N Engl J Med. 2012;367: 399–410. 10.1056/NEJMoa1108524 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Creese A, Floyd K, Alban A, Guinness L. Cost-effectiveness of HIV / AIDS interventions in Africa: a systematic review of the evidence. Lancet. 2002;359: 1635–1642. 10.1016/S0140-6736(02)08595-1 [DOI] [PubMed] [Google Scholar]
  • 39.Samuelson J, Dickson K. Progress in scale-up of male circumcision for HIV prevention in Eastern and Southern Africa: Focus on service delivery World Health Organization and UNAIDS; 2011. [Google Scholar]
  • 40.Wong VJ, Murray KR, Phelps BR, Vermund SH, McCarraher DR. Adolescents, young people, and the 90-90-90 goals: a call to improve HIV testing and linkage to treatment. AIDS. 2017;31: S191 10.1097/QAD.0000000000001539 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Deane KD, Samwell Ngalya P, Boniface L, Bulugu G, Urassa M. Exploring the relationship between population mobility and HIV risk: Evidence from Tanzania. Glob Public Health. 2016; 1–16. 10.1080/17441692.2016.1178318 [DOI] [PubMed] [Google Scholar]
  • 42.McGrath N, Hosegood V, Newell ML, Eaton JW. Migration, sexual behaviour, and HIV risk: A general population cohort in rural South Africa. Lancet HIV. 2015;2: e252–e259. 10.1016/S2352-3018(15)00045-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Wilson D, Halperin DT. “Know your epidemic, know your response”: a useful approach, if we get it right. Lancet. 2008;372: 423–426. 10.1016/S0140-6736(08)60883-1 [DOI] [PubMed] [Google Scholar]
  • 44.Gasser T, Sroka L, Jennen-Steinmetz C. Residual variance and residual pattern in nonlinear regression. Biometrika. 1986;73: 625–633. [Google Scholar]

Decision Letter 0

Richard Turner

13 Nov 2019

Dear Dr. Hontelez,

Thank you very much for submitting your manuscript "Mapping and characterising HIV transmission hotspots in sub-Saharan Africa: a geospatial analysis of national survey data" (PMEDICINE-D-19-03490) for consideration at PLOS Medicine.

Your paper was discussed with an academic editor with relevant expertise and sent to independent reviewers, including a statistical reviewer. The reviews are appended at the bottom of this email and any accompanying reviewer attachments can be seen via the link below:

[LINK]

In light of these reviews, we will not be able to accept the manuscript for publication in the journal in its current form, but we would like to invite you to submit a revised version that fully addresses the reviewers' and editors' comments. You will appreciate that we cannot make a decision about publication until we have seen the revised manuscript and your response, and we expect to seek re-review by one or more of the reviewers.

In revising the manuscript for further consideration, your revisions should address the specific points made by each reviewer and the editors. Please also check the guidelines for revised papers at http://journals.plos.org/plosmedicine/s/revising-your-manuscript for any that apply to your paper. In your rebuttal letter you should indicate your response to the reviewers' and editors' comments, the changes you have made in the manuscript, and include either an excerpt of the revised text or the location (eg: page and line number) where each change can be found. Please submit a clean version of the paper as the main article file; a version with changes marked should be uploaded as a marked up manuscript.

In addition, we request that you upload any figures associated with your paper as individual TIF or EPS files with 300dpi resolution at resubmission; please read our figure guidelines for more information on our requirements: http://journals.plos.org/plosmedicine/s/figures. While revising your submission, please upload your figure files to the PACE digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at PLOSMedicine@plos.org.

We hope to receive your revised manuscript by Dec 04 2019 11:59PM. Please email us (plosmedicine@plos.org) if you have any questions or concerns.

***Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.***

We ask every co-author listed on the manuscript to fill in a contributing author statement, making sure to declare all competing interests. If any of the co-authors have not filled in the statement, we will remind them to do so when the paper is revised. If all statements are not completed in a timely fashion this could hold up the re-review process. If new competing interests are declared later in the revision process, this may also hold up the submission. Should there be a problem getting one of your co-authors to fill in a statement we will be in contact. YOU MUST NOT ADD OR REMOVE AUTHORS UNLESS YOU HAVE ALERTED THE EDITOR HANDLING THE MANUSCRIPT TO THE CHANGE AND THEY SPECIFICALLY HAVE AGREED TO IT. You can see our competing interests policy here: http://journals.plos.org/plosmedicine/s/competing-interests.

Please use the following link to submit the revised manuscript:

https://www.editorialmanager.com/pmedicine/

Your article can be found in the "Submissions Needing Revision" folder.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/plosmedicine/s/submission-guidelines#loc-methods.

Please ensure that the paper adheres to the PLOS Data Availability Policy (see http://journals.plos.org/plosmedicine/s/data-availability), which requires that all data underlying the study's findings be provided in a repository or as Supporting Information. For data residing with a third party, authors are required to provide instructions with contact information for obtaining the data. PLOS journals do not allow statements supported by "data not shown" or "unpublished results." For such statements, authors must provide supporting data or cite public sources that include it.

Please let me know if you have any questions. Otherwise, we look forward to receiving your revised manuscript in due course.

Sincerely,

Richard Turner, PhD

Senior Editor, PLOS Medicine

rturner@plos.org

-----------------------------------------------------------

Requests from the editors:

Our academic editor commented that the term "hotspot" is not favoured by people with HIV, and we ask you to amend this.

Also, it was suggested that "the framing could be that by addressing areas of high incidence we are addressing inequities rather than focusing on areas perceived to be driving the epidemic".

In your abstract, please specify the range of years that the source datasets refer to. Also, please add summary demographic details for study participants.

Please add a new final sentence to the "methods and findings" subsection of your abstract to summarize the study's main limitations.

At line 40, please start the "Conclusions" subsection with "In this study, we found that ..." or similar.

After the abstract, we ask you to add a new and accessible "author summary" section in non-identical prose. You may find it helpful to consult one or two recent research papers published in PLOS Medicine to get a sense of the preferred style.

In your methods section, please briefly mention the situation with ethics approval (e.g., not required for this analysis).

Early in the methods section of your main text, please state whether the study had a protocol or prespecified analysis plan, and if so attach the relevant document(s) as a supplementary file (referred to in the methods). Please highlight analyses that were not prespecified.

Please add a completed checklist for the most appropriate reporting guideline, which may be RECORD, as a supplementary file (again referred to in your methods section. In the checklist, individual items should be referred to by section (e.g., "Methods") and paragraph number rather than by line or page numbers, as the latter generally change in the event of publication.

When discussing your conclusions, e.g. in the first paragraph of your Discussion section, please ensure that findings are referred to consistently in the past tense (e.g., "alternated" at line 268) with conclusions in the present tense (e.g., "Our findings show ..." at line 266).

Please avoid claims of primacy (e.g., at line 317), and where used add "to our knowledge" or similar.

Please remove "(80-)" from reference 17.

Comments from the reviewers:

*** Reviewer #1:

I confine my remarks to statistical aspects of this paper. I have a couple of fairly major issues to resolve before I can recommend publication, but I think there's a good paper here.

My first question is what was done to avoid finding patterns in pure noise? How can we be sure that these patterns are not just things found by the power of the method?

The second big thing is the model formation. The authors did bivariate screening followed by stepwise methods. Both of these are problematic. All of the output from such methods is wrong: Parameter estimates are biased away fro 0, p values are too small, standard errors are too small (see Harrell **Regression Modeling Strategies* for lots of details). It would be better to use substantive knowledge to build a model. If the authors insist on not using their knowledge and want an automatic method, the LASSO is probably the best of the readily available methods.

More minor points

The authors should list all the variables and how they were measured. Not just the ones that wound up in the model, but all the ones that were available

Line 139 "Multivariate" should be "multiple"

Line 144 Age and sex should be fixed effects

Line 153 and other places R^2 is mistyped as R2

Line 158 I don't understan how random effects were translated into MOR. First, random effects are usually nuisance variables that you are not particularly interested in. Second, the standard errors for random effects are very poorly estimated (so much that some programs don't even print them) so I think the authors mean fixed effects. Third, I don't know how the translation was done.

Figure 1 How were the outliers dealt with?

Peter Flom

*** Reviewer #2:

This work attempts to map 'hotspots' for young adults (proxy for higher transmission hotspots for targeting) in seven countries in Eastern and Southern Africa, specifically identify significant clustering and associated determinants potentially explaining the underlying heterogeneity observed. The work I believe makes a sufficient contribution/step forward to the existing body of literature in this area, is well written and should be useful for policy makers in these countries. I have some comments and suggestions below that need to be addressed before the paper can be accepted.

General: Identifying the hotspots is first natural step as currently presented in the paper. However the current detailed description of explained versus unexplained heterogeneity explained for the larger covariate groupings in the main results text is not in a policy maker friendly format and needs to be revised to maximise the readability/usability of these results. It would be more useful to present key findings highlighting which risk factors or covariates are the most attributable both within and across countries to assist with geographic/locally tailored intervention packages.

General: It is also important present the spatial hotspot/interpolation surface by gender in addition to overall (e.g. SDG 5: Gender equality) to confirm if the same hotspots are identified for both males and females. It may also be worth considering shared component analysis of the gender specific prevalence surfaces to see to which degree they correlate at finer geographic scale and how this varies within and across the countries.

General: Apologies if missed it but there is little or no contextualisation of the results with regards to SDG targets for HIV and how these results are relevant to assessing progress towards these targets and assisting with retailoring of intervention packages across the individual countries in this analysis.

Major comments:

1) The analysis utilised data from DHS/AIS surveys, however there is also a wealth of small geographic scale data in the demographic surveillance sites (DSS) in the region and ongoing HIV work in these sites e.g. ALPHA network. I was wondering if the authors considered using prevalence estimates from these DSS as an external validation source for the applied kriging exercise? i.e. does the model predict well at unsampled locations that potentially overlap with DSS.

2) Did the authors consider a Bayesian spatial approach/implementation in R-INLA as opposed to ordinary kriging? A spatial Bayesian multivariable formulation including the covariates might then be able to produce uncertaintiy around predictions at unsampled locations and more efficiently adjust for spatial correlation in nearby sampled clusters as per DHS design. I suspect that a Bayesian approach in INLA (which would be computationally tractable) would perform better than the approaches currently employed in the paper.

3) For the risk factor analysis/modelling I can understand not using the original sampling weights from the DHS/AIS. However when estimating prevalence and generating a smoothed prevalence surface I would suspect that these weights be more important and allow uncertainty intervals (e.g. Figure 1 etc). Did the authors perform a sensitivity analysis to confirm that the surface without weights would be fairly similar to a surface utilising the survey weights? The weights would also be useful if you were to add additional visualisations projecting absolute counts of HIV positive young adults by high versus lower risk clusters. Prevalence without weighted correction may mask underlying population size differences within and across countries.

4) The environmental covariates if I understand it are homogeneous for all individuals within a DHS survey "cluster" or community while those measured at individual level varying at that level. I wonder if this may impact the variable selection approach, especially if income or SES at household level is linked up with GHF (economic activity component) at cluster level.

5) Population attributability - it may be worth considering a Poisson model with robust variance to estimate relative risks and the leverage prevalence of exposure to various risk factors to estimate population attributable fractions for each in addition to the effect sizes currently presented. This may reveal some additional subtle differences across the countries and within. Additional a decomposition type approach (e.g. Shapley) using a generalised linear framework could also be used to estimate the relative importance of the various covariates.

7) Results narrative - i think the explanation of the directionality of association for the environmental covariate e.g. EVI and GHF needs to be improved (e.g. lines 216-217).

8) If I understand Figure 3 correctly the least heterogeneity is explained in the areas with the highest prevalence among young adults (Figure 2)? This is important and has implication for the conclusions/utility of the results.

Minor

1)Risk factor analysis: S2 Table - the coefficient for gender looks incorrect i.e. 0.00? Especially given the highly significant coefficient in S3 Table.

*** Reviewer #3:

This well-organized study uses geospatial methods to identify areas of high HIV prevalence among young adults within 7 countries in Eastern and sub-Saharan Africa. Additionally, the authors present a regression model that relates epidemiological predictors to HIV prevalence. While the approach is sound, the significance of is more of an incremental contribution rather than a major advance from published geospatial modeling studies of HIV prevalence, such as the works by Cuadros et al and Dwyer-Lindgren et al that are cited in the introduction of the paper. The focus on HIV prevalence among young adults is interesting and could be developed further.

Major critiques:

1. The paper focuses on HIV prevalence among young adults, with the claim that young adults represent more recent infection and that high prevalence locations can be prioritized for prevention interventions. This is an interesting focus and could be developed further in the paper. Are the locations of high HIV prevalence in young adults the same as the locations among the general adult population? How do areas of high prevalence among young adults compare with variation in the underlying population age structure? More can be said in the discussion about efforts to prevent HIV infection specifically in this age group.

2. Line 171-172 and Figure 1. How were the national-level prevalence estimates generated? Are these the weighted means of the clusters values or do they use the estimated HIV prevalence rasters? How do they compare with the published survey reports? It would enhance the results to provide a confidence interval around the estimate.

3. The nine-year timespan over which the survey data were collected means that the resulting maps are a composite of various survey years. The years represented in the study coincided with a major scale-up of ART and HIV prevention interventions, so the difference in timing could be important. The difference in survey years should be included in the limitations.

4. The claim in the discussion lines 278-279 that all high prevalence areas were areas with known high levels of economic activity needs to be tested more rigorously. How is the level of economic activity for an area defined in this analysis? Was the level of economic activity assessed over the entire geographic area? Were areas also found that had high economic activity and low HIV prevalence?

Minor critiques

1. Please make it clear which years are represented for the covariates that were not directly extracted from the DHS or AIS surveys (eg WorldPop).

2. Include a justification for why a larger age group was defined as young adults for males than for females.

3. Line 124 statistical analysis. Please provide the interpolation equations in the supplementary material.

4. How does the amount of variation explained by the models in this study compare to R2 in other published studies?

5. Line 300. I do not agree that the findings of this study support the need to increase condom use specifically. Rather, persistently high HIV incidence supports the need to increase access to effective HIV prevention interventions more broadly.

6. Line 317, "We are the first…". Please modify this statement in light of the other published HIV prevalence maps cited in the paper. While this paper's focus on young adults takes a new lens to these data, they are fully utilized in the other studies as well.

***

Any attachments provided with reviews can be seen via the following link:

[LINK]

Decision Letter 1

Richard Turner

13 Jan 2020

Dear Dr. Hontelez,

Thank you very much for re-submitting your manuscript "Mapping and characterising areas with high levels of HIV transmission in sub-Saharan Africa: a geospatial analysis of national survey data" (PMEDICINE-D-19-03490R1) for consideration at PLOS Medicine.

I have discussed the paper with editorial colleagues and our academic editor, and it was also seen again by three reviewers. I am pleased to tell you that, provided the remaining editorial and production issues are dealt with, we expect to be able to accept the paper for publication in the journal.

The remaining issues that need to be addressed are listed at the end of this email. Any accompanying reviewer attachments can be seen via the link below. Please take these into account before resubmitting your manuscript:

[LINK]

Our publications team (plosmedicine@plos.org) will be in touch shortly about the production requirements for your paper, and the link and deadline for resubmission. DO NOT RESUBMIT BEFORE YOU'VE RECEIVED THE PRODUCTION REQUIREMENTS.

***Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.***

In revising the manuscript for further consideration here, please ensure you address the specific points made by each reviewer and the editors. In your rebuttal letter you should indicate your response to the reviewers' and editors' comments and the changes you have made in the manuscript. Please submit a clean version of the paper as the main article file. A version with changes marked must also be uploaded as a marked up manuscript file.

Please also check the guidelines for revised papers at http://journals.plos.org/plosmedicine/s/revising-your-manuscript for any that apply to your paper. If you haven't already, we ask that you provide a short, non-technical Author Summary of your research to make findings accessible to a wide audience that includes both scientists and non-scientists. The Author Summary should immediately follow the Abstract in your revised manuscript. This text is subject to editorial change and should be distinct from the scientific abstract.

We hope to receive your revised manuscript within 1 week. Please email us (plosmedicine@plos.org) if you have any questions or concerns.

We ask every co-author listed on the manuscript to fill in a contributing author statement. If any of the co-authors have not filled in the statement, we will remind them to do so when the paper is revised. If all statements are not completed in a timely fashion this could hold up the re-review process. Should there be a problem getting one of your co-authors to fill in a statement we will be in contact. YOU MUST NOT ADD OR REMOVE AUTHORS UNLESS YOU HAVE ALERTED THE EDITOR HANDLING THE MANUSCRIPT TO THE CHANGE AND THEY SPECIFICALLY HAVE AGREED TO IT.

Please ensure that the paper adheres to the PLOS Data Availability Policy (see http://journals.plos.org/plosmedicine/s/data-availability), which requires that all data underlying the study's findings be provided in a repository or as Supporting Information. For data residing with a third party, authors are required to provide instructions with contact information for obtaining the data. PLOS journals do not allow statements supported by "data not shown" or "unpublished results." For such statements, authors must provide supporting data or cite public sources that include it.

Please let me know if you have any questions. Otherwise, we look forward to receiving your revised manuscript shortly.

Kind regards,

Richard Turner, PhD

Senior Editor, PLOS Medicine

rturner@plos.org

------------------------------------------------------------

Requests from Editors:

In the abstract, please present numbers in the format of "113,000 adults" at line 32, for example.

Around line 33, again in the abstract, we suggest adding a sentence to quote the range of mean levels of HIV prevalence by country, for adults and young adults, to contrast the local numbers quoted subsequently.

Also at lines 32 and 247, please make that "... of which about 53,000 were young adults ..." or similar.

At line 35, we suggest adapting the current wording to " ... among young adults as high as 11% or 15%"

At line 59, the wording "Heterogeneity could be explained for 15.6% ..." suggests "15.6 of participants", and we ask you to reword this text if you are referring to "15.6% of heterogeneity" (and similarly at line 367).

In the abstract, you may wish to quote additional findings from column 3 of table 2 in a new sentence, say, to help readers appreciate the context of the quoted value of "15.6%".

At line 42, please make that "main study limitations".

Should "eSwatini" be substituted at line 90?

At line 239, please make that "followed RECORD guidelines".

At line 316, should that be "7.2% (marginal R2) or 26.3% (conditional R2) of the HIV heterogeneity ..."?

At line 320, please add a comma ("... sexual, behavioural ...").

At line 363, please make that "substantial levels ... exist among ...".

At line 368, please add a few words to explain what "global human footprint" is.

At line 453, please amend the text to "... DHS surveys produce cross-sectional data intended to provide a measure of HIV prevalence and behavior ..." or similar.

Please add fuller access details to reference 11, as available.

Please remove "[Internet]" from reference 16 and any other relevant references, and add a cited date.

Please convert the RECORD checklist to a separate supplementary file, named "S1_Checklist".

Comments from Reviewers:

*** Reviewer #1:

The authors have addressed my concerns and I now recommend publication

Peter Flom

*** Reviewer #2:

[supportive report received]

*** Reviewer #3:

My requests in the initial review were adequately addressed by the authors.

***

Any attachments provided with reviews can be seen via the following link:

[LINK]

Decision Letter 2

Richard Turner

5 Feb 2020

Dear Dr. Hontelez,

On behalf of my colleagues and the academic editor, Dr. Ruanne Barnabas, I am delighted to inform you that your manuscript entitled "Mapping and characterising areas with high levels of HIV transmission in sub-Saharan Africa: a geospatial analysis of national survey data" (PMEDICINE-D-19-03490R2) has been accepted for publication in PLOS Medicine.

PRODUCTION PROCESS

Before publication you will see the copyedited word document (in around 1-2 weeks from now) and a PDF galley proof shortly after that. The copyeditor will be in touch shortly before sending you the copyedited Word document. We will make some revisions at the copyediting stage to conform to our general style, and for clarification. When you receive this version you should check and revise it very carefully, including figures, tables, references, and supporting information, because corrections at the next stage (proofs) will be strictly limited to (1) errors in author names or affiliations, (2) errors of scientific fact that would cause misunderstandings to readers, and (3) printer's (introduced) errors.

If you are likely to be away when either this document or the proof is sent, please ensure we have contact information of a second person, as we will need you to respond quickly at each point.

PRESS

A selection of our articles each week are press released by the journal. You will be contacted nearer the time if we are press releasing your article in order to approve the content and check the contact information for journalists is correct. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact.

PROFILE INFORMATION

Now that your manuscript has been accepted, please log into EM and update your profile. Go to https://www.editorialmanager.com/pmedicine, log in, and click on the "Update My Information" link at the top of the page. Please update your user information to ensure an efficient production and billing process.

Thank you again for submitting the manuscript to PLOS Medicine. We look forward to publishing it.

Best wishes,

Richard Turner, PhD

Senior Editor

PLOS Medicine

plosmedicine.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Checklist. The RECORD statement—Checklist of items, extended from the STROBE statement, that should be reported in observational studies using routinely collected health data.

    (DOCX)

    S1 Equations. Mathematical equations of our kriging model and MOR calculations.

    (DOCX)

    S1 Fig. Overview of the study area in SSA (top right panel) and the DHS and AIS sample locations (blue dots) for the 7 countries included in this study: Kenya (n = 394), Malawi (n = 847)1, Mozambique (n = 270), Tanzania (n = 570)1, Uganda (n = 470)1, Zambia (n = 719), and Zimbabwe (n = 400).

    1AIS.

    (PDF)

    S2 Fig. Map presents whether the sample location is classified as urban or rural.

    (PDF)

    S3 Fig. Map presents the population density for the study area.

    (PDF)

    S4 Fig. Map presents the proximity from each sample location to the nearest highway.

    (PDF)

    S5 Fig. Map presents the proximity from each sample location to the nearest major city (more than 250,000 inhabitants).

    (PDF)

    S6 Fig. Map presents the proximity from each sample location to the nearest border crossing or major port.

    (PDF)

    S7 Fig. Map presents the EVI for the study area.

    (PDF)

    S8 Fig. Map presents the GHF for the study area.

    (PDF)

    S9 Fig

    Maps present the predicted HIV prevalence in women (15–49 years) (A) and men (15–54 years) (B) for 7 countries in Eastern and Southern Africa. The maps of HIV prevalence among adults and young adults are shown in Fig 2. Continuous surface maps were created by kriging HIV prevalence data obtained from (https://dhsprogram.com/).

    (PDF)

    S10 Fig

    Maps and scatterplots illustrating the difference in HIV prevalence (per 5 km2 grid cell) between women and men (A and C, respectively) and between adults and young adults (B and D, respectively), for 7 countries of Eastern and Southern Africa.

    (PDF)

    S11 Fig

    Density plots illustrating the overall sample location-level distributions of HIV prevalence among adults (A), young adults (B), women (C), and men (D) for each country included in this study.

    (PDF)

    S12 Fig

    Density plots illustrating the overall logit-transformed sample location-level distributions of HIV prevalence among adults (A), young adults (B), women (C), and men (D) for each country included in this study, as used for semivariogram modelling and ordinary kriging. The logit-transformed HIV prevalence of −6 (on the x-axis) represents a prevalence of 0%, −5 of 1%, −4 of 2%, −3 of 5%, −2 of 12%, −1 of 27%, 0 of 50%, and 1 of 73.

    (PDF)

    S13 Fig

    Plots illustrating the observed versus the predicted sample location-level HIV prevalence (A) and the observed versus the predicted number of HIV cases (B) among young adults (women 15–24 years and men 15–29 years) for the combined ‘full’ best-fitting multiple multilevel regression model per DHS sample location for 7 countries of Eastern and Southern Africa (also see S9 Table).

    (PDF)

    S14 Fig. Map of HIV prevalence estimates for young adults, as interpolated in this study, and HIV prevalence estimates for young adults as reported from 7 population-based cohorts at small-scale geographical sites (ALPHA network) within the area covered by this study.

    (PDF)

    S1 Table. Overview of all variables included in the study.

    (DOCX)

    S2 Table. Spatial autocorrelation of HIV prevalence at the sample location level, estimated by Moran’s I index.

    (DOCX)

    S3 Table. Bivariate logistic regression models of HIV status and behavioural, socioeconomic, and environmental variables in young adults (women 15–24 years and men 15–29 years of age) in 7 countries of Eastern and Southern Africa, adjusted for age and sex.

    Data obtained through (https://dhsprogram.com/).

    (DOCX)

    S4 Table. Multiple multilevel nested ‘empty’ logistic regression model of HIV status in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age and sex.

    Data obtained through (https://dhsprogram.com/).

    (DOCX)

    S5 Table. Multiple multilevel logistic regression model of HIV status and behavioural variables in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age and sex.

    Data obtained through (https://dhsprogram.com/).

    (DOCX)

    S6 Table. Multiple multilevel logistic regression model of HIV status and socioeconomic variables in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age and sex.

    Data obtained through (https://dhsprogram.com/).

    (DOCX)

    S7 Table. Multiple multilevel logistic regression model of HIV status and environmental variables in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age and sex.

    Data obtained through (https://dhsprogram.com/).

    (DOCX)

    S8 Table. Combined ‘full’ multiple multilevel logistic regression model of HIV status and behavioural, socioeconomic, and environmental variables in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age and sex.

    Data obtained through (https://dhsprogram.com/).

    (DOCX)

    S9 Table. Combined ‘full’ multiple multilevel model as modified Poisson regression (with robust variance) of HIV status and behavioural, socioeconomic, and environmental variables in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age and sex.

    Data obtained through (https://dhsprogram.com/).

    (DOCX)

    S10 Table. Combined ‘full’ multiple multilevel logistic regression model of HIV status and behavioural, socioeconomic, and environmental variables in young adults (women 15–24 years and men 15–29 years of age) for 7 countries of Eastern and Southern Africa, adjusted for age, sex, and country.

    Data obtained through (https://dhsprogram.com/).

    (DOCX)

    S11 Table. Overview of the heterogeneity (R2) explained by the full final logistic regression model (see S9 Table), for each country separately.

    (DOCX)

    Attachment

    Submitted filename: FINAL_PLOS Medicine_reviewers_comments.docx

    Attachment

    Submitted filename: Response to the requests from the editors FINAL.docx

    Data Availability Statement

    All utilised data are open-source, and the hyperlinks to the different data sources are provided in the Methods section of the manuscript.


    Articles from PLoS Medicine are provided here courtesy of PLOS

    RESOURCES