Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Dec 10.
Published in final edited form as: Stat Med. 2015 Jul 26;34(28):3750–3759. doi: 10.1002/sim.6589

A composite likelihood approach for estimating HIV prevalence in the presence of spatial variation

Kathleen E Wirth a, Denis Agniel b, Christopher D Barr b, Matthew D Austin b, Victor DeGruttola b
PMCID: PMC5029272  NIHMSID: NIHMS709595  PMID: 26215657

Abstract

Since 1990 the World Health Organization (WHO) has recommended HIV surveillance among pregnant women as an essential surveillance activity for countries with generalized HIV epidemics. Despite the widespread availability and potential usefulness of antenatal HIV surveillance, analyses of such data present important challenges. Within an individual clinic, the HIV status of its attendees may be correlated due to similarities in HIV risk among women close in age. Between-clinic correlation may also arise as women often seek antenatal care at clinics located close to their home and individuals living in nearby communities may share important characteristics or behaviours related to susceptibility. A general estimating equations-based approach for spatially-correlated, binary data such as that antenatal HIV surveillance based on a pairwise composite likelihood has been described. We present an extended version of this model that can accommodate penalized spline estimators and apply it to antenatal HIV surveillance data collected in 2011 in Botswana to estimate the effects of proximity to the “hotspot” of the country’s HIV epidemic and age on HIV prevalence. Finally, we compare the results to a logistic regression analysis which ignores potential correlation of responses.

Keywords: Composite likelihood, spatial variation, HIV, Botswana, antenatal surveillance

1. Introduction

Since 1990 the World Health Organization (WHO) has recommended HIV surveillance among pregnant women as an essential surveillance activity for countries with generalized HIV epidemics.[1] Indeed, more than 118 countries worldwide routinely conduct anonymous HIV testing within selected antenatal clinics, including >90% of countries in sub-Saharan Africa.[1] More than two-thirds of all pregnant women in sub-Saharan Africa access antenatal care at least once during their pregnancy making them a relatively easy population to access.[2] In hyperendemic epidemic settings (i.e. >15% adult HIV prevalence) where all sexually active persons are considered to be at elevated risk of HIV infection, antenatal HIV surveillance may be used to monitor trends in HIV prevalence in the general population. However, the extent to which pregnant women represent the general population in these settings depends on the proportion of women who access antenatal care and who receive care outside of the surveillance system (e.g. private clinics) as well as the system’s geographic distribution and coverage.[3, 4, 5] Finally, given the extensive financial and human resources required for community randomized trials, it will be valuable for public health researchers and practitioners in resource-constrained settings to make use of existing surveillance methods such as antenatal HIV surveillance to evaluate the impact of HIV prevention strategies.

The Botswana Ministry of Health has conducted HIV surveillance of pregnant women since 1992.[6] Initially undertaken in only a select number of high-prevalence districts, antenatal HIV surveillance is now conducted bi-annually throughout the country with more than 280 hospitals, clinics, and health posts contributing to data collection efforts. In 2011, the last year for which data are available, 6,745 women provided a specimen for HIV testing and completed a brief social and demographic questionnaire. Overall, more than 30% of pregnant women participating in the surveillance study tested positive for HIV. The highest rates were recorded among women aged 35 to 39 years (52%) and those presenting for services in the eastern-most region of the country (42%).[6] In response to these data and others[7] showing continued high prevalence among adults in Botswana, the Botswana Ministry of Health, in partnership with the Centers for Disease Control and Prevention, Harvard School of Public Health, and the Botswana Harvard AIDS Institute Partnership, has developed and plans to implement a novel “combination prevention” intervention; the intervention will simultaneously implement multiple evidence-based, research-tested HIV prevention interventions (i.e. antiretroviral treatment, male circumcision, HIV counselling and testing, and prevention of mother to child transmission services) at a sufficient scale, quality, and intensity to impact the epidemic. A large-scale community randomized trial involving more than 100,000 individuals residing in 30 communities over four years will formally evaluate the impact and cost-effectiveness of the intervention. A similar trial evaluating a combination approach to HIV prevention is also underway in South Africa and Zambia.[8] However, if combination prevention strategies prove successful and are rolled out in non-trial settings, more cost-efficient methods will be needed for long-term monitoring and evaluation.

Despite the widespread availability and potential usefulness of antenatal HIV surveillance, analyses of such data present important challenges. Within an individual clinic, the HIV status of its attendees may be correlated (i.e. within-clinic clustering) due to similarities in HIV risk among women close in age. Between-clinic clustering may also arise as women often seek antenatal care at clinics located close to their home and individuals living in nearby communities may share important characteristics or behaviours related to susceptibility. For example, women in neighbouring communities may be more likely to have contacts in the same sexual network. Failure to account for within- and between-clinic clustering can result in biased estimates of population means and relative risks. In these settings, even if the point estimates are correctly estimated, the corresponding standard errors may be too small, confidence intervals too narrow, and tests of significance overstated.[9, 10]

Generalized estimating equations (GEE) can be used to appropriately account for clustering within and between sites. To parametrize and fit a model accounting for the joint correlation across a large number of sites may be undesirably complex. A GEE-based approach for spatially-correlated, binary data based only on a pairwise composite likelihood has been described by Haegerty and Lele.[11] Composite likelihoods are often useful in settings such as ours where the sources of correlation are complex.[12] Below we extend this model to accommodate repeated correlated binary data and penalized spline estimators. Analyses of antenatal HIV surveillance data collected in 2011 in Botswana illustrate the use of the approach to estimate the effects of proximity to the hotspot of the country’s HIV epidemic and age on HIV prevalence, while accounting for correlations described above.

2. Composite Likelihood Approach

We begin by considering antenatal HIV surveillance sites (or groups of sites) s = 1, … , S and the corresponding (S2) pairs of sites. Let the data for analysis be {Ysji}i=1;…;nsj;j=1,…,J;s=1,…,S; Ysji is the HIV serostatus for woman i at site s in age group Xj, where Ysji = 1 for HIV seropositive women and Ysji = 0 otherwise. Let women within sites and age groups be identically distributed, with HIV seroprevalence πsj = E[Ysji] defined by the model

g(πsj)=f(Xj)+βsQs (1)

where g is the logit link function, f(·) is an unrestricted function of age, Qs is a measure of the distance between site s and a specified location, and βs is a parameter relating to the impact on seroprevalence of being located at site s. Throughout we model elements of f using natural cubic splines, with f1, …, fJ corresponding to the J age groups. Furthermore, we model the relationship between HIV seroprevalence and distance to a surveillance site from a specified location categorically, with categories determined according to quantiles. In order to account for correlation between similarly located sites and similar age groups, we specify estimating equations for each pair of sites. The parameters of interest can be written θ = (f, β) where f = {f(X1), …, f(Xj)} are represents the function f (·) evaluated at each age group and β = (β1, …, βS) represents the vector of geographical parameters. To estimate θ, consider the estimating function for sites s and t

Ust(θ)=DstVst-1(η^st-ηst) (2)

where ηst = (ηs1, … ηsJ, ηt1, …, ηtJ) is a vector containing the log-odds of seroprevalence for each age group in sites s and t, ηsJ = g(πsj) = f(Xj) + βsQs. And η̂st is an empirical version of η st, η̂st = η̂s1, … η̂sJ, η̂t1, …, η̂tJ), η^sj=g(nsj-1i=1nsjYsji). Dst = (ηst/θ) is a matrix of partial derivatives, and Vst is a working covariance matrix for η̂st. The working covariance matrix we propose can be written Vst=Ast1/2Rst(ρ)Ast1/2, with Ast = diag(vst), vst = [{ns1πs1(1 − πs1)}−1, …, {nsJπsJ(1 − πsJ)}−1, {nt1πt1(1 − πt1)}−1, …, {ntJπtJ(1 − πtJ)}−1] a vector of variances of η̂st. Rst(ρ) encodes a working correlation structure between sites and age groups,

Rst(ρ)=(B(ρa)ρddstB(ρa)ρddstB(ρa)B(ρa))2J×2J, (3)

where ρ = (ρa, ρd), ρa represents the correlation between adjacent age groups so that the (i; j)th element of the J × J matrix B(ρa) is ρai-j. Similarly ρd represents the isotropic correlation between sites, and dst is the distance between sites s and t. Then one can finally specify a composite estimating equation by combining the contributions from each pair of sites and adding a penalty to encourage smoothness in f(·)

U(θ)=s=1S-1t>sSUst(θ)-n-1λGθ (4)

where λ is a user-defined tuning parameter that governs the smoothness in f(·), G is a matrix with all entries 0 except the upper left J × J submatrix Gf chosen to satisfy {f″ (t)2 dt = fTGff, amd n = mins,j nsj.

One can apply a modified Fisher scoring algorithm to solve for θ̂, yielding the following iterative equation

θ^p+1=θ^p+[s=1S-1t>sSDsttVst-1Dst+n-1λG]-1×[s=1S-1t>sSDsttVst-1(η^st-η(θ^p))-λGθ^p] (5)

where θ̂p, , the pth iteration of the estimate for θ, is used to evaluate the terms st, Dst, and η̂stη(θ̂p). In our case, the covariance structure can be derived from the delta method and estimated as a function π̂sj for each site and age group. Therefore the only parameters left to be estimated are those given by the within- and between-cluster correlation. Estimation of these parameters can be based on a method-of-moments approach.

We allow a woman’s HIV serostatus to be potentially correlated with others of similar age and location. As long as the group of women with whom each woman’s serostatus is correlated grows sufficiently slowly as a function of sample size (and similar conditions hold for correlated pairs and triplets of women), then our proposed estimator θ̂ is asymptotically normal with covariance matrix

I-1E[U(θ0)U(θ0)](I-1), (6)

where θ0 is the vector of true parameters, U(θ)=s,tDstVst-1Pst-1(π^st-πst), Pst = diag{πs1 (1 − πs1) …, πsJ (1 − πsJ), πt1 (1 − πt1), …, πtJ (1 − πtJ)}, I=θU(θ)|θ=θ0, πst = (πs1 …, πsJ, πt1, … πtJ), and π^st=(ns1-1i=1ns1Ys1i,,nsJ-1i=1nsJYsji,nt1-1i=1nt1Yt1i,,ntJ-1i=1ntJYtJi). The proof is given in the supplementary information.

The above condition would be met if sub-epidemics propagating on subnetworks were independent and if the overlap between these subnetworks decreased with increasing distance and age differences. Novitsky and colleagues (2013) enumerated 83 distinct HIV-1 clade C chains of HIV transmission from 785 sequences obtained from blood samples taken from households in one sector of the village of Mochudi, Botswana. In an analysis of clustering using 11,934 clade C sequences in the HIV Sequence database at Los Alamos National Laboratory, 90.1% (95% CI: 85.1%, 93.6%) of clustered Mochudi sequences were unique to the Mochudi clusters. None of the sequences from Mochudi clustered with any of the 1,244 non-Botswana HIV-1 clade C sequences. These findings provide strong evidence in support of our assumption that women who live near each other are likely to be in the same HIV transmission network. In addition, it also implies that the majority of HIV-infected women in Botswana are unlikely to have related HIV genomes, strengthening the assumption of independent infections across subnetworks.

3. Botswana HIV antenatal surveillance

3.1. Setting

Botswana is a landlocked country of about two million people located in sub-Saharan Africa. The Kalahari Desert, located in the western half of the country, covers more than 70% of the Botswana land surface. As a result, the majority of the population resides in the east and south-east regions with the central and south-western regions sparsely populated. An estimated 25% of adults in Botswana are HIV-infected, the second highest adult HIV prevalence in the world. Each year approximately 14,000 persons are newly infected with HIV, the third highest national HIV incidence rate worldwide.[13]

3.2. Study population

Data are available from 6,745 women seen at 303 surveillance sites across 24 health districts between August 1 and October 28, 2011 in Botswana. Within each health district, the primary or general hospital served as the primary surveillance site. Additional health facilities, including clinics and health posts, were added as satellite sites to improve representativeness and ensure the required sample size could be achieved within the surveillance period. All pregnant women aged 15 to 49 years presenting for antenatal services at a designated surveillance site for the first time during the surveillance period were eligible for inclusion in the surveillance survey. Dried blood spots were obtained from leftover blood collected for haemoglobin or blood grouping and sent to the Botswana National Health Laboratory for HIV testing. Further details on the handling of surveillance specimens and quality control measures implemented in the district and national laboratories(s) can be found on the Botswana Ministry of Health website.[6]

We note that HIV prevalence among pregnant women presenting for antenatal care at clinics within a surveillance system may not fully represent HIV prevalence among all pregnant women in Botswana. A number of factors, including uptake of antenatal services among pregnant women, attendance at antenatal clinics within the surveillance system, geographical distribution of surveillance sites and age distribution of antenatal attendees. More specifically, the antenatal HIV surveillance data does not provide data on pregnant women who either do not access antenatal care or receive care outside the public sector. These women may systematically differ with respect to both age and HIV infection; for example, older women may be less likely to seek antenatal care compared to younger women and in Botswana HIV prevalence in women peaks at 49% between ages 30 and 34 years. [7] However, these factors are unlikely to impact the interpretation of our findings. Approximately 94% of pregnant women in Botswana receive for antenatal care from a trained health professional in the public sector: 21% have 1–4 antenatal visits and 73% have >4 visits during their pregnancy.[14]

3.3. Statistical analysis

To estimate the effects of proximity to the “hotspot” of the country’s HIV epidemic and age on HIV prevalence we employed the pairwise composite likelihood approach described above. More specifically we modelled logit (πsj) or the logit of the proportion of HIV positive women in the jth octile of age at site s as followed:

logit(πsj)=f(Xj)+βsQs+γRs (7)

where Qs is the categorical distance in kilometres to the “hotspot” and R = 1 for facilities located within the Ramotswa health district and R = 0 otherwise. We included separate indicator variable for location wihtin the Ramotswa health district due to operational challenges encountered in the field. Specifically, the supply of HIV test kits was interrupted mid-way through the surveillance period. The model specification allows for a baseline effect of age defined by octiles (1: 15–19 years; 2: 20–21 years; 3: 22–24 years; 4: 25–26 years; 5: 27–28 years; 6: 29–31 years; 7: 32–35 years; 8: 36–49 years) that varies categorically with proximity to the “hotspot” and location within Ramotswa health district.

We defined the hotspot location to be Bobonong, the administrative center of the Bobirwa district, because this district itself has recorded the highest HIV prevalence and incidence rates in nationally representative samples since 2009. [7, 15] Selebi-Phikwe, also located within the Bobirwa district, but sampled separately within the HIV antenatal surveillance system, is located approximately 60 km from Bobonong. Selebi-Phikwe is a nickel mining town established to house employees of the first nickel mining company and given the close proximity of these two communities, these there is likely substantial overlap across residents and their sexual networks. For all districts, we use the administrative center as the point of reference for distance from the hotspot and therefore we chose Bobonong as the hotspot rather than Selebi-Phikwe. To form pairs, we grouped all individual surveillance sites within each health district yielding 24 clusters or (242) or 276 pairs. This approach was necessary to reduce dimensionality; an analysis considering individual surveillance sites that contributed information during the surveillance period would have yielded (3032) or 45,753 pairs, far exceeding the total number of observations in the surveillance survey.

Following the approach taken by Heagerty and Lele, we assumed an isotropic correlation structure that accounts for the pairwise distances and differences in age groups between health districts. Within a district, the correlation structure reduces to autoregressive according to age. To compute the pairwise distances in kilometres between the 24 clusters and the distance from each district to the “hotspot” we obtained the global positioning system (GPS) coordinates of the administrative centre of each health district before applying Vincenty’s ellipsoidal formula for geodesic distance as implemented by the “oce” package in R 3.0.2.[16, 17] In three cases, the administrative centre of the health district was included as its own district for the purposes of the HIV surveillance survey. In these scenarios, we used the GPS coordinates of the next most populous town or village within the district. Standard errors and 95% confidence intervals (CIs) for the spatial model were calculated using a nonparametric bootstrap (n=5,000 bootstrap samples) conditional on health district. The logistic regression analysis was conducted using SAS software version 9.3 (SAS Institute, Cary, NC). The supplementary appendix provides the programming code used to implement the data analysis and simulation in R 3.0.2. The data can be accessed through Botswanas National AIDS Coordinating Agency at the following web address: www.hiv.gov.bw/uploads.

3.4. Results

The median number of facilities per district participating in the surveillance survey was 13 ranging from three in Jwaneng to 22 in Serowe/Palapye (Table 1). The number of women tested in each district also varied substantially (Table 1); five districts tested less than 150 women compared to 10 districts testing 300 or more women. Overall, the median number tested (interquartile range) per district was 252 (166 to 373). In total, 2,044 women (30.3%) tested positive for HIV infection. In the Bobirwa health district, where the highest prevalence of HIV was observed, approximately two out of every five women tested positive for HIV infection (41.1%) during the surveillance period (Table 1). Figure 1 presents a map of the individual health facilities and administrative centres for each of the 24 health districts as well distance (in sextiles) from the hotspot - Bobonong, the administrative centre of the Bobirwa health district – to district administrative centres.

Table 1.

Summary of HIV surveillance activities conducted among 6,745 women newly presenting for antenatal services between August 1 and October 28, 2011 at 303 facilities in 24 health districts in Botswana according to distance from the hotspot

District Administrative center of district Distance from hotspot, in kilometers Number facilities Number tested Percent HIV positive
Bobirwa Bobonong 0 15 363 41.1
Boteti Letlhakane 301 9 197 30.5
Chobe Kasane 576 8 143 30.8
Francistown Francistown 132 13 547 32.7
Gaborone Gaborone 391 12 572 28.7
Ghanzi Ghanzi 701 10 135 18.5
GoodHope GoodHope 492 15 194 26.8
Jwaneng Jwaneng 476 3 136 19.9
Kgalagadi North Hukuntsi 717 14 118 22.9
Kgalagadi South Tsabong 758 11 191 23.6
Kgatleng Mochudi 352 13 330 33.9
Kweneng East Molepolole 400 13 262 26.3
Kweneng West Letlhakeng 419 14 157 30.6
Lobatse Lobatse 455 7 169 25.4
Mabutsane Mabutsane 563 10 95 25.3
Mahalapye Mahalapye 206 18 463 32.6
Ngamiland Maun 566 16 455 27.9
Northeast Masunga 181 19 198 37.9
Okavango Shakawe 797 14 327 29.1
Selebi-Phikwe Selebi-Phikwe 60 9 288 39.6
Serowe/Palapye Serowe 182 22 458 35.2
South East Ramotswa 412 10 242 15.7
Southern Kanye 456 13 302 21.9
Tutume Tutume 218 15 403 37.0

Figure 1.

Figure 1

Map of 303 facilities in 24 health districts in Botswana that participated in a HIV antenatal surveillance survey conducted between August 1 and October 28, 2011 by distance from the hotspot to the district adminstrative center.

Figure 2 and Table 2 present the age-specific seroprevalence of HIV according to distance from the hotspot. Overall, HIV seroprevalence increased with each octile of age. Among women in the youngest octile of age (15–19 years), mean seroprevalence of HIV was 11.9% (95% confidence interval [CI]: 7.0%, 16.7%) compared to 36.6% (95% CI: 25.3%, 47.9%) among women aged 27 to 28 years and 49.5% (95% CI: 36.9%, 62.2%) among women in the oldest octile of age (36–49 years).

Figure 2.

Figure 2

Crude and estimated age-specific HIV seroprevalence among 6,745 women newly presenting for antenatal services between August 1 and October 28, 2011 at 303 facilities in 24 health districts in Botswana by distance from the hotspot to the district administrative center.

Table 2.

Odds ratios (OR), 99% confidence intervals (CI), and confidence limit differences (CLD) for the effect of distance from the hotspota on HIV seroprevalence among 6,745 pregnant women newly presenting for antenatal services between August 1 and October 28, 2011 at 303 facilities in 24 health districts in Botswana.

Conventional analysis Spatial analysis

OR (99% CI) P CLD OR (99% CI) P CLD
 Sextile of distance from hotspot
1 (<182 km) 1 (reference) 1 (reference)
2 (182 to 351 km) 0.85 (0.69, 1.05) 0.05 0.42 0.90 (0.71, 1.15) 0.28 0.48
3 (352 to 418 km) 0.63 (0.50, 0.80) <0.0001 0.46 0.68 (0.52, 0.89) 0.0003 0.54
4 (419 to 491 km) 0.50 (0.38, 0.66) <0.0001 0.55 0.59 (0.42, 0.81) <0.0001 0.65
5 (492 to 700 km) 0.65 (0.50, 0.83) <0.0001 0.51 0.84 (0.62, 1.13) 0.13 0.60
6 (>700 km) 0.54 (0.41, 0.71) <0.0001 0.54 0.66 (0.47, 0.93) 0.002 0.67

 Ramotswa district 0.42 (0.26, 0.69) <0.0001 1.00 0.45 (0.25, 0.81) 0.001 0.1.18

We also found evidence for a non-linear relationship between distance and HIV seroprevalence among antenatal attendees (Table 2). After accounting for potential clustering of responses, HIV seroprevalence in districts located between either 182 and 351 km (second sextile) or 492 and 700 km (fifth sextile) from the hotspot was statistically indistinguishable from that recorded closest to the hotspot. In contrast, compared to the area closest to the hotspot, HIV seroprevelance was significantly lower in districts located between either 352 and 418 km (third sextile), 419 and 491 km (fourth sextile), or more than 700 km from the hotspot (sixth sextile). This non-linear effect persisted even after adjustment for location with the Ramotswa health district where HIV seroprevalence is uncharacteristically low compared to neighbouring district.

The logistic regression analysis that ignored potential correlation of responses within and between districts provided consistently higher estimates of the effect of proximity to the hotspot as compared to the spatial model (Table 2). The magnitude of the difference observed with the logistic regression model also tended to increase with increasing distance from the hotspot. In addition the logistic regression approach provided lower estimates of the variance than did those from the spatial model, which accounted for the correlated nature of the surveillance data. As with the point estimates, the difference between the two approaches in the confidence limits also appeared to increase with increasing distance from the hotspot.

4. Discussion

In the current paper, we present and implement a generalized regression approach for spline estimation of spatially-correlated binary data. Our approach adapts the work of Heagerty and Lele[11] regarding the use of composite likelihoods for binary spatial data to settings in which the means may be estimated using a penalized spline and correlation exists at multiple levels. Using HIV surveillance data collected among pregnant women in Botswana, we found significant heterogeneity in age-specific HIV seroprevalence across geographic regions.

Overall, HIV seroprevalence declines significantly with increasing distance from the hotspot of the country’s epidemic. HIV seroprevalence also appears to rise with age until 35 years and then plateau or slightly fall within the oldest age groups. This pattern appears to be consistent across the country but with less variability than when the spatial model is used. The absolute level of seroprevalence is highest at the hotspots and surrounding area, but in districts located between 492 and 700 km (i.e. fifth sextile) from the hotspot was comparable to that observed in districts immediately surrounding the hotspot. The other sextiles were considerably lower but the relationship between distance and prevalence was not linear. As expected, the effect of the spatial model was to smooth, to some degree, the distance and age effects. It also led to a widening of the confidence intervals for the prevalence estimates because it appropriately takes into account correlation. In contrast, the weighted average approach to estimating age-specific HIV seroprevalence by distance showed declines in HIV seroprevalence within the middle quantiles of age in four of the distance groups; the age groups in which the declines were observed were not consistent and these trends were not observed when the spatial model was used.

HIV serosurveillance has been successfully used to map and monitor the spread of HIV at the national, regional, and community level. Moreover, repeated measures of age-specific HIV prevalence within the same community may be useful in evaluating the impact of combinations of HIV prevention strategies on HIV incidence at the population level. Indeed, a number of countries including Botswana are conducting large-scale cluster-randomized trials to evaluate the impact of different combinations of evidence-based behavioral, biological and structural interventions on HIV incidence.[18] If such combination prevention strategies are found to be effective, implementation in non-trial settings will require continued monitoring and evaluation of intervention impact.[19] Thus, it will be essential to make use of existing surveillance methods and infrastructure. Antenatal clinics provide easy access to large samples of healthy, sexually active women in the general population; because blood is already being obtained for other purposes (e.g. syphilis testing) HIV surveillance among antenatal attendees can be both efficient and low cost. The methodological work presented here extends spline estimation methods to accommodate correlation arising from geographic proximity of clinics and closeness in age among women within clinics.

Our study had a number of limitations. First, we formed clusters of surveillance sites based on location within the administrative boundary of each health district. While this classification may be relevant for resource allocation and logistics, it may not coincide with actual sexual networks of pregnant women in Botswana. For example, the Kgalagadi South health district in the southwest corner of the country included surveillance sites between 623 and 952 km from the hotspot. However, because the administrative center of the district, Tsabong, is located 758 km from the hotspot, we allocated the entire district to the sixth sextile (>700 km), potentially resulting in misclassification of the effect of proximity to the hotspot on HIV seroprevalence.

To explore the sensitivity of our findings to cluster formation, we conducted a sensitivity analyses in which we selected clusters of surveillance sites empirically using a weighted k-means clustering algorithm.[23] Both showed that lowest prevalence was in the clusters that were between 400 and 500 km and in the Ramotswa district. However, we do not consider the k-means analysis to be appropriate because it resulted in multiple clusters of surveillance sites within an administrative health district and a single cluster across multiple districts. In Botswana, human and financial resources for all healthcare services, including HIV care, treatment and prevention activities, are allocated and dispersed at the district level rather than individual communities. As a result, any findings from the weighted k-means clustering analysis would likely be of limited benefit to public health practitioners and policymakers in Botswana. In other settings, however, with different population and settlement distributions such analyses may be more interpretable and useful.

Our focus in this paper is to characterize spatial variability in age-specific HIV seroprevalence among pregnant women. The value of understanding spatial variation in HIV prevelance within this group does not depend on the degree to which they represent all women. Nevertheless, we note that according to the World Health Organization, HIV surveillance in this population is a core surveillance activity for countries with both concentrated and generalized HIV epidemics [1] In addition there are reasons to infer that pregnant women do provide some information about sexually active women. In Botswana, the former are not as strongly selected among all sexually active women as they might be in settings where family planning is both widely available and practiced. In an ongoing prospective cohort study of 475 HIV-uninfected and 474 HIV-infected pregnant and recently postpartum adult women in Botswana, 80% of participants were not in a steady relationship at the time of conception and 44% reported that they were not planning to become pregnant (Shahin Lockman, personal communication). Among these women, 36% were not using contraception at the time of conception. Even among the 64% who reported contraceptive use at the time of conception, 81% were relying on condoms. According to the Centers for Disease Control and Prevention, male condoms are one of the least effective forms of contraception with an average annual failure rate of 18%. [20] Very few women (<5%) reported using a method considered to be most effective (i.e. intrauterine device, hormonal implants and male/female sterilization) which have failure rates of <1%.[20]

We also do not seek to identify the causes of spatial variability in age-specific HIV seroprevalence in our antenatal population. Distance from the hotspot cannot be interpreted as a direct cause of variability in HIV prevalence. Instead, distance likely serves as a proxy for many risk factors that vary geographically. These may include sexual contact network characteristics, cultural effects on HIV-related risk behavior, location of employment, migration patterns, availability of treatment for the HIV-infected persons and prevalence of male circumcision among many others. However, we consider it important to assess the degree to which variability in observed HIV seroprevalence across regions reflects a true underlying pattern as opposed to noise attributable to sampling, and to estimate the magnitude of these differences. This objective can serve as a starting point, from which analyses intended to make causal inference about these differences can benefit.

Supplementary Material

Supp MaterialS1

Acknowledgments

Contract/grant sponsor: This project was supported by the National Institute for Allergy and Infectious Disease (R37 AI 51164).

The authors would like to gratefully acknowledge Sikhulile Moyo at the Botswana Harvard School of Public Health AIDS Institute Partnership for his assistance in obtaining the antenatal HIV surveillance data and providing many invaluable insights about the structure and quality of the data. The authors would also like to acknowledge Courage Matiza at Amherst College for his assistance in obtaining GPS coordinates for each of the antenatal clinics in the surveillance system.

References

  • 1.World Health Organization JUNPoH. Guidelines for second generation hiv surveillance. 2009. [PubMed] [Google Scholar]
  • 2.The Partnership for Maternal N, Health C. Chapter 2: Antenatel Care. 2009. Opportunities for Africas newborns: Practical data, policy and programmatic support for newborn care in Africa, chap; pp. 51–62. [Google Scholar]
  • 3.Montana MVHRLS. Comparison of hiv prevalence estimates from antenatal care surveillance and population-based surveys in sub-saharan africa. Sexually Transmitted Infections. 2008;84(Suppl 1):i78–i84. doi: 10.1136/sti.2008.030106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Saphonn V, Hor LB, Ly SP, Chhuon S, Saidel T, Detels R. How well do antenatal clinic (anc) attendees represent the general population? a comparison of hiv prevalence from anc sentinel surveillance sites with a population-based survey of women aged 15–49 in cambodia. International Journal of Epidemiology. 2002;31(2):449–455. [PubMed] [Google Scholar]
  • 5.Gouws E. Trends in hiv prevalence and sexual behaviour among young people aged 15–24 years in countries most affected by hiv. Sexually transmitted infections. 2014;86(Suppl 2):ii72–ii83. doi: 10.1136/sti.2010.044933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.of Health BM. 2011 botswana second generation hiv/aids antenatal sentinel surveillance technical report. 2012. [Google Scholar]
  • 7.Central Statistic Office NACA. Botswana aids impact survey iii. draft statistical report. Gaborone, Botswana: CSO; 2009. [Google Scholar]
  • 8.Vermund SH, Fidler SJ, Ayles H, Beyers N, Hayes RJ, Team HS, et al. Can combination prevention strategies reduce hiv transmission in generalized epidemic settings in africa? the hptn 071 (popart) study plan in south africa and zambia. Journal of acquired immune deficiency syndromes (1999) 2013;63(02):S221. doi: 10.1097/QAI.0b013e318299c3f4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kish L, Frankel MR. Inference from complex samples. Journal of the Royal Statistical Society. Series B (Methodological) 36(1):1–37. [Google Scholar]
  • 10.Hansen MH, Madow WG, Tepping BJ. An evaluation of model-dependent and probability-sampling inferences in sample surveys. Journal of the American Statistical Association. 1983;78(384):776–793. [Google Scholar]
  • 11.Heagerty PJ, Lele SR. A composite likelihood approach to binary spatial data. Journal of the American Statistical Association. 1998;93(443):1099–1111. [Google Scholar]
  • 12.Lindsay BG. Composite likelihood methods. Contemporary Mathematics. 1988;80(1):221–39. [Google Scholar]
  • 13.on HIV/AIDS (UNAIDS) JUNP. Global report: Unaids report on the global aids epidemic 2010. Geneva: UNAIDS; 2010. [Google Scholar]
  • 14.Central Statistic Office UNCF. 2007 botswana family health survey iv report. Gaborone, Botswana: Central Statistics Office; 2009. [Google Scholar]
  • 15.Statistics Botswana NACA. Preliminary results botswana aids impact survey iv (bais iv), 2013. Gaborone, Botswana: Statistics Botswana; 2013. [Google Scholar]
  • 16.D.K. oce. Analysis of oceanographic data. r package version 0.9–12. Available at: http://CRAN.R-project.org/package=oce.
  • 17.Team RC. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2013. [Google Scholar]
  • 18.Kurth AE, Celum C, Baeten JM, Vermund SH, Wasserheit JN. Combination hiv prevention: significance, challenges, and opportunities. Current HIV/AIDS Reports. 2011;8(1):62–72. doi: 10.1007/s11904-010-0063-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.DeGruttola V, Smith DM, Little SJ, Miller V. Developing and evaluating comprehensive hiv infection control strategies: issues and challenges. Clinical Infectious Diseases. 2010;50(Supplement 3):S102–S107. doi: 10.1086/651480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Trussel J. Contraceptive failure in the united states. Contraception. 2011;83:397–404. doi: 10.1016/j.contraception.2011.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp MaterialS1

RESOURCES