Skip to main content
Journal of Urban Health : Bulletin of the New York Academy of Medicine logoLink to Journal of Urban Health : Bulletin of the New York Academy of Medicine
. 2015 Sep 21;92(6):1052–1064. doi: 10.1007/s11524-015-9981-0

If You Are Not Counted, You Don’t Count: Estimating the Number of African-American Men Who Have Sex with Men in San Francisco Using a Novel Bayesian Approach

Paul Wesson 1,, Mark S Handcock 2, Willi McFarland 3, H Fisher Raymond 3
PMCID: PMC4675739  PMID: 26392276

Abstract

African-American men who have sex with men (AA MSM) have been disproportionately infected with and affected by HIV and other STIs in San Francisco and the USA. The true scope and scale of the HIV epidemic in this population has not been quantified, in part because the size of this population remains unknown. We used the successive sampling population size estimation (SS-PSE) method, a new Bayesian approach to population size estimation that incorporates network size data routinely collected in respondent-driven sampling (RDS) studies, to estimate the number of AA MSM in San Francisco. This method was applied to data from a 2009 RDS study of AA MSM. An estimate from a separate study of local AA MSM was used to model the prior distribution of the population size. Two-hundred and fifty-six AA MSM were included in the RDS survey. The estimated population size was 4917 (95 % CI 1267–28,771), using a flat prior estimated 1882 (95 % CI 919–2463) as a lower acceptable bound, and a large prior estimated 6762 (95 % CI 1994–13,863) as an acceptable upper bound. Point estimates from the SS-PSE were consistent with estimates from multiplier methods using external data. The SS-PSE method is easily integrated into RDS studies and therefore provides a simple and appealing tool to rapidly produce estimates of the size of key populations otherwise difficult to reach and enumerate.

Keywords: Population size estimation, African-American, Men who have sex with men, HIV/AIDS, Respondent-driven sampling

Introduction

Despite advances in treatment regimens and prevention strategies, HIV/AIDS remains a leading cause of morbidity and mortality worldwide.1 Globally, key populations, such as men who have sex with men (MSM), female sex workers (FSW), and injection drug users (IDU), remain at increased risk for HIV infection. Due to biological, behavioral, and structural vulnerabilities, the prevalence of HIV infection is typically higher in these groups than that in the general population. Targeting key populations for public health outreach is one of the six strategies on the global agenda to achieve maximum effectiveness in the public health response to HIV.2,3 Investing in programs that focus on key populations is a component of the strategic global response to the HIV epidemic and requires reliable estimates of the sizes of these populations so that resources may be allocated efficiently and public health actions prioritized. Furthermore, enumeration of key populations allows epidemiologists to quantify the burden of disease and model the impact of targeted interventions. Finally, enumeration of these key populations contributes to the evaluation of programs with respect to reach, coverage, and intensity.

Even among key populations, disparities in disease burden exist. Studies have reported on the disparities in HIV and other STIs among African-American (AA) MSM.4,5 AA MSM are a key population for HIV infection in the San Francisco Bay Area. In 2010, AA men were reported to have the highest incidence of HIV infection for any racial group,6 although recent data suggest a possible convergence between groups.7 Among AA men living with AIDS in 2010, the majority (52 %) were MSM. A previous study by Scott et al. also reported an increased burden of HIV and other STIs among AA MSM in the San Francisco Bay Area. 8

Despite the observed relative disparities in disease burden among AA MSM, the true scope and scale of the epidemic in this population have not been quantified because the size of this population remains unknown. Although several methods are available, quantifying the size of many key populations remains a challenge in public health. Current population size estimation (PSE) methods widely used in public health require data that is very difficult to obtain or require assumptions that are difficult to meet or verify. For example, capture-recapture, which requires multiple data sources that each list members of a target population, traditionally assumes that these data sources are independent of each other and that each member of the target population has an equal probability of appearing on each list included in the analysis,9,10 assumptions that seem unlikely to be true. These assumptions may be relaxed, using log-linear models to specify the relationships between the data sources and allow the probabilities of appearance to vary. However, these modeling assumptions are subject to misspecification, resulting in biased estimates.11 Similar to capture-recapture, the service multiplier method requires two sources of data; one source is a direct count of the target population participating in a service, while the other source is a representative sample of the target population. The multiplier method assumes that the two data sources are independent and that one of the data sources is a representative sample of the population, an assumption that is difficult to verify for hidden populations.12 Other PSE methods, such as network scale-up, require large population-based surveys and the addition of many questions that may not always be feasible.13 PSE methods usually require planning in advance of study implementation in order to be carried out successfully.

A new PSE method has great appeal as it can be implemented using data routinely collected within respondent-driven sampling (RDS) surveys. We applied the new PSE method, referred to as the sequential sampling population size estimation method (SS-PSE), to previously collected data from an RDS survey of AA MSM in San Francisco14 to estimate the size of the city’s AA MSM population. This method, which uses the network size question asked in RDS studies, has been tested in simulation studies 15 but has less often been described empirically in key populations. Our aims, therefore, were to apply the SS-PSE method using data from the RDS study to estimate the number of AA MSM living in San Francisco and compare results to other estimates.

Methods

The Black men testing (BMT) data originate from a cross-sectional integrated bio-behavioral surveillance (IBBS) survey of AA MSM in San Francisco, California. The original study was implemented in 2009 by the San Francisco Department of Public Health’s (SFDPH) HIV Prevention section for the purpose of using social networks as a channel to reach AA MSM for HIV testing.

Study participants were recruited through RDS, a peer-recruitment method commonly used worldwide to sample hard-to-reach populations.16,17 Sampling begins with members of the target population, referred to as “seeds”, purposefully selected by the research team. Each seed is given a pre-determined number of coupons to give to other target population members who are in their social network. The coupons themselves have no external monetary value; they are simply tokens that allow an individual to enroll in the study. The coupon allows potential participants to enter the study and tracks the waves and patterns of recruitment through a unique code that links the recruiter and recruit. Each study participant thereafter is given coupons to distribute within their social network, and this process of recruitment iterates until both sample size and sample stability (where the composition of the sample changes little with subsequent recruitment) are reached. In theory, with enough waves of recruitment, the final RDS study sample will be independent of the characteristics of the initial RDS seeds, and enough information is collected to adjust statistically for differential probability of being selected. A statistical assessment of RDS is given by Gile and Handcock. 18

RDS seeds for the BMT study were selected to represent the diversity of AA MSM in San Francisco, according to age, neighborhood of residence, and education level. Each respondent was given three coupons to use to recruit other AA MSM at least 18 years of age and in his social network. The final sample size included 256 AA MSM. Details of the BMT study and main findings have been described elsewhere. 14

The SS-PSE method models the total number of persons in the target population using RDS data. The SS-PSE method is an adaptation of a similar model described and implemented by Nair and Wang 19 and by West 20 to estimate the size of untapped oil pools in an oil reserve based on the observed measures of size for already discovered oil pools in the reserve (volume, surface area, net pay, and depth). The model assumes a size-biased sampling in which larger oil pools are more likely to be discovered before smaller oil pools. A prior estimate of the total number of oil pools is included in the model, along with information on the measured parameters of the already discovered pools, to model the characteristics of the remaining oil pools (e.g., volume) in the reserve, should they exist. These parameters are modeled as a posterior distribution, expressing the probability of characteristics, such as volume of oil remaining in the reserve yet to be discovered.

In the human population application using RDS data, the SS-PSE method uses self-reported individual network size (i.e., the number of other members of the target population an individual respondent knows) as the informative measure of the target population. Just as the physical characteristics of the oil pools determine the probability that an individual oil pool will be observed, the SS-PSE assumes that the size of individual’s social network with respect to the target population influences the probability that an individual will be observed during the RDS discovery process. The SS-PSE method assumes that respondents with larger network sizes, those more socially connected, are more likely to be “discovered” initially by RDS recruitment than respondents with smaller network sizes. Formally, the model assumes that one’s probability of selection is proportional to that of the individual’s network size. Over the period of recruitment, with sequential sampling without replacement, the probability of being sampled over time is proportional to the network size of the remaining members of the population. The model further assumes that the target population is uniform; when respondents report their network size, this number is in reference to the target population as a whole and is not restricted to specific subgroups within the target population. As an extension of this second assumption, the model implicitly assumes that respondents interpret the network size question in the same way.

The SS-PSE method uses a Bayesian approach to estimate the probable size of the target population. A prior estimate of the population size is used to represent previous knowledge about the target population and, if necessary, provide bounds on the population size estimate. The prior estimate, expressed as a measure of central tendency, is combined with the specified shape of the distribution to calculate the prior distribution of the population size. If very little is known about the prior size and distribution, a uniform distribution may be specified. For our informative prior, we used 4450, based on a previous estimate by Scott et al..8

The SS-PSE method uses the prior estimate in combination with the specified distribution and the data (the self-reported network size) to calculate the posterior population size estimate. Markov chain Monte Carlo (MCMC) simulations are used to compute the posterior distribution. MCMC simulations use a directed random-walk algorithm to sample possible values of the parameter of interest. 21 While this process of sampling from the parameter space is random, some values will have a higher probability of being drawn than others, because the Markov chain is sampling from the more likely regions of the parameter space. The differential probability of sampling from the parameter space is determined by the information in the data (in this case, the network size) and the prior estimate for the population size. The entire distribution of the parameter of interest is then constructed from this (directed) random sampling. Consistently estimating the posterior distribution can be improved by increasing the MCMC settings, such as the number of samples taken from the parameter space. Additionally, the burn-in period may also be increased; the burn-in period refers to the number of samples initially taken to begin the Markov chain, but these samples do not contribute to the estimation of the posterior distribution. Any measure of central tendency can then be calculated to summarize the probability distribution of the population size. Full details of the SS-PSE method are described elsewhere. 15

Three network size questions were included in the IBBS survey. We chose the most specific network size question (“Of the [African-American men who have sex with men, who live in San Francisco and are 18 years or older and you have seen in the past 30 days] how many do you think you could give a coupon to (like the one you brought in today) within the next four weeks?”), as this was the most specific to an individual’s probability of selection for the RDS study. Other questions were phrased more generally about the number of other AA MSM the respondent knows.

The SS-PSE also allows for the option of truncation. Truncation imposes bounds on the posterior probability distribution so that no probability is assigned to values outside defined bounds. The tail of the lower end of the probability distribution is always truncated at the sample size of the RDS sample because the estimated size of the target population cannot be less than the number of people sampled and included in the RDS data set. The user can specify upper truncation, although the default setting is no upper truncation for the posterior probability distribution. If the user has prior knowledge whereby it would be impossible for the population size to be above a certain value, the upper tail of the probability distribution may be truncated to avoid extending past a certain value (and therefore no probability is assigned to any value beyond this upper limit). The area under the curve of the region of the tail that would have extended past the upper truncation is then redistributed within the allowed bounds of the posterior probability distribution.

Analysis

Analyses were performed using STATA version 12, 22 R (version 3.1.1),23 and RDS-Analyst (RDS-A) (version 1.7-16).24 RDS-A allows for the selection of different RDS estimators to conduct population inference from the RDS sample. RDS-A includes the RDS estimator available in RDS analysis tool, as well as the Gile’s SS (sequential sampling), which accounts for finite population bias by using the reported individual network size and estimated population size to weight the sample, and does not assume sampling with replacement. 25 Recruitment trees were produced using RDS-A.

Results

The BMT data included 256 eligible AA MSM, recruited by ten seeds. Recruitment took place from February to September 2009. The reported network size ranged from 1 to 99. One network size reported as “0” was recoded as “1” because we assumed that the respondent knew at least one other member of the target population (the person who recruited him or the person he recruited). By the same logic, six respondents with a reported network size of “999” (“not applicable”) were also recoded to have a network size of “1”. For this analysis, four non-seed participants were removed because they were not linked to any other participant in the data set for unknown reasons. The final sample for this analysis included 252 respondents. Figure 1 shows the recruitment tree, with each node scaled to reflect reported network size. A slight decrease in network size over successive waves of recruitment is evident. Table 1 describes the demographic characteristics and key HIV-related variables in the study population with two RDS estimators to make population inference. Differences between the RDS-II adjusted estimates and the Gile’s sequential sampling (SS) adjusted estimates indicate that there is little finite population bias.

FIG. 1.

FIG. 1

Recruitment tree (nodes scaled to network size) for African-American men who have sex with men (MSM) participating in the Black men testing (BMT) survey, San Francisco, 2009.

TABLE 1.

Demographic characteristics, injection drug use (IDU), and HIV status of African-American men who have sex with men (MSM) participating in the Black men testing (BMT) respondent-driven sampling (RDS) survey, San Francisco, 2009 (N = 252)

Characteristic Crude count (%) RDS-II weighted percent (95 % CI) Gile’s SS weighted percent (95 % CI) Difference
Age group (years)
 18–20 2 (0.8) 3.1 (0.9, 5.3) 3.2 (−1.1, 7.4) 0.1
 21–25 12 (4.8) 12.0 (7.6, 16.3) 12.5 (1.7, 23.8) 0.5
 26–30 17 (6.8) 4.8 (−5.0, 14.5) 5.0 (1.5, 8.5) 0.2
 31–35 17 (6.8) 4.6 (1.8, 7.3) 4.8 (1.6, 8.1) 0.2
 36–40 25 (9.9) 11.8 (0.3, 23.2) 12.3 (2.7, 22.0) 0.5
 41–45 58 (23.0) 17.1 (11.8, 22.4) 18.0 (11.2, 24.7) 0.9
 46–50 58 (23.0) 19.9 (11.1, 28.6) 16.1 (8.5, 23.7) −3.8
 51+ 63 (25.0) 26.9 (21.6, 32.1) 28.2 (17.2, 39.2) 1.3
Education
 <High school 37 (14.7) 18.9 (8.6, 29.2) 14.8 (7.8, 21.9) −4.1
 High school 98 (38.9) 47.9 (35.5, 60.4) 50.2 (38.2, 62.2) 2.3
 >High school 117 (46.4) 33.2 (22.6, 43.8) 35.0 (24.4, 45.6) 1.8
Annual income
 0–10k 106 (42.1) 55.1 (42.2, 68.1) 52.6 (41.9, 63.2) −2.5
 11–20k 66 (26.2) 20.7 (11.2, 30.3) 22.0 (14.5, 29.4) 1.3
 21–30k 36 (14.3) 9.1 (−1.5, 19.7) 9.7 (4.2, 15.1) 0.6
 31k+ 44 (17.5) 15.0 (6.8, 23.2) 15.8 (8.5, 23.2) 0.8
Ever injected drugs (yes) 92 (36.5) 32.8 (20.9, 44.8) 29.5 (21.2, 37.9) −3.3
Injected drugs in last 6 months (yes) 40 (15.9) 11.2 (5.2, 17.2) 11.8 (6.0, 17.7) 0.6
Ever tested for HIV (yes) 235 (93.3) 93.3 (86.6, 99.9) 92.9 (88.3, 97.5) −0.4
Diagnosed with HIV prior to survey (of 229 respondents) 68 (29.7) 21.7 (12.7, 30.7) 27.0 (15.8, 38.2) 5.3
Positive HIV test result during survey (of 245 tested) 79 (32.2) 25.9 (14.2, 37.6) 34.0 (22.2, 45.9) 8.1

Population size estimates using the SS-PSE method are shown in Fig. 2. Combining the prior distribution, based on a prior median estimate of 4450, and the network size distribution from the BMT data set, the model calculated a posterior median estimate of 5708 (95 % CI 1381–25,799; model 1). Increasing the burn-in period, interval, and sample size for the Markov Chain Monte Carlo simulation settings reduced this median estimate to 4917 (95 % CI 1267–28,771; model 2), which was more consistently estimated over repeated simulations. The American Community Survey indicates that for 2010, the year following the BMT survey, 20,824, AA men 18 years or older were living in San Francisco.26 Truncating the upper bound of the prior distribution for the population size to a conservative 15,000 (i.e., that MSM are far less than 72 % of adult men) resulted in a median estimate of 4518 (95 % CI 1330–13,051; model 3). Using a flat prior distribution, specifying no prior knowledge of the population size, the SS-PSE estimated the median posterior estimate of the AA MSM population living in San Francisco to be 1875 (95 % CI 910–2461; model 4). Increasing the specified prior median from 4450 to 10,000 and again truncating the prior distribution at 15,000 resulted in a posterior median estimate of 6762 (95 % CI: 1994–13,863; model 5).

FIG. 2.

FIG. 2

Bar plots comparing posterior population size estimates of the number of African-American men who have sex with men (MSM) by different prior inputs using the sequential sampling size (SS-PSE) method, San Francisco, 2009.

For comparison, we examined other AA MSM population size estimates using different methods (Table 2). Previous size estimation exercises performed by the SFDPH estimated 66,487 total MSM living in San Francisco as of December 2010.27 In 2008, the National HIV Behavioral Surveillance (NHBS) survey done by time-location sampling estimated 6.5 % of all MSM in San Francisco to be AA.7 Applying the NHBS proportion to the estimated count from SFDPH yields an estimated 4320 AA MSM. Two multiplier method adaptations were also possible. In 2009, 1170 AA MSM diagnosed with HIV infection were reported to the SFDPH surveillance system by the time of the BMT survey. Meanwhile, 17.3 % of the respondents in the BMT survey were HIV positive and aware of their status. Taking the BMT prevalence as the prevalence of diagnosed HIV cases, those that would be seen in the HIV surveillance system, we applied this proportion as the multiplier to the benchmark estimate from the HIV surveillance system to yield an estimate of 6763 (95 % CI 4415–11,142) AA MSM in San Francisco.

TABLE 2.

“External validation”: other methods to estimate the number of African-American men who have sex with men (MSM) in San Francisco, 2009

Method Source 1 (benchmark) Source 2 (multiplier) Population size estimate
SS-PSE NA NA 4917
Simple proportion Estimated total San Francisco MSM population size25
• 66,687
NHBS (2008) estimated proportion of MSM who are AA7
• 6.5 %
4320
Multiplier AA MSM living with HIV in surveillance system
• 1170
BMT proportion of African-Americans diagnosed HIV12
• 17.3 %
6763
Multiplier Estimated number of AA MSM living with HIV (surveillance data accounting for unrecognized infection from BMT)
• 1581
NHBS (2008) prevalence of HIV among AA MSM7
• 25 %
6325

Alternatively, the San Francisco HIV case reporting system indicated there were 1186 HIV-positive AA MSM at the time of the 2008 NHBS survey of MSM. Using the 25 % of HIV-positive cases among respondents to the BMT survey who previously did not know they were HIV positive, we adjust the number from the surveillance system to 1581 total HIV cases among African-American MSM. NHBS estimates 25 % of AA MSM to be HIV positive. Assuming 1581 to be 25 % of the total number of AA MSM, we project the population size to be 6325.

Discussion

We estimated the size of the AA MSM population in San Francisco to be nearly 5000 (4917; 95 % CI 1267–28,771). Taking into account the size of the total AA adult male population in San Francisco and truncating the prior distribution to a plausible maximum upper value refined this size estimate to 4518 (95 % CI 1330–13,051). This estimate is highly consistent with our prior estimate of 4450, based on Scott et al.’s projection, which used data from the 2004 National HIV Behavioral Surveillance MSM 1 study to estimate the number of AA MSM in San Francisco. 8

We note several factors that affect the precision and consistency of this estimate or are sensitive parameters using the SS-PSE method. The associated 95 % probability intervals for these estimates are quite wide. The median posterior estimate when using a flat prior distribution was nearly 2000 (1875; 95 % CI 910–2461), which appears implausible as it is close to the number of AA MSM known to be living with HIV. Using 10,000 as a prior median (roughly twice the size of our informative prior) resulted in a median posterior estimate of 6762 (95 % CI 1994–13,863). This estimate is nearly identical to the estimated population size using the multiplier method of the HIV case reporting system and HIV prevalence in the BMT survey (6763). Using a uniform-flat prior and a large (relatively uninformative) prior could provide an acceptable lower and upper bound to the estimated population size, respectively. This approach is especially helpful in settings that lack external data sources with counts of the target population.

Results from the BMT RDS survey suggest a 17.3 % prevalence of HIV infection. Using the estimated population size from the SS-PSE method as the denominator, we estimate the prevalence of recognized HIV infection among AA MSM in 2009 to be 24 %. This figure is consistent with the Gile’s SS estimate (27 %; 95 % CI 16–38 %) (Table 1). Extending the Gile’s SS estimator to diagnosed HIV infection within the BMT survey, again, using our estimated population size of 4917, we estimate the prevalence of HIV infection among AA MSM to be 34 % (95 % CI 22–46 %). This places AA MSM as an extremely vulnerable population for HIV infection, following MSM IDU (47.4 % prevalence), transgender IDU (44.4 % prevalence), and transgender women (35.5 % prevalence). 6

As with all size estimation approaches, the SS-PSE method depends on meeting underlying assumptions. These assumptions are challenging to verify. First, the model assumes that the probability of selection at any point is proportional to an individual’s network size. That is, during recruitment, the probability of being sampled at a given point in time is proportional to their network size relative to the still unsampled members of the population at that point. Visualizing this “size bias” phenomenon through a recruitment tree with nodes scaled to reported network size does not show a clear decreasing trend in reported network size with subsequent waves of recruitment (Fig. 1). The attempted crude visualization with these plots may not be sufficient to check the first assumption for the SS-PSE. The subtle signal may be observed only with a more sophisticated model that plots the likelihood of observing each participant at the moment he is observed, given the distribution of the remaining network sizes in the target population. A second assumption for the SS-PSE method is that the target population is uniform, such that the respondent’s reported network size is specific to the target population as a whole and not to a specific subgroup. Unfortunately, there does not seem to be an empirical test for this assumption. A close approximation may be to explore homophily (i.e., similarity in characteristics between recruiter and recruit) in the data set, but this would be limited to participants’ recruiting behavior and not the composition of their social network with respect to the target population. While these assumptions seem reasonable and more likely to be met than the assumptions for other population size estimation methods (e.g., source independence in capture-recapture analysis), it is unclear if the BMT data set meets these assumptions and, if not, what would be the resulting direction of the bias.

Our model implicitly assumes that an individual’s reported network size is an appropriate proxy for his probability of being recruited. In RDS studies, out-degree (the number of target population members a participant knows and can recruit) is used as a proxy for in-degree (the number of target population members who could have recruited the participant) to estimate the participant’s probability of selection into the study. RDS estimators use the network size measurement to weight participant observations in order to make inference to the characteristics of the larger target population from the study population. Investigators have previously noted difficulties in accurately measuring network size and have included different ways of asking the question in the same or separate surveys. Even if measured accurately, reported network size may not accurately reflect a participant’s probability of selection due to other covariates that may influence recruitment behavior.28 For example, an individual may report a network size of five, but only three of the people they had in mind would also consider the individual to be a part of their social network and would recruit him (reciprocity). In this case, the individual has overestimated his true network size and therefore his probability of selection. If individuals believe they have a large network size, they may choose to round rather than indicate the exact size of their network. In the BMT survey, we observed “digit preference” behavior, whereby higher network sizes were reported in factors of five (e.g., 30, 35, 40, 45, 50, etc.). For other network size questions, the range extended past 100. While it is possible for someone to know “769” other AA MSM in San Francisco (the maximum reported network size for one of the network size questions asked in BMT), this is likely a generalization for having a large network size. We observed that reported network sizes greater than 100 may lead to convergence problems for the SS-PSE method.

According to RDS theory, with enough waves of recruitment, the characteristics of the final RDS sample will be independent of the characteristics of the seeds, and the sample will be representative of the target population. Unfortunately, without a gold standard for comparison, this assumption cannot be confirmed (especially when studying hidden populations). Previous research on MSM in Fortaleza, Brazil, has challenged the validity of this claim. Although the RDS sampling process succeeded in reaching otherwise inaccessible members of the MSM population in Brazil, the sample overrepresented lower socioeconomic MSM compared to the other sampling approaches.29 It is possible that in our sample of AA MSM, the RDS sampling process only reached the lower socioeconomic portion of the AA MSM population in San Francisco. In that case, our inference only applies to this segment of the target population, and the size of the target population is therefore larger than that which we estimated.

The SS-PSE R package, sspse,30 was under development before becoming publically available. The results presented in this paper are output from model runs in July 2014. Improvements to the program since then may have an impact on the reproducibility of these exact results, as the improvements have affected the estimation of the posterior distribution. With this variability in mind, we sought to improve the precision of the posterior estimate by increasing the MCMC settings. All results described here used a burn-in period of 10,000, a sample size of 1000, and an interval sampling of 100. Though increasing the MCMC settings improves the precision of posterior estimates, it does so at the cost of noticeably increasing computation time.

The SS-PSE method provides a simple and appealing tool to rapidly produce estimates of the size of high-risk populations—a fundamental public health measure that has been scarce for much of the HIV epidemic. Under the above outlined conditions, the SS-PSE method produced reasonable estimates for the size of the AA MSM population in San Francisco, including lower and upper acceptable bounds. The model has the potential to be a useful addition to the repertoire of population size methods available to epidemiologists and other public health practitioners. The model especially has appeal because of its reasonable assumptions and seamless integration into RDS studies, which are commonly implemented to study hidden populations around the world.31 The method has it limitations: First, the amount of information about population size in the RDS data is modest, so that the posterior distribution can have high variance. Secondly, the SS-PSE model will somewhat misspecify the actual RDS process, mainly due to the SS approximation to the RDS process. This will lead to some error of the posterior (as compared to the posterior based on the unknown RDS process). In addition, current concerns with regard to replication of results and manipulation of parameter inputs to adjust posterior estimates could make investigators vulnerable to confirmation bias. As a result, the appeal of this method should not obviate the planning for and use of multiple PSE methods to triangulate the most plausible size estimate for the target population. Combining multiple methods, as is often done in practice,32 could balance and reduce the impact of bias on any one particular method. As the SS-PSE method produces a posterior distribution, it can be used as prior input to other methods using Bayesian inference.

Acknowledgments

We thank Ali Mirzazadeh for his valuable suggestions regarding the sensitivity analyses.

Footnotes

• Valid and reliable estimates of the size of hidden populations are challenging to obtain.

• Population size estimates are necessary for health policy formation and program planning.

• Respondent-driven sampling (RDS) is widely used to recruit hidden populations in health surveys.

• Sampling probabilities from RDS data can be used to estimate the size of the targeted population.

• This low-cost and accessible method can improve measures of disease burden in hidden populations.

References

  • 1.Murray CJL, Vos T, Lozano R, et al. Disability-adjusted life years (DALYs) for 291 diseases and injuries in 21 regions, 1990-2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet. 2012;380(9859):2197–2223. doi: 10.1016/S0140-6736(12)61689-4. [DOI] [PubMed] [Google Scholar]
  • 2.Padian NS, McCoy SI, Karim SSA, et al. HIV prevention transformed: the new prevention research agenda. Lancet. 2011;378(9787):269–278. doi: 10.1016/S0140-6736(11)60877-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Schwartländer B, Stover J, Hallett T, et al. Towards an improved investment approach for an effective response to HIV/AIDS. Lancet. 2011;377(9782):2031–2041. doi: 10.1016/S0140-6736(11)60702-2. [DOI] [PubMed] [Google Scholar]
  • 4.Heckman TG, Kelly J a, Bogart LM, Kalichman SC, Rompa DJ. HIV risk differences between African-American and white men who have sex with men. J Natl Med Assoc. 1999;91(2):92–100. Available at: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2608406&tool=pmcentrez&rendertype=abstract. Accessed September 2014. [PMC free article] [PubMed]
  • 5.Torian LV, Makki HA, Menzies IB, Murrill CS. Department of Health sexually transmitted disease clinics, a decade of serosurveillance finds that racial disparities and associations between HIV and gonorrhea persist. Sex Transm Dis. 2002;29(2):73–78. doi: 10.1097/00007435-200202000-00002. [DOI] [PubMed] [Google Scholar]
  • 6.San Francisco Department of Public Health. HIV/AIDS Epidemiology Annual Report. San Francisco HIV Epidemiology Section. 2010.
  • 7.Sudhinaraset M, Raymond HF, McFarland W. Convergence of HIV prevalence and inter-racial sexual mixing among men who have sex with men, San Francisco, 2004-2011. AIDS Behav. 2013;17(4):1550–1556. doi: 10.1007/s10461-012-0370-3. [DOI] [PubMed] [Google Scholar]
  • 8.Scott HM, Bernstein KT, Raymond HF, Kohn R, Klausner JD. Racial/ethnic and sexual behavior disparities in rates of sexually transmitted infections, San Francisco, 1999-2008. BMC Public Health. 2010;10:315. doi: 10.1186/1471-2458-10-315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.International Working Group for Disease Monitoring and Forcasting Capture-recapture and multiple-record systems estimation II: applications in human diseases. Am J Epidemiol. 1995;142(10):1059–1068. [PubMed] [Google Scholar]
  • 10.International Working Group for Disease Monitoring and Forcasting. Capture-recapture and multiple-record systems estimation. I: History and theoretical development. Am J …. 1995;142(10):1047–1058. Available at: http://hub.hku.hk/handle/10722/82976. Accessed April 28, 2013. [PubMed]
  • 11.Jones HE, Hickman M, Welton NJ, De Angelis D, Harris RJ, Ades AE. Recapture or precapture? Fallibility of standard capture-recapture methods in the presence of referrals between sources. Am J Epidemiol. 2014;179(11):1383–1393. doi: 10.1093/aje/kwu056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Johnston LG, Prybylski D, Raymond HF, Mirzazadeh A, Manopaiboon C, McFarland W. Incorporating the service multiplier method in respondent-driven sampling surveys to estimate the size of hidden and hard-to-reach populations: case studies from around the world. Sex Transm Dis. 2013;40:304–310. doi: 10.1097/OLQ.0b013e31827fd650. [DOI] [PubMed] [Google Scholar]
  • 13.Salganik MJ, Fazito D, Bertoni N, Abdo AH, Mello MB, Bastos FI. Assessing network scale-up estimates for groups most at risk of HIV/AIDS: evidence from a multiple-method study of heavy drug users in Curitiba Brazil. Am J Epidemiol. 2011;174(10):1190–1196. doi: 10.1093/aje/kwr246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Fuqua V, Chen Y-H, Packer T, et al. Using social networks to reach Black MSM for HIV testing and linkage to care. AIDS Behav. 2012;16(2):256–265. doi: 10.1007/s10461-011-9918-x. [DOI] [PubMed] [Google Scholar]
  • 15.Handcock M, Gile K, Mar C. Estimating hidden population size using respondent-driven sampling data. arXiv Prepr arXiv12096241. 2012. Available at: http://arxiv.org/pdf/1209.6241v1.pdf. Accessed February 3, 2014. [DOI] [PMC free article] [PubMed]
  • 16.Heckathorn D. Respondent-driven sampling: a new approach to the study of hidden populations. Soc Probl. 1997;44:174–199. Available at: http://www.jstor.org/stable/10.2307/3096941. Accessed April 28, 2013.
  • 17.Heckathorn D. Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations. Soc Probl. 2002;49(1):11–34. Available at: http://www.jstor.org/stable/10.1525/sp.2002.49.1.11. Accessed April 28, 2013.
  • 18.Gile KJ, Handcock MS. Respondent-driven sampling: an assessment of current methodology. Sociol Methodol. 2010;40(1):285–327. doi: 10.1111/j.1467-9531.2010.01223.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Nair VN, Wang PC. Maximum likelihood estimation under a successive model discovery sampling. Technometrics. 1989;31(4):423–436. doi: 10.1080/00401706.1989.10488591. [DOI] [Google Scholar]
  • 20.West M. Inference in successive sampling discovery models. J Econ. 1996;75(1):217–238. doi: 10.1016/0304-4076(95)01777-1. [DOI] [Google Scholar]
  • 21.Hamra G, MacLehose R, Richardson D. Markov chain Monte Carlo: an introduction for epidemiologists. Int J Epidemiol. 2013;42(2):627–634. doi: 10.1093/ije/dyt043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.StataCorp. Stata Statistical Software: release 12. College Station, TX: StataCorp LP. 2011.
  • 23.R Core Team. R: a language and environment for statistical computing. 2014. Available at: http://www.r-project.org. Accessed July 2014.
  • 24.Handcock MS, Fellows IE, Gile KJ. RDS Analyst: software for the analysis of respondent-driven sampling data, Version 0.42. 2014. http://hpmrg.org. Accessed July 2014.
  • 25.Gile KJ. Improved inference for respondent-driven sampling data with application to HIV prevalence estimation. J Am Stat Assoc. 2011;106(493):135–146. doi: 10.1198/jasa.2011.ap09475. [DOI] [Google Scholar]
  • 26.Bureau USC. Sex by age universe : total population 2006-2010 American community survey selected population tables. 2014:1–2. Available at: http://factfinder2.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_10_SF4_B01001&prodType=table. Accessed November 2014.
  • 27.Raymond HF, Bereknyei S, Berglas N, Hunter J, Ojeda N, McFarland W. Estimating population size, HIV prevalence and HIV incidence among men who have sex with men: a case example of synthesising multiple empirical data sources and methods in San Francisco. Sex Transm Infect. 2013;89(5):383–387. doi: 10.1136/sextrans-2012-050675. [DOI] [PubMed] [Google Scholar]
  • 28.Rudolph AE, Fuller CM, Latkin C. The importance of measuring and accounting for potential biases in respondent-driven samples. AIDS Behav. 2013;17(6):2244–2252. doi: 10.1007/s10461-013-0451-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kendall C, Kerr LRFS, Gondim RC, et al. An empirical comparison of respondent-driven sampling, time location sampling, and snowball sampling for behavioral surveillance in men who have sex with men, Fortaleza Brazil. AIDS Behav. 2008;12(SUPPL. 1):97–104. doi: 10.1007/s10461-008-9390-4. [DOI] [PubMed] [Google Scholar]
  • 30.Handcock M, Gile K. SSPSE: estimating hidden population size using respondent driven sampling data. 2015. Available at: http://hpmrg.org. Accessed July 2014. [DOI] [PMC free article] [PubMed]
  • 31.Malekinejad M, Johnston LG, Kendall C, Kerr LRFS, Rifkin MR, Rutherford GW. Using respondent-driven sampling methodology for HIV biological and behavioral surveillance in international settings: a systematic review. AIDS Behav. 2008;12(4 Suppl):S105–S130. doi: 10.1007/s10461-008-9421-1. [DOI] [PubMed] [Google Scholar]
  • 32.UNAIDS/WHO Working Group on Global HIV/AIDS and STI Surveillance. Guidelines on Estimating the Size of Populations Most at Risk to HIV. Geneva, Switzerland; 2011.

Articles from Journal of Urban Health : Bulletin of the New York Academy of Medicine are provided here courtesy of New York Academy of Medicine

RESOURCES