Skip to main content
Journal of Urban Health : Bulletin of the New York Academy of Medicine logoLink to Journal of Urban Health : Bulletin of the New York Academy of Medicine
. 2014 Aug 16;92(1):151–167. doi: 10.1007/s11524-014-9897-0

Evaluation of Respondent-Driven Sampling in a Study of Urban Young Men Who Have Sex with Men

Lisa M Kuhns 1,2,, Soyang Kwon 1,2, Daniel T Ryan 3, Robert Garofalo 1,2, Gregory Phillips II 3, Brian S Mustanski 3
PMCID: PMC4338125  PMID: 25128301

Abstract

Evidence suggests that respondent-driven sampling (RDS) is an efficient approach to sampling among varied populations of adult men who have sex with men (MSM) both in the USA and abroad, although no studies have yet evaluated its performance among younger MSM, a population with a steep rise in HIV infection in recent years. Young MSM (YMSM) may differ in terms of their connectedness to other YMSM (e.g., due to evolving sexual identity, internalization of sexual minority stigma, and lack of disclosure to others) and mobility (e.g., due to parental monitoring) which may inhibit the sampling process. The aims of this study were to evaluate the efficiency and effectiveness of RDS-based sampling among young urban MSM and to identify factors associated with recruitment success. We hypothesized that demographic, social, behavioral, and network factors, including racial/ethnic minority status, homelessness (i.e., as an indicator of socioeconomic marginalization), HIV-positive status, substance use problems, gay community connectedness, and network size would be positively related to recruitment productivity, while sexual minority stigmatization, environmental barriers (e.g., parental monitoring), and meeting sex partners on the internet (i.e., virtual venue) would be negatively related to recruitment productivity. Between December 2009 and February 2013, we used RDS to recruit a sample of 450 YMSM, ages 16–20. Findings suggest that the use of RDS for sampling among YMSM is challenging and may not be feasible based on the slow pace of recruitment and low recruitment productivity. A large number of seeds (38 % of the sample, n = 172) had to be added to the sample to maintain a reasonable pace of recruitment, which makes use of the sample for RDS-based population estimates questionable. In addition, the prevalence of short recruitment chains and segmentation in patterns of recruitment by race/ethnicity further hamper the network recruitment process. Thus, RDS was not particularly efficient in terms of the rate of recruitment or effective in generating a representative sample. Hypotheses regarding factors associated with recruitment success were supported for network size and internalized stigma (but not other factors), suggesting that participants with larger network sizes or high levels of internalized stigma may have more and less success recruiting others, respectively.

Keywords: Respondent-driven sampling, Young men who have sex with men, HIV/AIDS

Introduction

Respondent-driven sampling (RDS) is a non-random, chain referral sampling method designed to minimize the sampling bias in estimates of the prevalence of risk factors or disease, such as HIV infection. It was developed specifically to sample hidden and stigmatized populations for which there is no sampling frame.1 Because of these advantages, it was quickly adopted throughout the USA and abroad among researchers and public health authorities in HIV/AIDS-related research, given that HIV infection impacts populations stigmatized by the sexual and intravenous drug-using behaviors that drive disease transmission. This rapid dissemination and adoption of RDS have led to many subsequent studies of its implementation and functioning, which have both tempered initial enthusiasm and helped to further develop this method.

RDS sampling begins with non-random selection of “seeds” or initial recruits from the target population. Seeds then initiate the chain of referral by recruiting a set number of their peers (i.e., a “quota” of recruits, usually up to 3–4), who then also recruit peers until the desired sample size is reached. Participants are incentivized to recruit with a small amount of compensation for each successful recruit. As a network-based approach, RDS-based samples may overrepresent those with larger social networks or who are more efficient recruiters of their peers. Pieces of information on the recruitment pattern of respondents, specifically, who recruited whom and participants’ personal network size, provide the basis for controlling bias introduced by the tendency of participants to recruit those like themselves, known as “homophily,” and provide the means for controlling bias toward oversampling those with larger peer networks.1,2 In theory, recruitment incentives and quotas help generate a sample which is also independent from the characteristics of the initial seeds by lengthening referral chains and reducing bias due to volunteerism. Thus, the sample is theorized to converge and stabilize or reach “equilibrium” after subsequent “waves” (i.e., one recruitment step along the chain) of recruitment are completed (i.e., usually six or less),1,2 although the equilibrium proportions are not (necessarily) equal to the population proportions due to these biases. Equilibrium is reached when key characteristics in the sample vary by less than 2 % between consecutive waves.3 Based on information collected on recruitment patterns and network size, relative inclusion probabilities in the form of sampling weights are calculated using the statistical theory upon which RDS is based. This occurs after sampling has been completed, a process known as “post-stratification.”2

Thus, RDS includes two primary components: a participant recruitment mechanism, which combines recruitment incentives and quotas (i.e., “RDS sampling”), and statistical sample adjustment for network size and patterns of recruitment (“RDS inference”).2,4 Although RDS has been widely acclaimed as an efficient sampling approach because recruitment of the target sample size is typically reached quickly and, with relatively few resources, it has faltered in cases in which the underlying network structure does not meet assumptions, for example that respondents know each other (i.e., RDS assumes a reciprocal relationship between recruiters and recruits),1 are interconnected, and have network sizes sufficiently free of segmentation for the sample to reach equilibrium.5 Furthermore, the calculated estimates based on RDS inferential approaches have also been challenged, with evidence of biased estimates.4,6

The efficiency of RDS sampling has been typically assessed in terms of the rate of recruitment over the enrollment period (i.e., “speed”) and related metrics, such as the productivity of seeds, the ratio of seeds to recruits or “sprouts,” and the average number of successful recruits per participant (i.e., RDS requires that participants recruit at least one participant on average to sustain the chain referral process).1,2 Sampling effectiveness has been evaluated based on whether or not the sample reached equilibrium, the level of homophily in the sampling process (i.e., as evidence of segmentation in networks or lack thereof), and the structure of the recruitment chains, that is, whether or not there are multiple waves of recruitment and long recruitment chains (i.e., as evidence of sustainability of chain referral recruitment).

Evidence suggests that RDS sampling is efficient but with inconsistent effectiveness among varied populations, including adult men who have sex with men (MSM) in the USA,711 although no studies have yet evaluated the performance among younger MSM. Young MSM (YMSM) may not be well connected to other YMSM (e.g., due to evolving sexual identity, internalization of sexual minority stigma, and lack of disclosure to others) and be less mobile (e.g., due to parental monitoring, lack of access to transportation) which may inhibit the RDS sampling process. Prior research suggests that RDS sampling may be less feasible among MSM in communities in which high levels of stigma limit connections within the population12,13 or where problems with transportation/mobility can slow down the recruitment process.1417 Furthermore, RDS samples may overrepresent individuals of lower socioeconomic status (i.e., who may have more reason to participate due to the recruitment incentives), including individuals living with HIV or those with substance use problems.1822 In addition, because less community connectedness has been cited as a more recent problem in recruitment of MSM generally,18 YMSM who are more connected to the “gay” community and/or have larger eligible network sizes may be more efficient recruiters, facilitating more efficient chain recruitment. Finally, connections to sexual partners via social venues versus virtual venues may also be related to recruitment efficiency.19

The purpose of this analysis was to evaluate the performance of RDS sampling among young urban YMSM (e.g., recruitment speed, sample characteristics, recruitment patterns, and network chain development) and to identify factors associated with individual-level recruitment success to inform future use of RDS with this population. We hypothesize that demographic, social, behavioral, and network factors, including racial/ethnic minority status, homelessness (i.e., as an indicator of socioeconomic marginalization), HIV-positive status, substance use problems, gay community connectedness, and network size, would be positively related to recruitment productivity, while sexual minority stigmatization, environmental barriers (e.g., parental monitoring), and meeting sex partners on the internet (i.e., virtual venue) would be negatively related to recruitment productivity.

Methods

Design and Setting

Data analyzed herein come from an ongoing longitudinal study of HIV-related risk among a sample of 450 YMSM in Chicago, the nation’s third largest city and the epicenter of the HIV/AIDS epidemic in the Midwest. The purpose of the study is to characterize the prevalence, course, and predictors of HIV/STI risk among very young MSM. Volunteers were eligible for study participation if they were age 16 to 20 at the time of enrollment, English-speaking, assigned a male sex at birth, had a prior sexual encounter with a male or identified as gay/bisexual, resided in the target metropolitan area, and were available for multiple follow-ups across the 24-month follow-up period. Eligible volunteers were asked to provide identification to verify age (e.g., school or state-issued identification). Seeds were recruited through community-based convenience sampling with study promotional materials distributed via active and passive means in community locations frequented by YMSM. To increase the probability that the sample would include very young MSM (i.e., a particularly hard population to recruit) and primarily HIV-negative MSM (i.e., to meet study aims for assessing correlates of sexual risk for HIV acquisition over time), we targeted community venues with younger populations to recruit seeds (e.g., gay–straight alliances affiliated with local schools, gay “prom,” etc.). In the last 4 months of open enrollment, in order to maximize sample size, we added recruitment of seeds via geosocial/sexual applications (e.g., Grindr, Jack’d). A total of four study enrollment sites were utilized throughout the course of the study, three on the north side of the city and one on the near southwest side.

Study enrollees were instructed to recruit up to three eligible peers (i.e., sexual minority young men ages 16–20). While study “coupons” were distributed initially, due to lost coupons, among other coupon problems,23 we modified the approach to include use of a participant-specific and unduplicated password system in which participants chose and then distributed passwords via text, email, or other means. Sprouts then “redeemed” the passwords (i.e., until a total of up to three were redeemed) at the point of screening; coupons/passwords were verified, and if valid, study eligibility was assessed and eligible volunteers were scheduled for an enrollment visit. Recruitment patterns, i.e., “who recruited whom” were tracked in a study-specific database.

Data Collection and Measures

Study procedures were approved by institutional review boards at the participating institutions. Following assent/consent procedures, data were collected via computer-assisted self-interviewing (CASI), and participants were tested for HIV. Initial baseline data were collected in two enrollment visits, with total compensation equal to $70 for completion of both visits. Each participant was compensated an additional $15 for each successful recruit (i.e., defined as an eligible volunteer who completed enrollment, up to total of $45 for three recruits). Because participants became well-known to staff in two initial baseline visits, duplication of enrollment was virtually impossible.

A brief session was conducted with participants at the initial enrollment visit to train them in recruitment of peers using a structured training script, including frequently asked questions (FAQ). For example, FAQs included information about eligibility, how to contact the study if interested, how many individuals could be recruited, how to collect recruitment incentives, and emphasized procedures to protect confidentiality and minimize coercion. Because the enrollment visit was broken up into two visits approximately 7–10 days apart, at the second visit, staff reviewed recruitment efforts to-date via a recruitment worksheet in which participants recalled which strategies were successful (or not), and suggestions and encouragement were offered by staff. After completion of the two initial visits, staff called participants by telephone at 1, 3, and 5 weeks post-enrollment to encourage recruitment and troubleshoot any problems encountered in the recruitment process. Staff continued to encourage recruitment throughout the accrual period at subsequent study visits.

In order to identify factors associated with individual-level recruitment success, we measured several demographic, social, behavioral, and network factors.

Sociodemographic

We measured age (i.e., “What is your date of birth?”), race/ethnicity (i.e., “What race do you primarily consider yourself to be?”), and current residence (i.e., “Which of the following best describes your current living situation?”; responses included living in a house, apartment, or dorm alone or with friends, family, or others; no permanent address, homeless, shelter, group home, or residential treatment facility; dichotomously coded as “homeless” versus all other responses).

HIV Status

HIV infection was determined by OraQuick/Orasure™ testing for those with unknown status and by self-report for those with known status (NB: HIV-positive status was confirmed for 71 % of cases via Orasure™, medical records, or HIV-related medication prescription verification).

Sexual Minority Status/Engagement/Stigma

To measure sexual orientation, we asked respondents, “Recognizing that sexual identity is only one part of your identity, how do you define your sexual identity?” Response categories included: only gay/homosexual, mostly gay/homosexual, bisexual, mostly heterosexual, only heterosexual, or other. Disclosure of sexual orientation to others was measured with a set of questions with the same stem, “You told [person type] that you are gay, queer or bisexual”; person types included family members, best friends, and teachers or other adults (i.e., dichotomized for analysis to include any “yes” response to any person type versus all other responses). We measured internalized homophobia via set of 22 items previously found to be reliable and valid among adult MSM24 and adapted for YMSM for this study (Cronbach’s α = 0.87). A sample item included is “Sometimes I wish I were not gay,” with responses on a four-point Likert scale: 1 = strongly disagree to 4 = strongly agree with higher values reflecting more stigmatization. We measured attachment to the gay community via three items that measured the frequency of attending gay-related activities/organizations, such as meetings, fund-raisers, and political activities, gay bars, and gay-related social service programs. Responses were scored on a five-point frequency scale from 1 = never to 5 = several times/week or daily. We measured community involvement with questions used in a prior study of community involvement in adult MSM,24 “In the past 12 months, how often have you volunteered for any HIV/AIDS organizations or causes?” and “In the past 12 months, how often have you volunteered for any Lesbian, Gay, Bisexual, Transgender, or Queer (LGBTQ organizations or causes?” (i.e., dichotomously coded as any HIV or LGBTQ volunteering versus none).

Environmental Barriers

We measured parental monitoring as an environmental barrier to recruitment of sprouts. Parental monitoring was assessed using the monitoring subscale of the Parenting Style Questionnaire (i.e., measured via youth report of parents style).25 The subscale consists of four items scored on a five-point Likert scale (e.g., 1 = never or almost never true, 5 = always or almost always true). A sample item is “How often do you let your parents/caregiver(s) know where you are going?”

Behavioral Factors

We determined whether or not participants had met recent sex partners via the internet in the prior 6 months from the HIV-Risk Assessment for Sexual Partners (H-RASP; dichotomously coded as having met any partner via the internet versus none).26 The binge drinking item was adopted from that recommended by the National Institute on Alcohol Abuse and Alcoholism (NIAAA; dichotomously coded as any episode in the prior 6 months of drinking five or more drinks in a 2-h period).27 Items reflecting use of illicit drugs were taken from the 2009 Youth Risk and Behavior Survey (YRBS) and included any prior 6-month use of (1) marijuana and (2) other drugs (i.e., cocaine, heroin, methamphetamines, opiates; non-prescription depressants, stimulants; psychedelics, Ecstasy, GHB, ketamine, and inhalants; dichotomized as any prior use of (1) marijuana versus none and (2) any other listed substance versus none).

RDS Network

Eligible network size was measured with the question, “Approximately how many people do you know by name? These are people who you know and who also know you. You would know how to contact them and you have seen them in the past 6 months?” Followed by, “Of those individuals that you know by name, how many people do you know who are young men between the ages of 16–20 who identify as gay, bisexual or queer or who have sex with other guys and who live in the Chicago area? These are people who you know and who also know you, who you know how to contact, and who you have seen in the past six months.” Additionally, all participants were asked, “How would you describe your relationship to the person who invited you to participate in this study, the person who gave you the coupon?” (i.e., to validate the RDS reciprocity assumption that the recruiter and recruit know each other).

Data Analysis

In order to determine the efficiency of RDS sampling within the target population, we calculated the rate of recruitment, the number and percentage of seeds, and the average number of sprouts recruited per participant. We assessed the effectiveness of RDS sampling by evaluating the number of waves needed to reach equilibrium and the sample size, the number and length of waves of recruitment and recruitment chains, the degree of homophily on key characteristics, including race/ethnicity, age, and HIV status, and whether or not the sample reached equilibrium on these characteristics. Descriptive analysis and statistical tests were conducted using SAS 9.3. We describe sample homophily and equilibrium (i.e., wave-by-wave) and calculate RDS estimates to the population for characteristics including race, age group, and HIV status using Respondent-Driven Sampling Analysis Tool 7.13 For geocoding, linkage management and data analysis were performed using ArcGIS 10.1. To assess potential correlates of recruitment success within the sample, we excluded participants who were recruited 30 days or less before closing to accrual (n = 13), given that they had a very limited amount of time to recruit others. Recruitment success was defined as successfully recruiting at least one participant. To simplify the analysis given the relatively high number of variables in the model and to facilitate straightforward interpretation of results, continuous measures were dichotomized based on a median value for the analysis of recruitment success. Variables significantly associated with recruitment success in bivariate analyses (p ≤ 0.05) were selected to be included in a multivariate logistic regression model.

Results

Sample Characteristics

In the overall sample (N = 450), the mean age was 18.9 years (SD = 1.3). The majority of participants were of minority race/ethnicity, i.e., 53.3 % Black, 20.0 % Latino, 18.0 % White, and 8.6 % other. A total of 34 (7.6 %) participants were HIV-positive.

Sampling Efficiency

Over a recruitment period of 39 months from December 2009 to February 2013, a total of 450 eligible participants were enrolled, equating to 11.5/month on average (NB: equating to a rate of 12.5/month including 38 participants who were withdrawn due to failure to complete the baseline interview or a post hoc determination of ineligibility), much slower than a rate of 16–20/month anticipated by the study team based on prior studies using convenience sampling. Study participants recruited 0.62 (SD = 0.94) others into the study on average. A total of only 168 (37.3 %) participants were productive (i.e., successfully recruited at least one sprout; 62.7 % recruited none); only 34 (7.5 %) participants fulfilled the recruitment quota of three sprouts (i.e., 12 seeds and 22 sprouts; see Table 1). Because of the slow recruitment rate and low productivity across the sample, seeds were added throughout the course of the study. This resulted in a sample with a large percentage of seeds, 172 (38.2 %), of which 55 (32.0 %) were productive. There were a total of 278 sprouts; the number of sprouts enrolled was fairly stable over the recruitment period (i.e., did not increase geometrically), and the number of productive seeds leveled out as the study reached its target sample size (see Fig. 1). Of the 278 sprouts, 66 (23.7 %) were recruited within a month, and 140 (50.4 %) were recruited within 6 months (i.e., from the point of enrollment of recruiter to sprout enrollment). The average number of months to coupon/password redemption was 4.5 (SD = 5.9, range = 0.0 to 29.5). No sprouts indicated that their recruiter was a “stranger” or someone not known to them.

TABLE 1.

Total number of recruits by seed versus sprout status

Number of recruits Total (n = 450) Seeds (n = 172) Sprouts (n = 278)
0 282 (62.7 %) 117 (68.0 %) 165 (59.4 %)
1 91 (20.2 %) 29 (16.9 %) 62 (22.3 %)
2 43 (9.6 %) 14 (8.1 %) 29 (10.4 %)
3 34 (7.5 %) 12 (7.0 %) 22 (7.9 %)

Chi-square (χ 2) = 3.43; df = 3; p = 0.33

FIG. 1.

FIG. 1

Cumulative number of enrollees by over the study period (December 2009–February 2013), by seed/sprout, and by productivity status.

Effectiveness of Sampling

The analysis of network chain development and characteristics was limited to the subsample of productive seeds and their sprouts; a total of 278 recruits linked to 55 productive seeds. Among 55 referral chains, the majority, 29 (52.7 %), had only one recruitment wave and 10 (18.3 %) had lengths of four waves or more (see Fig. 2). The three longest referral chains produced 87 sprouts which account for 35 % of the total. The map depicted in Fig. 3 shows the density of residential locations of productive seeds and their sprouts, which originated from almost all community areas (i.e., neighborhoods not confined to particular zip codes) within the city of Chicago, with the greatest concentrations on the north side of the city.

FIG. 2.

FIG. 2

Number of waves linked to 55 productive seeds.

FIG. 3.

FIG. 3

Map of locations of study sites, productive seed, and sprouts by Chicago community area.

The RDS sample composition gradually stabilized wave-by-wave for race/ethnicity at about wave 7 and almost immediately for age and HIV status (see Table 2). Characteristics of the 55 productive seeds (in comparison to the cumulative sample of sprouts at wave 8) reflected study goals to recruit younger and HIV-negative seeds. Table 3 shows the sample composition, equilibrium composition, population estimates, and network sizes (i.e., dual component network size) and homophily by demographic characteristics. In terms of race/ethnicity, each group (i.e., Black, Hispanic, and White) tended to recruit others of the same ethnicity (homophily index value range 0.52 to 0.58) and tended not to recruit outside their racial group (heterophily index values range −0.80 to 0.03, data not shown), except those in the “other” race category who tended most toward random mixing. Network size estimates were larger for Black (mean = 6.0), White (mean = 6.7), and other (mean = 5.7) race participants in comparison to Hispanic participants (mean = 4.2) and reflect relatively small eligible network sizes across all groups. The sample distribution approximates the population estimate in most cases; however, the estimate for Hispanic race adjusted up 3.7 % due in part to relatively smaller network size. In terms of age, the homophily indices were positive and relatively low for both 16–18-year-olds and 19–20-years-olds (i.e., homophily indices of 0.26 and 0.16, respectively); estimated network size was slightly larger for younger participants (mean = 6.1 for those ages 16–18 versus 5.3 for those ages 19–20). Population estimates mirror these statistics with adjustment down in the younger group and up in the older group to account for differences in homophily and network size (i.e., 42.4 % ages 16–18 in the sample versus 37 % in the population, 57.6 % ages 19–20 in the sample versus 63 % in the population, respectively). Regarding HIV status, recruitment patterns indicate a tendency for HIV-positive participants to recruit other HIV-positive participants (homophily index for HIV-positive 0.25; homophily index = 0.08 for HIV-negative participants). Network size estimates were slightly larger among HIV-negative versus HIV-positive participants (mean = 5.7 versus 4.7, respectively). Population estimates adjusted down for HIV-positive participants and up for HIV-negative participants, given slight differences in homophily.

TABLE 2.

Change in RDS sample size and composition over recruitment wave

Wave Recruits Cumulative RDS sample Race, n (%) Age group, n (%) HIV status, n (%)
Black Hispanic White Other 16–18 years 19–20 year Negative Positive
0 0 55 25 (45 %) 13 (24 %) 14 (25 %) 3 (5 %) 33 (60 %) 22 (40 %) 54 (98 %) 1 (2 %)
1 92 92 41 (44 %) 21 (23 %) 21 (23 %) 9 (10 %) 39 (42 %) 53 (58 %) 84 (91 %) 8 (9 %)
2 59 151 73 (48 %) 30 (20 %) 34 (23 %) 14 (9 %) 66 (44 %) 85 (56 %) 140 (93 %) 11 (7 %)
3 32 183 89 (49 %) 38 (21 %) 37 (20 %) 19 (10 %) 80 (44 %) 103 (56 %) 169 (92 %) 14 (8 %)
4 30 213 110 (52 %) 43 (20 %) 38 (18 %) 22 (10 %) 90 (42 %) 123 (58 %) 197 (93 %) 16 (7 %)
5 30 243 133 (55 %) 46 (19 %) 39 (16 %) 25 (10 %) 102 (42 %) 141 (58 %) 223 (92 %) 19 (8 %)
6 20 263 152 (58 %) 46 (17 %) 39 (15 %) 26 (10 %) 111 (42 %) 152 (58 %) 239 (91 %) 23 (9 %)
7 12 275 163 (59 %) 46 (17 %) 40 (15 %) 26 (9 %) 116 (42 %) 159 (58 %) 249 (91 %) 25 (9 %)
8 3 278 165 (59 %) 46 (17 %) 41 (15 %) 26 (9 %) 118 (42 %) 160 (58 %) 251 (91 %) 26 (9 %)
Total 333 190 (57 %) 59 (18 %) 55 (16 %) 29 (9 %) 151 (45 %) 182 (55 %) 305 (92 %) 27 (8 %)

Note: HIV status missing for one participant

TABLE 3.

Sample composition, RDS estimates, and recruitment patterns by demographic factors

Race/ethnicity of recruiters Race/ethnicity of recruits Total
Black Hispanic White Other
 Black 132 (83.1 %) 7 (4.4 %) 5 (3.1 %) 15 (9.4 %) 159
 Hispanic 7 (14.9 %) 29 (61.7 %) 6 (12.8 %) 5 (10.6 %) 47
 White 6 (13.3 %) 7 (15.6 %) 28 (62.2 %) 4 (8.9 %) 45
 Other 20 (74.1 %) 3 (11.1 %) 2 (7.4 %) 2 (7.4 %) 27
 Total 165 46 41 26 278
Actual sample composition 59.4 % 16.5 % 14.7 % 9.4 % 100 %
Equilibrium sample composition 63.5 % 15.0 % 12.2 % 9.4 % 100 %
Population composition (95 % CI) 60.2 % (46.5, 74.5) 20.2 % (9.6, 31.6) 10.3 % (3.3, 20.3) 9.3 % (3.3, 15.9) 100 %
Mean network size, n (adjusted) 6.0 4.2 6.7 5.7
Homophily 0.57 0.52 0.58 −0.20
Age group of recruiters Age group of recruits
16–18 years 19–20 years Total
 16–18 years 75 (53.6 %) 65 (46.4 %) 140
 19–20 years 43 (31.2 %) 95 (68.8 %) 138
 Total 118 160 278
Actual sample composition 42.4 % 57.6 % 100 %
Equilibrium sample composition 40.2 % 59.8 % 100 %
Population composition (95 % CI) 37.0 % (28.3, 47.9) 63.0 % (52.1, 71.7) 100 %
Mean network size, n (adjusted) 6.1 5.3
Homophily 0.26 0.16
HIV status of recruiters HIV status of recruits
Negative Positive Total
 Negative 238 (91.2 %) 23 (8.8 %) 261
 Positive 13 (81.3 %) 3 (18.8 %) 16
 Total 251 26 277
Actual sample composition 90.6 % 9.4 % 100 %
Equilibrium sample composition 90.2 % 9.8 % 100 %
Population composition (95 % CI) 88.3 % (81.2, 94.5) 11.7 % (5.5, 18.8) 100 %
Mean network size (adjusted) 5.7 4.7
Homophily 0.25 0.08

Note: HIV status missing for one participant

Characteristics of Productive Seeds and Correlates of Successful Recruitment

In terms of characteristics associated with successful recruitment, in initial bivariate analysis of correlates of recruitment success across the sample, homelessness, internalized homophobia, gay community attachment, gay community involvement, binge drinking, other drug use, having met a partner on the internet, and network size were related to recruitment success. Age, race/ethnicity, HIV status, disclosure, sexual orientation, parental monitoring, and marijuana use were not significant correlates of recruitment success in bivariate analysis. In a multivariate model, internalized homophobia was negatively related to recruitment success (odds ratio (OR) = 0.56; 95 % confidence interval (CI) = 0.37, 0.85), while network size was positively related to it (OR = 1.63; 95 % CI = 1.06, 2.49). Homelessness, gay community attachment and community involvement, binge drinking, other drug use, and having met a partner on the internet were not significantly related to recruitment success in the multivariate model. Thus, our hypothesis was supported for internalized homophobia and network size but not for all other factors (Table 4).

TABLE 4.

Multivariate logistic regression of successful recruitment on demographic, behavioral, community, environmental, and network factors (n = 437)

OR 95 % CI
Demographic factors
 Living in a home/apartment/dorm Ref.
 Homeless (e.g., no permanent address, shelter, treatment facility) 2.01 0.87, 4.65
Internalized homophobia scalea
 Low (1.0 to 2.0) Ref.
 High (2.1 to 4.0) 0.56 0.37, 0.85
Engagement in offline gay community activitiesa
 Once a month or less Ref.
 Several times a month or more 1.19 0.77, 1.84
Gay community involvement scorea
 Not volunteered in past 12 months (1) Ref.
 Yes, volunteered (2–8) 1.44 0.93, 2.22
Eligible network sizea
 <10 people Ref.
 ≥10 people 1.63 1.06, 2.49
Binge drinking
 No Ref.
 Yes 0.64 0.29, 1.40
Hard drug use
 No Ref.
 Yes 0.55 0.29, 1.07
Met sex partners on the Internet
 No Ref.
 Yes 0.65 0.41, 1.00

aVariables that were dichotomized based on a median value

Discussion

The purpose of this analysis was to evaluate the performance of RDS sampling among young urban MSM, ages 16–20, and to identify factors associated with recruitment success to inform future use of RDS with this population.

In terms of sampling efficiency, the rate of recruitment was slow in comparison to expectations. To draw a sample of 450 YMSM, ages 16–20, it took 39 months (i.e., 11.5 eligible participants/month, including both seeds and sprouts). This rate of recruitment is less than a quarter the speed of recruitment described in prior studies of RDS to recruit adult MSM.8,10 One factor that may explain this phenomenon is the very low level of productivity among both seeds and sprouts; overall participants recruited less than one additional person on average, and 63 % of the sample did not recruit anyone. While we did anticipate slow recruitment in comparison to MSM adults, we did not anticipate this low level of productivity. We modified the coupon distribution process to eliminate paper coupons in favor of a password system, which although it did not increase the recruitment rate did eliminate coupon-related problems, resulting in more efficient use of staff time. The geometric growth expected and needed to maintain the chain referral process was not evident in this case.

In order to sustain a steady rate of recruitment, additional seeds had to be added to the sample throughout the enrollment period, constituting 38 % of the sample, much greater than the <10 % found in most RDS sampling studies. This high percentage of seeds in the sample renders it inefficient for making RDS-based estimates to the population since these methods require that seed data be removed from the analytic sample to ensure only peer-recruited individuals are assessed.28 Furthermore, the long period of recruitment calls into question the original estimates of network size used to weight the sample for population-based estimates. In other work based on a subsample of the data presented herein, estimates of network size changed in the same individual over time, becoming less correlated over time with original estimates.29 In addition, low productivity may partly be due to the relatively small average network size found in this sample. Small network size could be due to variation in sexual minority identity development in this age group30 as well as the relatively narrow age range for eligibility (i.e., 16–20 years old). Others have suggested that different research participation payment structures are needed to incentivize YMSM in particular,18 given low rates of participation. An alternative system was developed for adult MSM (e.g., raffle with large cash prize) in an RDS-based study;31 however, any increase in incentive payments would need to be balanced with concerns about undue influence and/or coercion,1 particularly for young participants. In addition, while steering strategies (i.e., payment of larger incentives or additional incentives to recruit younger individuals) have also been suggested to recruit younger MSM,20 these violate the RDS assumption of random recruitment. The proportion of seeds to sprouts found herein is remarkably similar to a parallel study of sexual minority youth conducted in Chicago at the same time as this study,32 which provides additional evidence regarding the difficulty of RDS sampling in similar populations.

One additional reason for the lack of efficiency of RDS for YMSM recruitment in comparison to other populations, such as injection drug users (IDU), including MSM/IDU,8 may be the characteristics of relationships within the network. For example, among IDUs, relationships are sometimes characterized as multiplex (e.g., network contacts may be drug use partners, sex partners, and/or social companions) and/or transactional (e.g., exchange of sex or drug for money, food, or shelter), which may provide a more fertile environment for the social influence and social exchange on which RDS-based recruitment depends.18

The recruitment patterns and structure of recruitment chains also reflect problematic dynamics for RDS-based recruitment in this population. Among the 55 referral chains linked to productive seeds, the majority (52 %) had only one recruitment wave, and only 18 % had had lengths of four waves or more. In terms of recruitment patterns, homophily was quite low for age and HIV status but moderate–high for race/ethnicity (homophily index values 0.52 to 0.58, except for “other” race), reflecting segmentation by race/ethnicity. In other work based on a subsample of the data presented herein, an analysis of sexual networks reflects similar segmentation.33

In addition, while the sample compositions approximated the equilibrium sample compositions and the estimated population compositions for key characteristics (i.e., age, race/ethnicity, HIV status), the proportions by race/ethnicity do not match what one might expect based on available data. The RDS population estimates were 60 % Black, 20 % Hispanic, and 10 % White YMSM. While there is no sampling frame available for comparison, this suggests an oversampling of Black YMSM, even given national data from the Youth Risk Behavior Survey (YRBS), a survey of a representative sample of high school age youth and risk behavior collected every 2 years (i.e., primarily public school students), which indicates a larger percentage of youth of color identify as gay or lesbian.30 Given the network segmentation by race/ethnicity, the oversampling of Black YMSM was also driven by a larger number of Black seeds (45 %).

The failure to generate a representative sample has been found in some prior studies of MSM,21,22,34 in which the sample overrepresented poor and HIV-positive MSM. In one case, the overrepresentation of HIV-positive MSM was attributed to errors in the operationalization of eligibility criteria and a large network from an HIV/AIDS service organization;34 the overrepresentation of poor individuals was attributed to motivation due to the incentive structure.21,22,34 It is possible that, although chains were sometimes quite long in these studies, they were not long enough to offset a high degree of segmentation by race/ethnicity, poverty, and HIV status.

In terms of factors significantly associated with recruitment success, we hypothesized that racial/ethnic minority status, homelessness, HIV-positive status, substance use problems, gay community connectedness, and estimated network size would be positively related to recruitment productivity, while sexual minority stigmatization, parental monitoring, and meeting sex partners on the internet (i.e., virtual venue) would negatively be related to recruitment productivity. Only the hypotheses regarding network size and stigmatization were supported. Although prior studies of adult MSM suggest that estimated network size may not be a significant productivity factor in RDS-based sampling,19 we did find evidence to suggest it is an important factor among YMSM. More specifically, we found that controlling for other factors, YMSM with larger estimated eligible network (i.e., ≥10 people) may be up to 60 % more likely to recruit at least one additional person. Not surprisingly, internalized stigmatization negatively impacted recruitment productivity, with those with higher levels of stigmatization more than 40 % less likely to recruit at least one additional person into the study. One important caveat is that we did not systematically track information about peer recruitment refusals, which in future studies might provide important additional information about productivity problems.

Thus, in conclusion, we found evidence to suggest that the use of RDS for sampling among YMSM is challenging and may not be efficient based on the slow pace of recruitment and low recruitment productivity; the average number of recruits per enrollee was not sufficient to sustain recruitment over time (i.e., <1). The large number of seeds that had to be added to maintain a reasonable pace of recruitment makes use of the sample for RDS-based population estimates questionable. In addition, the prevalence of short recruitment chains and segmentation in patterns of recruitment by race/ethnicity further hamper the network recruitment process and suggest that young MSM may not form one large interconnected network as required for use of RDS sampling. The overall sample proportions and population estimates in this case appear to overrepresent Black YMSM. Recruitment success was positively associated with network size and negatively associated with internalized stigma, suggesting that individuals who have larger network sizes and high internalized stigma may be more and less successful in recruitment of others, respectively. At the same time, it is worth acknowledging that 62 % of the sample was peer-recruited, which may be useful as an adjunctive method of recruitment along with other approaches.

These findings must be considered in light of limitations. As has been noted by others,12 communities of hidden and stigmatized populations may be more or less interconnected depending on local characteristics and network dynamics, so our findings may not generalize to other urban areas or populations of YMSM. In particular, Chicago is a city characterized by racial segregation in neighborhoods and schools;35 thus, other communities in which this segregation is not as present may not have the same level of segmentation in YMSM networks found herein. In addition, RDS assumes that members of the population know each other; thus as in all such studies, individuals who are isolates are underrepresented. For example, YMSM who have not disclosed their sexual orientation to others are likely underrepresented in this study. As well, those who have not yet reached particular sexual milestones, such as identifying as gay or bisexual or who have not had sex with other men are not represented in this study.

Acknowledgments

Thanks to the staff and participants of “Crew450” for their time and effort. Research reported in this publication was supported by the National Institute on Drug Abuse of the National Institutes of Health under Award Number R01DA025548. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

References

  • 1.Heckathorn DD. Respondent-driven sampling: a new approach to the study of hidden populations. Soc Probl. 1997;44(2):174–99. doi: 10.2307/3096941. [DOI] [Google Scholar]
  • 2.Heckathorn DD. Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations. Soc Probl. 2002;49(1):11–34. doi: 10.1525/sp.2002.49.1.11. [DOI] [Google Scholar]
  • 3.Respondent-Driven Sampling Analysis Tool (RDSAT) [computer program]. Version 7.1. Ithaca, NY: Cornell University; 2012.
  • 4.Salganick M. Respondent-driven sampling in the real world. Epidemiology. 2012;23(1):148–50. doi: 10.1097/EDE.0b013e31823b6979. [DOI] [PubMed] [Google Scholar]
  • 5.Simic M, Johnston LG, Platt L, Baros S, Andjelkovic V, Novotny T, Rhodes T. Exploring barriers to ‘respondent driven sampling’ in sex worker and drug-injecting sex worker populations in Eastern Europe. J Urban Health. 2006;83(6 Suppl):i6–15. doi: 10.1007/s11524-006-9098-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.McCreesh N, Frost SD, Seeley J, et al. Evaluation of respondent-driven sampling. Epidemiology. 2012;23(1):138–47. doi: 10.1097/EDE.0b013e31823ac17c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Deiss RG, Brouwer KC, Ma OL, et al. High-risk sexual and drug using behaviors among male injection drug users who have sex with men in 2 Mexico-US border cities. Sex Transm Dis. 2008;35(3):243–9. doi: 10.1097/OLQ.0b013e31815abab5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Iguchi MY, Ober AJ, Berry SH, et al. Simultaneous recruitment of drug users and men who have sex with men in the United States and Russia using respondent-driven sampling: sampling methods and implications. J Urban Health. 2009;86:S5–31. doi: 10.1007/s11524-009-9365-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ramirez-Valles J, Garcia D, Campbell RT, Diaz RM, Heckathorn DD. HIV infection, sexual risk behavior, and substance use among Latino gay and bisexual men and transgender persons. Am J Public Health. 2008;98(6):1036–42. doi: 10.2105/AJPH.2006.102624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ramirez-Valles J, Heckathorn DD, Vazquez R, Diaz RM, Campbell RT. From networks to populations: the development and application of respondent-driven sampling among IDUs and Latino gay men. AIDS Behav. 2005;9(4):387–402. doi: 10.1007/s10461-005-9012-3. [DOI] [PubMed] [Google Scholar]
  • 11.Rhodes SD, McCoy TP, Hergenrather KC, et al. Prevalence estimates of health risk behaviors of immigrant Latino men who have sex with men. J Rural Health Off J Am Rural Health Assoc Natl Rural Health Care Assoc. 2012;28(1):73–83. doi: 10.1111/j.1748-0361.2011.00373.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Johnston L, Whitehead S, Simic-Lawson M, Kendall C. Formative research to optimize respondent-driven sampling surveys among hard-to-reach populations in HIV behavioral and biological surveillance: lessons learned from four case studies. AIDS Care. 2010;22(6):784–92. doi: 10.1080/09540120903373557. [DOI] [PubMed] [Google Scholar]
  • 13.Chua AC, Chen MI, Cavailler P, et al. Challenges of respondent driven sampling to assess sexual behaviour and estimate the prevalence of human immunodeficiency virus (HIV) and syphilis in men who have sex with men (MSM) in Singapore. Ann Acad Med. 2013;42(7):350–3. [PubMed] [Google Scholar]
  • 14.Toledo L, Codeço CT, Bertoni N, et al. Putting respondent-driven sampling on the map: insights from Rio de Janeiro, Brazil. J Acquir Immune Defic Syndr. 2011;57(Suppl 3):S136–43. doi: 10.1097/QAI.0b013e31821e9981. [DOI] [PubMed] [Google Scholar]
  • 15.Kral AH, Malekinejad M, Vaudrey J, et al. Comparing respondent-driven sampling and targeted sampling methods of recruiting injection drug users in San Francisco. J Urban Health Bull N Y Acad Med. 2010;87(5):839–50. doi: 10.1007/s11524-010-9486-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Burt RD, Hagan H, Sabin K, Thiede H. Evaluating respondent-driven sampling in a major metropolitan area: comparing injection drug users in the 2005 Seattle area national HIV behavioral surveillance system survey with participants in the RAVEN and Kiwi studies. Ann Epidemiol. 2010;20(2):159–67. doi: 10.1016/j.annepidem.2009.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.McCreesh N, Johnston LG, Copas A, et al. Evaluation of the role of location and distance in recruitment in respondent-driven sampling. Int J Health Geogr. 2011;10:56. doi: 10.1186/1476-072X-10-56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Jenkins RA. Recruiting substance-using men who have sex with men into HIV prevention research: current status and future directions. AIDS Behav. 2012;16(6):1411–9. doi: 10.1007/s10461-011-0037-5. [DOI] [PubMed] [Google Scholar]
  • 19.Reisner SL, Mimiaga MJ, Johnson CV, et al. What makes a respondent-driven sampling “seed” productive? Example of finding at-risk Massachusetts men who have sex with men. J Urban Health Bull N Y Acad Med. 2010;87(3):467–79. doi: 10.1007/s11524-010-9439-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wei C, McFarland W, Colfax GN, Fuqua V, Raymond HF. Reaching black men who have sex with men: a comparison between respondent-driven sampling and time-location sampling. Sex Transm Infect. 2012;88(8):622–6. doi: 10.1136/sextrans-2012-050619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lauby JL, Millett GA, LaPollo AB, Bond L, Murrill CS, Marks G. Sexual risk behaviors of HIV-positive, HIV-negative, and serostatus-unknown Black men who have sex with men and women. Arch Sex Behav. 2008;37(5):708–19. doi: 10.1007/s10508-008-9365-6. [DOI] [PubMed] [Google Scholar]
  • 22.Shoptaw S, Weiss RE, Munjas B, et al. Homonegativity, substance use, sexual risk behaviors, and HIV status in poor and ethnic men who have sex with men in Los Angeles. J Urban Health Bull N Y Acad Med. 2009;86(Suppl 1):77–92. doi: 10.1007/s11524-009-9372-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kuhns L, Johnson A, Mustanski B, Garofalo R. Barriers to and facilitators of optimal functioning of respondent-driven sampling for recruitment of young men who have sex with men (YMSM) into HIV-related research. Paper presented at: Annual Meeting of the American Public Health Association (APHA) 2011; Washington, DC.
  • 24.Ramirez-Valles J, Kuhns LM, Campbell RT, Diaz RM. Social integration and health: community involvement, stigmatized identities, and sexual risk in Latino sexual minorities. J Health Soc Behav. 2010;51(1):30–47. doi: 10.1177/0022146509361176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Nappi CMTC, Kapungu C, et al. Parental monitoring as a moderator of the effect of family sexual communication on sexual risk behavior among adolescents in psychiatric care. AIDS Behav. 2009;13:1012–20. doi: 10.1007/s10461-008-9495-9. [DOI] [PubMed] [Google Scholar]
  • 26.Newcomb ME, Ryan DT, Garofalo R, Mustanski B. The effects of sexual partnership and relationship characteristics on three sexual risk variables in young men who have sex with men. Arch Sex Behav. 2014;43(1):61–72. doi: 10.1007/s10508-013-0207-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.National Institute on Alcohol Abuse and Alcoholism Task Force on Recommended Alcohol Questions. National Council on Alcohol Abuse and Alcoholism Recommended Sets of Alcohol Consumption Questions. Bethesda, MD: National Institute on Alcohol Abuse and Alcoholism; October 15–16 2003.
  • 28.Salganik M, Heckathorn DD. Sampling and estimation in hidden populations using respondent-driven sampling. Sociol Methodol. 2004;34:193–240. doi: 10.1111/j.0081-1750.2004.00152.x. [DOI] [Google Scholar]
  • 29.Phillips G, Kuhns LM, Garofalo R, Mustanski B. Do recruitment patterns of young men who have sex with men (YMSM) through respondent-driven sampling (RDS) violate assumptions? Manuscript under review. [DOI] [PMC free article] [PubMed]
  • 30.Mustanski B, Birkett M, Greene GJ, Rosario M, Bostwick W, Everett BG. The association between sexual orientation identity and behavior across race/ethnicity, sex, and age in a probability sample of high school students. Am J Public Health. 2014;104(2):237–44. doi: 10.2105/AJPH.2013.301451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Truong HH, Grasso M, Chen YH, et al. Balancing theory and practice in respondent-driven sampling: a case study of innovations developed to overcome recruitment challenges. PLoS One. 2013;8(8):e70344. doi: 10.1371/journal.pone.0070344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Mustanski BS, Garofalo R, Emerson EM. Mental health disorders, psychological distress, and suicidality in a diverse sample of lesbian, gay, bisexual, and transgender youths. Am J Public Health. 2010;100(12):2426–32. doi: 10.2105/AJPH.2009.178319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Mustanski B, Birkett M, Kuhns L, Muth S, Latkin C. The role of geographic and network factors in racial disparities in HIV among young men who have sex with men: an egocentric network study. Manuscript under review. [DOI] [PMC free article] [PubMed]
  • 34.Marks G, Millett GA, Bingham T, et al. Understanding differences in HIV sexual transmission among Latino and black men who have sex with men: the Brothers y Hermanos Study. AIDS Behav. 2009;13(4):682–90. doi: 10.1007/s10461-008-9380-6. [DOI] [PubMed] [Google Scholar]
  • 35.Glaeser E, Vigdor J. The end of the segregated century: racial segregation in America’s neighborhoods, 1890–2010. New York: Manhattan Institute for Policy Research; 2012. [Google Scholar]

Articles from Journal of Urban Health : Bulletin of the New York Academy of Medicine are provided here courtesy of New York Academy of Medicine

RESOURCES