INTRODUCTION
Composite biological endpoints are increasingly being used in clinical trials. Composite endpoints have been used to demonstrate the efficacy of an experimental intervention, but the way composite endpoints are designed and interpreted has not been evaluated (1). In human immunodeficiency (HIV) and sexually transmitted infection (STI) prevention trials, multiple biologic outcomes have been combined to increase the power of a study and increase the likelihood of detecting an intervention effect (2). That strategy can maximize the opportunity to have an outcome for every participant, as well as a single primary outcome for the trial even when there are missing data or individuals have entered the trial with incurable STIs (e.g., HIV and herpes simplex virus type 2 (HSV-2) infections). The most meaningful and appropriate measure of the efficacy of a behavioral HIV prevention intervention is a reduction in HIV incidence, but frequently it is not feasible to use HIV infection as a primary endpoint because of the low incidence of HIV seroconversion and the modest expected effect size (30–50%) of behavioral interventions. Researchers have, therefore, used incident HIV infection and STIs to demonstrate efficacy of behavioral and biomedical interventions (3). However, those studies did not describe how the selection and use of multiple outcomes impacted the interpretation of the study findings.
Using a composite biological endpoint of STIs including HIV infection presents multiple challenges because of different biological and transmission characteristics such as infectivity, duration of infectiousness, symptomatology, ease of diagnosis and treatment, and risk of repeat infections (4). Therefore, it is not intuitive that combining multiple biological outcomes with different epidemiologic characteristics will produce interpretable results; more may not be better.
The purpose of this study was to develop empirical evidence to evaluate whether a composite biological outcome performs better than a single (or dual) biological outcome in evaluating HIV prevention interventions in multiple study sites where the study populations may have different STI prevalence, demographic, and behavioral characteristics. By simulating various combinations of STI outcomes and modifying both the strength of the intervention effect and the number of STIs in the outcome measured, we demonstrated how a composite test statistic might vary. This is the first empirical analysis of how the selection of STI outcome measures might impact study interpretation in various epidemiologic settings.
Background
The question of which biological endpoint to use was an issue in the National Institute of Mental Health (NIMH) Collaborative HIV/Sexually Transmitted Disease (STD) Prevention Trial (hereafter called the C-POL Trial) (5). The original study was conducted in five international sites (China, India, Peru, Russia, and Zimbabwe) in a range of sexually active populations at risk for STIs, including HIV infection. The C-POL Trial was a two arm, community-level, cluster randomized controlled trial that involved between 20 and 40 independent clusters or venues (discrete geographic community sites) in each country where members of the population who are at increased risk for HIV infection and STIs could be reached by community popular opinion leaders (C-POLs) who delivered HIV/STD risk reduction messages. The objective was to change the social norms for safer sexual practices in these target populations. The design and results of the C-POL Trial using a composite biological endpoint have been reported elsewhere (6). Using simulation techniques informed by the experiences from the C-POL Trial, we examined the performance of a composite biological endpoint to detect the effect of an intervention under various conditions. Based upon the observational data from the C-POL Trial, we assumed the prevalence of the STDs varied widely at baseline.
METHODS
Setting
The setting in this simulation analysis was a multi-population, 2-armed clustered randomized trial of an intervention to reduce sexual risk behavior. The study design included intervention and comparison venues in each population to demonstrate an intervention effect (i.e., a reduction in incidence) in each of the five populations as determined by a biological endpoint. Six STIs were identified that could be analyzed using blood, urine or vaginal samples: chlamydia (Ct), gonorrhea (Ng), trichomoniasis (Tv), in women only, syphilis (SYP), genital herpes (HSV-2), and HIV. The primary question was when the prevalence of those STIs varied greatly by population, what is the best biological endpoint to use to demonstrate an intervention effect in each population under various scenarios (e.g., the incidence of all STIs are reduced in the intervention venues, the incidence of one STI is reduced in the intervention venues, the incidence of STIs are reduced in both intervention and comparison venues, etc.). In particular, given the above context, we wanted to examine how a composite endpoint would perform in detecting an intervention effect compared to using individual STIs.
Composite Biological Endpoint
The composite biological endpoint was defined as the incidence of any new STI: Ct, Ng, Tv (women) HSV-2, SYP, or HIV observed during a 24-month follow-up period. Thus, for each participant, a composite binary variable was determined to indicate whether or not a new case of at least one of the six STIs was detected during follow-up. Individuals with Ct, Ng, Tv, and SYP were assumed to be treated and cured following a positive test result at baseline and thus eligible for incident infection at follow-up. Since HIV and HSV-2 infections cannot be cured, an individual was classified as positive for having a new case of any of the 6 infections during follow-up if there was any new positive test for Ct, Ng, Tv, SYP, HSV-2 (if negative at baseline), or HIV (if negative at baseline). Otherwise, an individual was classified as negative for the composite outcome denoted as “any STI.”
Simulation
Simulation is a way to study outcomes that replicate real world experience. By using well-specified steps, researchers develop evidence that elucidates different simulated outcomes. In this study, simulation was used to demonstrate how a composite biological outcome performed compared to a single (or dual) biological outcome in detecting the effect of a behavioral intervention.
In conducting the simulation study for this paper, we performed the following steps:
Step One
We established four hypothetical populations with 1,000 eligible participants in each of 20 to 40 venues (a discrete geographical area where participants assemble [e.g., bar, office, school]) in each of the four populations (see sample size calculations for number of venues below). The four hypothetical populations were assumed to be the entire eligible populations congregating in the selected venues (see Table 1 for the sample sizes in the hypothetical populations).
Table 1.
Sample sizes and Sex and Age Distributions (in years) for the Four Hypothetical Populations
| Males <25 | Males ≥25 | Females <25 | Females ≥25 | Total1 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Population | N | % | N | % | N | % | N | % | N | % |
| A | 12,390 | (41.3) | 3,510 | (11.7) | 10,620 | (35.4) | 3,480 | (11.6) | 30,000 | (100) |
| B | 11,280 | (47.0) | 5,520 | (23.0) | 3,600 | (15.0) | 3,600 | (15.0) | 24,000 | (100) |
| C | 5,400 | (18.0) | 15,600 | (52.0) | 3,300 | (11.0) | 5,700 | (19.0) | 30,000 | (100) |
| D | 2,960 | (7.4) | 15,760 | (37.4) | 1,840 | (4.6) | 19,440 | (48.6) | 40,000 | (100) |
Total universe over venues.(venues in population A = 30, in population B = 24, in population C = 30 and in population D = 40).
Step Two
Based on the data (not census data) from the C-POL Trial, the eligible participants from the hypothetical population were randomly assigned age and gender and, then, based on their age and gender, randomly assigned as an individual having one or more STIs or not having one or more STIs at baseline and during the two-year follow-up period, and whether over time they were lost to follow-up. As in the C-POL Trial, these participant characteristics varied from population to population over a wide range of values so that the results of the simulation would be generalizable to other populations of similar age and sex distributions.
Further, we assumed intraclass correlations (ICCs) among the venues (i.e., among venue variances) in each of the four populations for the prevalence of each of the six STIs between .005 and .02 as well as correlations among the prevalence of the six STIs (6). The incidences took into account whether a participant had an STI at baseline. We assumed for Ct, Ng, Tv, and SYP that the STI was treated and cured at baseline, while for those infected with HIV and HSV-2 at baseline, infection was lifelong. ICCs between venues were also assumed for the incidence data, as well as correlations among the incidences of the six STIs. Table 1 presents the samples sizes as well as the age and sex distributions for each of the hypothetical populations. Table 2 presents overall STI prevalences at baseline and STI incidences over a two year period in the hypothetical population comparison venues that were used to assign whether an individual had any of the STIs or not at baseline or during the two-year follow-up period.
Table 2.
Overall Population STI Prevalences at Baseline and Incidences Over Two Years by STI for the four Hypothetical Populations1
| Population A | Population B | Population C | Population D | |
|---|---|---|---|---|
| STI | Prevalence (Incidence) % |
Prevalence (Incidence) % |
Prevalence (Incidence) % |
Prevalence (Incidence) % |
| Ct | 1 (1) | 5 (6) | 6 (5) | 10 (6) |
| Ng | 1 (1) | 2 (2) | 3 (3) | 4 (2) |
| HSV-2 | 30 (15) | 20 (5) | 15 (5) | 10 (4) |
| Tv | 20 (5) | 15 (18) | 9 (9) | 4 (5) |
| SYP | 1 (1) | 7 (2) | 5 (2) | 2 (1) |
| HIV | 15 (9) | 8 (3) | 5 (2) | 1 (1) |
| "Any STI" | 37 (20) | 34 (19) | 26 (13) | 25 (12) |
Incidence in table are for comparison venues.
We assumed lost-to-follow rates in each venue of the hypothetical populations based on the age-sex distribution in each country (Males < 25 years of age lost-to-follow-up rate = 20%; Females < 25 years, rate = 15%; Males > 25 years, rate = 15%; Females >25 years, rate = 10%). Then we randomly assigned individuals in each venue from the hypothetical population based on their age and sex to be lost to follow-up. As mentioned above, those assigned ICCs, correlations, prevalence, incidences, and follow-up rates were based on the observational data in the C-POL Trial.
Step Three
We blocked venues within the hypothetical populations into pairs based on “any STI” rates at baseline (the two venues with the highest baseline “any STI” rates in the first pair, etc.). Pairs were used as blocks to reduce variability and to maximize statistical power. For each matched pair of venues, we randomly assigned one to intervention (I) and one to comparison (C).
Step Four
Using the hypothetical populations, a random sample of participants was selected in each venue in each population. To determine how large a sample to draw from the hypothetical population in each venue, we did sample size calculations for each population (i.e., number of venues, number of participants per venue) based on (1) an assumed 30% effect size in the intervention venues; (2) ICCs observed in the C-POL Trial; and (3) the incidences assumed in the comparison venues for the endpoint “any STI” (7). The sample sizes assumed 80% power and a two-sided significance level of 0.05. (Note in Population C that the number of venues selected in the simulation was fewer than needed to obtain 80% power in order that we might observe the results when a sampled population was underpowered.) Based on those assumptions the following sample sizes were calculated (number of venues, number of participants sampled in each venue, total number of sampled participants for each random sample): (1) Population A – number of venues: 30; participants sampled in each venue: 125; total number of sampled participants: 3750; (2) Population B—number of venues: 24; participants sampled in each venue : 150; total number of sampled participants: 3600; (3) Population C—number of venues: 30; participants sampled in each venue:150; total number of sampled participants: 4500; (4) Population D—number of venues: 40; participants sampled in each venue : 100; total number of sampled participants: 4000. Thus, for example in Population A, a random sample of 125 participants was selected from the 1000 population members in the hypothetical population in each of the 30 venues - 3,750 participants. (See Figure 1)
Fig. 1.
Schematic of simulation process in Country A.
Step Five
The statistics “any STI” as well as other statistics (HIV, HSV-2) were computed for each participant in the sample selected in step Four. We also computed the statistic “any viral STI” which was created to indicate participants who tested newly positive for HSV-2 or HIV over the 2-year follow-up period. If a participant was lost-to-follow-up, the statistics were missing for that participant. No imputation was performed.
Step Six
The incidence rates for all sampled participants in each venue in each population who had “any STI,” “a single STI,” (HIV, HSV-2) or “any viral STI“ over the 2 years was then computed. The overall test statistic for the sampled participants within a population was taken as the average of the differences of the intervention minus the comparison incidences, across venue pairs with equal weight to each pair (e.g., in Population A the 15 incidence differences [where 15 was the number of intervention and comparison venue pairs] were computed and then averaged to get the overall test statistic). (See Figure 1) (6).
Step Seven
A permutation test (see below) was used to determine the significance among test statistics for each population. This yielded a p-value for each test statistic in each population. The permutation test is based on the randomization of venues within venue pairs to the intervention or comparison condition. Statistical significance was computed by considering all possible values each test statistic could have taken by permuting the random assignment of venues within venue pairs. Under the null hypothesis of no difference between intervention and comparison conditions, the statistical significance of the observed results was taken as the rank of the observed statistic among the possible permutations. P-values were 2-sided. (8, 6).
Step Eight
Steps Four to Seven were then repeated 1,000 times (e.g., 1,000 different samples were drawn randomly from the venues in the hypothetical populations). This yielded 1,000 p-values in each hypothetical population for each test statistic so that these statistics could be compared (Figure 1 describes the simulation process for Population A (i.e., could be different geographically or politically defined areas, such as country, city, school, etc.).
Step Nine
Steps Four to Eight were then repeated to observe how the test statistic performed (e.g., the number of times the test statistics detected a difference between intervention and comparison venues) under various scenarios (e.g., a decrease of 30% in incidence of all six STI’s in the Intervention venues compared to no change in incidence in the Comparison venues). The scenarios selected represented a wide range of possible intervention effects (e.g., STI incidence outcomes).
These steps permitted us to compare how each test statistic performed in detecting the difference between the intervention and comparison venues in each population assuming different effect sizes for the intervention. For example, for a particular intervention effect if the “any STI“ p-value was less than .05 (i.e., significant) in 900 cases out of 1,000 and an individual STI p-value was less than .05 in 700 cases out of a 1,000, then the “any STI“ statistic was significant 90% of the time compared to 70% of the time for the individual STI.
The scenarios selected to compare the test statistics were:
a decrease of 30% in the incidence of all six STIs in the intervention venues compared to the incidences in the comparison venues given in Table 2;
a decrease of 20% in the incidence of all six STIs in the intervention venues compared to the incidences in the comparison venues given in Table 2;
a decrease of 30% in the incidence of all six STIs in the intervention venues compared to a 10% decrease in incidence of all six STIs in the comparison venues;
a decrease of 30% in the incidence of HSV-2 only in the intervention venues compared to the incidences in the comparison venues given in Table 2;
a decrease of 30% in the incidence of “any viral STI“ only in the intervention venues compared to the incidences in the comparison venues given in Table 2.
Those five scenarios were selected to compare the different test statistics under different conditions which represent a wide range of possible STI incident outcomes. Scenario 3 was designed to examine the case in which there were decreases in incidence in both the intervention and comparison venues over the two-year intervention period, which is often the case in behavioral intervention trials.
RESULTS
Table 3 summarizes the results of the 1,000 samples drawn from each of the four hypothetical populations for various intervention effect scenarios. In the Table the test statistic “any STI“ is compared with the test statistics HSV-2, HIV and “any viral STI”. HSV-2 was selected as one of the six STIs to study since it generally had the highest prevalence of any of the six STIs at baseline in the hypothetical populations (i.e., HSV-2 prevalence varied from 30% in Population A to 10% in Population D, see Table 2).
Table 3.
Percentage of time a Test Statistic detected a Significant Difference between Intervention (I) and Comparison(C) Venues for Different Intervention Effect Scenarios and Test Statistics (Any STI versus Any Viral STI, HSV-2, and HIV) by Population
| Population | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | B | C | D | |||||||||||||
| Intervention Incidence Effect Scenarios Over 2 Years |
Any STI % |
Any Viral STI % |
HSV -2 % |
HIV % |
Any STI % |
Any Viral STI % |
HSV -2 % |
HIV % |
Any STI % |
Any Viral STI % |
HSV -2 % |
HIV % |
Any STI % |
Any Viral STI % |
HSV -2 % |
HIV % |
| 1. Incidence of all 6 STIs decrease by 30% in I Venues1 |
99.6 | 93.4 | 71.9 | 64.8 | 99.9 | 63.3 | 47.8 | 29.8 | 69.0 | 38.9 | 32.3 | 12.7 | 87.5 | 39.2 | 31.7 | 9.8 |
| 2. Incidence of all 6 STIs decrease by 20% in I Venues1 |
90.7 | 64.0 | 40.2 | 8.1 | 97.9 | 15.2 | 7.8 | 15.5 | 14.4 | 14.6 | 6.9 | 4.8 | 66.3 | 18.5 | 14.1 | 5.3 |
| 3. Incidence of all 6 STIs decrease by 30% in I Venues and 10% in C Venues |
84.1 | 56.6 | 17.1 | 31.0 | 48.0 | 17.6 | 8.4 | 11.7 | 32.8 | 31.0 | 26.6 | 9.1 | 71.3 | 15.4 | 10.2 | 5.6 |
| 4. Only incidence of HSV-2 decreases by 30% in I Venues1 |
61.4 | 36.3 | 70.3 | .1 | 65.5 | 23.8 | 42.8 | 3.6 | 5.4 | 25.2 | 33.4 | 2.9 | 8.7 | 30.1 | 34.1 | 3.0 |
| 5. Only incidence of: "Any viral STI" decreases by 30% in I Venues1 |
91.5 | 84.3 | 71.6 | 69.7 | 82.0 | 36.9 | 24.1 | 17.9 | 9.8 | 38.6 | 37.2 | 12.3 | 14.4 | 38.7 | 29.6 | 9.6 |
Assumed the incidence in comparison venues did not decrease from the values given in Table 2.
The percentage of time for detecting a difference between intervention and comparison venues given various intervention scenarios was almost always higher for the “any STI” statistic. For example, in Population A if the incidence of all six STIs were reduced by 30% over the 2-year follow-up period then the “any STI” statistic detected this difference (i.e., p-value less than .05) 99.6% of the time while the statistics HSV-2, HIV and “any viral STI” detected it only 71.9%, 64.8% and 93.4% of the time, respectively. Similarly, in Population B if the incidence of all STIs were reduced by 20% over the 2-year follow-up period then the “any STI” statistic detected the difference 97.9% of the time, while the statistics HSV-2, HIV and “any viral STI” detected it only 7.8%, 15.5% and 15.2% of the time, respectively. Even in Population C where the number of venues (i.e., 30) was not adequate to pick-up reductions in the intervention venues 80% of the time (recall that we purposely underpowered Population C), the “any STI” statistic detected the effects a higher percentage of time than the other statistics for scenarios 1, 2, and 3. In general, in populations where the STI prevalence and incidence rates were relatively high (i.e., Populations A and B), the “any STI” statistic was superior for all scenarios in Table 3. In countries where the STI rates were relatively low (i.e., Population C and Population D), the “any STI” statistic was superior for scenarios 1, 2, and 3, but not for scenarios 4 and 5 in which detection differences were less than 50% for all statistics.
DISCUSSION
Using observational data from a large multi-country behavioral intervention trial, we simulated the performance of a composite biological outcome versus single and dual biological outcomes. In almost all instances the composite biological outcome was more likely to detect an intervention effect when compared with a single or dual outcome. We have shown how a composite STI outcome can contribute evidence towards determining the efficacy of an HIV/STI prevention intervention.
In many settings, the advantages of using a composite STI outcome are substantial:
The composite STI endpoint has much greater power than individual STIs to detect intervention effects, thereby allowing for a reasonable sample size both within and across populations and permitting a shorter follow-up period, which increases study feasibility. Of course, intervention effects for individual STIs may also be examined as secondary endpoints but power is limited in these cases.
The composite STI endpoint allows all the STI data within a study population to be used. This is important because there may be varying degrees of missing data across STIs and the composite statistic permits researchers to measure an impact on the whole community (not only a subset as does a single STI). Also, the primary biological endpoint can be used across populations even though the populations might have different patterns of prevalence for the various STIs. Thus, there is one primary outcome rather than several as with individual STIs.
The composite STI endpoint also permits the same biological endpoint (a single composite measure) to be used for both men and women.
Results are more generalizable and might be more robust, if several STIs are included in a biological endpoint rather than a single STI.
Because the specific impact of a behavioral risk reduction intervention might not be known and the intervention may affect a variety of different sexual risk behaviors—type of sex, partner frequency, concurrent partnerships, condom use, etc.—a composite STD endpoint provides for a more inclusive measure to determine possible impact.
The strengths of a composite biological endpoint outweigh several possible limitations including:
Large-scale intervention trials focused on reducing HIV infection by treating STIs have delivered mixed results, suggesting that the biologic and epidemiologic relationships between STIs and HIV infection are extremely complex. The evidence provided by a composite STI endpoint depends on the pathogenesis, treatment, and transmission dynamics of specific STIs or HIV infection, the behavioral characteristics of the population, and the stage of the HIV epidemic. Therefore, there may be concerns regarding external validity when using a composite biological endpoint that incorporates different types of STIs (e.g., bacterial, protozoan, and viral).
A composite STI outcome may not adequately consider the role of condom use in preventing the transmission of some STIs and not others, as well as the sexual behavior of the individuals. When condoms are used correctly and consistently, they are highly effective in reducing the incidence of HIV and some STIs (e.g., gonorrhea and chlamydia), but they may not be as effective in decreasing HSV-2 transmission, if viral shedding occurs in areas not covered by the condom.
The simulations in this report were based on data from a sample of men and women recruited for a study in four countries using selection criteria based on high prevalence of HIV-related risk behaviors and STIs in the population. While those data are not as representative as census data, they are representative of the high -risk populations outside the United States that would be recruited for HIV/STI clinical trials.
The cost of a study and its outcome measure is always a critical factor in designing a study. However, because specimen collection for most STIs is similar (blood or genital secretions) and the cost of specimen testing is incremental (multiplex testing may be used to conduct multiple tests on the same clinical sample) including additional study outcomes does not substantially increase study cost. The treatment costs for most STIs like gonorrhea, chlamydia and trichomoniasis are single dose generic antibiotics on the range of $1 to $5 per treatment. The treatment of STIs is a high public health priority in many under resourced countries because they recognize their role in HIV transmission and complications in childbirth.
The results of this simulation demonstrate that it is feasible to use multiple biological outcomes in behavioral intervention trials to reduce HIV infection, thereby strengthening the quality and interpretability of HIV/STI prevention research and ensuring more robust outcomes. Even sites that have different patterns of STIs can be included in a multi-site study. Furthermore, the biological outcome data from men and women can be used together. Having both behavioral and biological outcomes provides complementary information on the efficacy of a program. Our simulation study showed how a composite biological outcome might vary in different epidemiologic settings and was generally superior to single or dual biological outcomes. These findings should permit more investigators to incorporate biological outcomes in behavioral intervention trials for HIV/STI prevention.
Acknowledgments
This study was funded by the National Institute of Mental Health
REFERENCES
- 1.Cordoba F, Schwartz L, Woloshin S, et al. Definition, reporting, and interpretation of composite outcomes in clinical trials: systematic review. BMJ. 2010 Aug 18;341:c3920. doi: 10.1136/bmj.c3920. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kamb ML, Fishbein M, Douglas JM, et al. Efficacy of risk-reduction counseling to prevent human immunodeficiency virus and sexually transmitted diseases: a randomized controlled trial. Project RESPECT Study Group. JAMA. 1998;280(13):1161–1167. doi: 10.1001/jama.280.13.1161. 7. [DOI] [PubMed] [Google Scholar]
- 3.Fishbein M, Pequegnat W. Evaluating AIDS prevention interventions using behavioral and biological outcome measures. Sex Transm Dis. 2010;17(3):101–110. doi: 10.1097/00007435-200002000-00008. [DOI] [PubMed] [Google Scholar]
- 4.Garnett G. The transmission dynamics of sexually transmitted infections. In: Holmes KK, Sparling PF, Stamm WE, et al., editors. STD. 4th edition. New York: McGraw Hill, Inc.; 2008. [Google Scholar]
- 5.NIMH Collaborative HIV/STD Prevention Trial Group. Methodological overview of a five-country community-level HIV/sexually transmitted disease prevention trial. AIDS. 2007;219(suppl 2):S3–S18. doi: 10.1097/01.aids.0000266453.18644.27. [DOI] [PubMed] [Google Scholar]
- 6.NIMH Collaborative HIV/STD Prevention Trial. Results of the NIMH Collaborative HIV/Sexually Transmitted Disease Prevention Trial of a Community Popular Opinion Leader Intervention. J Acquir Immune Def Syndr. 2010;24(2):204–214. doi: 10.1097/QAI.0b013e3181d61def. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Murray DM, Hannan PJ. Planning for the appropriate analysis in school-based drug-use prevention studies. J Consult Clin Psychol. 1990;58:458–468. doi: 10.1037//0022-006x.58.4.458. [DOI] [PubMed] [Google Scholar]
- 8.Gail MH, Carroll RJ, Green SB, et al. On design considerations and randomizations-based inference for community interventions trials. Stat Med. 1996;15:1096–1092. doi: 10.1002/(SICI)1097-0258(19960615)15:11<1069::AID-SIM220>3.0.CO;2-Q. [DOI] [PubMed] [Google Scholar]

