Abstract
Objective
To estimate how results would have varied if a substance abuse clinical trial had been conducted with nationally representative adults with substance use and with representative adults receiving substance use treatment.
Method
Results were analyzed from a multisite clinical trial comparing the effectiveness of the Therapeutic Education System (TES) to treatment-as-usual for outpatient addiction treatment (n=507). Abstinence in the last four weeks of treatment was the primary outcome. The general population sample and general population treated samples were derived from Wave 1 of the National Epidemiological Survey on Alcohol and Related Conditions (NESARC) (n=43,093). Propensity scores provided a standardized measure of the difference between clinical trial participants and the two NESARC samples. The clinical trial was reanalyzed by reweighting the sample with propensity scores derived for the two samples to obtain generalizable estimates of treatment effects.
Results
Prior to reweighting, the Odds Ratio (OR) of response to TES versus treatment-as-usual in the trial was 1.62 (95% CI=1.12–2.35). After reweighting the sample to be representative of the two groups, ORs were 1.33 (95% CI=0.34–5.26) for the representative sample with any substance use and 1.64 (95% CI=0.82–3.27) for the representative treated sample.
Conclusion
Applying propensity score weighting to clinical trial results provides a method for estimating the population generalizability of clinical trial findings that relies on effect moderators observed in the study sample and population. Broader confidence intervals in the reweighted samples do not necessarily indicate lack of efficacy of TES, but rather greater uncertainty concerning effectiveness in general population samples.
The separation of clinical research from practice raises concerns over whether research results can truly inform practice.1 Because most efficacy studies use stringent selection criteria, study participants are relatively homogenous.2 When study participants substantially differ from target populations with the disorder, however, trial-based estimates of treatment effectiveness may not directly translate into clinical practice.3 Beyond documenting the extent to which trial participants represent target populations, clinical policy makers and health planners want to know the likely effectiveness of experimental treatments in target populations. We illustrate a new method to estimate from clinical trial results the effectiveness of interventions in target populations.4–6
Several studies suggest that clinical trials in psychiatry, which by design exclude 60% to 85% of individuals with the target disorder, have limited generalizability.4,7,8 Given concerns over the representativeness of trial participants in clinical psychiatric research, we selected a recent randomized controlled trial of a behavioral intervention for substance use disorders9,10 to illustrate the estimation of treatment effectiveness in target populations. This ten site clinical trial evaluated the effectiveness of the Therapeutic Education System (TES), a 12 week web-based behavioral intervention that includes motivational incentives, for adults with substance use disorders. The study compared the outcomes of subjects assigned to either TES and treatment-as-usual (n=255) or treatment-as-usual alone (n=252).
To assess the generalizability of the results, we drew on a nationally representative epidemiological study of psychiatric and substance use disorders11 and applied propensity score methods to participants from the TES trial. We first evaluated differences between the clinical trial participants and the nationally representative target population with substance users as well as a subgroup who sought treatment. We then applied propensity scores to estimate how the clinical trial results would have varied had the study been conducted in this nationally representative sample. The effect size estimates using the nationally representative sample were then compared with the effect size estimate in the clinical trial sample. The generalizability of this approach relies on the extent to which it is possible to measure and adjust for the intervention effect moderators in the study sample and population sample.
Method
Clinical Trial Sample
Treatment-as-usual included a minimum of two hours of face-to-face therapeutic group or individual sessions per week. TES consists of 62 computer-interactive, multimedia modules delivered at the clinic sites, covering skills for achieving and maintaining abstinence, and prize-based motivational incentives contingent on abstinence and treatment adherence. Patients, which were recruited between June 2010 and August 2011, were eligible if they were: (1) 18 or older; (2) using illicit substances in the 30 days prior to baseline (or 60 days if the patient was exiting a controlled environment) to exclude participants with alcohol use disorders only; (3) within 30 days of entering the treatment episode; (4) and planning to remain in the area and treatment program for ≥ 3 months. Patients were excluded if they were: (1) prescribed opioid replacement therapy or (2) unable to provide informed consent. The study was approved by the Institutional Review Boards of all the participating sites. The primary outcome was abstinence from drugs and drinking in the last 4 weeks of treatment as measured by twice weekly urine drug screens and self-reports. Generalized estimating equations were utilized to adjust for the correlation of half-weeks within patients.
General Population Sample
We used as the general population sample Wave 1 of the National Epidemiological Survey on Alcohol and Related Conditions (NESARC). The NESARC is a nationally representative sample of the adult population of the United States conducted by the National Institute on Alcoholism and Alcohol Abuse (NIAAA) that has been described in detail elsewhere.12–15 The target population was the civilian noninstitutionalized population, 18 years and older, residing in households and group quarters in the United States. Face-to-face interviews were conducted with 43,093 respondents. The survey response rate was 81%. Blacks, Hispanics, and young adults (ages 18–24 years) were oversampled with data adjusted for oversampling and nonresponse. The weighted data were then adjusted to represent the US civilian population based on the 2000 census. DSM-IV diagnoses were assessed with the Alcohol Use Disorder and Associated Disabilities Interview Schedule-DSM-IV Version (AUDADIS-IV),16 a fully structured diagnostic interview for non-clinician interviewers. The high reliability and validity of the AUDADIS substance use disorder diagnoses (κ = 0.70–0.94) have been demonstrated in numerous clinical and general population studies in the U.S. and abroad.17–22 The NESARC has been previously used by our group and others to estimate the a priori generalizability of clinical trials of several psychiatric disorders.4–6,23–29 The NESARC research protocol received approval from the Institutional Review Boards of the US Office of Management and Budget and the US Census Bureau.
Statistical Analyses
In the analysis of clinical trials, each participant is generally considered equally important in estimating the efficacy of the intervention and given a weight of one. However, because participants in clinical trials may not represent the target population with the disorder, it is informative to reweight the clinical trial sample to better approximate the distribution of demographic and clinical characteristics of the target population.30–32 Thus, we proceed in two steps. First, we reweighted the sample and then we repeated the original analyses using this reweighted sample.
One way to achieve this first step, which involves reweighting the clinical sample, is through the use of propensity scores.33 Specifically, the propensity score is the probability of membership in a particular target population for each individual in the clinical sample as a function of their baseline demographic and clinical characteristics. In this case, the propensity score provides an estimate of probability that participants in the clinical trial would have been randomly selected from a representative sample of the target population. We focused our first analyses on individuals with substance use and our second analysis on those who had sought treatment in the previous year.34,35
Separate logistic regressions (one for each of these two target populations) were used to obtain the propensity scores that combined all individuals from each of the NESARC target populations with all individuals from the clinical trial. The outcome was set to 0 if the individual was from the clinical trial and 1 if the individual was from the NESARC target population. Predictors included all demographic and clinical characteristics available in the NESARC and clinical trial data sets (Table 1). All statistically significant two-way interactions were also included. Sampling weights from the NESARC were used and weights for the individuals from the clinical trial members were fixed at one in the logistic regression. The analyses were conducted with SUDAAN 11 (RTI International) to take into account the complex design of the NESARC.
Table 1.
Original CTN Sample | NESARC Treatment Seeking Substance Users | Original CTN sample versus NESARC Treatment Seeking Substance Users (p-value)a | CTN Reweighted to approximate NESARC Treatment Seeking Substance Usersb | Original CTN sample versus CTN sample reweighted to approximate NESARC Treatment Seeking Substance Users | ||||
---|---|---|---|---|---|---|---|---|
| ||||||||
N=507 | N=183 | N= 507 | (p-value)a | |||||
| ||||||||
Mean | SD | Mean | SD | Weighted Mean | Weighted SD | |||
| ||||||||
Age (years) | 34.90 | 10.90 | 35.52 | 11.75 | 0.52 | 36.25 | 10.47 | 0.46 |
| ||||||||
N | % | N | % | Weighted N | Weighted % | |||
| ||||||||
Female (%) | 192 | 37.90 | 75 | 39.29 | 0.75 | 172.00 | 34.14 | 0.21 |
Race (%) | <0.0001 | 0.80 | ||||||
White | 284 | 56.00 | 133 | 73.65 | 372.59 | 74.07 | ||
Black/African American | 116 | 22.90 | 33 | 14.52 | 79.61 | 15.83 | ||
American Indian/Alaska Native | 3 | 0.60 | 5 | 4.54 | 11.49 | 2.28 | ||
Asian | 13 | 2.60 | 1 | 0.76 | 3.19 | 0.63 | ||
Native Hawaiian/Pacific Islander | 12 | 2.40 | 2 | 1.64 | 12.92 | 2.57 | ||
Multi-racial | 54 | 10.70 | 9 | 4.89 | 23.20 | 4.61 | ||
Other | 23 | 4.50 | ||||||
Hispanic/Latino (%) | 55 | 10.80 | 34 | 14.10 | 0.24 | 43.08 | 8.56 | 0.03 |
Education (%) | <0.0001 | 0.77 | ||||||
< High School Degree | 118 | 23.30 | 42 | 24.17 | 109.48 | 21.76 | ||
High School Degree/GED | 310 | 61.10 | 59 | 30.44 | 152.72 | 30.36 | ||
> High School Degree | 79 | 15.60 | 82 | 45.39 | 240.81 | 47.87 | ||
Marital Status (%) | <0.0001 | 0.77 | ||||||
Single/Never Married | 308 | 60.70 | 75 | 37.79 | 176.21 | 35.03 | ||
Married/Remarried | 72 | 14.20 | 53 | 36.12 | 184.49 | 36.68 | ||
Separated/Divorced/Widowed | 127 | 25.00 | 55 | 26.09 | 142.30 | 28.29 | ||
Underemployed (%) | 190 | 37.50 | 80 | 43.22 | 0.17 | 226.82 | 45.09 | 0.66 |
Substance Dependence (%) | ||||||||
Alcohol | 224 | 44.20 | 85 | 46.10 | 0.66 | 274.29 | 54.53 | 0.05 |
Cocaine | 177 | 34.90 | 19 | 10.10 | <0.0001 | 56.49 | 11.23 | 0.67 |
Stimulants | 100 | 19.70 | 12 | 7.04 | <0.0001 | 39.82 | 7.92 | 0.70 |
Cannabis | 146 | 28.80 | 23 | 12.50 | <0.0001 | 60.67 | 12.06 | 0.88 |
Opiates | 158 | 31.20 | 19 | 10.46 | <0.0001 | 54.81 | 10.90 | 0.87 |
Other | 41 | 8.10 | 23 | 12.28 | 0.09 | 57.90 | 11.51 | 0.78 |
Based on chi-square and t-tests as appropriate.
Weighting was done using propensity score weighting (see text for details).
Once the logistic regression models were fit for each target population, a propensity score was calculated for each individual in the clinical trial corresponding to the predicted probability of being in each target population. The inverse of the propensity score was used as a weight to rescale the clinical sample. This resulted in multivariate distributions of demographic and clinical characteristics that were similar to each of the target populations. We normalized the propensity score weights so that the sum of the weights would be identical to the sample size of the clinical trial.
In the second step of our approach, we replicated the original analyses of the clinical trial reweighting the clinical sample. In this analysis, the propensity score weights applied to the clinical trial data and the analytic model together provide an estimate of the effectiveness of the treatment in the target population. Specifically, a longitudinal logistic regression model was used to obtain the odds ratio and 95% confidence interval of abstinence during the first 8 weeks of the treatment phase in the treatment versus control group. We obtained two summary measures of the difference for the clinical trial sample and the target NESARC sample: 1) the standardized difference (i.e., the mean difference divided by the overall standard deviation) between the propensity scores between the clinical trial sample and the target populations;36 and, 2) the overlap of the distributions of propensity scores of the clinical trial sample and the target populations.37 The first measure provides an estimate of the mean difference in the values of the propensity scores in the clinical sample versus the general population sample, whereas the second measure is more focused on the overall distributions and maybe less sensitive to the effect of extreme values.
Results
Prior to applying the propensity score weights, individuals in the clinical trial tended to be older, less likely to be white and to have lower educational attainment than the NESARC sample of all individuals with substance use regardless of treatment-seeking behavior. They were more likely to be single, underemployed, and dependent on all substances. When the sample was narrowed to those seeking treatment in the past year, individuals in the clinical trial were less likely than treatment-seeking individuals in the NESARC to be white, to have achieved greater than a high school educational and to be married. They were more likely to be single and to be dependent on cocaine, stimulants, cannabis and opiates (Table 1).
The standardized difference in propensity scores between the clinical trial sample and the nationally representative treatment-seeking sample was 1.4, whereas the difference with nationally representative sample of substance users that included respondents without respect to past year treatment was 2.1. The overlap in the propensity score distributions between the clinical trial sample and the nationally representative sample of treatment-seekers and substance users were 0.86 and 0.73, respectively (see Supplement). After applying the propensity score weights, each reweighted sample had a distribution that more closely resembled the target NESARC subsample (Table 2).
Table 2.
Original CTN Sample | NESARC Substance Users | Original CTN sample versus NESARC Substance Users (p-value)a | CTN reweighted to approximate NESARC Substance Usersb | Original CTN sample versus CTN sample reweighted to approximate NESARC Substance Users | ||||
---|---|---|---|---|---|---|---|---|
|
||||||||
N=507 | N=2461 | N=507 | (p-value)a | |||||
|
||||||||
Mean | SD | Mean | SD | Weighted Mean | Weighted SD | |||
|
||||||||
Age (years) | 34.90 | 10.90 | 33.23 | 13.39 | 0.0027 | 33.38 | 10.79 | 0.78 |
| ||||||||
N | % | N | % | Weighted N | Weighted % | |||
| ||||||||
Female (%) | 192 | 37.90 | 1095 | 39.91 | 0.41 | 265.00 | 52.59 | <0.0001 |
Race (%) | <0.0001 | <0.0001 | ||||||
White | 284 | 56.00 | 1845 | 80.27 | 366.92 | 72.95 | ||
Black/African American | 116 | 22.90 | 409 | 10.89 | 115.06 | 22.87 | ||
American Indian/Alaska Native | 3 | 0.60 | 34 | 1.37 | 5.54 | 1.10 | ||
Asian | 13 | 2.60 | 45 | 2.36 | 2.41 | 0.48 | ||
Native Hawaiian/Pacific Islander | 12 | 2.40 | 18 | 0.71 | 1.35 | 0.27 | ||
Multi-racial | 54 | 10.70 | 110 | 4.40 | 11.73 | 2.33 | ||
Other | 23 | 4.50 | ||||||
Hispanic/Latino (%) | 55 | 10.80 | 420 | 9.71 | 0.43 | 13.55 | 2.69 | <0.0001 |
Education (%) | <0.0001 | <0.0001 | ||||||
< High School Degree | 118 | 23.30 | 413 | 15.58 | 162.35 | 32.28 | ||
High School Degree | 310 | 61.10 | 701 | 29.34 | 151.30 | 30.08 | ||
> High School Degree | 79 | 15.60 | 1347 | 55.08 | 189.35 | 37.64 | ||
Marital Status (%) | <0.0001 | 0.0013 | ||||||
Single/Never Married | 308 | 60.70 | 1158 | 45.77 | 194.84 | 38.74 | ||
Married/Remarried | 72 | 14.20 | 783 | 38.36 | 236.59 | 47.04 | ||
Separated/Divorced/Widowed | 127 | 25.00 | 520 | 15.88 | 71.57 | 14.23 | ||
Underemployed (%) | 190 | 37.50 | 793 | 31.36 | 0.01 | 129.66 | 25.78 | 0.01 |
Substance Dependence (%) | ||||||||
Alcohol | 224 | 44.20 | 538 | 24.46 | <0.0001 | 101.81 | 20.24 | 0.04 |
Cocaine | 177 | 34.90 | 49 | 2.17 | <0.0001 | 13.80 | 2.74 | 0.43 |
Stimulants | 100 | 19.70 | 27 | 1.13 | <0.0001 | 5.10 | 1.01 | 0.82 |
Cannabis | 146 | 28.80 | 133 | 5.23 | <0.0001 | 19.13 | 3.80 | 0.18 |
Opiates | 158 | 31.20 | 41 | 1.83 | <0.0001 | 7.42 | 1.48 | 0.58 |
Other | 41 | 8.10 | 49 | 1.76 | <0.0001 | 9.06 | 1.80 | 0.95 |
Based on chi-square and t-tests as appropriate.
Weighting was done using propensity score weighting (see text for details).
Prior to reweighting the sample, the OR of response to TES versus treatment-as-usual was 1.62 (95% CI=1.12–2.35), as previously reported.9 Interactions between background characteristics and study group assignment on the 8 week abstinence outcome are presented in Supplemental Table 1. After reweighting the sample to be representative of those who had sought treatment in the previous year the OR was 1.64 (95% CI=0.82–3.27). The corresponding ORs obtained after reweighting the sample to be representative of individuals with substance use without respect to past year treatment was 1.33 (95% CI=0.34–5.26) (Table 3).
Table 3.
Sample | Treatment Effect (TES vs TAU) | ||
---|---|---|---|
OR | 95% C.I. | p | |
Original CTN sample | 1.62 | 1.12–2.35 | 0.01 |
CTN reweighted to approximate NESARC treatment-seeking substance users | 1.64 | 0.82–3.27 | 0.16 |
CTN reweighted to approximate NESARC substance users | 1.33 | 0.34–5.26 | 0.68 |
Discussion
Concerns over the representativeness of clinical trials raise questions about the generalizability of their results to populations of clinical and policy interest. In this study, we used propensity scores to obtain a standardized measure of the difference between participants in a recent clinical trial and a nationally representative sample of individuals who had used illicit substances and sought treatment in the prior year. The clinical trial sample was reweighted with propensity score weights to make the distribution of baseline characteristics resemble the nationally representative sample and then reanalyzed using those weights to obtain generalizable estimates of the treatment effects. The point estimate of this reweighted treatment-seeking sample was very similar to the estimate of effect derived from the original (i.e., unweighted) clinical trial data, although its confidence interval was broader. The estimates of efficacy for the intervention were substantially lower when reweighting the clinical trial sample by a national representative sample of individuals with substance use regardless of treatment-seeking behavior.
To our knowledge, this is the first study to use propensity scores to reweigh the results of a clinical trial for the treatment of a mental disorder. In accord with prior work, prior to reweighting, the composition of the clinical trial participants differed from the community target populations.4,7,8 As a result, the standardized differences between the clinical trial sample and both nationally representative samples were greater than the recommended upper limit (0.25–0.50 standard deviations) for observational studies,38 although closer to the 0.73 standard deviations found in the only other study36 that, to our knowledge, has used similar methods. Furthermore, the overlap between the propensity score distribution was 0.73 for the substance use sample, i.e., close the recommended 0.80, and 0.86 for the treatment-seeking sample. Reflecting their larger differences, it was harder to balance covariates between the clinical trial and community substance use samples than between the clinical trial and community treatment-seeking samples, even after reweighting. Our findings suggest that randomized trials, which have relatively small sample sizes and rely on the participation of volunteers, face challenges recruiting representative samples. Reweighting may partially, but not fully compensate for incomplete representativeness of clinical trials samples. As more investigators apply propensity scores to examine the generalizability of clinical trials, it may be possible to calibrate the range of standardized differences between clinical samples and target populations and to examine the impact of these differences on the generalizability of study results.
Differences in the composition of the clinical trial sample and the target populations support the need to estimate the effectiveness of interventions in various target populations. The use of propensity scores to reweight clinical trials samples offers a new approach to obtain these estimates. In our study, the effectiveness of the intervention varied with the target population. The point estimate of the nationally representative treatment-seeking sample was 1.64, very close to the estimate in the unweighted clinical trial sample, suggesting that in this case the clinical trial sample provided a reasonable estimate of effectiveness of the intervention in the target populations of interest at the national level (i.e., treatment-seeking individuals who used illicit drugs). By contrast, the point estimate in the nationally representative sample that included individuals without respect to recent treatment-seeking behavior was 1.33, an almost 20% difference compared to the treatment-seeking sample. The variation in estimates is in accord with prior work demonstrating that results of clinical trials are highly sensitive to their inclusion and exclusion criteria and that the effectiveness of an intervention depends on the target population.39–41 Our findings illustrate the importance of carefully selecting the eligibility criteria when planning clinical trials and of defining the target population of interest.
The effect size of the original TES study, 1.62 (95% CI=1.12–2.35), was statistically significant, the estimated effect size for the nationally representative treatment-seeking sample, 1.64 (95% CI=0.82–3.27), was not. The wider confidence interval of the nationally representative treatment-seeking sample is the result of the variance inflation generated by the differences between the clinical trial sample and the national representative sample and the need to apply propensity score weights, particularly the larger weights, to recalibrate the clinical trial sample. The wider confidence intervals in the nationally representative sample do not necessarily mean that TES is not efficacious for the treatment of substance use disorders. Instead, the width of the confidence intervals reflects increased uncertainty associated with extrapolating the results of the clinical trial sample to the broader populations. This increase in uncertainty may help to explain variations in effectiveness experienced by clinicians who apply clinical interventions to patients from populations that are more heterogeneous (i.e., have greater variability in their treatment response) than those in which the intervention was originally tested. An increase in this variability and uncertainty associated with differences in the composition of the study populations may contribute to challenges in reproducing clinical and basic research findings.42 By minimizing the variance inflation associated with reweighting the study sample, recruitment of more representative samples may help narrow confidence intervals of the estimates of effectiveness of the intervention in the target population. More detailed descriptions of clinical trial participants might also help facilitate the reweighting procedure and narrow confidence intervals of reweighted samples.43 A complementary approach to narrowing confidence intervals would involve combining and reweighting to the same nationally representative sample several studies that test the same intervention. The resulting estimates could then be jointly examined using meta-analytic techniques to assess more precisely the effectiveness of the intervention. It would even be possible to adapt approaches to interim analysis of clinical trials to determine when interventions have accumulated sufficient evidence to be considered effective at the population level.44–46
Our study should be understood in the context of several limitations. First, we recalibrated the clinical trial sample using propensity score weights. However, reweighting will only yield unbiased estimates of treatment effects if all of the treatment effect modifiers are adjusted for in the analysis. If unmeasured variables, such as motivation to participate in treatment, moderate treatment effects and are related to selection into the trial, the reweighted estimates may still be biased. Second, the assessments of substance use in the NESARC are based on self-report and were not confirmed with biological testing or collateral information. Third, our estimates assume that the intervention would be conducted under identical conditions as the clinical trial. Variation in clinical settings, treatment intensity or other system-level variation could influence applied treatment effectiveness. Fourth, some of the study eligibility criteria were not available in the NESARC (i.e., being on opioid replacement therapy and a willingness to provide consent to participate in the clinical trial) or had to be estimated using a different timeframe (use of illicit substances and treatment-seeking was assessed in the NESARC in the last 12 months, rather than in the last 30 days as in the clinical trial).
Despite these limitations, our study exemplifies a novel approach for estimating the population generalizability of clinical trial results. This method is flexible and may find applications in a range of disorders and target populations. We hope that it helps to refine clinical trial methods, improve estimation of population-level effect of interventions, and advance personalized and precision medicine.
Supplementary Material
Clinical Points.
Clinical trials may not be representative of individuals with the target disorder.
Reweighting clinical trials to make them more representative provides a better estimate of treatment effects that are expected in clinical practice
Acknowledgments
Protocol CTN-044 was supported by grants from the National Drug Abuse Treatment Clinical Trials Network, National Institute on Drug Abuse (NIDA): U10 DA013035 (to Dr. Nunes and John Rotrosen), U10 DA015831 (to Kathleen M. Carroll and Roger D. Weiss), U10 DA013034 (Dr. Stitzer and Robert P. Schwartz), U10 DA013720 (to José Szapocznik and Lisa R. Metsch), U10 DA013732 (to Theresa Winhusen), U10 DA020024 (to Madhukar H. Trivedi), U10 DA013714 (to Dennis M. Donovan and John Roll), U10 DA015815 (to James L. Sorensen and Dennis McCarty), and K24 DA022412 (to Dr. Edward V. Nunes). The National Epidemiologic Survey on Alcohol and Related Conditions was sponsored by the National Institute on Alcohol Abuse and Alcoholism and funded, in part, by the Intramural Program, NIAAA, National Institutes of Health. The views and opinions expressed in this report are those of the authors and should not be construed to represent the views of any of the sponsoring organizations, agencies or the US government. Work on this manuscript was supported by NIH grants DA023200, MH0760551 and MH082773 (Dr. Blanco), and the New York State Psychiatric Institute (Drs. Blanco, Nunes, Olfson and Wall). The sponsors had no additional role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript. Dr. Wang had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Footnotes
Authors Contributions: Drs. Blanco, Campbell, Olfson and Wall designed the study. Drs. Wall and Wang and conducted the data analyses. Dr. Blanco wrote the initial draft of the manuscript. All authors contributed to manuscript revision and approved the final version.
Conflicts of interested: Dr. Nunes has received medication for research studies from Alkermes/Cephalon, Duramed Pharmaceuticals, and Reckitt-Benckiser. All other authors declare no conflicts of interest.
References
- 1.Califf RM, Platt R. Embedding Cardiovascular Research Into Practice. JAMA. 2013;310(19):2037–2038. doi: 10.1001/jama.2013.282771. [DOI] [PubMed] [Google Scholar]
- 2.Tunis SR, Stryer DB, Clancy CM. Practical clinical trials: increasing the value of clinical research for decision making in clinical and health policy. JAMA. 2003;290(12):1624–1632. doi: 10.1001/jama.290.12.1624. [DOI] [PubMed] [Google Scholar]
- 3.Glasgow RE, Magid DJ, Beck A, Ritzwoller D, Estabrooks PA. Practical clinical trials for translating research to practice: design and measurement recommendations. Med Care. 2005;43(6):551–557. doi: 10.1097/01.mlr.0000163645.41407.09. [DOI] [PubMed] [Google Scholar]
- 4.Blanco C, Olfson M, Goodwin RD, et al. Generalizability of clinical trial results for major depression to community samples: results from the National Epidemiologic Survey on Alcohol and Related Conditions. J Clin Psychiat. 2008;69(8):1276–1280. doi: 10.4088/jcp.v69n0810. [DOI] [PubMed] [Google Scholar]
- 5.Blanco C, Olfson M, Okuda M, Nunes EV, Liu SM, Hasin DS. Generalizability of clinical trials for alcohol dependence to community samples. Drug Alcohol Depend. 2008;98(1):123–128. doi: 10.1016/j.drugalcdep.2008.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Okuda M, Hasin DS, Olfson M, et al. Generalizability of clinical trials for cannabis dependence to community samples. Drug Alcohol Depend. 2011;111(1–2):177–181. doi: 10.1016/j.drugalcdep.2010.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zimmerman M, Chelminski I, Posternak MA. Generalizability of antidepressant efficacy trials: differences between depressed psychiatric outpatients who would or would not qualify for an efficacy trial. Am J Psychiatry. 2005;162(7):1370–1372. doi: 10.1176/appi.ajp.162.7.1370. [DOI] [PubMed] [Google Scholar]
- 8.Westen D, Morrison K. A multidimensional meta-analysis of treatments for depression, panic, and generalized anxiety disorder: an empirical examination of the status of empirically supported therapies. J Consult Clin Psychol. 2001;69(6):875–899. [PubMed] [Google Scholar]
- 9.Campbell AN, Nunes EV, Matthews AG, et al. Internet-delivered treatment for substance abuse: a multisite randomized controlled trial. Am J Psychiatry. 2014;171(6):683–690. doi: 10.1176/appi.ajp.2014.13081055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Campbell AN, Nunes EV, Miele GM, et al. Design and methodological considerations of an effectiveness trial of a computer-assisted intervention: an example from the NIDA Clinical Trials Network. Contemp Clin Trials. 2012;33(2):386–395. doi: 10.1016/j.cct.2011.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Grant BF, Stinson FS, Dawson DA, et al. Prevalence and co-occurrence of substance use disorders and independent mood and anxiety disorders: results from the National Epidemiologic Survey on Alcohol and Related Conditions. Arch Gen Psychiatry. 2004;61(8):807–816. doi: 10.1001/archpsyc.61.8.807. [DOI] [PubMed] [Google Scholar]
- 12.Grant BF, Dawson DA, Stinson FS, Chou PS, Kay W, Pickering RP. The Alcohol Use Disorder and Associated Disabilities Interview Schedule-IV (AUDADIS-IV): reliability of alcohol consumption, tobacco use, family history of depression and psychiatric diagnostic modules in a general population sample. Drug Alcohol Depend. 2003;71(1):7–16. doi: 10.1016/s0376-8716(03)00070-x. [DOI] [PubMed] [Google Scholar]
- 13.Grant BF, Moore T, Shepard J, Kaplan K. Source and Accuracy Statement: Wave 1 National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) Bethesda, MD: National Institute on Alcohol Abuse and Alcoholism; 2003. [Google Scholar]
- 14.Grant BF, Dawson DA, Hasin DS. The wave 2 National Epidemiologic Survey on Alcohol and Related Conditions Alcohol Use Disorder and Associated Disabilities Interview Schedule-DMS-IV. Bethesda, MD: National institute on Alcohol Abuse and Alcoholism; 2004. [Google Scholar]
- 15.Grant BF, Stinson FS, Dawson DA, Chou SP, Ruan W, Pickering RP. Co-occurrence of 12-month alcohol and drug use disorders and personality disorders in the United States: results from the National Epidemiologic Survey on Alcohol and Related Conditions. Arch Gen Psychiatry. 2004;61(4):361–368. doi: 10.1001/archpsyc.61.4.361. [DOI] [PubMed] [Google Scholar]
- 16.Grant BF, Dawson DA, Hasin DS. The Alcohol Use Disorder and Associated Disabilities Interview Schedule–DSM–IV Version, National Institute on Alcohol Abuse and Alcoholism. Bethesda, MD: National Institute on Alcohol Abuse and Alcoholism; 2001. [Google Scholar]
- 17.Canino G, Bravo M, Ramírez R, et al. The Spanish Alcohol Use Disorder and Associated Disabilities Interview Schedule (AUDADIS): reliability and concordance with clinical diagnoses in a Hispanic population. J Stud Alcohol Drugs. 1999;60(6):790–799. doi: 10.15288/jsa.1999.60.790. [DOI] [PubMed] [Google Scholar]
- 18.Chatterji S, Saunders JB, Vrasti R, Grant BF, Hasin D, Mager D. Reliability of the alcohol and drug modules of the Alcohol Use Disorder and Associated Disabilities Interview Schedule--Alcohol/Drug-Revised (AUDADIS-ADR): an international comparison. Drug Alcohol Depend. 1997;47(3):171–185. doi: 10.1016/s0376-8716(97)00088-4. [DOI] [PubMed] [Google Scholar]
- 19.Cottler LB, Grant BF, Blaine J, et al. Concordance of DSM-IV alcohol and drug use disorder criteria and diagnoses as measured by AUDADIS-ADR, CIDI and SCAN. Drug Alcohol Depend. 1997;47(3):195–205. doi: 10.1016/s0376-8716(97)00090-2. [DOI] [PubMed] [Google Scholar]
- 20.Grant BF, Harford TC, Dawson DA, Chou PS, Pickering RP. The Alcohol Use Disorder and Associated Disabilities Interview Schedule (AUDADIS): reliability of alcohol and drug modules in a general population sample. Drug Alcohol Depend. 1995;39(1):37–44. doi: 10.1016/0376-8716(95)01134-k. [DOI] [PubMed] [Google Scholar]
- 21.Hasin D, Carpenter KM, McCloud S, Smith M, Grant BF. The Alcohol Use Disorder and Associated Disabilities Interview Schedule (AUDADIS): reliability of alcohol and drug modules in a clinical sample. Drug Alcohol Depend. 1997;44(2):133–141. doi: 10.1016/s0376-8716(97)01332-x. [DOI] [PubMed] [Google Scholar]
- 22.Ustun B, Compton W, Mager D, et al. WHO study on the reliability and validity of the alcohol and drug use disorder instruments: overview of methods and results. Drug Alcohol Depend. 1997;47(3):161–169. doi: 10.1016/s0376-8716(97)00087-2. [DOI] [PubMed] [Google Scholar]
- 23.Hoertel N, Le Strat Y, Blanco C, Lavarde P, Dubertret C. Generalizability of clinical trial results for generalized anxiety disorder to community samples. Depress Anxiety. 2012;29(7):614–620. doi: 10.1002/da.21937. [DOI] [PubMed] [Google Scholar]
- 24.Hoertel N, Le Strat Y, Lavaud P, Dubertret C, Limosin F. Generalizability of clinical trial results for bipolar disorder to community samples: findings from the national epidemiologic survey on alcohol and related conditions. J Clin Psychiat. 2013;74(3):265–270. doi: 10.4088/JCP.12m07935. [DOI] [PubMed] [Google Scholar]
- 25.Hoertel N, Falissard B, Humphreys K, Gorwood P, Seigneurie AS, Limosin F. Do clinical trials of treatment of alcohol dependence adequately enroll participants with co-occurring independent mood and anxiety disorders? An analysis of data from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) J Clin Psychiatry. 2014;75(3):231–237. doi: 10.4088/JCP.13m08424. [DOI] [PubMed] [Google Scholar]
- 26.Hoertel N, Le Strat Y, De Maricourt P, Limosin F, Dubertret C. Are subjects in treatment trials of panic disorder representative of patients in routine clinical practice? Results from a national sample. J Affect Disord. 2013;146(3):383–389. doi: 10.1016/j.jad.2012.09.023. [DOI] [PubMed] [Google Scholar]
- 27.Hoertel N, López S, Wang S, González-Pinto A, Limosin F, Blanco C. Generalizability of Pharmacological and Psychotherapy Clinical Trial Results for Borderline Personality Disorder to Community Samples. Personal Disord. 2014;6(1):81–87. doi: 10.1037/per0000091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hoertel N, Le Strat Y, Limosin F, Dubertret C, Gorwood P. Prevalence of subthreshold hypomania and impact on internal validity of RCTs for major depressive disorder: results from a national epidemiological sample. PLoS One. 2013;8(2):e55448. doi: 10.1371/journal.pone.0055448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hoertel N, de Maricourt P, Katz J, et al. Are Participants in Pharmacological and Psychotherapy Treatment Trials for Social Anxiety Disorder Representative of Patients in Real-Life Settings? J Clin Psychopharmacol. 2014 doi: 10.1097/JCP.0000000000000204. [DOI] [PubMed] [Google Scholar]
- 30.Cole SR, Stuart EA. Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial. Am J Epidemiol. 2010;172(1):107–115. doi: 10.1093/aje/kwq084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Stuart EA, Bradshaw CP, Leaf PJ. Assessing the generalizability of randomized trial results to target populations. Prev Sci. 2015;16(3):475–485. doi: 10.1007/s11121-014-0513-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hartman E, Grieve R, Ramsahai R, Sekhon JS. From SATE to PATT: combining experimental with observational studies to estimate population treatment effects. J R Stat Soc Ser A Stat Soc. 2013;10:1111. [Google Scholar]
- 33.D’Agostino RB., Jr Propensity scores in cardiovascular research. Circulation. 2007;115(17):2340–2343. doi: 10.1161/CIRCULATIONAHA.105.594952. [DOI] [PubMed] [Google Scholar]
- 34.Blanco C, Iza M, Rodríguez-Fernández J, Baca-García E, Wang S, Olfson M. Probability and predictors of treatment-seeking for substance use disorders in the U.S. Drug Alcohol Dep. 2015;149(1):136–144. doi: 10.1016/j.drugalcdep.2015.01.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Lopez-Quintero C, Cobos JP, Hasin DS, et al. Probability and predictors of transition from first use to dependence on nicotine, alcohol, cannabis, and cocaine: Results of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) Drug Alcohol Depend. 2011;115(1–2):120–130. doi: 10.1016/j.drugalcdep.2010.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Stuart EA, Cole SR, Bradshaw CP, Leaf PJ. The use of propensity scores to assess the generalizability of results from randomized trials. J R Stat Soc Ser A Stat Soc. 2001;174(2):369–386. doi: 10.1111/j.1467-985X.2010.00673.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Tipton E. How Generalizable Is Your Experiment? An Index for Comparing Experimental Samples and Populations. J Educ Behav Stat. 2014;39(6):478–501. [Google Scholar]
- 38.Rubin DB. Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation. Health Serv Outcomes Res Methodol. 2001;2:169–188. [Google Scholar]
- 39.Khan A, Kolts R, Thase M, Krishnan K, Brown W. Research design features and patient characteristics associated with the outcome of antidepressant clinical trials. Am J Psychiatry. 2004;161(11):2045–2049. doi: 10.1176/appi.ajp.161.11.2045. [DOI] [PubMed] [Google Scholar]
- 40.Wisniewski SR, Rush AJ, Nierenberg AA, et al. Can phase III trial results of antidepressant medications be generalized to clinical practice? A STAR*D report. Am J Psychiatry. 2009;166(5):599–607. doi: 10.1176/appi.ajp.2008.08071027. [DOI] [PubMed] [Google Scholar]
- 41.Khan A, Brodhead AE, Kolts RL, Brown WA. Severity of depressive symptoms and response to antidepressants and placebo in antidepressant trials. J Psychiatr Res. 2005;39(2):145–150. doi: 10.1016/j.jpsychires.2004.06.005. [DOI] [PubMed] [Google Scholar]
- 42.Collins FS, Tabak LA. NIH plans to enhance reproducibility. Nature. 2014;505(7485):612–613. doi: 10.1038/505612a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Hoertel N, de Maricourt P, Gorwood P. Novel routes to bipolar disorder drug discovery. Expert Opin Drug Discov. 2013;8(8):907–918. doi: 10.1517/17460441.2013.804057. [DOI] [PubMed] [Google Scholar]
- 44.Lan KKG, Demets DL. Discrete Sequential Boundaries for Clinical-Trials. Biometrika. 1983;70(3):659–663. [Google Scholar]
- 45.Pogue JM, Yusuf S. Cumulating evidence from randomized trials: Utilizing sequential monitoring boundaries for cumulative meta-analysis. Control Clin Trials. 1997;18(6):580–593. doi: 10.1016/s0197-2456(97)00051-2. [DOI] [PubMed] [Google Scholar]
- 46.Blanco C, Schneier FR, Schmidt A, et al. Pharmacological treatment of social anxiety disorder: A meta-analysis. Dep and Anxiety. 2003;18(1):29–40. doi: 10.1002/da.10096. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.