Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jul 28.
Published in final edited form as: Stat Med. 2009 Jan 15;28(1):65–74. doi: 10.1002/sim.3453

Applying permutation tests with adjustment for covariates and attrition weights to randomized trials of health-services interventions

Lingqi Tang 1,*,, Naihua Duan 2,3, Ruth Klap 1, Joan Rosenbaum Asarnow 4, Thomas R Belin 5
PMCID: PMC8318313  NIHMSID: NIHMS1726440  PMID: 18937226

SUMMARY

Using a health-services study as an illustrative example of longitudinal randomized field research with the potential for participants to be lost to follow-up, we apply a permutation test where the treatment indicator variable is randomly permuted in the context of regression models with covariates and attrition weighting. The test is applied to a multi-site randomized intervention trial of a quality-improvement program for adolescent depression treatment in primary-care settings, in which regression models were used to assess intervention effects with weights used to adjust for attrition bias. The foundation and motivation for this approach to the analysis are considered with attention to the demands associated with implementing such a strategy. The results from the permutation tests were qualitatively similar to the results obtained from conventional parametric models, and in fact suggested that the significance level from the conventional t-test was understated in this application.

Keywords: analysis of covariance, permutation test, attrition weights, cognitive behavioral therapy, adolescent depression, primary care, randomization test

1. INTRODUCTION

The permutation test, or randomization test, has emerged as a useful tool in the analysis of randomized controlled trials as computing power has increased rapidly in recent years [18]. An important advantage of the permutation test, relative to conventional parametric inference procedures, is the ability to avoid strong parametric distributional assumptions. This article develops a permutation test to assess the effectiveness of a quality-improvement (QI) intervention for adolescent depression in primary-care clinics.

Our interest in pursuing this research is motivated in part by earlier mental health-services research projects involving members of our research team that were published in the Journal of the American Medical Association [9, 10] where questions were raised in the review process about the possibility of using permutation tests. Given those concerns regarding the appropriateness of parametric methods and the challenges involved in detecting departures from underlying statistical assumptions, we developed an approach for implementing analyses that make use of design-based techniques to reflect loss to follow-up and that use permutation tests to avoid parametric assumptions [9, 10]. This approach has the advantage of requiring fewer underlying assumptions than are invoked in traditional linear models (e.g. normality of residuals and equal variances) and can be viewed as a way to gauge the robustness of findings using traditional methods. Although the method could be used in any experimental setting in an effort to strengthen confidence in project results, the issue seems particularly important for studies that are difficult to replicate and have strong potential for influencing clinical practice and patient outcomes.

For testing the hypothesis of no treatment effect in a randomized clinical trial, a variety of randomization-based inferential tests have been developed [18]. Multiple statistical packages offer relevant software, such as the entire StatXact package (http://www.cytel.com/Products/StatXact/), the ‘permute’ command in Stata (Stata Corporation, www.stata.com), and the NPAR1WAY procedure and the EXACT option in PROC FREQ in SAS (SAS Institute Inc., www.sas.com). However, applying these procedures to a covariate-adjusted analysis gives rise to subtleties and challenges. Gail et al. [11] proposed a permutation test that is based on the randomization distribution of residuals computed from a regression on covariates other than an indicator of treatment. In a design where units are assigned at random to one of two groups, the randomization distribution is straightforward, but the method can be used in more complicated designs. The appeal of this method is that the permutation test does not depend on the validity of traditional regression-model assumptions such as residuals being normal and homoscedastic. Rosenberger and Lachin [8] offer illustrative examples of permutation tests for covariate-adjusted analysis under different designs (e.g. blocked randomization, stratified urn randomization).

Various strategies have been proposed for permutation tests adapted to linear models [e.g. 11–20]. Consider a linear model Y = β0+β1X+β2Z+ε′, where Y is the dependent variable, X is a vector of covariates, Z is a primary explanatory variable (e.g. treatment indictor in a randomized trial), the error-terms components of ε are assumed iid., and the null hypothesis is H0:β2=0. The method of Freedman and Lane [13] involves permuting observed residuals from model Y = β0+β1X+ε′, which they call permutation ‘under the reduced model’. For each order of the n! possible reorderings, the permuted residuals are added to predicted dependent variables from the reduced model to form new versions of the dependent variable. These new values are then fitted in the full model to compute the permutation test statistic. The idea of this approach is that the residual from the reduced regression approximates the errors that meet the assumption of exchangeability under the null hypothesis. An alternative approach outlined by ter Braak [16] is to permute residuals ‘under the full model’ or ‘under the alternative hypothesis’. A third method, described by Manly [15], is to permute raw data values, which preserves the covariances among independent variables and hence can be used for testing the relationship of Y vs X and Z together. Given alternative study designs and analyses of interest, Anderson [21] provides a summary of empirical and theoretical results for these three methods and gives recommendations for their use. In our application, which tests treatment effects on outcome variables in a randomized trial, the Freedman and Lane approach is an attractive strategy because the assumption of exchangeability under the null is assured.

However, as in many longitudinal clinical trials, the problem of unit nonresponse complicates the interpretation of study results. To correct for attrition bias, weighting adjustment methods are commonly used [22, 23]. The objective of nonresponse weighting is to extrapolate from the observed follow-up sample to the population represented in the original sample. One common approach is to model the propensity that an individual would be lost to follow-up conditional on covariates and to use the inverse of the estimated probability as a nonresponse weight for each participant. Variations on the propensity model have been considered depending on whether the treatment indicator and its interaction with other covariates are included. In the situation that weights are correlated with the treatment indicator, the validity of permutation tests based on permutation distribution of residuals from weighted analyses is violated. To incorporate attrition weighting in a permutation test, one could consider alternatives that permute the treatment indicator variable either with or without conditioning on the covariates, analogous to the distinction between unconditional and conditional bootstrap analyses [24].

The method of permuting independent variables for linear models was used by Draper and Stoneman [12] for data analysis and evaluated in simulations by others [17, 19, 20]. Oja [14] used a distribution-free permutation test for studying the effect of a treatment variable Z on a response Y after allowing for the effects of a multivariate covariate X. By randomizing the Z-values to subjects, Oja approximated randomization distributions with the distributions of other test statistics. Manly [15] gave some general ideas regarding a more computer-intensive approach that is straightforward for repeating the whole estimation procedure many times, permuting Z-values to obtain the randomization distribution for the absolute value of the coefficient of Z in a multiple regression relating Y to Z and the covariates X. Kennedy and Cade [17] called this method ‘Shuffle-Z permutation test’. A simulation study by O’Gorman [20] showed that the test based on permutations of independent variables has roughly equal power to the permute-residuals test and can be utilized in conjunction with weighted least-squares procedures, unlike the permute-residuals method.

The purpose of this study is to demonstrate the viability of permutation tests based on permuting treatment indicators in the context of complex randomization protocols, the availability of covariates, the use of logistic regression or other nonlinear procedures, and the application of attrition weighting. Our method first permutes treatment status according to the original randomization design, and then reconstructs attrition weights using the permuted data. We then re-run weighted regressions using the reconstructed weights. Our proposed permutation test is illustrated using data from a randomized controlled trial, the Youth Partners-in-Care (YPIC) study, aimed at evaluating a health-services intervention for adolescent depression [10]. We describe the YPIC study in Section 2 below, develop a proposed method in Section 3, and present results in Section 4 based on applying the method to the YPIC study.

2. APPLIED CONTEXT: THE YPIC STUDY

YPIC is a multi-site randomized effectiveness trial comparing a QI intervention with a usual care (UC) control group [10]. The QI intervention included: (1) expert leader teams at each site, (2) care managers who supported primary-care providers (PCPs) with patient evaluation, medication and psychosocial treatment, and linkage with specialty mental health services, (3) training of care managers in manualized cognitive behavioral therapy (CBT) for depression, and (4) patient and provider choice of treatment modalities. UC patients had access to usual treatment at the site, but not to the specific mental health providers trained in the CBT and care management services used in the study.

Enrolled patients were assessed at baseline prior to the intervention and at six months. The assessment included CES-D total score for depressive symptoms (Center for Epidemiological Studies-Depression Scale) [25], a mental health summary score known as MCS-12 [26], satisfaction with mental health care assessed using a 5-point scale, and process of care; the baseline assessment also included socio-demographic characteristics such as age, gender, race/ethnicity, parent employment status, speaks another language at home, number of households lived in, insurance status, and family income. After completing the baseline assessment, participants were randomized to the QI or UC condition. We randomized patients within PCPs to assure a balance between the QI and UC groups in terms of provider mix. We also blocked patients over time within each PCP for the randomization to assure a balance between QI and UC patients in terms of patient sequence, i.e. patients entering into the study early vs late. We used a block size of two to maximize the balance of patient sequence. Screening/enrollment staff were masked to randomization status and sequence and were different from assessment staff. These design features prevented protocol subversion due to selection bias in enrollment that might occur with blocked randomization [27]. We also applied Berger–Exner’s test [28] to confirm this expectation. (We acknowledge an anonymous JAMA reviewer for our previous publication [10], for helpful comments on the threat of protocol subversion in blocked randomization.)

At baseline, the study enrolled and randomized 418 depressed patients aged 13–21 from five health-care organizations purposively selected to include managed care, public sector, and academic medical center clinics, with 211 patients assigned to the QI condition and 207 to the UC condition. The sample consisted of 378 patients randomized in 189 complete blocks of size two with 189 patients randomized to each arm, and 40 patients in 40 incomplete blocks of size one with 22 randomized to the QI condition and 18 randomized to the UC condition. (An incomplete block is formed when an odd number of patients is enrolled from a PCP, leaving only one patient in the last block.)

At six months, 344 (82 per cent) patients completed the follow-up assessment with 170 in the QI condition and 174 in the UC condition. Among the original 189 complete blocks, 132 remained as complete blocks at follow-up, 51 lost one patient (22 with the QI patient staying and 29 with the UC patient staying), and 6 lost both patients. Among the original 40 incomplete blocks of size one, 29 remained at follow-up (16 in QI and 13 in UC), and 11 lost their single patient (6 in QI, 5 in UC).

To control for potential nonresponse bias, attrition weights were constructed by fitting logistic regression models to predict follow-up status from baseline predictors including socio-demographic and clinical variables. These models were fitted separately for the two intervention arms. The fitted logistic regression models were used to derive the predicted probability for each individual respondent to remain in the follow-up. The reciprocal of the predicted probability was then used as the attrition weight for each participant. For the models, we started with a large set of independent variables to be considered for a logistic regression on the response outcome (coded 1 if response and 0 if nonresponse). The potential predictors included age, age group (13–18 vs 19–21), number of households lived in (1 vs 2 or more), other language spoken at home (yes vs no), ethnicity (4 categories: African American, Latino, Caucasian, and other), total number of counseling visits in past 6 months, the Mental Health Inventory five-item version (MHI-5) [26], and study site variables. There was 1 case missing for MHI-5 and 7 cases missing for total number of counseling visits. The missing values were imputed by regression predictions. The final model included all predictors that were at least marginally significantly associated with response status (p<0.10) in a bivariate analysis. Although the follow-up rates did not differ significantly across intervention conditions (χ2(1)=0.87, p=0.35), we found that there were more covariates significantly associated with response status in the control group than in the intervention group. Within both the UC and QI groups, older youth and those living in more than one household had higher nonresponse rates compared with younger youth and those living in single households. Further, within the UC group, nonrespondents also reported poorer mental health at baseline and were less likely to speak another language at home.

To evaluate the intervention effects on 6 months outcome assessments, we conducted multiple linear regressions for the three continuously scaled outcome variables (CES-D total score, MCS-12, and satisfaction with mental health care) and logistic regressions for the four binary variables (severe depression defined as CES-D⩾24, any specialty mental health care, any psychotherapy/counseling, and any medication). All of these analyses were conducted with an ANCOVA specification with the intervention status as the primary explanatory variable and the baseline version of the outcome measure as the covariate. Because CES-D was not measured at baseline, the baseline measure MHI-5 was used as the covariate for the analysis of 6-month CES-D, both total score and dichotomized. (CES-D and MHI-5 were highly correlated, with Pearson’s correlation 0.78 for our follow-up data.)

All regression models were fitted with attrition weights to mitigate potential attrition bias. We used SUDAAN software [29] for estimating variance of the parameter estimates. For weighted models, one could also use other software packages for complex survey data, such as the svy-commands in Stata and the SURVEYREG and SURVEYLOGISTIC procedures in SAS version 9. In SUDAAN, we used the design option ‘sampling with-replacement’ in linear and logistic regression models, which employs a Taylor-series linearization procedure in a manner equivalent to estimating variances of parameter estimates in the generalized estimating equations framework of Liang and Zeger [30, 31]. In contrast to a procedure based on the permutation distribution implied by the randomization process, our analysis is anchored in a model-based framework.

3. PERMUTATION TEST

Here we propose a permutation test that permutes treatment assignments for entire observations. Our technique attempted to reflect the design of the study, nonresponse adjustment, and adjustment for baseline covariates. Specifically, we permute the randomization indicators according to the blocked randomization design, and then we evaluate the intervention effect following the procedures in the original data analysis protocol, including weighting and covariate adjustments.

Consider a regression function: E(Y)=g(β0+β1X+β2Z), where Z is the treatment indicator (1=intervention,0=control), X is the vector of covariates, and g denotes a link function, with the identity link function appropriate for a linear model and the inverse of the logit function appropriate for a logistic regression model. The vector X could include fixed effects for blocks, or one could proceed with separate block effects, which is the approach we took in this application. The intervention effect is given by the regression coefficient β2; the significance of the intervention effect is tested with the null hypothesis H0:β2=0 vs Ha:β2 ≠ 0. We use the ratio T=β^2/se(β^2) rather than β^2 as the test statistic since the attrition weights were used in the regression models.

The null distribution of T for testing β2=0 can be simulated by re-randomizing treatment assignment according to the original randomization protocol (such as the blocked randomization design for YPIC) while keeping outcomes and covariates as observed. For a blocked randomized study with attrition weighting, our method follows the three-step procedure below:

Step 1: Compute the test statistic for the actual data observed:

  1. Select an attrition weighting model as described in Section 2.

  2. Fit the attrition weighting model.

  3. Derive attrition weights using model fitted in Step 1b.

  4. Fit the regression model using SUDAAN protocols for linear or logistic regression with attrition weights derived in Step 1c to obtain the test statistic T for the null hypothesis H0:β2=0.

Step 2: Estimate the null distribution for the test statistic T with N replicates of the three sub-steps below:

  1. Re-randomize treatment assignment within each block.

  2. Re-derive the attrition weights using the methods in Step 1a–1c.

  3. Re-fit the regression model using SUDAAN for linear or logistic regression with attrition weights re-derived in Step 2b to re-derive the test statistic T.

Step 3: Derive the empirical p-value for the test statistic T from the null distribution based on the permutation distribution obtained in Step 2, i.e. p=(M+1)/(N+1), where M denotes the number of replicates for which the test statistic T obtained in the permutation procedure in Step 2 is equal to or greater than (in absolute value) the observed value of T obtained in Step 1 and N denotes the total number of replicates. The addition of 1 in both numerator and denominator represents the observed test statistic for the original data, which is considered one of the realizations of the permutation distribution.

For the model selection in Step 2b, one might wonder whether there would be a difference in results between fixing covariates or carrying out variable selection for each permutation. These two approaches were included in our application. In Method 1, we used the exact predictors identified in Step 1a. Method 2 incorporated the model selection procedure in Step 1a, namely including all predictors that were at least marginally significantly associated with response status (p<0.10) in a bivariate analysis. The two approaches yielded similar results.

As a sensitivity analysis, we also carried out permutation tests without incorporating blocking in the analysis, regardless of the presence of blocks in Step 2a. Two variations on this procedure were conducted, one based on considering the entire sample at once and the other reflecting stratification by study site. Both methods treated the original size of each treatment group as fixed. Results were very similar across these alternatives, suggesting that intra-block correlation has a negligible effect in this context.

Note that Method 2 is demanding in its programming requirements. The permutation procedure in Step 2 can be greatly simplified if the attrition model does not include the treatment indicator variable. As a reviewer pointed out, bypassing Step 2 would eliminate the need to re-estimate the attrition weights for each permutation. When patterns of nonresponse do not differ substantially between treatment groups, the weighting model in Step 1 could be based on a single attrition model that does not use an intervention indicator as a predictor. If the resulting tests from two weighting models point to close agreement, then an argument can be made for using the simplified permutation method.

In our application, although the rates of nonresponse did not differ significantly across two treatment groups (19 per cent in the intervention group and 16 per cent in the control group, p=0.35), different predictors were found in propensity weighting models. To take into account the potential for differential nonresponse bias across treatment groups, we modeled attrition weights separately in a stratified analysis. A collateral feature of this decision was that marginal total sample sizes remained fixed in different replicates.

4. PERMUTATION TEST APPLIED TO YPIC

We applied the permutation test described in Section 3 to the YPIC data, with N = 10 000 replicates for the re-sampling procedure. At baseline when the randomization was conducted, there were 189 complete blocks with two patients in each block and 40 incomplete blocks with only one patient in each block. For each complete block, the re-randomization in Step 2a amounts to permuting the intervention condition for the two patients in the block. For each incomplete block, the re-randomization in Step 2a amounts to re-randomizing the intervention condition for the single patient in the block. Since our original analysis used all patients from both types of blocks, our permutation test mimics the original analysis, retaining the incomplete blocks.

The results are presented in Table I, showing the two-sided p-values based on the test statistic T. The first column presents the p-values based on the traditional t-test obtained with the actual observed data. The second column presents the p-values based on the permutation test. To account for variability of Monte Carlo approximation, we calculated Monte Carlo confidence interval for the estimated p-values P^MC, where the standard error of the Monte Carlo estimate is se(P^MC)=P^MC(1P^MC)/N, and the 95 per cent confidence limits are defined as P^MC±1.96×se(P^MC).

Table I.

Comparison of p-values obtained with usual t-test based on actual data and permutation tests.

Permutation test
Usual t-test*
Method 1
Method 2
p-Value Estimated p-value (95 per cent CI) Estimated p-value (95 per cent CI)
Continuously scaled variables
CES-D 0.0228 0.0186 (0.0160, 0.0212) 0.0180 (0.0154, 0.0206)
MCS12 0.0290 0.0265 (0.0233, 0.0296) 0.0252 (0.0221, 0.0283)
Satisfaction with mental health care 0.0038 0.0026 (0.0016, 0.0036) 0.0025 (0.0015, 0.0035)
Dichotomized variables
Severe depression 0.0199 0.0127 (0.0105, 0.0149) 0.0130 (0.0108, 0.0152)
Any specialty mental health care 0.0004 0.0007 (0.0002, 0.0012) 0.0004 (0.0000, 0.0008)
Any psychotherapy/counseling 0.0066 0.0041 (0.0028, 0.0054) 0.0034 (0.0023, 0.0045)
Any medication 0.7369 0.7214 (0.7126, 0.7302) 0.7229 (0.7142, 0.7317)
*

Weighted analysis with attrition weighting.

Permutation test with 10 000 replicates. Method 1: The nonresponse weights were created by using the same set of predictors from the original attrition weighting model for each permutation. Method 2: The nonresponse weights were remodeled by carrying out variable selection for each permutation.

For all seven outcome measures, the two versions of permutation tests and the usual t-test based on observed data lead to qualitatively similar p-values, suggesting that the results from the traditional analyses were robust.

Method 2, which re-derived the attrition weights, produced slightly smaller p-values in 5 cases out of 7 than did Method 1, which used the same predictors as in the original data set. For five outcome measures (CES-D, satisfaction, severe depression, any psychotherapy/counseling, and any medication), the p-values based on the traditional t-test were greater than the Monte Carlo 95 per cent upper confidence limits for both permutation methods, indicating that the differences between the p-values are not explained by sampling variability from the Monte Carlo procedure. For MCS12, the p-value based on the usual t-test fell within the Monte Carlo 95 per cent confidence interval for Method 2 but was greater than the upper endpoint of the interval produced by Method 1. For the seventh variable (any specialty mental health care), the p-value based on the usual t-test fell inside the Monte Carlo 95 per cent confidence interval for both methods.

5. DISCUSSION

The permutation tests considered in this article invoked two key elements: (1) the permutation of the treatment indicator is generated in a manner consistent with the original randomization scheme, and (2) the test statistic is computed in exactly the same manner as in the traditional analysis. The proposed strategy for relaxing distributional assumptions might be used as a routine technique to evaluate the findings of scientifically important field studies, in particular to assess whether results are sensitive to the parametric assumptions in traditional linear models and analysis-of-covariance procedures.

We demonstrated that the permutation test can be applied to realistic settings such as the YPIC study by taking the randomization scheme into account along with the use of covariates and attrition weighting. As a further illustrative example, we also implemented the permutation test in a different study published in JAMA, a multi-center randomized controlled trial of a disease management program for late-life depression known as Project IMPACT (Improving Mood-Promoting Access to Collaborative Treatment) with which members of our research team were also involved [9]. The IMPACT study enrolled 1801 depressed patients from 18 primary-care clinics belonging to eight health-care organizations, with random assignment at the individual level to a health-services intervention or to UC within strata defined by clinic and recruitment method (clinic screening or clinician referral) as well as within blocks of size 20. The originally planned analysis of the project made use of mixed-effects models to assess outcomes at four time points over 12 months of follow-up, but given concerns raised in the initial review about the extent to which the results might depend on modeling assumptions, we conducted analyses using design-based stratified exact permutation tests (using StatXact software). The fact that a few covariate items were missing on a few subjects added a layer of complication; in the original analysis, five multiply imputed data sets were created [32, 33], and in an effort to preserve the frequency properties of our testing procedures, the exact permutation test was conducted on the imputation version least favorable to intervention effects. Across 10 dependent variables, this procedure yielded significance results very similar to those from the mixed-effect models, with one exception (for the variable ‘Overall functional impairment’) where the permutation test produced a more significant p-value (p=0.0116) than the mixed-effect model (p=0.0233).

The fact that the permutation tests led, more often than not in our application, to stronger conclusions about significance reassures us about the use of the traditional analysis. This approach adds to extant methods for confirming results across diverse statistical methods and assumptions, and provides a useful strategy for enhancing confidence in research results with strong potential for impacting clinical care and patient outcomes.

Despite the similarity of significance findings between parametric techniques and permutation tests, there is room for philosophical debate surrounding the lingering differences: Do differences in significance findings lead one to favor the method that relies on fewer assumptions? Should there be a general expectation that analyses of randomized trials should rely primarily on permutation tests, or do limits of time and other resources suggest that scientific progress would be best served by embracing parametric tests, which are far more accessible to large numbers of applied researchers? Gains in insight into these questions might emerge from simulation studies, although we doubt that it would be possible to resolve such matters entirely. Still, the findings of the present investigation can serve as reference points for applied researchers, helping to reinforce intuition about the relevant tradeoffs based on having implemented a wide array of alternative analyses on the central scientific question at hand.

ACKNOWLEDGEMENTS

We acknowledge two anonymous reviewers for helpful comments.

Contract/grant sponsor: NIMH; contract/grant numbers: P30 MH068639, P30 MH082760, P30 MH58017

Contract/grant sponsor: Agency for Health Care Research and Quality; contract/grant number: HS09908

REFERENCES

  • 1.Fisher R The Design of Experiments. Oliver & Boyd: Edinburgh, Scotland, 1935. [Google Scholar]
  • 2.Pitman EJG. Significance tests which may be applied to samples from any population. Journal of the Royal Statistical Society 1937; B4:119–130 and 225–232 (Parts I and II). [Google Scholar]
  • 3.Pitman EJG. Significance tests which may be applied to samples from any population. Part III. The analysis of variance test. Biometrika 1938; 29:322–335. [Google Scholar]
  • 4.Welch WJ. Construction of permutation tests. Journal of American Statistical Association 1990; 85:693–698. [Google Scholar]
  • 5.Edgington ES. Randomization Tests (3rd edn). Marcel Dekker: New York, 1995. [Google Scholar]
  • 6.Berger VW. Pros and cons of permutation tests in clinical trials. Statistics in Medicine 2000; 19:1319–1328. [DOI] [PubMed] [Google Scholar]
  • 7.Good P Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses (3rd edn). Springer: New York, NY, 2005. [Google Scholar]
  • 8.Rosenberger W, Lachin JM. Randomization in Clinical Trials: Theory and Practice. Wiley: New York, 2002. [Google Scholar]
  • 9.Unutzer J, Katon W, Callahan CM, Williams JW, Hunkeler E, Harpole L, Hoffing M, Penna RD, Noël PH, Lin E, Arean PA, Hegel MT, Tang L, Belin TR, Oishi S, Langston C. Collaborative care management of late-life depression in the primary care setting: a randomized controlled trial. Journal of the American Medical Association 2002; 288(22):2836–2845. [DOI] [PubMed] [Google Scholar]
  • 10.Asarnow JR, Jaycox LH, Duan N, LaBorde AP, Rea MM, Anderson M, Landon C, Tang L, Wells KB. Effectiveness of a quality improvement intervention for adolescent depression in primary care clinics: a randomized controlled trial. Journal of the American Medical Association 2005; 293(3):311–319. [DOI] [PubMed] [Google Scholar]
  • 11.Gail MH, Tan WY, Piantadosi S. Tests for no treatment effect in randomized clinical trials. Biometrika 1988; 75:57–64. [Google Scholar]
  • 12.Draper NR, Stoneman DM. Testing for the inclusion of variables in linear regression by a randomization technique. Technometrics 1966; 8:695–699. [Google Scholar]
  • 13.Freedman D, Lane D. A nonstochastic interpretation of reported significance levels. Journal of Business and Economic Statistics 1983; 1:292–298. [Google Scholar]
  • 14.Oja H On permutation tests in multiple regression and analysis of covariance problems. Australian Journal of Statistics 1987; 29:91–100. [Google Scholar]
  • 15.Manly BFJ. Randomization, Bootstrapand Monte Carlo Methods in Biology. Chapman & Hall: London, 1991. (2nd edn, 1997). [Google Scholar]
  • 16.ter Braak CJF. Permutation versus bootstrap significance tests in multiple regression and ANOVA. In Bootstrapping and Related Techniques, Jöckel KH, Rothe G(eds). Springer: Sendler, Berlin, 1992; 79–86. [Google Scholar]
  • 17.Kennedy PE, Cade BS. Randomization tests for multiple regression. Journal of Statistical Computation and Simulation 1996; 25:923–936. [Google Scholar]
  • 18.Koch GG, Tangen CM, Jung JW, Amara IA. Issues for covariance analysis of dichotomous and ordered categorical data from randomized clinical trials and non-parametric strategies for addressing them. Statistics in Medicine 1998; 17:1863–1892. [DOI] [PubMed] [Google Scholar]
  • 19.Anderson MJ, Legendre P. An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model. Journal of Statistical Computation and Simulation 1999; 62:271–303. [Google Scholar]
  • 20.O’Gorman TW. The performance of randomization tests that use permutations of independent variables. Communication in Statistics—Simulation and Computation 2005; 34:895–908. [Google Scholar]
  • 21.Anderson MJ. Permutation tests for univariate or multivariate analysis of variance and regression. Canadian Journal of Fisheries and Aquatic Sciences 2001; 58:629–636. [Google Scholar]
  • 22.Korn EL, Graubard BI. Analysis of Health Surveys. Wiley: New York, 1999. [Google Scholar]
  • 23.Groves RM, Dillman DA, Eltinge JL, Little RJA. Survey Nonresponse. Wiley: New York, 2002. [Google Scholar]
  • 24.Efron B, Tibshirani R. An Introduction to the Bootstrap. Chapman & Hall/CRC: London/Boca Raton, FL, 1994. [Google Scholar]
  • 25.Radloff LS. The CES-D scale: a self report depression scale for research in the general population. Applied Psychological Measurement 1977; 1:385–401. [Google Scholar]
  • 26.Ware JE Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36): I. Conceptual framework and item selection. Medical Care 1992; 30:473–483. [PubMed] [Google Scholar]
  • 27.Berger VW, Christophi CA. Randomization technique, allocation concealment, masking, and susceptibility of trials to selection bias. Journal of Modern Applied Statistical Methods 2003; 2:80–86. [Google Scholar]
  • 28.Berger VW, Exner DV. Detecting selection bias in randomized clinical trials. Controlled Clinical Trials 1999; 20:319–327. [DOI] [PubMed] [Google Scholar]
  • 29.Research Triangle Institute. SUDAAN Language Manual, Release 9.0. Research Triangle Park, NC: Research Triangle Institute, 2004. [Google Scholar]
  • 30.Binder D On the variance of asymptotically normal estimators from complex surveys. International Statistical Review 1983; 51:279–292. [Google Scholar]
  • 31.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73:13–22. [Google Scholar]
  • 32.Lavori PW, Dawson R, Shera D. A multiple imputation strategy for clinical trials with truncation of patient data. Statistics in Medicine 1995; 14:1913–1925. [DOI] [PubMed] [Google Scholar]
  • 33.Tang L, Song J, Belin TR, Unützer J. A comparison of imputation methods in a longitudinal randomized clinical trial. Statistics in Medicine 2005; 24:2111–2128. [DOI] [PubMed] [Google Scholar]

RESOURCES