Skip to main content
Contemporary Clinical Trials Communications logoLink to Contemporary Clinical Trials Communications
. 2017 Jul 20;7:208–216. doi: 10.1016/j.conctc.2017.07.004

Predicting enrollment performance of investigational centers in phase III multi-center clinical trials

Rutger M van den Bor a,b,, Diederick E Grobbee a,b, Bas J Oosterman a, Petrus WJ Vaessen a, Kit CB Roes b
PMCID: PMC5898520  PMID: 29696188

Abstract

Failure to meet subject recruitment targets in clinical trials continues to be a widespread problem with potentially serious scientific, logistical, financial and ethical consequences. On the operational level, enrollment-related issues may be mitigated by careful site selection and by allocating monitoring or training resources proportionally to the anticipated risk of poor enrollment. Such procedures require estimates of the expected recruitment performance that are sufficiently reliable to allow centers to be sensibly categorized. In this study, we investigate whether information obtained from feasibility questionnaires can potentially be used to predict which centers will and which centers will not meet their enrollment targets by means of multivariable logistic regression analysis. From a large set of 59 candidate predictors, we determined the subset that is optimal for predictive purposes using Least Absolute Shrinkage and Selection Operator (LASSO) regularization. Although the extent to which the results are generalizable remains to be determined, they indicate that the prediction accuracy of the optimal model is only a marginal improvement over the intercept-only model, illustrating the difficulty of prediction in this setting.

Keywords: Feasibility studies, Risk-based monitoring, Site performance prediction, Site questionnaires, Trial accrual, Trial recruitment

1. Introduction

Successful completion of a clinical trial requires pre-specified recruitment targets to be met. However, failure to recruit sufficient numbers of subjects in clinical trials continues to be a widespread problem with potentially serious scientific, logistical, financial and ethical consequences [11], [12], [7], [16], [5].

On the operational level, enrollment-related issues may be mitigated by careful site selection (i.e. by primarily initiating centers likely to meet enrollment targets and timelines) and by allocating monitoring or training resources proportionally to the anticipated risk of poor enrollment. Often, considerable effort is made to collect information on topics related to enrollment performance by means of extensive feasibility questionnaires. Yet, how to reliably draw an a priori distinction between centers that will and centers that will not meet their enrollment targets, and whether doing so is sensible, is unclear. It requires knowledge concerning potential associations between quantifiable center-specific factors and recruitment performance.

In this study, we investigate such associations. In a broad sense, it shares this aim with many earlier studies (e.g. Refs. [1], [4], [6], [8], [14], [16], [17], [19], [22]). Note, however, that there exist considerable heterogeneity between these studies in terms of e.g. medical context, methodology, the operationalization of ‘recruitment performance’, and the results, making it far from clear which factors should be used for the purpose of an operational risk classification.

Prior to the recruitment phase, investigators are typically required to present an enrollment plan, based on their expectations regarding the available number of eligible patients, their willingness to cooperate, available staff members, etc. In this study, we consider enrollment to be successful if this center-specific pre-specified enrollment target is met. To our knowledge, only studies described in Reuter and Esche [17] and Getz [8] use this definition as well: Reuter and Esche [17] assessed the association between meeting enrollment targets and various feasibility questionnaire responses (aimed to measure, among others, the size of the potential patient pool, site/staff experience, and concerns with respect to the investigational medicinal product) in a phase III rheumatoid arthritis clinical trial, but did not detect any significant associations. Getz [8] summarizes the results of a study concluding that “once a particular site has conducted six to 10 clinical trials, that site has a higher likelihood of meeting enrollment targets within the requisite time frame.” (p.1).

We use data obtained from a large, international, placebo-controlled, phase III cardiovascular clinical trial to further evaluate possible associations between center-specific factors and recruitment performance. We consider a large set of candidate predictors obtained from the trial's feasibility study. To assess whether there exists a subset of candidate predictors that can help to identify which centers will and which will not meet their enrollment target, we use logistic regression analysis in combination with variable selection through Least Absolute Shrinkage and Selection Operator (LASSO) regularization.

2. The AleCardio trial

The AleCardio trial (ClinicalTrials.gov identifier: NCT01042769) aimed to assess the effect of treatment with Aleglitazar on cardiovascular mortality and morbidity in patients with known or newly diagnosed type 2 diabetes (T2D) who experienced a recent acute coronary syndrome (ACS) event. Patient enrollment in the trial took place between February 2010 and May 2012. In July 2013, the trial was halted due to futility for efficacy and increased rates of safety endpoints. In total, over 7000 patients were included by over 700 sites in Asia Pacific, China, Eastern Europe and Russia, India, Latin and South America, North America and Western Europe. For more detailed accounts of the design and results of this trial, see Lincoff et al. [9] and Lincoff et al. [10].

We use data from 811 centers for which an enrollment target was set and which were actually initiated. Note that not all of these centers ended up enrolling subjects. In fact, 88 centers were closed before the anticipated end of the recruitment period, typically because they failed to enroll any subject.

3. Methods

3.1. Outcome and candidate predictors

The outcome of interest is quantified as the dichotomous variable indicating whether a center met its enrollment target timely (0 = no, 1 = yes). We treat the 88 centers that were closed early as not having met their target, as we consider the theoretical possibility that these centers (had they not been closed early) would have met their enrollment targets infeasible.

Candidate predictors were obtained from the feasibility questionnaire data, an extensive source of information during the center initiation phase. Information was extracted for all items we considered to be possibly associated with recruitment performance, yielding a total of 56 candidate predictor variables. In addition, three candidate predictors were extracted from the recruitment planning that the centers were required to provide: the expected (i.e., target) number of subjects recruited, the expected number of months required to meet that target, and the anticipated screen failure rate. The 59 candidate predictors can be categorized into seven categories: (1) general center characteristics, (2) staff availability, (3) clinical trial experience, (4) patient pool characteristics, (5) potential/perceived enrollment challenges, (6) recruitment plan and strategies, and (7) contract execution and protocol approval. More details are provided in Appendix A.

3.2. Descriptive analyses

For descriptive purposes, we provide details on the distribution of the target and actual number of enrolled subjects and calculate the proportion of centers meeting the enrollment target. In addition, we regress the outcome variable (i.e., the dichotomous variable indicating whether a center met its enrollment target) on the full set of candidate predictors using multivariable logistic regression analysis, fitted using quasi-likelihood estimation to account for possible overdispersion.

3.3. Variable selection procedure

We use LASSO regularization [20] to determine the subset of candidate predictors that is optimal in terms of prediction accuracy when regressing the outcome on the candidate predictors through multivariable logistic regression analysis. In the estimation of the regression coefficients, LASSO regularization enables regression coefficient estimates to be shrunk to exactly zero, thereby realizing variable selection. The amount of shrinkage applied to the regression coefficient estimates, and hence the number of regression coefficient estimates equal to zero, is determined by the value of the tuning parameter λ, with larger values of λ representing more shrinkage. To determine the appropriate value of λ, we first estimate the model's out-of-sample prediction error (in terms of the Brier score) by cross validating a grid of 500 possible λ values through 10-fold cross validation (CV). Two common options for selecting λ are (1) to select the value of λ for which the CV error is minimized, and (2) to use the 1-standard error (1-SE) rule, i.e. to select the largest value of λ for which the CV error is within one SE from the minimum CV error. We apply both strategies. To ensure that all levels of categorical predictors are either in- or excluded from the model, a so-called group LASSO is used, as implemented in the R [2] package 'grplasso' [13]. To account for model-selection instability caused by the random selection of the 10 CV folds, we repeat the CV 20 times and calculate the 95th percentile of the 20 selected λ values (see Ref. [18]). For each model, we assess the range of CV error values at the selected value of λ. In addition, we determine the cross validated area under the curve (CV-AUC) values for each model at the selected value of λ to investigate the discriminatory power of the models.

3.4. Missing data handling

Table A1 shows the proportion of missing values for each of the candidate predictors. No values were missing for the outcome variable. Using the R [2] package 'mice' [21], we impute missing values ten times by means of predictive mean matching (for numeric data), logistic regression imputation (for binary data), polytomous regression imputation (for unordered categorical data) or proportional odds regression (for ordinal data), the default options in the mice function. As a consequence, the variable selection procedure is repeated ten times. Note that a complete case analysis was performed to assess the impact of removing centers with missing values (results are described in Appendix C).

4. Results

4.1. Descriptive analyses

Fig. 1 shows the distribution of (1) the center-specific enrollment targets, (2) the actual number of subjects enrolled, and (3) the difference between the enrollment target and the actual number of subjects enrolled. The median center-specific enrollment target equals 10.1 (Q1: 7.5, Q3: 11.5). The median number of actually recruited subjects equals 4.0 (Q1: 1.5, Q3: 9.00). It can be seen that only few centers (18.2%, 95% Wilson's CI: 15.7–21.1) met their enrollment target.

Fig. 1.

Fig. 1

Violin plots and boxplots showing the distribution of (1) the center-specific enrollment targets, (2) the actual number of subjects enrolled, and (3) the difference between the enrollment target and the actual number of subjects enrolled.

Regressing the outcome (i.e., the variable indicating whether a center met its recruitment target) on the full set of candidate predictors by means of a quasi-binomial generalized linear model and pooling over the multiply imputed datasets yields the results shown in Table 1 (in which, for presentation purposes, only predictors with associated p-values lower than 0.1 are displayed).

Table 1.

Pooled regression coefficient estimates (βˆ), standard errors (SE), and p-values for the quasi-binomial generalized linear model regressing the outcome (i.e. the variable indicating whether a center met its recruitment target) on the full set of candidate predictors. For presentation purposes, the table only includes predictors associated with p-values lower than 0.1. P-values are based on likelihood ratio tests comparing the full model against the model without the predictor. The estimate of the dispersion parameter ranged from 1.07 to 1.14. See Appendix A for a more detailed description of the variables.

Predictor Description βˆ SE P-value
(Intercept) −4.214 2.198
GCC.region The region in which a center is located. <0.001
 Asia Pacific Ref. -
 China 1.130 0.626
 E.Europe/Russia 0.021 0.504
 India 1.001 0.674
 Latin/S. America 0.412 0.550
 N. America/Can. −1.142 0.513
 W. Europe −0.817 0.489
GCC.clinic Indicates whether the center can be considered a clinical setting (0 = no, 1 = yes) 0.931 0.357 0.003
CTE.distrials_dep The department's experience (in number of trials) with clinical trials conducted in this disease area. 0.052
 None Ref. -
 1 to 5 0.657 0.476
 6 to 9 1.248 0.536
 10 or more 1.200 0.538
PEC.stmed Indicates whether the center expects the study medication to be a challenge with respect to enrollment. Rated from 1 to 5 with number 1 being the most challenging. −0.189 0.115 0.072
RPS.alterncontact Indicates whether the center has stated to be both willing and capable of utilizing alternate contact information for patients, including that of family and friends, to assist in maintaining patient contact (0 = no, 1 = yes). −0.548 0.301 0.060
RPS.recr_dur The planned length (in months) of the follow-up period 0.100 0.032 0.002
RPS.webcasts Indicates whether the center has stated to be both willing and capable of providing periodic webcasts for patients (0 = no, 1 = yes). −1.289 0.628 0.034
CEPA.comm_approv The total number of days required from submission of essential study documents to obtain final protocol approval from all of the site's required committees combined. 0.036
 1 to 10 Ref. -
 11 to 20 −1.001 0.472
 21 to 30 −0.499 0.409
 31 to 60 −0.706 0.417
 Greater than 60 0.165 0.494

4.2. Predictor selection

Table 1 already provides an indication of which candidate predictors are potentially important, but the results of the more formal variable selection procedure (for each multiply imputed dataset) are presented in Table 2. For instance, when using the ‘minimum CV error’ strategy to select the shrinkage parameter λ on the first multiply imputed dataset, it can be observed that the selected value of λ (scaled to the maximum possible value) equals 0.239. At that value of lambda, the CV error ranges1 from 0.142 to 0.145, the CV-AUC ranges from 0.643 to 0.675, and the selected model contains 13 predictors plus an intercept term. The corresponding CV-plot is provided in Appendix B. It can be observed that, while the CV error and CV-AUC values are similar over the multiply imputed datasets, the set of selected predictors is not, although a subset of eight candidate predictors is selected consistently.

Table 2.

Results (λ, CV error, CV-AUC and the set of selected predictors) of the LASSO analyses for each multiply imputed dataset, using two strategies to select λ. In the last column, variables that are consistently selected are highlighted in bold font. See Appendix A for a description of the variables.

Strategy for selecting λ MI dataset λ (scaled) CV error CV-AUC Selected candidate predictors
Minimum CV error 1 0.239 0.142–0.145 0.643–0.675 (Intercept), GCC.region, GCC.clinic, CTE.distrials_dep, PPC.num12m, PEC.scrfail, RPS.recr_target, RPS.recr_dur, RPS.webcasts, GCC.inet, CTE.gcptrials_dep, PEC.proc, CEPA.comm_approv, CEPA.exec_30d.
2 0.229 0.143–0.145 0.632–0.668 (Intercept), GCC.region, GCC.clinic, CTE.distrials_dep, PPC.num12m, PEC.scrfail, RPS.recr_target, RPS.recr_dur, RPS.webcasts, GCC.medhsp, CTE.gcptrials_dep, PEC.stmed, CEPA.comm_approv, CEPA.exec_30d.
3 0.313 0.145–0.148 0.611–0.647 (Intercept), GCC.region, GCC.clinic, CTE.distrials_dep, PPC.num12m, PEC.scrfail, RPS.recr_target, RPS.recr_dur, RPS.webcasts, GCC.smo.
4 0.235 0.142–0.145 0.645–0.669 (Intercept), GCC.region, GCC.clinic, CTE.distrials_dep, PPC.num12m, PEC.scrfail, RPS.recr_target, RPS.recr_dur, RPS.webcasts, CTE.gcptrials_dep, PEC.proc, PEC.stmed, PEC.patpop, RPS.chartrev, RPS.promote, RPS.alterncontact, CEPA.comm_approv, CEPA.exec_30d.
5 0.235 0.144–0.146 0.626–0.675 (Intercept), GCC.region, GCC.clinic, CTE.distrials_dep, PPC.num12m, PEC.scrfail, RPS.recr_target, RPS.recr_dur, RPS.webcasts, GCC.pi_inv, GCC.medhsp, PEC.stmed, CEPA.comm_approv.
6 0.236 0.144–0.146 0.630–0.662 (Intercept), GCC.region, GCC.clinic, CTE.distrials_dep, PPC.num12m, PEC.scrfail, RPS.recr_target, RPS.recr_dur, RPS.webcasts, GCC.inet, GCC.pi_spec, GCC.medhsp, SA.resnurse, CTE.audit, PEC.proc, PEC.stmed, CEPA.comm_approv, CEPA.exec_30d.
7 0.239 0.144–0.146 0.624–0.650 (Intercept), GCC.region, GCC.clinic, CTE.distrials_dep, PPC.num12m, PEC.scrfail, RPS.recr_target, RPS.recr_dur, RPS.webcasts, CTE.audit, CTE.gcptrials_dep, CTE.gcpyrs_stcoord, PEC.stmed, CEPA.comm_approv, CEPA.exec_30d.
8 0.247 0.143–0.146 0.625–0.664 (Intercept), GCC.region, GCC.clinic, CTE.distrials_dep, PPC.num12m, PEC.scrfail, RPS.recr_target, RPS.recr_dur, RPS.webcasts, GCC.medhsp, SA.recrspec, CTE.audit, CTE.gcpyrs_stcoord, PEC.stmed, PEC.patpop, CEPA.comm_approv, CEPA.exec_30d.
9 0.231 0.143–0.146 0.626–0.656 (Intercept), GCC.region, GCC.clinic, CTE.distrials_dep, PPC.num12m, PEC.scrfail, RPS.recr_target, RPS.recr_dur, RPS.webcasts, GCC.inet, GCC.medhsp, GCC.smo, CTE.audit, CTE.gcptrials_dep, PEC.stmed, RPS.chartrev, RPS.alterncontact, CEPA.comm_approv.
10 0.262 0.144–0.147 0.628–0.660 (Intercept), GCC.region, GCC.clinic, CTE.distrials_dep, PPC.num12m, PEC.scrfail, RPS.recr_target, RPS.recr_dur, RPS.webcasts, CTE.gcptrials_dep, PEC.stmed, PEC.patpop, RPS.chartrev, CEPA.comm_approv.



1-SE rule 1 1 0.149–0.150 0.484–0.507a (Intercept)
2 1 0.149–0.150 0.483–0.505a (Intercept)
3 1 0.149–0.150 0.484–0.504a (Intercept)
4 1 0.149–0.150 0.490–0.506a (Intercept)
5 1 0.149–0.150 0.484–0.504a (Intercept)
6 1 0.149–0.150 0.476–0.503a (Intercept)
7 1 0.149–0.150 0.486–0.502a (Intercept)
8 1 0.149–0.150 0.488–0.502a (Intercept)
9 1 0.149–0.150 0.484–0.500a (Intercept)
10 1 0.149–0.150 0.492–0.501a (Intercept)
a

Note that some variability in the results is possible because the maximum value of λ in the CV training sets may not be identical to the maximum value in the complete data.

These results, however, do not hold when the 1-SE rule is used to select λ. In that case, the selected value of λ consistently equals its maximum possible value, thus yielding intercept-only models. However, in terms of prediction accuracy, the negative consequences of doing so are marginal, as observed by the limited increase in the CV error and the limited decrease in the CV-AUC.

The size and direction of the regression coefficient estimates of the candidate predictors that were consistently selected when using the ‘minimum CV error’ rule for choosing λ are provided in Table 3.

Table 3.

Regression coefficient estimates of the candidate predictors that were consistently selected in each of the multiply imputed datasets. See Appendix A for a description of the variables.

Predictor Multiply imputed dataset
1 2 3 4 5 6 7 8 9 10
(Intercept) −2.648 −2.388 −2.421 −2.409 −2.536 −2.721 −2.437 −2.456 −2.669 −2.499
GCC.region
 Asia Pacific Ref. Ref. Ref. Ref. Ref. Ref. Ref. Ref. Ref. Ref.
 China 0.614 0.602 0.391 0.593 0.611 0.601 0.570 0.549 0.612 0.504
 E.Europe/Russia 0.119 0.105 0.080 0.119 0.096 0.125 0.100 0.105 0.101 0.098
 India 0.454 0.432 0.241 0.423 0.451 0.439 0.400 0.392 0.457 0.345
 Latin/S. America 0.191 0.156 0.093 0.161 0.149 0.172 0.139 0.144 0.159 0.124
 N. America/Can. −0.290 −0.332 −0.236 −0.303 −0.354 −0.314 −0.321 −0.314 −0.356 −0.289
 W. Europe −0.140 −0.147 −0.104 −0.139 −0.162 −0.144 −0.141 −0.134 −0.175 −0.126
GCC.clinic 0.342 0.425 0.168 0.313 0.366 0.320 0.334 0.372 0.313 0.264
CTE.distrials_dep
 None Ref. Ref. Ref. Ref. Ref. Ref. Ref. Ref. Ref. Ref.
 1 to 5 0.151 0.155 0.029 0.136 0.181 0.160 0.139 0.148 0.184 0.128
 6 to 9 0.359 0.367 0.063 0.339 0.398 0.316 0.307 0.313 0.405 0.246
 10 or more 0.294 0.341 0.055 0.316 0.346 0.267 0.258 0.251 0.331 0.245
PPC.num12m 0.137 0.020 0.061 0.025 0.012 0.017 0.002 0.027 0.066 0.039
PEC.scrfail 0.416 0.492 0.394 0.467 0.462 0.469 0.500 0.476 0.489 0.450
RPS.recr_target 0.003 0.004 0.005 0.003 0.004 0.004 0.004 0.004 0.004 0.004
RPS.recr_dur 0.067 0.068 0.060 0.066 0.067 0.066 0.066 0.065 0.068 0.064
RPS.webcasts −0.617 −0.704 −0.343 −0.873 −0.350 −0.667 −0.415 −0.591 −0.332 −0.481

In this study, centers in China and India are predicted to have the highest probability of meeting recruitment targets (keeping the other variables constant). Predicted probabilities are lowest for centers in Western Europe and North America and Canada (GCC.region). Higher probabilities are predicted for centers that can be considered clinical settings (GCC.clinic), centers that have more experience in conducting trials in this specific clinical context (CTE.distrials_dep), and for centers with a larger patient pool (PPC.num12m). Also, a higher anticipated screen failure rate positively affects the predicted probability of meeting enrollment targets (PEC.scrfail), as does a longer duration of the recruitment period (RPS.recr_dur). Contrary to our expectations, a positive regression coefficient estimate is found for the enrollment target (RPS.recr_target). A second surprising finding is that centers stating to be both willing and capable of providing periodic webcasts for patients (a strategy aimed to increase patient retention) have lower predicted probabilities to meet their target than center who did not (RPS.webcasts). From the categories ‘staff availability (SA)’ and ‘Contract execution and protocol approval (CETA)’, no candidate predictors were consistently selected.

When the ‘1-SE rule’ was used to select λ, the estimated intercept (the only term in the models) was equal to −1.5 in all analyses.

5. Discussion

We used data from a large international phase III cardiovascular clinical trial to investigate associations between center characteristics obtained from the responses to the feasibility questionnaires and recruitment performance. From a large number of candidate predictors, we determined the subset that is optimal for predictive purposes using LASSO regularization. In terms of prediction accuracy, the models selected using the ‘minimum CV error’ strategy for choosing the value of shrinkage parameter λ are only marginal improvements over the intercept-only models that were selected using the ‘1-SE rule’. This result implies that the predictive value of the set of candidate predictors is limited and should not be overestimated. The results illustrate the difficulty of prediction in this context and suggest that it may be unjustified to base operational decisions on the responses to the feasibility questionnaire items.

However, these findings are based on data from a single trial, hampering their generalizability. In addition, using the results of our study for the specific purpose of making a decision to proceed or not to proceed with a center in the site selection process should be done with caution, as the selection of centers included in this study already represents a (possibly selective) subset of centers. More research is needed to assess whether the findings presented here hold more generally.

The data used were not collected for the purpose of this analysis. Therefore, this assessment should be considered explorative in nature. We were unable to include potentially relevant feasibility questionnaire items due to, e.g., ambiguous item or answer formulations, and in some cases a subjective assessment of text field entries was required. Data on certain potentially important factors were not collected. E.g. our list of candidate predictors fails to adequately address the extent and nature of potential prior cooperation between the center and the sponsor or site management organization. Also, the feasibility questionnaire was designed specifically for this trial and, as a consequence, questionnaire items were not always formulated in sufficiently general terms. Lastly, one could argue that a formal comparison of the predictive accuracy of the model constructed using the ‘minimum CV error’ rule versus the model based on the ‘1-SE rule’ requires independent test data. We therefore repeated the LASSO procedure on a random selection of two-thirds of the data and estimated the prediction error and AUC values on the remainder of the data. Although the results (available upon request) showed signs of numerical instability (i.e. selected λ values were more variable over the multiply-imputed data sets, likely due to the smaller sample sizes of the training sets), the prediction error levels and AUC values corresponding to the selected models are similar to the results described above.

Comparing our results to the results of earlier investigations is not straightforward due to variability in terms of medical context, study methodology and the operationalizations used. Note, however, that from a general perspective our results resemble those of Reuter and Esche [17] who failed to detect a significant association between the responses to a range of feasibility questionnaire items and meeting enrollment targets. Our findings reveal possible limitations of the items used in feasibility questionnaires and could be interpreted as a warning against overemphasizing the outcomes of feasibility studies in general. Overall, however, more research is needed to be able to draw more definitive conclusions. The candidate predictors selected using the ‘minimum CV error’ strategy for choosing λ may have been of limited value in this trial, but may be considered for re-evaluation in future studies.

In conclusion, the results suggest that drawing a reliable a priori distinction between centers that will meet their recruitment target and those that will not is a difficult task, as even the optimal selection of candidate predictors only represents a marginal improvement in predictive accuracy as compared to the intercept-only model. Thus, the predictive value of current feasibility studies may not be large enough to justify such extensive questioning. However, more research, preferably from varying types of trials and clinical contexts, is needed to assess whether our results hold more general.

Funding

Funding for this study was provided by Julius Clinical Ltd., an Academic Research Organization that offers a range of services (such as protocol development, site management and monitoring) to its (pharmaceutical) partners.

Conflicts of interest

We wish to confirm that there are no known conflicts of interest associated with this publication.

Trial information

The analysis in this manuscript was performed on data from the AleCardio trial (ClinicalTrials.gov identifier: NCT01042769).

Acknowledgments

The authors thank R Abbink, S Bruins, F Geerdinck, VL Jong, EM Lusseveld, WJ Meijst, SLP Nieuwenhuis, P Nowak, MB Oostendorp, S Ruis, HMA Verbraeken and NPA Zuithoff for useful comments and discussions.

Footnotes

1

The CV error and the CV-AUC are variable because the CV procedure was repeated 20 times, as explained in section 3.3.

Appendix A. List of candidate predictors

Table A1.

Description of the candidate predictors considered in the analysis. Summary statistics (Quantiles Q1, Q2 and Q3 or percentages) are calculated on observed data. Abbreviations: PI = Principal Investigator, ICH = International Conference of Harmonization, GCP = Good Clinical Practice, ACS = Acute Coronary Syndrome, T2D = Type 2 Diabetes.

Candidate predictor Description % missing
General center characteristics (GCC)
 GCC.emr Indicates whether the center has access to electronic medical records (0 = no, 1 = yes). Percentages: 41.2, 58.8. 3.9
 GCC.inet Indicates whether a high-speed Internet connection is available at the center (0 = no, 1 = yes). Percentages: 1.9, 98.1. 3.9
 GCC.pdb Indicates whether the center has access to a patient database (0 = no, 1 = yes). Percentages: 20.9, 79.1. 3.9
 GCC.fu_resp Indicates whether the center is responsible for the long-term follow-up of patients in this trial (0 = no, 1 = yes). Percentages: 6.7, 93.3. 6.2
 GCC.pi_inv Indicates whether the PI is routinely involved in follow-up visits with study patients (0 = no, 1 = yes). Percentages: 9.2, 90.8. 6.2
 GCC.pi_spec Indicates whether the PI's specialty corresponds to the research area (here, cardiology/diabetes). (0 = no, 1 = yes). Percentages: 2.9, 97.1. 2.1
 GCC.region The region in which a center is located. Similar to Desai et al. [3]; each site is classified into one of the following regions: Asia Pacific, China, Eastern Europe and Russia, India, Latin and South America, North America (United States and Canada), Western Europe. Percentages: 9.2, 4.4, 12.5, 4.7, 13.1, 35.4, 20.7. 0
 GCC.clinic Indicates whether the center can be considered a clinical setting (0 = no, 1 = yes). Percentages: 88.5, 11.5. 0.5
 GCC.crc Indicates whether the center can be considered a clinical research center (0 = no, 1 = yes). Percentages: 85.6, 14.4. 0.5
 GCC.gov Indicates whether the center can be considered a government-run medical facility (0 = no, 1 = yes). Percentages: 89.3, 10.7. 0.5
 GCC.group Indicates whether the center can be considered a group practice (0 = no, 1 = yes). Percentages: 85.5, 14.5. 0.5
 GCC.medhsp Indicates whether the center can be considered a medical hospital (0 = no, 1 = yes). Percentages: 51.7, 48.3. 0.5
 GCC.private Indicates whether the center can be considered a private practice (0 = no, 1 = yes). Percentages: 75.2, 24.8. 0.5
 GCC.smo Indicates whether the center can be considered a site management organization (0 = no, 1 = yes). Percentages: 97.6, 2.4. 0.5
 GCC.spec Indicates whether the center can be considered a cardiology specialist center (0 = no, 1 = yes). Percentages: 98.6, 1.4. 0.5
 GCC.teach Indicates whether the center can be considered a teaching hospital (0 = no, 1 = yes). Percentages: 85.4, 14.6. 0.5
Staff availability (SA)
 SA.diet Indicates whether a registered dietician/nutritionist is available (1 = yes, 0 = no). Percentages: 62.2, 37.8. 6.4
 SA.endocr Indicates whether an endocrinologist is available (1 = yes, 0 = no). Percentages: 59.6, 40.4. 6.4
 SA.pharm Indicates whether a pharmacologist is available (1 = yes, 0 = no). Percentages: 51.6, 48.4. 6.4
 SA.phleb Indicates whether a phlebotomist is available (1 = yes, 0 = no). Percentages: 59.6, 40.4. 6.4
 SA.radiol Indicates whether a radiologist is available (1 = yes, 0 = no). Percentages: 58.5, 41.5. 6.4
 SA.recrspec Indicates whether a recruitment specialist is available (0 = no, 1 = yes). Percentages: 81.0, 19.0. 6.4
 SA.resnurse Indicates whether a research nurse is available (0 = no, 1 = yes). Percentages: 30.7, 69.3. 6.4
 SA.stcoord Indicates whether a study coordinator is available (0 = no, 1 = yes). Note: if missing or 0, but CTE.gcpyrs_stcoord is > 0, set to 1. Percentages: 5.7, 94.3. 6.4
 SA.subi Indicates whether a sub-investigator is available (0 = no, 1 = yes). Note: if missing or 0, but CTE.gcpyrs_subi is > 0, set to 1. Percentages: 6.9, 93.1. 6.4
Clinical trial experience (CTE)
CTE.audit Indicates whether the center has ever been audited by a regulatory agency or health authority (0 = no, 1 = yes). Percentages: 80.7, 19.3. 6.8
CTE.gcptrials_dep The department's experience (number of trials in the past three years) with clinical trials conducted according to ICH and GCP Guidelines. Categories: None, 1 to 4, 5 to 9, and 10 or more. Percentages: 1.3, 14.5, 30.2, 54.1. 4.4
CTE.distrials_dep The department's experience (in number of trials) with clinical trials conducted in this disease area. Categories: None, 1 to 5, 6 to 9, and 10 or more. Percentages: 11.3, 45.2, 19.2, 24.2. 4.3
CTE.gcpyrs_pi The PI's experience (in years) with clinical trials conducted according to ICH and GCP Guidelines. Categories: None, less than 1 year, 1–4 years, 4–7 years, or greater than 7 years. Percentages: 1.5, 2.1, 11.0, 20.5, 64.9. 4.3
CTE.gcpyrs_stcoord The study coordinator's experience (in years) with clinical trials conducted according to ICH and GCP Guidelines. Categories: None, less than 1 year, 1–4 years, 4–7 years, or greater than 7 years. Equals 0 if no study coordinator is present. Percentages: 6.5, 4.5, 22.5, 26.3, 40.2. 7.3
CTE.gcpyrs_subi The sub-investigator's experience (in years) with clinical trials conducted according to ICH and GCP Guidelines. Categories: None, less than 1 year, 1–4 years, 4–7 years, or greater than 7 years. Equals 0 if no sub-investigator is present. Percentages: 7.8, 6.0, 23.1, 27.1, 36.0. 7.2
Patient pool characteristics(PPC)
PPC.patdis10 km What proportion of your patients live within approximately 10 km (6 miles) distance from your clinic? 0, .01-.2, .21-.4, .41-.6, .61-.8, or > .8? Used category midpoints, treated as continuous. Q1, Q2, Q3: 0.30, 0.50, 0.70. 8.3
PPC.num12m The number of ACS patients with newly diagnosed T2D the center treated during the past 12 months, divided by 100. Note that this is an approximation, as it is based on two questions (one for ACS, and one for T2D, with ordinal answer categories). The product of midpoints was used. Furthermore, since one of the two questions had an open-ended last category, the strategy described and recommended by Parker & Fenwick [15] was used to estimate the midpoint for this category. Note also that this item excludes ACS patients with known T2D. Q1, Q2, Q3: 0.23, 0.49, 1.16. 6.9
Potential or perceived enrollment challenges (PEC)
PEC.proc Do you expect the procedures or assessments required to be a challenge with respect to enrollment? Please rate from 1 to 5 (with number 1 being the most challenging). Treated as continuous. Q1, Q2, Q3: 3, 4, 5. 6.8
PEC.import Do you expect the importation issues to be a challenge with respect to enrollment? Please rate from 1 to 5 (with number 1 being the most challenging). Treated as continuous. Q1, Q2, Q3: 3, 4, 5. 8.1
PEC.inex Do you expect in- and exclusion criteria to be a challenge with respect to enrollment? Please rate from 1 to 5 (with number 1 being the most challenging). Q1, Q2, Q3: 3, 4, 4. 6.8
PEC.stmed Do you expect the study medication to be a challenge with respect to enrollment? Please rate from 1 to 5 (with number 1 being the most challenging). Treated as continuous. Q1, Q2, Q3: 3, 4, 5. 7.2
PEC.reimb Do you expect medication reimbursement issues to be a challenge with respect to enrollment? Please rate from 1 to 5 (with number 1 being the most challenging). Treated as continuous. Q1, Q2, Q3: 3, 4, 5. 7.8
PEC.patpop Do you expect the patient population to be a challenge with respect to enrollment? Please rate from 1 to 5 (with number 1 being the most challenging). Treated as continuous. Q1, Q2, Q3: 3, 4, 4. 6.8
PEC.regul Do you expect regulatory authority issues to be a challenge with respect to enrollment? Please rate from 1 to 5 (with number 1 being the most challenging). Treated as continuous. Q1, Q2, Q3: 3, 4, 5. 7.4
PEC.staff Do you expect a lack of sufficient staff resources to be a challenge with respect to enrollment? Please rate from 1 to 5 (with number 1 being the most challenging). Treated as continuous. Q1, Q2, Q3: 4, 5, 5. 7.3
PEC.visit_dur Do you expect the visit frequency and/or study duration to be a challenge with respect to enrollment? Please rate from 1 to 5 (with number 1 being the most challenging). Treated as continuous. Q1, Q2, Q3: 3, 4, 5. 6.9
PEC.impconcerns Indicates whether the center has concerns about the investigational medicinal product (0 = no, 1 = yes). Percentages: 79.7, 20.3. 6.0
PEC.comptrials Indicates whether other, possibly competing trials are currently running/planned on the center. Categories: “No”, “Yes, but we can still meet the enrollment goal for this study”, “Yes, and it may impact our ability to meet the enrollment goal for this study”. Percentages: 60.1, 35.9, 4.0. 7.3
PEC.scrfail The expected proportion of screen failures. Q1, Q2, Q3: 0.15, 0.20, 0.25. 0
Recruitment plan and strategies (RPS)
RPS.recr_target The planned/target number of enrolled subjects. Q1, Q2, Q3: 7.50, 10.05, 11.51. 0
RPS.recr_dur The planned length (in months) of the follow-up period. Q1, Q2, Q3: 9, 12, 15. 0
RPS.chartrev Indicates whether the center has stated to be both willing and capable of providing additional support to assist with chart review to identify patients for the study (0 = no, 1 = yes). Percentages: 59.0, 41.0. 8.0
RPS.promote Indicates whether the center has stated to be both willing and capable of providing materials or services to promote the study to referral physicians/other departments (0 = no, 1 = yes). Percentages: 52.1, 47.9. 8.0
RPS.contact Indicates whether the center has stated to be both willing and capable of to keep regular contact between visits (0 = no, 1 = yes). Percentages: 42.2, 57.8. 7.4
RPS.cfu_remind Indicates whether the center has stated to be both willing and capable of providing community follow-up and visit reminder emails, cards and phone calls (0 = no, 1 = yes). Percentages: 54.6, 45.4. 7.4
RPS.contact_caregiver Indicates whether the center has stated to be both willing and capable of maintaining contact with the patients' other caregivers, particularly primary care physicians (0 = no, 1 = yes). Percentages: 56.1, 43.9. 7.5
RPS.letter Indicates whether the center has stated to be both willing and capable of providing personal thank you letters to patients (0 = no, 1 = yes). Percentages: 63.9, 36.1. 7.4
RPS.alterncontact Indicates whether the center has stated to be both willing and capable of utilizing alternate contact information for patients, including that of family and friends, to assist in maintaining patient contact (0 = no, 1 = yes). Percentages: 66.0, 34.0. 7.4
RPS.items Indicates whether the center has stated to be both willing and capable of providing study-pertinent items to patients at milestone visits (i.e. diabetes recipes, exercise guides, etc.) (0 = no, 1 = yes). Percentages: 46.1, 53.9. 7.5
RPS.website Indicates whether the center has stated to be both willing and capable of creating a study community website for patients to view news and articles related to their condition (0 = no, 1 = yes). Percentages: 78.8, 21.2. 7.4
RPS.webcasts Indicates whether the center has stated to be both willing and capable of providing periodic webcasts for patients (0 = no, 1 = yes). Percentages: 90.4, 9.6. 7.4
Contract execution and protocol approval (CEPA)
CEPA.comm_approv The total number of days required from submission of essential study documents to obtain final protocol approval from all of the site's required committees combined. Categories: 1 to 10, 11 to 20, 21 to 30, 31 to 60, Greater than 60. Percentages: 15.4, 15.1, 27.8, 30.8, 10.8. 7.9
CEPA.exec_30d Indicates whether it usually takes the center more than 30 days to execute a contract and budget (0 = yes or unknown, 1 = no). Percentages: 70.5, 29.5. 7.8

Appendix B. Example cross-validation plot

Fig. B1.

Fig. B1

The 20 CV plots for the CV error and the CV-AUC for the analysis performed on the first multiply imputed dataset. The vertical grey lines indicate the selected values of λ when using the minimum CV-error rule (left) or the 1-SE rule (right).

Appendix C. Complete case analysis

For reference, the table below shows the results for the regression analysis described in section 4.1 when it is applied to the 672 centers with complete data. Again, for presentation purposes, only results for which p < 0.1 are shown.

Table C1.

Complete case analysis: Regression coefficient estimates (βˆ), standard error (SE), and p-values for the quasi-binomial generalized linear model regressing the outcome (i.e. the variable indicating whether a center met its recruitment target) on the full set of candidate predictors. For presentation purposes, the table only includes predictors associated with p-values lower than 0.1. P-values are based on likelihood ratio tests comparing the full model against the model without the predictor. The estimate of the dispersion parameter equals 1.081. See Appendix A for a more detailed description of the variables.

Predictor Description βˆ SE P-value
(Intercept) - −4.108 2.404
GCC.inet Indicates whether a high-speed Internet connection is available at the center (0 = no, 1 = yes). 1.781 1.045 0.062
GCC.region The region in which a center is located. <0.001
 Asia Pacific Ref. -
 China 1.635 0.676
 E.Europe/Russia −0.142 0.520
 India 0.924 0.699
 Latin/S. America 0.039 0.587
 N. America/Can. −1.535 0.569
 W. Europe −0.823 0.515
GCC.clinic Indicates whether the center can be considered a clinical setting (0 = no, 1 = yes) 0.738 0.410 0.077
CTE.gcptrials_dep The department's experience (number of trials in the past three years) with clinical trials conducted according to ICH and GCP Guidelines. 0.032
 None Ref.
 1 to 4 −3.403 1.895
 5 to 9 −3.749 1.942
 10 or more −4.290 1.944
CTE.distrials_dep The department's experience (in number of trials) with clinical trials conducted in this disease area. 0.041
 None Ref. -
 1 to 5 0.591 0.510
 6 to 9 1.288 0.564
 10 or more 1.270 0.577
PEC.proc Indicates whether the center expects the procedures or assessments required to be a challenge with respect to enrollment. Rated from 1 to 5 with number 1 being the most challenging. −0.279 0.165 0.090
RPS.recr_dur The planned length (in months) of the follow-up period 0.096 0.035 0.006
RPS.webcasts Indicates whether the center has stated to be both willing and capable of providing periodic webcasts for patients (0 = no, 1 = yes). −1.721 0.724 0.006
CEPA.comm_approv The total number of days required from submission of essential study documents to obtain final protocol approval from all of the site's required committees combined. 0.010
 1 to 10 Ref. -
 11 to 20 −1.452 0.504
 21 to 30 −0.735 0.441
 31 to 60 −1.012 0.446
 Greater than 60 −0.166 0.524

The LASSO procedure, when applied to the subset of centers with complete data, yields the following results: The λ value that minimizes the CV error, as a proportion of its maximum possible value, equals 0.227, at which the CV error ranges from 0.146 to 0.150 and the CV-AUC ranges from 0.629 to 0.680. Using the '1-SE rule' yields a CV error estimate of 0.154 in all settings and a CV-AUC ranging from 0.479 to 0.505. These results are comparable to the results of the analyses on the multiply imputed data. Again, using the '1-SE rule' implies that the optimal model is the intercept-only model (with an intercept of −1.457). Minimizing the CV error results in the model in Table C2. As can be seen, the model contains the set of candidate predictors that was also consistently selected in the analyses on the imputed data, with comparable regression coefficient estimates.

Table C2.

Results (selected candidate predictors and regression coefficient estimates) of the LASSO procedure when applied to the subset of complete cases and when selecting the value for λ which minimizes the CV error. See Appendix A for a more detailed description of the variables.

Predictor Estimate
(Intercept) −2.351
GCC.region
 Asia Pacific Ref.
 China 0.763
 E.Europe/Russia 0.165
 India 0.395
 Latin/S. America 0.101
 N. America/Can. −0.357
 W. Europe −0.088
GCC.clinic 0.163
GCC.group −0.100
SA.stcoord 0.128
CTE.gcptrials
 None Ref.
 1 to 4 −0.237
 5 to 9 −0.195
 10 or more −0.323
CTE.distrials
 None Ref.
 1 to 5 0.145
 6 to 9 0.394
 10 or more 0.364
PPC.num12m 0.056
PEC.proc −0.033
PEC.patpop 0.002
PEC.scrfail 0.432
RPS.recr_target 0.006
RPS.recr_dur 0.068
RPS.chartrev 0.002
RPS.webcasts −0.658
CEPA.comm_approv
 1 to 10 Ref.
 11 to 20 −0.212
 21 to 30 −0.137
 31 to 60 −0.152
 Greater than 60 0.164

References

  • 1.Benson A.B., III, Pregler J.P., Bean J.A., Rademaker A.W., Eshler B., Anderson K. Oncologists' reluctance to accrue patients onto clinical trials: an Illinois cancer center study. J. Clin. Oncol. 1991;9(11):2067–2075. doi: 10.1200/JCO.1991.9.11.2067. [DOI] [PubMed] [Google Scholar]
  • 2.R Core Team . 2016. A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.http://www.r-project.org/ [Google Scholar]
  • 3.Desai P.B., Anderson C., Sietsema W.K. A comparison of the quality of data, assessed using query rates, from clinical trials conducted across developed versus emerging global regions. Drug Inf. J. 2012;46:455–463. [Google Scholar]
  • 4.Donovan J.L., Peters T.J., Noble S., Powell P., Gillatt D., Oliver S.E., Lane J.A., Neal D.E., Hamdy F.C. For the ProtecT Study Group. Who can best recruit to randomized trials? Randomized trial comparing surgeons and nurses recruiting patients to a trial of treatments for localized prostate cancer (the ProtecT study) J. Clin. Epidemiol. 2003;56:605–609. doi: 10.1016/s0895-4356(03)00083-0. [DOI] [PubMed] [Google Scholar]
  • 5.Fletcher B., Gheorghe A., Moore D., Wilson S., Damery S. Improving the recruitment activity of clinicians in randomized controlled trials: a systematic review. BMJ Open. 2012;2 doi: 10.1136/bmjopen-2011-000496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ford E., Jenkins V., Fallowfield L., Stuart N., Farewell D., Farewell V. Clinicians' attitudes towards clinical trials of cancer therapy. Br. J. Cancer. 2011;104(10):1535–1543. doi: 10.1038/bjc.2011.119. Epub 2011 Apr 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Friedman L.M., Furberg C.D., DeMets D.L. fourth ed. Springer; 2010. Fundamentals of Clinical Trials. [Google Scholar]
  • 8.Getz K. Predicting successful site performance. Appl. Clin. Trials. November 1st, 2011 http://www.appliedclinicaltrialsonline.com/predicting-successful-site-performance [Google Scholar]
  • 9.Lincoff A.M., Tardif J.-C., Neal B., Nicholls S.J., Rydén L., Schwartz G.G., Malmberg K., Buse J.B., Henry R.R., Wedel H., Weichert A., Cannata R., Grobbee D.E. Evaluation of the dual peroxisome proliferator-activated receptor α/γ agonist aleglitazar to reduce cardiovascular events in patients with acute coronary syndrome and type 2 diabetes mellitus: rationale and design of the AleCardio trial. Am. Heart J. 2013;166(3):429–434. doi: 10.1016/j.ahj.2013.05.013. [DOI] [PubMed] [Google Scholar]
  • 10.Lincoff A.M., Tardif J.-C., Schwartz G.G., Nicholls S.J., Rydén L., Neal B., Malmberg K., Wedel H., Buse J.B., Henry R.R., Weichert A., Cannata R., Svensson A., Volz D., Grobbee D.E. Effect of aleglitazar on cardiovascular outcomes after acute coronary syndrome in patients with type 2 diabetes mellitus: the AleCardio randomized clinical trial. J. Am. Med. Assoc. 2014;311(15):1515–1525. doi: 10.1001/jama.2014.3321. [DOI] [PubMed] [Google Scholar]
  • 11.Lovato L.C., Hill K., Hertert S., Hunninghake D.B., Probstfield J.L. Recruitment for controlled clinical trials: literature summary and annotated bibliography. Control Clin. Trials. 1997;l8:328–357. doi: 10.1016/s0197-2456(96)00236-x. [DOI] [PubMed] [Google Scholar]
  • 12.McDonald A.M., Knight R.C., Campbell M.K., Entwistle V.A., Grant A.M., Cook J.A., Elbourne D.R., Francis D., Garcia J., Roberts I., Snowdon C. What influences recruitment to randomised controlled trials? A review of trials funded by two UK funding agencies. Trials. 2006;7:9. doi: 10.1186/1745-6215-7-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Meier L. 2015. Grplasso: Fitting User Specified Models with Group Lasso Penalty.https://CRAN.R-project.org/package=grplasso [Google Scholar]
  • 14.Oude Rengerink K., Hooft L., Van den Boogaard N.M., Van der Goes B.Y., Bossuyt P.M.M., Mol B.W.J. University of Amsterdam; The Netherlands: 2014. Why Do Some Centres Recruit Better than Others? an Analysis of Recruitment Rates in 17 Randomized Clinical Trials in Obstetrics and Gynaecology. Chapter 4 in ‘Embedding Trials in Evidence-based Clinical Practice’ by Oude Rengerink, K. PhD thesis. [Google Scholar]
  • 15.Parker R.N., Fenwick R. The Pareto curve and its utility for open-ended income distributions in survey research. Soc. Forces. 1983;61(3):872–885. [Google Scholar]
  • 16.Rahman S., Majumder A.A., Shaban S.F., Rahman N., Ahmed M., Abdulrahman K.B., D'Souza U.J.A. Physician participation in clinical research and trials: issues and approaches. Adv. Med. Educ. Pract. 2011;2:85–93. doi: 10.2147/AMEP.S14103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Reuter S., Esche G. How effective are site questionnaires in predicting site performance? J. Clin. Res. Best Pract. 2007;3(4) [Google Scholar]
  • 18.Roberts S., Nowak G. Stabilizing the LASSO against cross-validation variability. Comput. Stat. Data Anal. 2014;70:198–211. [Google Scholar]
  • 19.Shea S., Bigger J.T., Campion J., Fleiss J.L., Rolnitzky L.M., Schron E., Gorkin L., Handshaw K., Kinney M.R., Branyon M. Enrollment in clinical trials: institutional factors affecting enrollment in the cardiac arrhythmia suppression trial (CAST) Control. Clin. Trials. 1992;13:466–486. doi: 10.1016/0197-2456(92)90204-d. [DOI] [PubMed] [Google Scholar]
  • 20.Tibshirani R. Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Ser. B. 1996;58(1):267–288. [Google Scholar]
  • 21.Van Buuren S., Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. Version 2.30J. Stat. Soft. 2011;45(3):1–67. http://www.jstatsoft.org/v45/i03/ [Google Scholar]
  • 22.Levett K.M., Roberts C.L., Simpson J.M., Morris J.M. Site-specific predictors of successful recruitment to a perinatal clinical trial. Clin. Trials. 2014;11:584–589. doi: 10.1177/1740774514543539. [DOI] [PubMed] [Google Scholar]

Articles from Contemporary Clinical Trials Communications are provided here courtesy of Elsevier

RESOURCES