Skip to main content
Sage Choice logoLink to Sage Choice
. 2019 May 10;29(6):1715–1727. doi: 10.1177/0962280219846146

Design and analysis of stratified clinical trials in the presence of bias

Ralf-Dieter Hilgers 1,, Martin Manolov 1, Nicole Heussen 1,2, William F Rosenberger 3
PMCID: PMC7270725  PMID: 31074333

Abstract

Background

Among various design aspects, the choice of randomization procedure have to be agreed on, when planning a clinical trial stratified by center. The aim of the paper is to present a methodological approach to evaluate whether a randomization procedure mitigates the impact of bias on the test decision in clinical trial stratified by center.

Methods

We use the weighted t test to analyze the data from a clinical trial stratified by center with a two-arm parallel group design, an intended 1:1 allocation ratio, aiming to prove a superiority hypothesis with a continuous normal endpoint without interim analysis and no adaptation in the randomization process. The derivation is based on the weighted t test under misclassification, i.e. ignoring bias. An additive bias model combing selection bias and time-trend bias is linked to different stratified randomization procedures.

Results

Various aspects to formulate stratified versions of randomization procedures are discussed. A formula for sample size calculation of the weighted t test is derived and used to specify the tolerated imbalance allowed by some randomization procedures. The distribution of the weighted t test under misclassification is deduced, taking the sequence of patient allocation to treatment, i.e. the randomization sequence into account. An additive bias model combining selection bias and time-trend bias at strata level linked to the applied randomization sequence is proposed. With these before mentioned components, the potential impact of bias on the type one error probability depending on the selected randomization sequence and thus the randomization procedure is formally derived and exemplarily calculated within a numerical evaluation study.

Conclusion

The proposed biasing policy and test distribution are necessary to conduct an evaluation of the comparative performance of (stratified) randomization procedure in multi-center clinical trials with a two-arm parallel group design. It enables the choice of the best practice procedure. The evaluation stimulates the discussion about the level of evidence resulting in those kind of clinical trials.

Keywords: Multi-center clinical trial, weighted t test, sample size, stratified randomization, type I error probability, selection bias, time-trend bias

1 Introduction

Large clinical trials often stratify the randomization on a small collection of covariates that may introduce heterogeneity into the patient stream. An important covariable in multi-center trials is often the clinical center, as different study personnel, clinical settings, and patient populations may result in differential study outcomes.1 A stratified population-based analysis can be performed with or without stratification in the design. Less is known about the impact of stratification when there is a bias in the clinical trial. In this paper, we explore this issue both for selection bias and chronological bias, and we demonstrate the impact of these analyses on a weighted stratified analysis. In so doing, we explore the role of specific stratified randomization procedures (RPs) and how certain procedures may mitigate the effects of bias. The recognition of the role of RPs in mitigating bias has been explored in prior research for unstratified trials.26 But because stratification into K strata creates K different independent randomized clinical trials, and a stratified test combines K independent tests, the impact of bias can be more pronounced.

The paper is organized as follows: In Section 2, we describe different stratified RPs and discuss aspects to formulate stratified versions of RPs. In Section 3, we derive a formulation of Fleiss1 stratified test statistic preserving the allocation sequence and derive the distribution of the test statistic taking bias into account but ignoring bias in the analysis and mention some sample size considerations for the stratified test. In Section 4, we specify the bias model in the form of an additive combination of strata-specific selection bias and strata-specific time-trend bias linked to stratified allocation sequence. The criterion introduced in Section 5 is used to summarize the impact of the allocation sequence-specific bias on the type I error probability over the range of all sequences induced by a specific RP. Consequently, an assessment of different RPs is enabled which guides the choice of an RPs for application in a particular clinical trial setting. The methodology is applied to some-specific scenarios in Section 6 to illustrate the effects. We discuss the findings in Section 7 and draw conclusions in Section 8.

2 Stratified RPs

RPs for clinical trials for two treatments are well described in literature.2 In principle, any RP used for two-treatment clinical trials can be employed within strata in a stratified randomization. A comprehensive review is given in Rosenberger.2 Complete randomization in which patients are assigned to treatments with probability 1/2 is rarely used in stratified clinical trials. Rather, some form of restricted randomization is employed in an effort to balance treatments within strata. Hilgers6 categorized restricted RPs that force balance in probability, force balance using a maximal tolerated imbalance, or force terminal balance. A selective list of restricted RPs is given as follows2:

  • Efron's biased coin design (EBC(p)), which consists of flipping a biased coin with probability p0.5 in favor of the treatment which has been allocated less frequently and a fair coin in case of equal numbers of treatment assignments,7

  • Big stick design (BSD(a)), which can be implemented via complete randomization with a forced deterministic assignment when a maximal tolerated imbalance a is reached during the enrollment,8

  • Random allocation rule (RAR), which assigns half the patients to E and C randomly,9

  • Permuted block randomization (PBR(b)) with block size b uses RAR within blocks of b patients, for b even,10

  • Maximal procedure (MP(a)) which uses the allocation sequences of RAR by additionally imposing a maximal tolerated imbalance (a) and assigning equal probability to all such sequences.11

Note that EBC(p) may be classified as a restricted RP forcing balance in probability. BSD(a) forces balance by maximal tolerated imbalance a during the allocation process but does not force terminal balance. Restricted RPs with a maximal tolerable imbalance and terminal balance are PBR(b) and MP(a).

The International Council of Harmonization stated in the E9 recommendation (ICH E9)

It is advisable to have a separate random scheme for each centre, i.e., to stratify by centre or to allocate several whole blocks to each centre.

The European Medicines Agency “Guideline on Clinical Trials in Small Populations” recommends stratified randomization to improve power. Using permuted blocks within each stratum is the most popular method of stratified randomization, and this is often called the stratified block design. Blocks can be selected with a fixed size or with variable sizes. However, blocking is not the only method to use within strata. The ICH E912 guidelines also state that “different trial designs will require different procedures for generating randomization schedules.” We now define stratified randomization more formally.

Consider the allocation zji{0,1} of patients i=1,,nj either to the treatment E if zji = 1 or C if zji = 0 in stratum j. An RP is implemented by assigning probabilities P(Zj=zj|zj{0,1}nj) to the possible allocations Zj=(Zj1,,Zjnj) in stratum j. A stratified randomization is implemented by creating independent randomization lists zj{0,1}nj for each stratum 1jK. Denote the possible allocations by z=(z1,,zK)×j=1K{0,1}nj={0,1}N with N=j=1Knj. Then a stratified version of an RP is implemented by assigning probabilities P(Z=z|z{0,1}N)=Πj=1KP(Zj=zj|zj{0,1}nj) to the possible allocations z{0,1}N. Of course, when implementing complete randomization, the stratified and unstratified RPs result in the same set if randomization sequences with the same probabilities because assignments are independent and equiprobable, i.e.

P(Z=z|z{0,1}N)=Πj=1KP(Zj=zj|zj{0,1}nj)=Πj=1K12nj=12N

However, when implementing a stratified restricted RP, this observation generally does not hold and some further definitions are necessary. Even in the very simple RAR, the set of possible randomization sequences is reduced considerable and the probability for the stratified allocation sequence becomes

P(Z=z|z{0,1}N)=Πj=1KP(Zj=zj|zj{0,1}nj)=Πj=1K(njnj/2)-1=(NN/2)-1

Another important aspect concerns the “balancing behavior” of restricted RPs. The term restricted refers to the fact that conditions on the randomization process are introduced to control the potential imbalance in the frequency of treatment allocations. Let s,1sN, denotes the patient's number preserving the appearance of patients in the trial so that s = 1 denotes the first patient and s = N the last enrolled patient. njE(s) and njC(s) denote the number of patients allocated to treatments E and C in stratum j until a total of s patients are recruited in the trial so far. Then, the imbalance in the number of allocations to treatment E and C in stratum j until a total of s patients are recruited is measured by

dj(s)=njE(s)-njC(s) (1)

Three definitions of imbalance are used in the following:

  1. An RP shows overall final balance, if d(N)=defj=1Kdj(N)=0

  2. An RP controls the final balance within strata, if dj(N)=0 for 1jK

  3. An RP controls the maximal tolerated imbalance, if -adj(s)a for all 1jK,1sN.

Of course controlling the overall final balance within strata does not imply to control final balance within stratum, i.e. dj(s)=0. Simply controlling the overall final balance may result in one stratum assigning patients only to E and another stratum assigning the same number of patients to C only, a case which invalidates the estimation of treatment difference within strata, presumably one issue in the ICH guidance. On the other hand, final balance within strata (dj(N)=0) implies overall final balance d(N)=0. In the following, we deal with stratified RPs and derive additional restrictions for meaningful definitions. With RAR, stratification and consequently final balance within strata require even samples sizes within each stratum and two treatment arms. The requirement of final balance within strata implies in the case of the stratified block design that the block sizes are divisors of the stratum sample sizes nj,1jK. Note that in stratified trials with a larger number of centers, usually smaller sample sizes in centers occur and thus final balance within strata forces the block sizes to be small, which will increase the potential for selection bias. Of course, center-specific block sizes are possible but rather uncommon. We will consider common block sizes in the following.

It should also be noted that the stratified RAR procedure in general cannot be considered as an unstratified PBR with block sizes nj,1jK, because enrollment of patients in the trial is parallel in strata, so that in general d(j=1knj)=0,1kK.

Similar problems arise, when controlling the maximal tolerated imbalance with margin a, which results in an upper overall bound of j=1K|dj(s)|Ka for all 1sN. Thus, controlling the maximal tolerated imbalance across strata could be accomplished by having a different imbalance level in each stratum. Two very straightforward simple settings are uniform spread |dj(s)|a/K,1jK for all 1sN resulting j=1K|dj(s)|j=1Ka/K=a for all 1sN or proportional spread with |dj(s)|anj/N,1jK for all 1sN resulting j=1K|dj(s)|j=1Kanj/N=a for all 1sN. Hilgers6 suggests defining a in relation to loss of power, and this implies stratum-specific maximal tolerated imbalance according to the rules above.

3 Stratified analysis

As mentioned in Section 1, a stratified randomization requires a stratified analysis, although a stratified analysis can be performed whether or not the randomization was stratified. In this section, we examine the distributional properties of a test statistic introduced by Fleiss1 (page 268, formulas 1 and 2) based on a weighted t statistic for the analysis of stratified clinical trials. While we do not consider randomization tests in this paper, clearly randomization-based inference is an attractive alternative, see Rosenberger.2 The reason for using a parametric t test is that it facilitates our goal of determining the effect of bias on inference, since we can derive the distribution of the test statistic under various forms of bias. In particular, in this section, we derive the non-centrality parameter for the distribution of the test statistic under alternative hypotheses and comment on how it can be used for sample size considerations. In the sequel, we are interested in the role of the RP in the analysis of stratified trials. Because Fleiss wrote specifically about centers rather than strata, we use both interchangeably; it should be clear that stratification can be done on variables other than center however.

We will consider a two-arm parallel group clinical trial stratified by K centers with no interim analysis. The response to the treatments E and C respectively is measured with the continuous normally distributed endpoint yji,1inj=njE+njC,1jK, on njE patients in the experimental group (E) and njC patients in the control group (C) in centers j. The total sample size is denoted by N=j=1Knj.

We use the allocation sequence notation of the statistical model assuming no treatment by center interaction by

yji=μEZji+μC(1-Zji)+τji+εji (2)

where εjiN(0,σ2),1inj=njE+njC,1jK. The expected treatment effects under E and C are denoted by μE and μC, respectively. The Zji denotes the allocation sequence indicator with Zji = 1 if patient i in center j is allocated to treatment E and Zji = 0 if patient i in center j is allocated to treatment C. Here and in what follows the notations njE=i=1njZji and njC=i=1nj(1-Zji) are used. Furthermore, τji denotes the fixed “bias” effect acting on the response of patient i in center j. Without loss of generality, we assume τji>0.

Fleiss's statistic to test the hypothesis (H0μE=μC) of no treatment effect across centers becomes

t=j=1KwjDjspj=1Kwj2/wj*=(j=1KwjDj)σ-1j=1Kwj2/wj*-1sp/σ (3)

where Dj=yjE-yjC are the mean treatment differences with yjE=1njEi=1njyjiZji and yjC=1njCi=1njyji(1-Zji). Furthermore, wj are weights associated with center j, wj*=njE×njCnjE+njC and sp is the pooled variance given by

sp2=(j=1K=E,C(nj-1)sj2)/(j=1K=E,C(nj-1)sj2)(j=1K=E,C(nj-1))(j=1K=E,C(nj-1))

Here sj2 denotes the variance of treatment group ℓ in center j. To derive the distribution of equation (3) under model (2), the distributions of the numerator as well denominator must be calculated. Note that the variance is given by

Var(wjDj)=j=1Kwj2(1njE2i=1njVar(yji)Zji+1njC2i=1njVar(yji)(1-Zji))=j=1Kwj2(σ2njE+σ2njC)=σ2j=1Kwj2njE+njCnjE×njC=σ2j=1Kwj2/wj*

so that the numerator of equation (3) has variance 1. Of course, Dj is normally distributed via the distribution of yij and thus the expectation of the denominator equals

δ(Z)=E((j=1KwjDj)σj=1Kwj2/wj*)=(σj=1Kwj2/wj*)-1E(j=1Kwj(1njEi=1njyjiZji-1njCi=1njyji(1-Zji)))=(σj=1Kwj2/wj*)-1((μE-μC)j=1Kwj+j=1Kwj(τjE-τjC))

where Z=(Z11,,Z1n1,,ZK1,,ZKnK)t is the observed allocation vector, τjE=1njEi=1njτjEZji and τjC=1njCi=1njτjC(1-Zji). In summary, the numerator is i.i.d. normally distributed with expectation δ(Z) and variance 1.

Next, we calculate the distribution of the denominator sp/σ using the allocation sequence notation

(j=1K=E,C(nj-1))sp2σ2=j=1K=E,C(nj-1)sj2σ2=j=1K((njE-1)sjE2σ2+(njC-1)sjC2σ2)=j=1K(i=1njZji(yjiσ-yjEσ)2+i=1nj(1-Zji)(yjiσ-yjCσ)2)

Note that Var(yji/σ)=1 and E(yji/σ)=(μE+τji)/σ for Zji = 1 and or E(yji/σ)=(μC+τji)/σ for Zji = 0 are i.i.d. normally distributed for all 1inj and 1jK. Following the arguments in Johnson and Kotz,13 the i=1njZji(yji-yjE)2/σ2 for group E, i.e. Zij = 1 and the i=1nj(1-Zji)(yji-yjC)2/σ2 for group C, i.e. Zij = 0 are χ2 distributed with njE-1 and njC-1 degrees of freedom respectively and non-centrality parameters

i=1njZjiσ2(μE+τji-1njEi=1njE(μE+τjE))2=i=1njZjiσ2(τji-τjE)2i=1nj(1-Zji)σ2(μC+τji-1njCi=1njC(μC+τjC))2=i=1nj(1-Zji)σ2(τji-τjC)2

Applying that the sum of independent non-central χνj2(λj) distributions is non-central χ2 with νj degrees of freedom and non-centrality parameter λj, it follows that the distribution of (j=1K=E,C(nj-1))sp2/σ2 is non-central χ2 with non-centrality parameter

λ(Z)=1σ2j=1K(i=1njZji(τji-τjE)2+i=1nj(1-Zji)(τji-τjC)2)=1σ2j=1Ki=1njτji2-j=1KnjEτjE2-j=1KnjCτjC2

and

df=j=1K=E,C(nj-1)=N-2K (4)

degrees of freedom. Finally, we have to show the independence of the numerator

j=1KwjDj=j=1Kwj(yjE-yjC)=j=1Kwj(1njEi=1njyjiZji-1njCi=1njyji(1-Zji))

and denominator

(j=1K=E,C(nj-1))sp2=j=1K(i=1njZji(yji-yjE)2+i=1nj(1-Zji)(yji-yjC)2)

as random variables. Here, Theorem 3 of Searle14 is used, stating that two random variables that can be expressed as xtAx and Bx, where xN(μ,V) is independent, if BVA=0. First, note that V=σ2I holds in our case.

For enabling the matrix notation of the above expressions, a usual design matrix X can be defined which includes two columns for the allocation indicator variables and N rows. Rearrangement of the design matrix by center and treatment group so that the first n1E observations belong to treatment E and the preceding n1C observations belong to C in center 1 and so on can be implemented by a suitable permutation matrix P. This simplifies the matrix notation of the above numerator and denominator in terms of B and A by reshuffling the allocation sequence Z=(Z11,,Zknk)t using a suitable permutation matrix P. This permutation matrix does not affect the matrix equation. Furthermore, note that it is sufficient to show the matrix equation for a particular center j because of the block structure implied by the independent observations in different centers. With this reshuffling, the notation for center j corresponding to Theorem 3 is

Bj=wj(1njE1njEt,-1njC1njCt)t

and with Hij=Inij-1nij1nij×nij the matrix

Aj=(HnjE,HnjC)=(InjE-1njE1njE×njE,InjC-1njC1njC×njC)

so that σ2BjInjE+njCAj=0 for all 1jK, which shows the independence. In summary, the distribution of the statistic in equation (3) is doubly non-central t,13 with non-centrality parameter

δ(Z)=(σj=1Kwj2/wj*)-1((μE-μC)j=1Kwj+j=1Kwj(τjE-τjC))λ(Z)=1σ2(j=1Ki=1njτji2-j=1KnjEτjE2-j=1KnjCτjC2) (5)

In the case sampling is “stratified” by center and the objective is to estimate the overall treatment effect accounting for center, Fleiss1 proposed the weights wj=wj*=njE×njCnjE+njC resulting in the test statistic (3)

t=j=1Kwj*Djspj=1Kwj*=j=1KnjE×njCnjE+njCDjspj=1KnjE×njCnjE+njC (6)

Of course, equation (5) implies that δ(Z) depends on the weights only and becomes

δ(z)=(σj=1Kwj*)-1((μE-μC)j=1Kwj*+j=1Kwj*(τjE-τjC)) (7)

In the case sampling is “stratified” by center and the objective is to estimate the overall treatment effect, Fleiss1 proposed the weights wj = 1 so that equation (3) becomes

t=j=1KDjspj=1K1/wj*

whereas the first non-centrality parameter δ(z) equals

δ(z)=(σj=1K1/wj*)-1(K·(μE-μC)+j=1K(τjE-τjC))

Weighting centers in the absence and presence of center-by-treatment interaction has discussed in detail by other authors.15

3.1 Sample size considerations

We now briefly discuss the aspects of the sample size and power calculation using the weighted t test statistic. Details can be found in the Supplementary Material Section S1. The results will be used in our numerical evaluation study.

Assuming no bias τji=0 in model (2), the sample size to prove the hypothesis H0μE=μC vs. H1μE-μC=Δ with the weighted t test (equation (3)) is given by

(j=1Kwj)2j=1Kwj2wj*=σ2Δ2(tN-2K(1-β)+tN-2K(1-α/2))2 (8)

The derivation assumed homogeneous variances in all groups and centers. Using the optimal weights of Fleiss,1 i.e. wj=wj*=njE×njCnjE+njC, equation (8) simplifies to

j=1KnjE×njCnjE+njC=σ2Δ2(tN-2K(1-β)+tN-2K(1-α/2))2

which in case of a balanced allocation ratio of r·nj=njE and (1-r)·nj=njC with 0rr for all 1jK becomes

r(1-r)N=σ2Δ2(tN-2K(1-β)+tN-2K(1-α/2))2 (9)

This formula, derived under the assumption of homogenous variances using the optimal weights and the allocation ratio of r, can be evaluated under various perspectives. One can determine the sample size necessary to detect a certain treatment effect of a clinical trial or to determine the power for various settings of the allocation ratio. Of course, the relationship of the sample size to the RP is obvious in the case of RPs forcing terminal balance. The power can also be related to RPs with the maximal tolerated imbalance margin a. The margin can be justified on the basis of the tolerable loss in power resulting from unbalanced allocation. In this case, equation (9) can be used to describe the relationship between r and the power. Both aspects are mentioned in the numerical evaluation study below. Using the weights wj=1, the left-hand side of equation (8) yields r(1-r)K2j=1K1nj and thus depends on the center sample sizes. In the case of equal center sample sizes, the same formula can be used for the unweighted test. In contrast to the weighted test, the sample size formula for the unweighted case requires assumptions if unbalanced sample sizes across centers are assumed.

4 Stratification in the presence of bias

We now turn to the question of bias. Two common forms of bias encountered in clinical trials are chronological bias due to time trends in patient outcomes,16 and selection bias, which can result in covariate imbalances and inflation of type I error rates.3 By definition, selection bias arises from the conscious or unconscious guessing of treatment assignments so that patients have a higher chance of assignment to the investigator's treatment of choice for those patients. While double-blinded studies, and multi-center studies with a central randomization unit mitigate the possibility of selection bias, Berger17 gives numerous examples of when selection bias has arisen in practice. As the ICH E9 Guidelines note,12

It is important to identify potential sources of bias as completely as possible so that attempts to limit such bias may be made…. The treatment effect and treatment comparisons should involve consideration of the potential contribution of bias to the p-value.

A recent paper provides a template on assessing the potential for chronological or selection bias and gives guidance on how to choose an appropriate RP and test statistic to account for that possibility.6 Here, we use a similar model to determine the impact on Fleiss's test in the presence of such bias.

We first specify a compound bias vector τji for stratum j and patient i that is a linear combination of a metric of chronological bias and selection bias. Taking into account the stratified randomization, we explore a linear time-trend16 model per stratum similar to Hilgers6 given by

τji=θjinjE+njClineartimetrend+ηjnjE(i-1)-njC(i-1)njE(i-1)+njC(i-1)selectionbias (10)

Hereby, the magnitude θj of the linear time trend varies between centers. Note that Hilgers6 proposed to formulate θj as fraction of the variance σ2. The second term generalizes the biasing policy first introduced by Proschan3 for the Gauss test and later investigated by Hilgers6 for the t test. The amount of selection bias ηj0 is allowed to vary between centers. The biasing policy in equation (10) “favors” or biases the expected response towards treatment E assuming if the less frequent treatment allocated so far is E assuming E will be allocated next. The direction ηj0 corresponds to favoring E. Other metrics have been used to define the selection bias metric, including just the sign of njE(i-1)-njC(i-1). We chose our metric so that it is roughly the same scale as the chronological bias metric.

5 Evaluation criterion

In our numerical evaluation study, we enumerate all possible randomization sequences for four different procedures and compare the bias to the type I error rate via computing the proportion of sequences that preserve the type I error rate at the nominal (0.05) level. If there is no bias (e.g. ηj=θj=0), 100% of sequences will preserve the type I error rate, regardless of the procedure used. To be more formal, denote the bias vector τ=(τ11,,τ1nj,,τK1,,τKnj) and the set of all sequences z generated by the RP by ΩRP. The test statistic t(z) depends on the randomization sequence is central t distributed with N-2k degrees of freedom under the null hypotheses and no bias τ=0, i.e. the null hypotheses H0 will be rejected at the α level if |t(z)|tN-2k(1-α/2). Then, the evaluation criterion can be expressed by using our distributional result above including the non-centrality parameter (7)

PRP,τ(H1|H0)=PRP,τ(Z{0,1}N|tN-2k,δ(Z),λ(Z)(1-α2)||tN-2k(1-α2)|)=zΩRP1{FN-2k,δ(z),λ(z)(tN-2k(α2))+FN-2k,-δ(z),λ(z)(tN-2k(α2))α}PRP,τ(Z=z) (11)

where FN-2k,δ,λ denotes the distribution function of the doubly non-central t-distribution with N-2K degrees of freedom and non-centrality parameters δ(Z) and λ(Z). In the ideal case, the probability should be 1, meaning that the 5% level is maintained by all allocation sequences. A value below 1 indicates that the actual type I error rate is higher than the target level of 5%. Note that this quantity summarizes the impact of bias over all randomization sequences and demonstrates the clinical consequences as well as the “go/no-go” decision of the regulator directly.

6 Numerical evaluation study

The objective of the following numerical evaluation study is to illustrate effects of stratification in both the randomization and the test statistic. It is not intended to conduct a comprehensive simulation study, recognizing that the specification of the sample size as well as θj and ηj depends on the practical situation. To be more specific, we start with a K = 2 center clinical trial and use a total sample size of 80 patients with common θj and ηj in all centers. The following reasoning leads to the specification of θj and ηj. Concerning the linear time trend θj, it should be noted that although the θj are defined within each center, the maximal extent of the time trend should not exceed σ. In contrast, although the magnitude of the selection bias effect ηj may vary between centers, it is like a population effect within center and no maximal extent restriction may apply. To relate the total sample size of 80 in a K = 2 center clinical trial to the effect size, formula (9) is used. The hypothesis H0μE=μC vs. H1μE-μC=Δ should be tested with the (optimal) weighted t test (equation (3)) assuming common variance σ = 1 and intended allocation ratio of 1:1 at the 5% significance level with a power of 80%. This results in a uniform effect size of Δ=0.635. With this effect size, the allocation ratio r is varied so that the loss in power does not exceed 2%. This yields an allocation ratio of r = 0.608 which translates to sample size of 31:49 corresponding to a maximal tolerable imbalance of 18. With the uniform or proportional spread, this results in a maximal tolerated imbalance by center of 4 and 5, respectively.

For illustration purposes, we will compare the stratified and unstratified versions of CR, BSD(9), PBR(4), and EBC(2/3). These four procedures represent complete randomization and the three types of restricted randomization mentioned earlier. These procedures were evaluated for two different splits of the total sample size (n1=n2=40 and n1=60,n2=20) and the combinations of selection and time-trend bias as (η,θ)=(0,0.2),(0.2,0),(0.2,0.2),(0,0.05),(0.05,0),(0.05,0.05). The evaluation criterion was the number of sequences protecting the 5% level for stratified and unstratified randomization as well as stratified (wj=1,wj*) and unstratified (us) test statistic and RP (see Supplementary Material). Note that unstratified randomization and test statistic correspond to the case presented in Hilgers.6 The results for (0,0.05), (0.05,0), (0.05,0.05) are given in Table 1 as well as for (0,0.2), (0.2,0), (0.2,0.2) in Table 2. In an additional evaluation, the number of centers K is increased from 2 to 8 while splitting the total sample uniformly to the centers to show, whether there is a different influence on the type I error rate. We used an R software script to conduct the analysis, see Supplementary Material.

Table 1.

Probability of stratified and unstratified randomization procedures to keep the 5% level for BSD(9), CR, EBC(0.67) and PBR(4) depending on the amount of selection η=0,0.05 and time-trend bias Θ=0,0.05 for different allocation ratios and analysis using weighted (wj*), unweighted (wj = 1) and unstratified (us) t test.

Allocation ratio Θ η Randomization procedure Stratified randomization
Unstratified randomization
wj* test wj = 1 test us-test wj* test wj = 1 test us-test
2060 0.05 0 BSD (9) 0.58 0.27 0.71 0.58 0.27 0.67
CR 0.58 0.28 0.67 0.58 0.28 0.68
EBC (0.67) 0.85 0.41 0.95 0.76 0.41 0.96
PBR (4) 1.00 0.93 1.00 1.00 0.93 1.00
0 0.05 BSD (9) 0.35 0.12 0.47 0.35 0.12 0.47
CR 0.34 0.11 0.46 0.34 0.11 0.47
EBC (0.67) 0.11 0.04 0.19 0.17 0.04 0.19
PBR (4) 0.00 0.00 0.00 0.00 0.00 0.00
0.05 0.05 BSD (9) 0.43 0.17 0.66 0.41 0.17 0.63
CR 0.42 0.17 0.63 0.42 0.17 0.63
EBC (0.67) 0.22 0.08 0.84 0.30 0.08 0.86
PBR (4) 0.03 0.00 1.00 0.03 0.00 1.00
4040 0.05 0 BSD (9) 0.57 0.31 0.76 0.59 0.31 0.67
CR 0.59 0.32 0.68 0.59 0.32 0.68
EBC (0.67) 0.82 0.52 0.97 0.74 0.52 0.96
PBR (4) 1.00 1.00 1.00 1.00 1.00 1.00
0 0.05 BSD (9) 0.35 0.16 0.47 0.34 0.16 0.47
CR 0.34 0.16 0.47 0.36 0.16 0.47
EBC (0.67) 0.11 0.04 0.20 0.15 0.04 0.20
PBR (4) 0.00 0.00 0.00 0.00 0.00 0.00
0.05 0.05 BSD (9) 0.42 0.23 0.69 0.43 0.23 0.62
CR 0.43 0.23 0.63 0.42 0.23 0.63
EBC (0.67) 0.22 0.10 0.88 0.29 0.10 0.87
PBR (4) 0.03 0.00 1.00 0.02 0.00 1.00
8 × 10 0.05 0 BSD (2) 0.79 0.13 0.98 0.68 0.13 0.66
CR 0.69 0.10 0.68 0.68 0.10 0.68
EBC (0.67) 0.78 0.12 0.91 0.71 0.12 0.96
PBR (2) 1.00 0.36 1.00 0.81 0.36 1.00
0 0.05 BSD (2) 0.00 0.00 0.15 0.04 0.00 0.48
CR 0.05 0.00 0.48 0.05 0.00 0.47
EBC (0.67) 0.01 0.00 0.21 0.02 0.00 0.19
PBR (2) 0.00 0.00 0.00 0.00 0.00 0.00
0.05 0.05 BSD (2) 0.00 0.00 0.86 0.05 0.00 0.63
CR 0.05 0.00 0.62 0.05 0.00 0.63
EBC (0.67) 0.01 0.00 0.79 0.02 0.00 0.86
PBR (2) 0.00 0.00 1.00 0.00 0.00 1.00

BSD: big stick design; EBC: Efron's biased coin design; PBR: permuted block randomization; CR: complete randomization.

Table 2.

Probability of stratified and unstratified randomization procedures to keep the 5% level for BSD(9), CR, EBC(0.67) and PBR(4) depending on the amount of selection η=0,0.2 and time-trend bias Θ=0,0.2 for different allocation ratios and analysis using weighted (wj*), unweighted (wj = 1) and unstratified (us) t test.

Allocation ratio Θ η Randomization procedure Stratified randomization
Unstratified randomization
wj* test wj = 1 test us-test wj* test wj = 1 test us-test
2060 0.2 0 BSD (9) 0.65 0.31 0.71 0.67 0.31 0.67
CR 0.66 0.32 0.68 0.66 0.32 0.68
EBC (0.67) 0.90 0.47 0.95 0.84 0.47 0.96
PBR (4) 1.00 0.97 1.00 1.00 0.97 1.00
0 0.2 BSD (9) 0.45 0.15 0.54 0.43 0.15 0.53
CR 0.45 0.16 0.54 0.44 0.16 0.55
EBC (0.67) 0.15 0.05 0.23 0.23 0.05 0.24
PBR (4) 0.00 0.00 0.00 0.00 0.00 0.00
0.2 0.2 BSD (9) 0.48 0.18 0.62 0.48 0.18 0.61
CR 0.48 0.18 0.62 0.48 0.18 0.60
EBC (0.67) 0.28 0.09 0.83 0.34 0.09 0.84
PBR (4) 0.03 0.00 1.00 0.03 0.00 1.00
4040 0.2 0 BSD (9) 0.66 0.36 0.75 0.66 0.36 0.65
CR 0.67 0.36 0.67 0.67 0.36 0.67
EBC (0.67) 0.89 0.60 0.97 0.82 0.60 0.96
PBR (4) 1.00 1.00 1.00 1.00 1.00 1.00
0 0.2 BSD (9) 0.45 0.22 0.53 0.45 0.22 0.54
CR 0.44 0.21 0.54 0.45 0.21 0.54
EBC (0.67) 0.15 0.05 0.24 0.22 0.05 0.25
PBR (4) 0.00 0.00 0.00 0.00 0.00 0.00
0.2 0.2 BSD (9) 0.48 0.25 0.67 0.48 0.25 0.61
CR 0.48 0.25 0.62 0.47 0.25 0.61
EBC (0.67) 0.27 0.11 0.86 0.33 0.11 0.84
PBR (4) 0.01 0.00 1.00 0.01 0.00 1.00
8 × 10 0.2 0 BSD (2) 0.72 0.11 0.97 0.61 0.11 0.66
CR 0.61 0.08 0.68 0.62 0.08 0.68
EBC (0.67) 0.70 0.10 0.91 0.64 0.10 0.96
PBR (2) 1.00 0.37 1.00 0.72 0.37 1.00
0 0.2 BSD (2) 0.00 0.00 0.25 0.03 0.00 0.54
CR 0.03 0.00 0.54 0.03 0.00 0.54
EBC (0.67) 0.00 0.00 0.26 0.01 0.00 0.24
PBR (2) 0.00 0.00 0.00 0.00 0.00 0.00
0.2 0.2 BSD (2) 0.00 0.00 0.82 0.04 0.00 0.60
CR 0.04 0.00 0.61 0.03 0.00 0.61
EBC (0.67) 0.00 0.00 0.77 0.01 0.00 0.85
PBR (2) 0.00 0.00 1.00 0.00 0.00 1.00

BSD: big stick design; EBC: Efron's biased coin design; PBR: permuted block randomization; CR: complete randomization.

In the case where both biases are present, the stratified randomization with stratified analysis performs worse than unstratified analysis scenarios. The magnitude does not depend on the balancing of sample sizes between centers (2060 vs. 4040; Table 1). Using the favored weighted test statistic following a stratified analysis, it appears that BSD and CR perform much better than all other RPs in the both biased scenarios. However, the effect depends markedly on the type of bias. In the case of only time trend in the data, the final balance procedures (EBC(0.67), PBR) perform better than BSD or CR as well as with the unstratified analysis following unstratified randomization. Weightig with wj* performs uniformly better than weighting with wj=1.

7 Discussion

The approach presented in this paper for multi-center trials follows the ideas of the evaluation of randomization procedures for design optimization (ERDO)6 framework. However as outlined, many aspects need to be addressed to demonstrate the contribution of randomization in mitigating bias during the planning phase of a multi-center trial.

Although Kraemer18 discussed various RPs in clinical trials including stratification, the most common choice of stratified randomization is PBR with common block size.1922 We have presented new aspects to formulate RPs, whether unrestricted or restricted, in order to induce the final balance or maximal tolerated imbalance including PBR in a stratified form. We have discussed the formulation of stratified unrestricted and restricted procedures forcing balance in probability, forcing balance by maximal tolerable imbalance, and forcing terminal balance as three subclassifications of restricted RPs.

There are several limitations of this study. First, our compound criterion for selection bias and chronological bias imposes similar scaling, but it is difficult or impossible to scale them identically. Second, the weighting of the two criteria is subjective and may be adjusted to account for the different scaling. Although our statistical test assumes homogeneous variances across centers, the methodology can be used with standardized observation in the case of known heterogeneous variances across centers.

Our proposed approach is demonstrated in a numerical evaluation study. Here, we use very specific settings, e.g. common selection bias and time-trend effects across centers, limited sample sizes corresponding to a particular effect size. We are aware that this evaluation study does not mirror all practical situations. However, specific practical situations of the multi-center clinical trial to be planned can be embedded easily into the evaluation study to demonstrate the corresponding effects. Moreover, the corresponding results for different evaluation metrics, e.g. mean type I error probability, are supplemented in tables. We used the supplemented R code for all computations.

We have chosen to use a parametric t test as our evaluation statistic rather than the more natural randomization test.23 Randomization tests can be computed easily through the Monte Carlo re-randomization methodology, although power considerations are computationally intensive. They tend to preserve type I error rates under time trends and have no distributional assumptions.2 Randomization tests can be formulated easily incorporating stratification, but the theoretical results we have derived herein would be impossible for exact randomization tests or Monte Carlo re-randomization tests.

Our theoretical derivation could be applied to a general class of weights wj including, in particular, the inverse variance approach, although we focus our numerical evaluation study to the weights wj = 1 or wj=wj*, see Lin.15 Lin stated that many statisticians as well as the U.S. Food and Drug Administration recommend the unweighted wj = 1 analysis.

Sample size considerations are presented by various authors. Whereas Ruvuna24 and Vierron and Giraudeau25 used the normal approximation formula, Lin's15 approach is based on the t statistic. We presented a general sample size formula for the weighted t test with K centers which generalizes Lin's approach for the two center case and the weighted (wj*) and unweighted wj = 1 evaluation. Among others, our results can be used to demonstrate the effect on the power when adding centers during progress of the trial, which seem to be common practice to increase recruitment. Furthermore, our formulas can be used for power considerations, when imbalance in sample sizes between centers is assumed.24,25 Although it was not discussed in here, the approach can be extended to the case of random center size by using the corrected variance formulas of Ganju and Mehrotra.26

Although some authors mention that randomization is used to avoid bias, bias is quite likely to occur when the PBR is used, particularly when the block size is small. We present a general formal analytical approach to show how RPs are able to limit the impact of selection and chronological bias on the test decision.

The idea behind the selection bias used originates from a natural preference for one of the treatments. Furthermore, it seems to be very common, assuming that the allocation process tends to produce a balanced allocation ratio at least at the end, that investigators would believe that the treatment used most frequently thus far is less likely to appear next. Combining these two arguments, it may be reasonable, that in the situation of knowledge or best guessing what the next allocation would probably be, to choose the next patient according to the expected next treatment. This is also in line with the patient's hope to be assigned to the better treatment. Summarizing, it has to be stated that this process is unconscious or subconscious. The question is not whether selection bias occurs or not, but rather how much impact of bias one is willing to accept. This can be investigated with the proposed sensitivity analysis approach even in the planning phase. With this consideration, a unique approach is presented to link the randomization process of unrestricted or restricted procedures with the trial outcome.

Of course, other biases for time trend, e.g. log-time trend and step time trend16 or attrition bias could be easily implemented in the modeling and then used in a numerical evaluation study. For instance, attrition bias could be modeled by a variable taking 0 or 1 on missingness, which offers opportunities, to study mechanism like missing at random.

Within this paper, we formulate a biasing policy for selection and chronological bias for a two-arm, parallel group, multi-center trial, according to the weighted stratified t test procedure proposed by Fleiss.1 We further derive the distribution of the stratified weighted test statistic to calculate the impact on the type I error rate. Finally, the impact of the combined additive bias in multi-center trials using the unstratified t test compared to the weighted stratified t test is demonstrated in a simulation study.

8 Conclusion

Stratification in the randomization process makes the analysis sensitive to bias, i.e. results in type I error inflation. Procedures forcing terminal balance are worse in the cases where the study is prone to selection bias, irrespective if time trend is present additionally. Unbalanced sample size between centers does not affect the results. This leads to the conclusion that stratification in the randomization should be considered carefully if bias is supposed to be present. In summary, the presented approach contributes to optimizing the design of clinical trials stratified by center with respect to improve the derived level of evidence.

Supplemental Material

Supplemental Material1 - Supplemental material for Design and analysis of stratified clinical trials in the presence of bias

Supplemental material, Supplemental Material1 for Design and analysis of stratified clinical trials in the presence of bias by Ralf-Dieter Hilgers, Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methods in Medical Research

Supplemental Material2 - Supplemental material for Design and analysis of stratified clinical trials in the presence of bias

Supplemental material, Supplemental Material2 for Design and analysis of stratified clinical trials in the presence of bias by Ralf-Dieter Hilgers, Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methods in Medical Research

Supplemental Material3 - Supplemental material for Design and analysis of stratified clinical trials in the presence of bias

Supplemental material, Supplemental Material3 for Design and analysis of stratified clinical trials in the presence of bias by Ralf-Dieter Hilgers, Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methods in Medical Research

Supplemental Material4 - Supplemental material for Design and analysis of stratified clinical trials in the presence of bias

Supplemental material, Supplemental Material4 for Design and analysis of stratified clinical trials in the presence of bias by Ralf-Dieter Hilgers, Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methods in Medical Research

Supplemental Material5 - Supplemental material for Design and analysis of stratified clinical trials in the presence of bias

Supplemental material, Supplemental Material5 for Design and analysis of stratified clinical trials in the presence of bias by Ralf-Dieter Hilgers, Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methods in Medical Research

Supplemental Material6 - Supplemental material for Design and analysis of stratified clinical trials in the presence of bias

Supplemental material, Supplemental Material6 for Design and analysis of stratified clinical trials in the presence of bias by Ralf-Dieter Hilgers, Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methods in Medical Research

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the IDeAl project funded from the European Union Seventh Framework Programme (FP7 2007-2013) under grant agreement No. 602552. RDH received funding from the European Joint Programme on Rare Diseases within European Union’s Horizon 2020 research and innovation program under grant agreement No. 825575. Part of the work was done while RDH joined 2018 workshop on Design of Experiments: New Challenges at CIRM Luminy, France. RDH was granted by RWTH Aachen University under project rwth0334 with computing resources for simulations.

Supplemental material

Supplemental material is available for this article online.

References

  • 1.Fleiss JL. Analysis of data from multiclinic trials. Control Clin Trials 1986; 7: 267–275. [DOI] [PubMed] [Google Scholar]
  • 2.Rosenberger WF, Lachin J. Randomization in clinical trials: theory and practice, New York, NY: Wiley, 2016. [Google Scholar]
  • 3.Proschan M. Influence of selection bias on type I error rate under random permuted block designs. Stat Sin 1994; 4: 219–231. [Google Scholar]
  • 4.Kennes LN, Cramer E, Hilgers RD, et al. The impact of selection bias on test decisions in randomized clinical trials. Stat Med 2011; 30: 2573–2581. [DOI] [PubMed] [Google Scholar]
  • 5.Tamm M, Cramer E, Kennes LN, et al. Influence of selection bias on the test decision – a simulation study. Methods Inf Med 2012; 51: 138–143. [DOI] [PubMed] [Google Scholar]
  • 6.Hilgers RD, Uschner D, Rosenberger WF, et al. ERDO – a framework to select an appropriate randomization procedure for clinical trials. BMC Med Res Methodol 2017; 17(1): 159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Efron B. Forcing a sequential experiment to be balanced. Biometrika 1971; 58: 403–417. [Google Scholar]
  • 8.Soares JF, Wu CFJ. Some restricted randomization rules in sequential designs. Commun Stat Theory Methods 1982; 12: 2017–2034. [Google Scholar]
  • 9.Mantel N. Random numbers and experimental design. Ann Stat 1969; 23: 32–34. [Google Scholar]
  • 10.Zelen M. The randomization and stratification of patients to clinical trials. J Chronic Dis 1974; 27: 365–375. [DOI] [PubMed] [Google Scholar]
  • 11.Berger VW, Ivanova A, Knoll DM. Minimizing predictability while retaining balance through the use of less restrictive randomization procedures. Stat Med 2003; 22(19): 3017–3028. [DOI] [PubMed] [Google Scholar]
  • 12.ICH E9. Statistical principles for clinical trials. https://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E9/Step4/E9_Guideline.pdf (accessed 24 April 2019).
  • 13.Johnson NL, Kotz S. Continuous univariate distributions – 2, New York, NY: Wiley, 1970. [Google Scholar]
  • 14.Searle SR. Linear models. New York, NY: Wiley, 1971.
  • 15.Lin Z. An issue of statistical analysis in controlled multi centre studies: how shall we weight the centres?. Stat Med 1999; 18: 365–373. [DOI] [PubMed] [Google Scholar]
  • 16.Tamm M, Hilgers RD. Chronological bias in randomized clinical trials arising from different types of unobserved time trends. Methods Inf Med 2014; 53: 501–510. [DOI] [PubMed] [Google Scholar]
  • 17.Berger VW. Selection bias and covariate imbalances in randomized clinical trials. Chichester: Wiley, 2005. [DOI] [PubMed]
  • 18.Kraemer H, Fendt KH. Random assignment in clinical trials: issues in planning (infant health and development program). J Clin Epidemiol 1990; 43: 1157–1167. [DOI] [PubMed] [Google Scholar]
  • 19.Ganju J, Zhou K. The benefit of stratification in clinical trials revisited. Stat Med 2011; 30: 2881–2889. [DOI] [PubMed] [Google Scholar]
  • 20.Pickering RM, Weatherall M. The analysis of continuous outcomes in multi-centre trials with small centre sizes. Stat Med 2007; 26: 5445–5456. [DOI] [PubMed] [Google Scholar]
  • 21.Chu R, Thabane L, Ma J, et al. Comparing methods to estimate treatment effects on a continuous outcome in multicentre randomized controlled trials: a simulation study. BMC Med Res Methodol 2011; 11: 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Feaster DJ, Mikulich-Gilbertson S, Brinks AM. Modeling site effects in the design and analysis of multisite trials. Am J Drug Alcohol Abuse 1998; 37: 383–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zheng L, Zelen M. Multi-center clinical trials: randomization and ancillary statistics. Ann Appl Stat 2008; 2(2): 582–600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ruvuna F. Unequal center sizes, sample size, and power in multicenter clinical trials. Drug Inf J 2004; 38: 387–394. [Google Scholar]
  • 25.Vierron E, Giraudeau B. Sample size calculation for multicenter randomized trial: Taking the center effect into account. Control Clin Trials 2007; 28: 451–458. [DOI] [PubMed] [Google Scholar]
  • 26.Ganju J, Mehrotra DV. Stratified experiments reexamined with emphasis on multicenter trials. Control Clin Trials 2003; 24: 167–181. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material1 - Supplemental material for Design and analysis of stratified clinical trials in the presence of bias

Supplemental material, Supplemental Material1 for Design and analysis of stratified clinical trials in the presence of bias by Ralf-Dieter Hilgers, Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methods in Medical Research

Supplemental Material2 - Supplemental material for Design and analysis of stratified clinical trials in the presence of bias

Supplemental material, Supplemental Material2 for Design and analysis of stratified clinical trials in the presence of bias by Ralf-Dieter Hilgers, Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methods in Medical Research

Supplemental Material3 - Supplemental material for Design and analysis of stratified clinical trials in the presence of bias

Supplemental material, Supplemental Material3 for Design and analysis of stratified clinical trials in the presence of bias by Ralf-Dieter Hilgers, Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methods in Medical Research

Supplemental Material4 - Supplemental material for Design and analysis of stratified clinical trials in the presence of bias

Supplemental material, Supplemental Material4 for Design and analysis of stratified clinical trials in the presence of bias by Ralf-Dieter Hilgers, Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methods in Medical Research

Supplemental Material5 - Supplemental material for Design and analysis of stratified clinical trials in the presence of bias

Supplemental material, Supplemental Material5 for Design and analysis of stratified clinical trials in the presence of bias by Ralf-Dieter Hilgers, Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methods in Medical Research

Supplemental Material6 - Supplemental material for Design and analysis of stratified clinical trials in the presence of bias

Supplemental material, Supplemental Material6 for Design and analysis of stratified clinical trials in the presence of bias by Ralf-Dieter Hilgers, Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methods in Medical Research


Articles from Statistical Methods in Medical Research are provided here courtesy of SAGE Publications

RESOURCES