Dealing with contamination in estimating sample size

The effect of potential contamination can be built into the sample size estimations as follows. Assume d represents the difference to be detected in the absence of contamination of the control group then d c, the reduced difference because of contamination, is:

d

c=(r 1-r 2 )(1-c)

where r 1 is the proportion with a positive outcome in the control group without contamination and r 2 is the proportion in the intervention group with c the estimated level of contamination.

The new sample size Sc is given by the formula:

Sc=(d 2/d c 2)S

where S is the original sample size assuming no contamination.

For example, assume that there is a treatment to reduce the event rate from 50% in the control group to 25% in the intervention arm, and that this would require a sample of 116 patients (for 80% power and 5% significance). Assuming an estimated 20% contamination of the control group, the effect size would be reduced from 25% to 20% (that is, (0.50-0.25)(1-0.2)). The new sample size would therefore be 182 (that is, (0.25x0.25)/(0.20x0.20)x116), an increase of 56%. Although this increase is substantial, it is still less than the 62% increase that would have been needed had a cluster design been adopted (assuming an intracluster correlation coefficient of 0.02 and a cluster size of 30).