Sample Size Considerations for Split-Mouth Design

Hong Zhu; Song Zhang; Chul Ahn

doi:10.1177/0962280215601137

. Author manuscript; available in PMC: 2018 Dec 1.

Published in final edited form as: Stat Methods Med Res. 2015 Aug 24;26(6):2543–2551. doi: 10.1177/0962280215601137

Sample Size Considerations for Split-Mouth Design

Hong Zhu ¹, Song Zhang ¹, Chul Ahn ¹

PMCID: PMC5573650 NIHMSID: NIHMS894714 PMID: 26303156

Abstract

Split-mouth designs are frequently used in dental clinical research, where a mouth is divided into two or more experimental segments that are randomly assigned to different treatments. It has the distinct advantage of removing a lot of inter-subject variability from the estimated treatment effect. Methods of statistical analyses for split-mouth design have been well developed. However, little work is available on sample size consideration at the design phase of a split-mouth trial, although many researchers pointed out that the split-mouth design can only be more efficient than a parallel-group design when within-subject correlation coefficient is substantial. In this paper, we propose to use the generalized estimating equation (GEE) approach to assess treatment effect in split-mouth trials, accounting for correlations among observations. Closed-form sample size formulas are introduced for the split-mouth design with continuous and binary outcomes, assuming exchangeable and “nested exchangeable” correlation structures for outcomes from the same subject. The statistical inference is based on the large sample approximation under the GEE approach. Simulation studies are conducted to investigate the finite-sample performance of the GEE sample size formulas. A dental clinical trial example is presented for illustration.

Keywords: continuous and binary outcomes, dental clinical trial, GEE, sample size, split-mouth

1. Introduction

In dental clinical trials, clinicians have the option of randomizing treatments over individuals (mouth level) or over segments in the mouth (segment level) within each individual. In the former case, all segments/sites of an individual receive the same treatment, which is called the parallel-group design. In the latter case, the split-mouth design is adopted, where the mouth is divided into two or more experimental segments that are randomly assigned to different treatments. For example, Morrow et al.¹ enrolled 23 patients in a split-mouth trial for the treatment of gingivitis. Four sites located on the left or right side (segment) of a patient’s mouth were randomly assigned to either the experimental treatment (chlorhexidine) or control (saline). The split-mouth design has the advantage of removing a lot of inter-subject variability from the estimated treatment effect, and potentially requires fewer subjects than a parallel-group trial with the same power. Statistical methods for the analysis of quantitative outcomes arising from the split-mouth design have been developed, as reviewed in Lesaffre et al.² Mixed-effect ANOVA models, generalized mixed-effect models, and generalized estimating equation (GEE) technique³ can be used to test or estimate the treatment effect, adjusting for the clustering of data. Particularly, Donner⁴ proposed an adjusted chi-square test for analyzing binary data in a split-mouth design, and Donner⁵ discussed the problem using GEE method to further correct for imbalance in baseline measurements. Split-mouth trials are not recommended when contamination between sites is suspected and when finding matching sites is impossible in subjects.⁶ In addition, some researchers pointed out that the split-mouth design can only be more efficient than the parallel-group design when the within-patient correlation coefficient is substantial, where efficiency is determined in terms of the number of measurements needed.⁷ However, relatively little attention has been paid in literature to the sample size consideration for the split-mouth design and a rigorous comparison with the parallel-group design is not available. This paper intends to fill this gap by developing closed-form sample size formulas to help clinicians on effectively designing a split-mouth study.

The GEE method has been widely used to model repeated measurement data or clustered data due to its robustness against mis-specification of the true correlation structure and ability to accommodate missing data. Sample size calculation based on the GEE method has been explored by many researchers in literature. Liu and Liang⁸ developed a sample size formula based on a generalized score test. Jung and Ahn^9,10 proposed sample size approaches to comparing rates of change for repeated continuous and binary measurements between two treatment groups. Zhang and Ahn¹¹ and Lou et al.¹² investigated sample size calculation for time-averaged differences for continuous and binary outcomes from repeated measurement studies, in presence of missing data. In this article, we derive closed-form sample size formulas based on the GEE approach, which properly account for correlations among observations in the split-mouth design, for both continuous and binary outcomes. We also explore the connection between the sample size requirement for the split-mouth design and that for the parallel-group design, and demonstrate how correlations among observations can influence their relative efficiency. There are other application areas in health sciences, such as dermatology and certain animal studies,^13,14 which also utilizes the split-mouth design, or more accurately, the split-cluster design. Nevertheless, we adopt the terminology of the split-mouth design here, and the proposed methods can be directly applied to the split-cluster design when naturally occurring clusters such as multiple sites or organs in the same subject are assigned to different treatments.

The remainder of the article is arranged as follows. In Section 2, we present the data structure and notations related to the split-mouth design, and propose a GEE sample size approach based on a marginal linear regression model for continuous outcomes. In Section 3, a GEE sample size approach based on a marginal logistic regression model for binary outcomes from the split-mouth design is presented. Closed-form sample size formulas are derived based on the large sample approximation under the GEE approach. Section 4 describes simulation studies conducted to investigate performances of the GEE sample size formulas for continuous and binary outcomes with exchangeable and “nested exchangeable”¹⁵ correlation structures. It shows that power levels as predicted by the GEE sample size formula generally agree well with the empirical power. In Section 5, the proposed sample size method is illustrated with a dental clinical trial example. Some recommendations at the design stage of a split-mouth trial and potential extensions of the proposed methods are discussed in Section 6.

2. Statistical model and sample size approach for continuous outcomes

Consider a split-mouth design where two treatments are involved. In the split-mouth design, each subject’s mouth is divided into two segments, and segments of m₁_j and m₂_j sites within the j th subject are randomly assigned to receive either experimental or control treatments, where m₁_j + m₂_j = m_j, and j = 1,…,n. Let Y_ijl denote the continuous outcome on site l of subject j under treatment i, for i = 1,2, j = 1,…,n and l = 1,…,m_ij. We further assume that there is a common correlation ρ among outcomes (Y_ijl, Y_i_′_jl_′) within the same subject, where i ≠ i′ or j ≠ j′. Observed outcomes from different subjects are assumed to be independent. Let r₁_j = 1 and r₂_j = 0 denote the experimental and control treatments, respectively.

To make an inference on the differential effect of treatment on Y_ijl, we assume a linear regression model,

Y_{ijl} = β_{1} + β_{2} r_{i j} + ε_{ijl},

(1)

where β₁ is the intercept, β₂ quantifies the differential effect of treatment, and ε_ijl denotes random error. It is common to assume that E(ε_ijl) = 0 and Var(ε_ijl) = σ². Our primary interest is to test the null hypothesis H₀:β₂ = 0, accounting for the within-subject correlation. Let β = (β₁, β₂)′. For the convenience of discussion, model (1) can be re-written as

Y_{j} = X_{j} β + ε_{j},

where for subject j, Y_j is an m_j × 1 vector of outcomes, ε_j is an m_j × 1 vector of random errors, and X_j is an m_j × 2 design matrix

X_{j} = (\begin{array}{l} 1_{m_{1 j}} & 1_{m_{1 j}} \\ 1_{m_{2 j}} & 0_{m_{2 j}} \end{array}),

and 1_m is an m × 1 vector of 1’s and 0_m is an m × 1 vector of 0’s.

Under the model assumption, the true correlation matrix is given by R₀ = (1 − ρ)I_m₁_j ₊ _m₂_j + ρJ _m₁_j ₊ _m₂_j (exchangeable), where I is an identity matrix and J is a square matrix of 1’s. We use the independent working correlation structure to derive the GEE estimator β̂ = (β̂₁,β̂₂)′, which is obtained by solving equation S_n(β) = 0 with

S_{n} (β) = n^{- 1 / 2} \sum_{j} {(Y_{j} - X_{j} β)}^{T} X_{j} .

Liang and Zeger³ showed that $n^{\frac{1}{2}} (\hat{β} - β)$ is approximately normal with mean 0 and the variance is consistently estimated by

\sum_{n} = σ^{2} {(n^{- 1} \sum_{j} {X_{j}}^{T} X_{j})}^{- 1} (n^{- 1} \sum_{j} {X_{j}}^{T} R_{0} X_{j}) {(n^{- 1} \sum_{j} {X_{j}}^{T} X_{j})}^{- 1} .

The robust variance of n^1/2β̂₂ is the (2,2)th element of Σ_n, denoted by $σ_{2}^{2}$ . We reject H₀ if $| \frac{n^{\frac{1}{2}} {\hat{β}}_{2}}{σ_{2}} | > z_{1 - α / 2}$ , where z₁₋_α_/2 is the $100 (1 - \frac{α}{2}) th$ percentile of a standard normal distribution. Sample size formula for the split-mouth design can be developed based on this result.

For illustration, we assume a balanced design where m₁_j = m₂_j = k. By some algebra, it can be shown that, as n → ∞,

\begin{matrix} {(n^{- 1} \sum_{j} {X_{j}}^{T} X_{j})}^{- 1} \to (\begin{matrix} \frac{1}{k} & - \frac{1}{k} \\ - \frac{1}{k} & \frac{2}{k} \end{matrix}), \\ n^{- 1} \sum_{j} {X_{j}}^{T} R_{0} X_{j} \to (\begin{matrix} 2 k {(1 + (2 k - 1) ρ} & k {(1 + (2 k - 1) ρ} \\ k {(1 + (2 k - 1) ρ} & k {(1 + (k - 1) ρ} \end{matrix}) . \end{matrix}

The (2,2)th element of Σ_n is simplified as $σ_{2}^{2} = \frac{2 σ^{2} (1 - ρ)}{k}$ . Then, given type I error α, power 1 − γ, and the true value of differential treatment effect β₂ the required total number of subjects for the split-mouth design is

n_{s} = \frac{2 σ^{2} (1 - ρ) {(z_{1 - \frac{α}{2}} + z_{1 - r})}^{2}}{k β_{2}^{2}},

(2)

where k is the number of sites per segment. Since each subject contributes 2k sites in a split-mouth trial, the total number of sites is m_s = 2kn_s. On the other hand, the required total number of subjects for the parallel-group design with a cluster size of k is¹⁶

n_{p} = \frac{{1 + (k - 1) ρ}}{k} \times \frac{4 σ^{2} {(z_{1 - \frac{α}{2}} + z_{1 - r})}^{2}}{β_{2}^{2}} .

In the parallel-group design the total number of sites is m_p = kn_p. Therefore, in terms of the number of subjects, the relative efficiency of the split-mouth design over the parallel-group design can be expressed as

\frac{n_{p}}{n_{s}} = \frac{2 {1 + (k - 1) ρ}}{1 - ρ},

from which we find that the relative efficiency of the split-mouth design increases with the within-subject correlation ρ. The relative efficiency of the split-mouth design in terms of the number of sites is

\frac{m_{p}}{m_{s}} = \frac{n_{p}}{2 n_{s}} = \frac{1 + (k - 1) ρ}{1 - ρ},

Further, for the special case of one site per segment (k = 1), the relative efficiency of the split-mouth design in terms of the number of subjects becomes

\frac{n_{p}}{n_{s}} = \frac{2}{1 - ρ},

which was discussed in Wang and Bakhai.¹⁷

Additionally, we can employ a more general correlation structure by assuming a common correlation ρ among outcomes (Y_ijl, Y_ijl_′) observed in the same segment, referred to as the intra-segment correlation, while the correlation among outcomes (Y₁_jl, Y₂_jl_′) observed in different segments within the same subject is given by ρ₁₂, referred to as the inter-segment correlation. Consequently, both the intra-segment correlation and the inter-segment correlation need to be taken into account for testing the null hypothesis H₀: β₂ = 0. Under the new model assumption, the true correlation matrix (“nested exchangeable”) is

R_{1} = (\begin{matrix} (1 - ρ) I_{m_{1 j}} + ρ J_{m_{1 j}} & ρ_{12} J_{m_{1 j} \times m_{2 j}} \\ ρ_{12} J_{m_{2 j} \times m_{1 j}} & (1 - ρ) I_{m_{2 j}} + ρ J_{m_{2 j}} \end{matrix}) .

The combinations (ρ,ρ₁₂), for which R₁ is positive definite, can be determined using the technique in Teerenstra et al.¹⁵ Similarly, the robust variance of n^1/2β̂₂ is the (2,2)th element of

\sum_{n}^{1} = σ^{2} {(n^{- 1} \sum_{j} {X_{j}}^{T} X_{j})}^{- 1} (n^{- 1} \sum_{j} {X_{j}}^{T} R_{1} X_{j}) {(n^{- 1} \sum_{j} {X_{j}}^{T} X_{j})}^{- 1} .

For a balanced design where m₁_j = m₂_j = k, as n → ∞,

n^{- 1} \sum_{j} {X_{j}}^{T} R_{1} X_{j} \to (\begin{matrix} 2 k {1 + (k - 1) ρ + k ρ_{12}} & k {1 + (k - 1) ρ + k ρ_{12}} \\ k {1 + (k - 1) ρ + k ρ_{12}} & k {1 + (k - 1) ρ} \end{matrix})

and the (2,2)th element of $\sum_{n}^{1}$ is $σ_{21}^{2} = \frac{2 σ^{2} {1 + (k - 1) ρ - k ρ_{12}}}{k}$ . Thus, given type I error α, power 1 − γ, and the true value of differential treatment effect β₂, the required total number of subjects for the split-mouth design is

n_{s} = \frac{2 σ^{2} {1 + (k - 1) ρ - k ρ_{12}} {(z_{1 - \frac{α}{2}} + z_{1 - r})}^{2}}{k β_{2}^{2}},

(3)

where k is the number of sites per segment. When the true correlation is exchangeable (ρ = ρ₁₂), the sample size formula in (3) reduces to the one in (2).

3. Statistical model and sample size approach for binary outcomes

In this section, we discuss sample size consideration for the split-mouth design with binary outcomes. Let Y_ijl denote the binary response on site l of subject j receiving treatment i, where Y_ijl = 1 denotes a “success” and Y_ijl = 0 denotes a “failure” for i = 1,2, j = 1,…,n and l = 1,…,m_ij. We assume that the intra-segment correlation is ρ and the inter-segment correlation is ρ₁₂.

To make an inference on the differential effect of treatment, we assume the following logistic regression model: Y_ijl~Bernoulli(p_ijl) and

logit {Pr (Y_{ijl} = 1)} = log (\frac{p_{ijl}}{1 - p_{ijl}}) = β_{1} + β_{2} r_{i j},

(4)

r₁_j = 1 and r₂_j = 0 indicate the experimental and control treatments, respectively, β₁ is the log-transformed odds for the control group, and β₂ is the log-transformed odds ratio between the experimental and control treatments, representing the treatment difference on the outcome. The primary interest is to test the null hypothesis H₀:β₂ = 0 accounting for both the intra-segment and inter-segment correlations. We can apply the GEE using the robust variance approach and an independent working correlation matrix to test H₀:β₂ = 0. Model (4) can be re-written as

p_{ijl} (β) = \frac{e^{β^{'} Z_{ijl}}}{1 + e^{β^{'} Z_{ijl}}},

where β = (β₁, β₂)′ and Z_ijl = (1,r_ij). Under the model assumption, the true correlation matrix is “nested exchangeable”. By the GEE method, an estimator β̂ is obtained by solving equation $S_{n}^{*} (β) = 0$ with

S_{n}^{*} (β) = n^{- 1 / 2} \sum_{j = 1}^{n} \sum_{i = 1}^{2} \sum_{l = 1}^{m_{i j}} {Y_{ijl} - p_{ijl} (β)} Z_{ijl} .

The equation is solved by the Newton-Raphson algorithm. At the mth iteration,

{\hat{β}}^{(m)} = {\hat{β}}^{(m - 1)} + n^{- \frac{1}{2}} A_{n}^{- 1} ({\hat{β}}^{(m - 1)}) S_{n}^{*} ({\hat{β}}^{(m - 1)}),

where

A_{n} (β) = - n^{- \frac{1}{2}} \frac{\partial S_{n}^{*} (β)}{\partial β} = n^{- 1} \sum_{j = 1}^{n} \sum_{i = 1}^{2} \sum_{l = 1}^{m_{i j}} p_{ijl} (1 - p_{ijl}) (\begin{matrix} 1 & r_{i j} \\ r_{i j} & r_{i j}^{2} \end{matrix}) .

Liang and Zeger³ showed that $n^{\frac{1}{2}} (\hat{β} - β)$ is approximately normal with mean 0 and the variance is consistently estimated by $\sum_{n}^{*} = A_{n}^{- 1} (\hat{β}) V_{n} (\hat{β}) A_{n}^{- 1} (\hat{β})$ , where

V_{n} (\hat{β}) = n^{- 1} {\sum_{j = 1}^{n} (\sum_{i = 1}^{2} \sum_{l = 1}^{m_{i j}} {\hat{ε}}_{ijl} Z_{ijl})}^{\otimes 2},

with ε̂_ijl = Y_ijl − p_ijl(β̂), and c^⊗2 = cc^T for a vector c. We reject H₀ if $| \frac{n^{\frac{1}{2}} {\hat{β}}_{2}}{σ_{2 *}} | > z_{1 - α / 2}$ , where $σ_{2 *}^{2}$ is the (2,2)th element of $\sum_{n}^{*}$ .

We consider a balanced design where m₁_j = m₂_j = k. To facilitate the discussion, let P₁ = e^β^₁+^β^₂/(1 + e^β^₁+^β^₂) and P₂ = e^β^₁/(1 + e^β^₁) denote the true success rates in the experimental and control groups, respectively. Define Q₁ = 1 − P₁ and Q₂ = 1 − P₂. The null hypothesis H₀: β₂ = 0 is equivalent to H₀: P₁ = P₂.

Theorem 1

As n → ∞, $\sum_{n}^{*} = A_{n}^{- 1} (\hat{β}) V_{n} (\hat{β}) A_{n}^{- 1} (\hat{β}) \to \sum^{*}$ and the (2,2)th element of Σ^* has a closed form

σ_{2 *}^{2} = \frac{{1 + (k - 1) ρ} (P_{1} Q_{1} + P_{2} Q_{2}) - 2 k ρ_{12} \sqrt{P_{1} Q_{1} P_{2} Q_{2}}}{k P_{1} Q_{1} P_{2} Q_{2}} .

The proof of Theorem 1 is presented in the Appendix. Given type I error α, power 1 − γ, and the true value of differential treatment effect β₂ the required total number of subjects for the split-mouth design is

n_{s} = \frac{σ_{2 *}^{2} {(z_{1 - \frac{α}{2}} + z_{1 - r})}^{2}}{β_{2}^{2}},

(5)

When the true correlation is exchangeable (ρ = ρ₁₂) and P₁ = P₂ = P (P is the true overall success rate), the sample size formula (5) can be simplified as

n_{s} = \frac{2 (1 - ρ) {(z_{1 - \frac{α}{2}} + z_{1 - r})}^{2}}{k P (1 - P) β_{2}^{2}},

where k is the number of sites per segment. For the parallel-group design with a cluster size of k for binary outcomes, the required total number of subjects is

n_{p} = \frac{{1 + (k - 1) ρ}}{k} \times \frac{4 {(z_{1 - \frac{α}{2}} + z_{1 - r})}^{2}}{P (1 - P) β_{2}^{2}} .

Therefore, with the exchangeable correlation structure, we have

\frac{n_{p}}{n_{s}} = \frac{2 {1 + (k - 1) ρ}}{1 - ρ},

which is consistent with the relative efficiency of the split-mouth design over the parallel-group design that we find for the continuous outcomes, in terms of the number of subjects.

4. Simulation study

The first set of the simulation is to demonstrate the effect of various design configurations on the sample size for the split-mouth design with continuous outcomes. The nominal levels of type I error and power are set at α = 0.05 and 1 − γ = 0.8, respectively. We consider both exchangeable (ρ = ρ₁₂) and “nested exchangeable” (ρ ≠ ρ₁₂) correlation structures, where values of ρ range from 0.1, 0.15 to 0.2, and values of ρ₁₂ range from 0.05, 0.1 to 0.15. Assuming a balanced design with m₁_j = m₂_j = k = 3, three sites in each segment are assigned to either experimental or control treatment. We set the true values of regression coefficients β = (β₁, β₂)′ = (0.3, 0.2)′ and variance σ² = 0.5 or 1, where (β₂, σ²) = (0.2, 1) indicates an effect size of 0.2 comparing experimental treatment with control treatment. We assess the performance of the proposed sample size approach for continuous outcomes for each combination of (σ²,ρ,ρ₁₂) The simulation procedure is as follows: (i) Estimate sample size n_s based on equation (3); (ii) Generate 5000 null (β₂ = 0) data sets and 5000 alternative (β₂ = 0.2) data sets, each containing n_s subjects. For subject j, generate a vector of outcomes Y_j from the Model Y_j = X_jβ + ε_j where a vector of random errors ε_j is generated from a multivariate normal distribution with mean 0, variance σ² = 0.5 or 1, and correlation matrix R₀ (exchangeable) or R₁ (“nested exchangeable”); (iii) For each data set, estimate β̂₂ and σ₂ (or σ₂₁); (iv) Calculate the empirical type I error and empirical power as the proportion of $| \frac{n^{\frac{1}{2}} {\hat{β}}_{2}}{σ_{2}} | > z_{1 - 0.05 / 2}$ (or $| \frac{n^{\frac{1}{2}} {\hat{β}}_{2}}{σ_{21}} | > z_{1 - 0.05 / 2}$ ) under the null and alternative hypotheses.

Table 1 presents the sample size estimate, empirical power and empirical type I error from the simulation. The empirical powers and type I errors are generally close to their nominal levels, which indicates a good performance of the proposed method. With all other factors fixed, for the “nested exchangeable” correlation (ρ ≠ ρ₁₂), the sample size increases as the intra-segment correlation ρ increases, or the inter-segment correlation ρ₁₂ decreases. For the exchangeable correlation (ρ = ρ₁₂)), the sample size increases as ρ decreases. The statistical inference under the GEE method is based on a large sample approximation. It is thus important to examine the performance of the proposed method in some small-sample-size scenarios. In Table 1, we have explored scenarios where sample size can be as small as 49 and the corresponding empirical power remains close to the nominal level of 0.8. This provides assurance to researchers that the proposed method is widely applicable to split-mouth clinical trials, even when the sample size is relatively small.

Table 1.

Sample size (empirical power, empirical type I error) for simulation with continuous outcomes, type I error=0.05, power=0.8.

σ²	σ	σ₁₂ = 0.05	σ₁₂ = 0.1	σ₁₂ = 0.15
0.5	0.1	69 (0.810, 0.058)	59 (0.795, 0.061)	49 (0.791, 0.062)
	0.15	75 (0.810, 0.047)	65 (0.795, 0.057)	56 (0.809, 0.054)
	0.2	82 (0.820, 0.057)	72 (0.805, 0.053)	62 (0.804, 0.056)
1	0.1	137 (0.799, 0.051)	118 (0.808, 0.052)	98 (0.809, 0.056)
	0.15	150 (0.798, 0.053)	131 (0.785, 0.049)	111 (0.793, 0.057)
	0.2	163 (0.798, 0.047)	144 (0.810, 0.056)	124 (0.791, 0.051)

Open in a new tab

The second set of the simulation is to evaluate the performance of the GEE sample size formula for binary outcomes, under various design configurations: true success rate for control P₂ of 0.1 or 0.2; true treatment effect Δ = P₁ − P₂ of 0.05 or 0.1; true intra-segment correlation ρ from 0.1, 0.15 to 0.2; true inter-segment correlation ρ₁₂ from 0.05, 0.1 to 0.15. We assume a balanced design with m₁_j = m₂_j = k = 3. For each (P₁,P₂,ρ,ρ₁₂) we generate 5000 null data sets and 5000 alternative data sets with sample size n_s, where the vector of binary outcomes Y_j for subject j is generated using the approach described by Obuchowski.¹⁸ The empirical power and type I error are calculated as proportions of times that the alternative hypothesis and null hypothesis are rejected based on the GEE approach, when fitting the logistic regression model (4) to the simulated data. Table 2 summarizes the simulation results, including the required sample size, empirical power and empirical type I error under different combinations of design factors (P₁,P₂,ρ,ρ₁₂) We have studied scenarios with a wide range of sample size, varying from 53 to 457. It shows that a larger treatment effect leads to a smaller sample size requirement. Similar to continuous outcomes, the sample size increases as ρ increases or ρ₁₂ decreases, when ρ ≠ ρ₁₂, with all other factors fixed. The empirical type I errors are all close to the nominal level of 0.05. The empirical power tends to be smaller than the nominal level when the sample size is relatively small, and gets closer to the nominal level as the sample size increases. The possible explanation might be related to normal approximation. When sample size is small, the normal approximation might not be very satisfactory.

Table 2.

Sample size (empirical power, empirical type I error) for simulation with binary outcomes, type I error=0.05, power=0.8.

P₁	P₂	ρ	ρ₁₂ = 0.05	ρ₁₂ = 0.1	ρ₁₂ = 0.15
0.15	0.1	0.1	244 (0.793, 0.054)	209 (0.766, 0.047)	175 (0.730, 0.049)
		0.15	267 (0.826, 0.051)	232 (0.791, 0.050)	198 (0.759, 0.049)
		0.2	290 (0.823, 0.052)	256 (0.806, 0.049)	221 (0.780, 0.056)
0.2	0.1	0.1	73 (0.836, 0.056)	63 (0.802, 0.052)	53 (0.759, 0.053)
		0.15	80 (0.845, 0.053)	70 (0.830, 0.054)	60 (0.795, 0.053)
		0.2	87 (0.862, 0.048)	77 (0.846, 0.046)	67 (0.813, 0.055)
0.25	0.2	0.1	384 (0.826, 0.052)	330 (0.805, 0.053)	275 (0.768, 0.054)
		0.15	421 (0.844, 0.054)	366 (0.823, 0.051)	311 (0.791, 0.053)
		0.2	457 (0.847, 0.050)	403 (0.834, 0.050)	348 (0.805, 0.049)
0.3	0.2	0.1	104 (0.822, 0.053)	89 (0.790, 0.051)	75 (0.762, 0.054)
		0.15	114 (0.839, 0.053)	99 (0.819, 0.051)	85 (0.787, 0.053)
		0.2	124 (0.841, 0.057)	109 (0.819, 0.051)	95 (0.801, 0.055)

Open in a new tab

5. Example

Morrow et al.¹ reported a split-mouth trial of chlorhexidine in the treatment of gingivitis, where the left and right sides of a patient’s upper and lower jaws were randomly assigned to chlorhexidine or a control treatment. Each treatment was applied to four sites located on the left and right sides of the upper and lower jaws (m₁_j = m₂_j = k = 4). The trial enrolled 23 orthodontic patients, and the proportions of patients having plague in chlorhexidine and control groups are estimated as P̂₁ = 0.89 and P̂₂ = 0.77, respectively, with the intra-segment correlation estimated as ρ̂ = 0.070 and the inter-segment correlation estimated as ρ̂₁₂ = 0.039. Correspondingly, we have the log-transformed odds for the control group β̂₁ = 1.21, and the log-transformed odds ratio β̂₂ = 0.88. An investigator would like to conduct a split-mouth trial to study the effect of a new drug as a treatment of gingivitis, following a similar study design where the new drug and a control treatment are randomly assigned to each segment with four sites of a patient’s mouth. To test the hypotheses H₀:β₂ = 0 versus H₁:β₂ ≠ 0with 80% power at a 5% significant level, the number of subjects needs to be determined. Based on the preliminary data from the aforementioned trial, we assume that the design factors are P₂ = 0.77, ρ = 0.070 and ρ₁₂ = 0.039. By the proposed method for binary outcomes, the required total number of subjects to detect a treatment effect of Δ = P₁ − P₂ = 0.10 (β₂ = 0.69) is n_s = 84 for a split-mouth trial. We also estimate the sample size for a parallel-group trial of the same cluster size of k = 4, and the required total number of subjects is n_p = 135. In terms of the number of subjects, the relative efficiency of the split-mouth design over the parallel-group design is $\frac{n_{p}}{n_{s}} = 1.61$ . Further, to detect a treatment effect of Δ = P₁ − P₂ = 0.15 (β₂ = 1.23), the required total number of subjects is n_s = 35 for a split-mouth trial, and the required total number of subjects for the corresponding parallel-group trial is n_p = 48

6. Discussion

In this paper, we present closed-form sample size formulas for the split-mouth design with continuous and binary outcomes. To our knowledge, it is the first attempt to systematically investigate the sample size methods for designing a split-mouth study. Marginal linear and logistic regression models with the GEE approach are employed to account for correlations among observations in split-mouth trials. The main contribution of this paper is to provide closed-form sample size formulas, which allows deeper insight into the impact of various design factors (treatment effect size, intra-segment and inter-segment correlations, etc.) on the sample size. We also theoretically derive the relative efficiency of the split-mouth design over the parallel-group design. The relative efficiency increases with the inter-segment correlation ρ₁₂, while it decreases with the intra-segment correlation ρ. The statistical inference under the GEE approach is based on the asymptotic properties, thus the sample size formulas are generally applicable for large sample sizes. Our simulation shows that the nominal power and type I error are preserved over a wide range of sample sizes.

Clinical trials with multiple or repeated measurements often encounter missing data, and in a split-mouth trial, there may be missing observations at some sites of a patient. A common practice to account for missing data is to estimate sample size by n/(1 − q), where n is the sample size (number of subjects or sites) calculated assuming no missing and q is the expected missing rate. However, as shown in Zhang and Ahn,¹¹ this crude adjustment may be unsatisfactory, as missing data might cause less information loss if the correlations among repeated measurements are high. On the other hand, the GEE approach has the advantage of utilizing incomplete observations. One possible extension of the current work is to incorporate missing data into sample size consideration. Finally, in this paper we have concentrated on the split-mouth design with continuous and binary outcomes. In the future, we will extend the proposed methods to the split-mouth design with categorical, count and survival outcomes. The aforementioned extensions demand significant methodological development and will be pursued in separate studies.

Acknowledgments

This work was supported in part by the Cancer Center Support Grant from the National Cancer Institute (5P30CA142543) awarded to the Harold C. Simmons Cancer Center at the University of Texas Southwestern Medical Center. The authors thank the editor and two reviewers for their constructive comments that have improved the initial version of this paper.

Appendix : Proof of Theorem 1

We consider a balanced design where m₁_j = m₂_j = k. First of all, A_n(β̂) can be split into two parts for the treatment and control as

\begin{array}{l} A_{n} (\hat{β}) = \frac{1}{n} \sum_{j = 1}^{n} \sum_{i = 1}^{2} \sum_{l = 1}^{k} p_{ijl} (\hat{β}) (1 - p_{ijl} (\hat{β})) (\begin{matrix} 1 & r_{i j} \\ r_{i j} & r_{i j}^{2} \end{matrix}) \\ = \frac{1}{n} \sum_{j = 1}^{n} \sum_{i = 1}^{2} \sum_{l = 1}^{k} p_{ijl} (β) (1 - p_{ijl} (β)) (\begin{matrix} 1 & r_{i j} \\ r_{i j} & r_{i j}^{2} \end{matrix}) + o_{p} (1) \\ = \frac{1}{n} \sum_{j = 1}^{n} \sum_{l = 1}^{k} p_{1 j l} (1 - p_{1 j l}) (\begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix}) + \frac{1}{n} \sum_{j = 1}^{n} \sum_{l = 1}^{k} p_{2 j l} (1 - p_{2 j l}) (\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}) + o_{p} (1) . \end{array}

Applying the central limit theorem, as n → ∞, A_n(β̂) converges to

A = P (1 - P) (\begin{matrix} k (P_{1} Q_{1} + P_{2} Q_{2}) & k P_{1} Q_{1} \\ k P_{1} Q_{1} & k P_{1} Q_{1} \end{matrix}) .

Next, we separate V_n(β̂) into two parts for the treatment and control as

\begin{array}{l} V_{n} (\hat{β}) = \frac{1}{n} \sum_{j = 1}^{n} {(\sum_{i = 1}^{2} \sum_{l = 1}^{k} {\hat{ε}}_{ijl} Z_{ijl})}^{\otimes 2} = \frac{1}{n} \sum_{j = 1}^{n} {\sum_{l = 1}^{k} (\begin{matrix} ε_{1 j l} + ε_{2 j l} \\ ε_{1 j l} \end{matrix})}^{\otimes 2} + o_{p} (1) \\ = \frac{1}{n} \sum_{j = 1}^{n} \sum_{l = 1}^{k} \sum_{l^{'} = 1}^{k} (\begin{matrix} ε_{1 j l} ε_{1 j l^{'}} + ε_{1 j l} ε_{2 j l^{'}} + ε_{2 j l} ε_{2 j l^{'}} + ε_{2 j l} ε_{1 j l^{'}} & ε_{1 j l} ε_{1 j l^{'}} + ε_{2 j l} ε_{1 j l^{'}} \\ ε_{1 j l} ε_{1 j l^{'}} + ε_{1 j l} ε_{2 j l^{'}} & ε_{1 j l} ε_{1 j l^{'}} \end{matrix}) + o_{p} (1) . \end{array}

By the central limit theorem, as n → ∞, V_n(β̂) converges to V, where

\begin{matrix} V_{11} = {k + (k^{2} - k) ρ} (P_{1} Q_{1} + P_{2} Q_{2}) + 2 k^{2} ρ_{12} \sqrt{P_{1} Q_{1} P_{2} Q_{2}}, \\ V_{12} = V_{21} = {k + (k^{2} - k) ρ} P_{1} Q_{1} + k^{2} ρ_{12} \sqrt{P_{1} Q_{1} P_{2} Q_{2}}, \\ V_{22} = {k + (k^{2} - k) ρ} P_{1} Q_{1} . \end{matrix}

A few steps of algebra show that the (2,2)th element of Σ^* = A⁻¹VA⁻¹ is

σ_{2 *}^{2} = \frac{{1 + (k - 1) ρ} (P_{1} Q_{1} + P_{2} Q_{2}) - 2 k ρ_{12} \sqrt{P_{1} Q_{1} P_{2} Q_{2}}}{k P_{1} Q_{1} P_{2} Q_{2}} .

References

1.Morrow D, Wood DP, Speechley M. Clinical effect of subgingival chlorhexidine irrigation on gingivitis in adolescent orthodontic patients. Am J Orthod Dentofac. 1992;101:408–413. doi: 10.1016/0889-5406(92)70113-O. [DOI] [PubMed] [Google Scholar]
2.Lesaffre E, Philstrom B, Needleman I, et al. The design and analysis of split-mouth studies: what statisticians and clinicians should know. Stat Med. 2009;28:3470–3482. doi: 10.1002/sim.3634. [DOI] [PubMed] [Google Scholar]
3.Liang K, Zeger S. Longitudinal data analysis for discrete and continuous outcomes using generalized linear models. Biometrika. 1986;84:3–32. [PubMed] [Google Scholar]
4.Donner A, Klar N, Zou G. Methods for the statistical analysis of binary data in split-cluster designs. Biometrics. 2004;60:919–925. doi: 10.1111/j.0006-341X.2004.00247.x. [DOI] [PubMed] [Google Scholar]
5.Donner A, Zou G. Methods for the statistical analysis of binary data in split-smouth designs with baseline measurements. Stat Med. 2007;26:3476–3486. doi: 10.1002/sim.2782. [DOI] [PubMed] [Google Scholar]
6.Pandis N. Sample calculation for split-mouth designs. Am J Orthod Dentofac. 2012;141:818–819. doi: 10.1016/j.ajodo.2012.03.015. [DOI] [PubMed] [Google Scholar]
7.Hujoel P, Loesche W. Efficiency of split-mouth designs. J Clin Periodontol. 1990;17:722–728. doi: 10.1111/j.1600-051x.1990.tb01060.x. [DOI] [PubMed] [Google Scholar]
8.Liu G, Liang K. Sample size calculations for studies with correlated observations. Biometrics. 1997;53:937–947. [PubMed] [Google Scholar]
9.Jung S, Ahn C. Sample size estimation for GEE method for comparing slopes in repeated measurements data. Stat Med. 2003;22:1305–1315. doi: 10.1002/sim.1384. [DOI] [PubMed] [Google Scholar]
10.Jung S, Ahn C. Sample size for a two-group comparison of repeated binary measurements using GEE. Stat Med. 2005;24:2583–2596. doi: 10.1002/sim.2136. [DOI] [PubMed] [Google Scholar]
11.Zhang S, Ahn C. Sample size calculation for time-averaged differences in the presence of missing data. Contemp Clin Trials. 2012;33:550–556. doi: 10.1016/j.cct.2012.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Lou Y, Cao J, Zhang S, et al. Sample size calculations for time-averaged difference of longitudinal binary outcomes. Commun Stat A-Theor. doi: 10.1080/03610926.2014.991040. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Bigby M, Godenne A. Continuing medical education: understanding and evaluating clinical trials. J Am Acad Dermatol. 1986;34:555–593. doi: 10.1016/s0190-9622(96)80053-3. [DOI] [PubMed] [Google Scholar]
14.Weiss RA. Comparison of endovenous radiofrequency versus 810 nm diode laser occlusion of large veins in an animal model. Dermatol Surg. 2002;28:56–61. doi: 10.1046/j.1524-4725.2002.01191.x. [DOI] [PubMed] [Google Scholar]
15.Teerenstra S, Lu B, Preisser JS, et al. Sample size considerations for GEE analyses of three-level cluster randomized trials. Biometrics. 2010;66:1230–1237. doi: 10.1111/j.1541-0420.2009.01374.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Donner A, Klar N. Design and analysis of cluster randomization trials in health research. London: Arnold; 2000. [Google Scholar]
17.Wang D, Bakhai A. Clinical Trials in Practice. A Practical Guide to Design, Analysis and Reporting, chapter 10. London: Remedica; 2006. [Google Scholar]
18.Obuchowski NA. On the comparison of correlated proportions for clustered data. Stat Med. 1998;17:1495–1507. doi: 10.1002/(sici)1097-0258(19980715)17:13<1495::aid-sim863>3.0.co;2-i. [DOI] [PubMed] [Google Scholar]

[R1] 1.Morrow D, Wood DP, Speechley M. Clinical effect of subgingival chlorhexidine irrigation on gingivitis in adolescent orthodontic patients. Am J Orthod Dentofac. 1992;101:408–413. doi: 10.1016/0889-5406(92)70113-O. [DOI] [PubMed] [Google Scholar]

[R2] 2.Lesaffre E, Philstrom B, Needleman I, et al. The design and analysis of split-mouth studies: what statisticians and clinicians should know. Stat Med. 2009;28:3470–3482. doi: 10.1002/sim.3634. [DOI] [PubMed] [Google Scholar]

[R3] 3.Liang K, Zeger S. Longitudinal data analysis for discrete and continuous outcomes using generalized linear models. Biometrika. 1986;84:3–32. [PubMed] [Google Scholar]

[R4] 4.Donner A, Klar N, Zou G. Methods for the statistical analysis of binary data in split-cluster designs. Biometrics. 2004;60:919–925. doi: 10.1111/j.0006-341X.2004.00247.x. [DOI] [PubMed] [Google Scholar]

[R5] 5.Donner A, Zou G. Methods for the statistical analysis of binary data in split-smouth designs with baseline measurements. Stat Med. 2007;26:3476–3486. doi: 10.1002/sim.2782. [DOI] [PubMed] [Google Scholar]

[R6] 6.Pandis N. Sample calculation for split-mouth designs. Am J Orthod Dentofac. 2012;141:818–819. doi: 10.1016/j.ajodo.2012.03.015. [DOI] [PubMed] [Google Scholar]

[R7] 7.Hujoel P, Loesche W. Efficiency of split-mouth designs. J Clin Periodontol. 1990;17:722–728. doi: 10.1111/j.1600-051x.1990.tb01060.x. [DOI] [PubMed] [Google Scholar]

[R8] 8.Liu G, Liang K. Sample size calculations for studies with correlated observations. Biometrics. 1997;53:937–947. [PubMed] [Google Scholar]

[R9] 9.Jung S, Ahn C. Sample size estimation for GEE method for comparing slopes in repeated measurements data. Stat Med. 2003;22:1305–1315. doi: 10.1002/sim.1384. [DOI] [PubMed] [Google Scholar]

[R10] 10.Jung S, Ahn C. Sample size for a two-group comparison of repeated binary measurements using GEE. Stat Med. 2005;24:2583–2596. doi: 10.1002/sim.2136. [DOI] [PubMed] [Google Scholar]

[R11] 11.Zhang S, Ahn C. Sample size calculation for time-averaged differences in the presence of missing data. Contemp Clin Trials. 2012;33:550–556. doi: 10.1016/j.cct.2012.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Lou Y, Cao J, Zhang S, et al. Sample size calculations for time-averaged difference of longitudinal binary outcomes. Commun Stat A-Theor. doi: 10.1080/03610926.2014.991040. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Bigby M, Godenne A. Continuing medical education: understanding and evaluating clinical trials. J Am Acad Dermatol. 1986;34:555–593. doi: 10.1016/s0190-9622(96)80053-3. [DOI] [PubMed] [Google Scholar]

[R14] 14.Weiss RA. Comparison of endovenous radiofrequency versus 810 nm diode laser occlusion of large veins in an animal model. Dermatol Surg. 2002;28:56–61. doi: 10.1046/j.1524-4725.2002.01191.x. [DOI] [PubMed] [Google Scholar]

[R15] 15.Teerenstra S, Lu B, Preisser JS, et al. Sample size considerations for GEE analyses of three-level cluster randomized trials. Biometrics. 2010;66:1230–1237. doi: 10.1111/j.1541-0420.2009.01374.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Donner A, Klar N. Design and analysis of cluster randomization trials in health research. London: Arnold; 2000. [Google Scholar]

[R17] 17.Wang D, Bakhai A. Clinical Trials in Practice. A Practical Guide to Design, Analysis and Reporting, chapter 10. London: Remedica; 2006. [Google Scholar]

[R18] 18.Obuchowski NA. On the comparison of correlated proportions for clustered data. Stat Med. 1998;17:1495–1507. doi: 10.1002/(sici)1097-0258(19980715)17:13<1495::aid-sim863>3.0.co;2-i. [DOI] [PubMed] [Google Scholar]

PERMALINK

Sample Size Considerations for Split-Mouth Design

Hong Zhu

Song Zhang

Chul Ahn

Abstract

1. Introduction

2. Statistical model and sample size approach for continuous outcomes

3. Statistical model and sample size approach for binary outcomes

Theorem 1

4. Simulation study

Table 1.

Table 2.

5. Example

6. Discussion

Acknowledgments

Appendix : Proof of Theorem 1

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Sample Size Considerations for Split-Mouth Design

Hong Zhu

Song Zhang

Chul Ahn

Abstract

1. Introduction

2. Statistical model and sample size approach for continuous outcomes

3. Statistical model and sample size approach for binary outcomes

Theorem 1

4. Simulation study

Table 1.

Table 2.

5. Example

6. Discussion

Acknowledgments

Appendix : Proof of Theorem 1

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases