Sample Size Evaluation for a Multiply Matched Case-Control Study Using the Score Test From a Conditional Logistic (Discrete Cox PH) Regression Model

John M Lachin

doi:10.1002/sim.3057

. Author manuscript; available in PMC: 2013 Apr 15.

Published in final edited form as: Stat Med. 2008 Jun 30;27(14):2509–2523. doi: 10.1002/sim.3057

Sample Size Evaluation for a Multiply Matched Case-Control Study Using the Score Test From a Conditional Logistic (Discrete Cox PH) Regression Model

John M Lachin ¹

PMCID: PMC3626499 NIHMSID: NIHMS80712 PMID: 17886235

Summary

The conditional logistic regression model (Breslow NE. Covariance adjustment of relative-risk estimates in matched studies. Biometrics, 1982; 38:661-672) provides a convenient method for the assessment of qualitative or quantitative covariate effects on risk in a study with matched sets, each containing a possibly different numbers of cases and controls. The conditional logistic likelihood is identical to the stratified Cox proportional hazards model likelihood with an adjustment for ties (Regression models and life-tables (with discussion). J. Roy. Statist. Soc., B, 1972; 34:187-220). This likelihood also applies to a nested case control study with multiply matched cases and controls selected from those at risk at selected event times. Herein the distribution of the score test for the effect of a covariate in the model is used to derive simple equations to describe the power of the test to detect a coefficient θ (log odds ratio or log hazard ratio), or the number of cases (or matched sets) and controls required to provide a desired level of power. Additional expressions are derived for a quantitative covariate as a function of the difference in the assumed mean covariate values among cases and controls, and for a qualitative covariate in terms of the difference in the probabilities of exposure for cases and controls. Examples are presented for a nested case-control study and a multiply matched case-control study.

Keywords: Sample size, power, conditional logistic model, Cox Proportional Hazards Model, multiple matching, case-control study, nested case-control study

1. INTRODUCTION

In a matched case-control study, one or more disease-free controls are matched with respect one or more characteristics to each case with the disease or outcome of interest. The analysis can then be conducted using the conditional logistic regression model of Breslow [1] to assess the effects of binary and/or quantitative covariates on the risk (log odds) of the outcome. Herein the score test is derived that is applicable to either a binary or quantitative covariate, its distribution under the alternative hypothesis is described, and expressions for the computation of power or sample size are derived.

For a binary exposure variable and a single control (m = 1) matched to each case, a simple analysis of matched pairs can be employed consisting of the McNemar [2] test and the computation of a conditional odds ratio (cf. [3], Chapter 5). For m > 1 controls per case, Miettinen [4] described methods for analysis of matched sets with m > 1 matched controls per case. Walter [5] generalized Miettinen’s test to the design with variable numbers of controls matched to each case; i.e. where m_i ∈ (1, 2, 3,…) for the ith matched set. He also presented a Z or t-test for quantitative covariate assumed to be normally distributed among the cases and controls with differing means and possibly variances. Miettinen [4] and Walter [5] also describe expressions for the power of their tests and sample size calculations. Ury [6], Taylor [7] and Lui [8] present expressions for the asymptotic relative efficiency under a local alternative of Miettinen’s test with m controls relative to McNemar’s test with a single control. These expressions allow computation of power or sample size, approximately, for Miettinen’s test based on a computation of power or sample size for McNemar’s test. Ejigou [9] also presents an evaluation of the sample size for a study with multiple controls based on an estimate of the conditional odds ratio and a corresponding estimation-based test statistic. Sinha and Mukerjee [10] derive the power of a score test based on a conditional likelihood as a function of the possible numbers exposed within eash set, and allows for polychotomous and ordinal exposure variables. All of these methods only allow for a single case matched to multiple controls.

The conditional logistic regression model also provides a convenient method for the analysis of such data, and provides for further generalizations to studies with variable numbers of cases as well as controls within each matched set. This could apply to a retro-spective case-control study with variable numbers of cases and controls in each matched set, as well as other studies with inherently matched sets, such as a family or sib-ship study of the association of a genotype with the prevalence of a phenotype among family members or siblings within a family. The non-central or non-null distribution of the score test then provides a direct method for the computation of the power or sample size for a study using a quantitative or qualitative covariate.

The cases and controls may also be selected from within a prospective cohort study. In the nested case-control study with no tied event times, a set of N cases is selected at random from among all those that occur during the study. Thus, each of the N cases can be characterized as a having an event indicator δ_i = 1 and event time t_i. Then for each case, m subjects are selected as controls who are elements of the risk set at time t_i, and known not to have had the event up to that time. This constructs a set of (m+1) subjects matched at that time. Note that a control for one case could later become a case. The nested case-control design can also be applied to a cohort with tied event times yielding multiple cases at a given time with possibly variable numbers of controls matched to the times of the cases.

Liddell, McDonald, Thomas and Cunliffe [11] describe a matched case-control study embedded within a prospective cohort follow-up study. They conducted an analysis using the Miettinen methods. However, in a supplement to the main paper, Thomas [12] shows that the Cox Proportional Hazards (PH) model [13] could be applied to this analysis. Goldstein and Langholz [14] use the martingale theory for counting processes to prove that the resulting coefficient estimates are asymptotically normally distributed under nested case-control sampling.

The Cox likelihood with Cox’s adjustment for tied event times [13] is equivalent to the conditional logistic regression model of Breslow [1], the two yielding an equivalent likelihood. Thus, the conditional logistic regression model can be fit using a stratified Cox PH model with Cox’s adjustment for ties.

Schoenfeld [15] describes an equation for the total number of deaths (subjects with the outcome event) required to provide a desired level of power for a single binary covariate in a Cox PH model. Hseih and Lavori [16] extend the Schoenfeld derivation for the case of a single quantitative covariate. Both are based on the expression for the score test of a covariate in the general PH model.

Herein I derive the form of the score test for a qualitative or quantitative covariate effect on the log odds in a conditional logistic regression model with variable numbers of cases and controls in each matched set. The test can be applied to a qualitative (binary) or quantitative covariate. The test can also be applied to the Cox PH model for a nested case-control study. For a binary covariate, the resulting test is a generalization of the tests of Miettinen [4] and Walter [5]. For a quantitative covariate, the score test is simpler and more general than that of Walter [5] that assumes an underlying normal distribution. I then derive the power function of the score test and describe sample size calculations for the design of a matched case-control study. In doing so I derive a useful expression for the log odds ratio in the regression model as a function of the difference in means of a quantitative covariate between cases and controls, or in probabilities of a qualitative covariate, that may further facilitate the evaluation of sample size for such studies.

2. MULTIPLY MATCHED SCORE TEST

2.1 A Quantitative Covariate

Consider a study with N matched sets. The ith matched set consists of n_i members of whom d_i ≤ D are cases and m_i ≤ M are controls, where D and M are the maximum numbers of cases and controls among all sets.

In a nested case-control study, these matched sets are constructed as follows. At the ith event time, assume that G_i ≥ 2 subjects may have the event at time t_i (or during the ith interval) among the risk set of R_i subjects at risk at that time. Then d_i subjects are randomly selected from among the G_i to be cases, and m_i are randomly selected from among the remaining R_i – G_i to be controls.

Consider a single covariate X, possibly quantitative, with values x_ij for the jth member of the ith matched set with the subjects ordered such that x_i1,…,x_{id_i} refers to the covariate values of the d_i cases, and x_{i(d_i+1)},…,x_{in_i} refers to the values for the m_i controls. The resulting conditional logistic regression model likelihood with matched sets containing d_i cases and m_i controls is

L = \prod_{i = 1}^{N} [\frac{\prod_{j = 1}^{d_{i}} e^{x_{ij} θ}}{Σ_{l = 1}^{(\begin{matrix} n_{i} \\ d_{i} \end{matrix})} \prod_{j (l) = 1}^{d_{i}} e^{x_{ij} {(l)}^{θ}}}]

(1)

where θ represents the log odds ratio [1]. This is identical to the Cox [13] discrete proportional hazards model for tied event times that would apply to the nested case-control sampling design in which θ is the log hazard ratio. Let $s_{i} = Σ_{j = 1}^{d_{i}} x_{ij}$ for the ith set and $s_{i (l)} = Σ_{j (l) = 1}^{d_{i}} x_{ij (l)}$ for the lth of $(\begin{matrix} n_{i} \\ d_{i} \end{matrix})$ combinations of d_i subjects within that set. The resulting score equation is

U (θ) = Σ_{i = 1}^{N} [s_{i} - \frac{Σ_{l} s_{i (l)} e^{s_{i θ}}}{Σ_{l} e^{s_{i θ}}}] = Σ_{i = 1}^{N} [s_{i} - E (s_{i} ∣ H_{1})],

(2)

where ∑_l denotes $Σ_{l = 1}^{(\begin{matrix} n_{i} \\ d_{i} \end{matrix})}$ . The expected information matrix I(θ) then is

I (θ) = Σ_{i = 1}^{N} [(\frac{Σ_{l} s_{i (l)}^{2} e^{s_{i} θ}}{Σ_{l} e^{s_{i} θ}}) - {(\frac{Σ_{l} s_{i (l)} e^{s_{i} θ}}{Σ_{l} e^{s_{i} θ}})}^{2}] = Σ_{i = 1}^{N} V_{1 i} .

(3)

Evaluating each under the null hypothesis H₀: θ = 0 yields

U (θ_{0}) = Σ_{i = 1}^{N} [s_{i} - \frac{Σ_{l} s_{i (l)}}{(\begin{matrix} n_{i} \\ d_{i} \end{matrix})}] = Σ_{i = 1}^{N} [s_{i} - E (s_{i} ∣ H_{0})] = Σ_{i = 1}^{N} [s_{i} - {\overset{‒}{s}}_{0 i}],

(4)

and

I (θ_{0}) = Σ_{i = 1}^{N} [(\frac{Σ_{l} s_{i (l)}^{2}}{(\begin{matrix} n_{i} \\ d_{i} \end{matrix})}) - {(\frac{Σ_{l} s_{i (l)}}{(\begin{matrix} n_{i} \\ d_{i} \end{matrix})})}^{2}] = Σ_{i = 1}^{N} V_{0 i} .

(5)

Then the score test is provided by

Z = \frac{U (θ_{0})}{\sqrt[]{I (θ_{0})}}

(6)

that is distributed as standard normal under H₀. The quantities s_i and E(s_i|H₀) are readily computed for any quantitative or qualitative covariate. This yields a convenient test. Below the form of the test is shown for specific cases. Note that sets with no cases (d_i = 0) or no controls (m_i = 0) do not contribute to U (θ₀) or to I(θ₀) and can simply be discarded. This also applies to sets where all members share the same covariate value.

In the most general case of a study with variable numbers of cases and controls, let n_dm designate the number of matched sets with with d cases, 1 ≤ d ≤ D, and m controls, 1 ≤ m ≤ M, and let x_dmij refer to the covariate value for the jth subject of the ith set with d cases and m controls. Then $s_{dmi} = Σ_{j = 1}^{d} x_{dmij}$ and $s_{dmi (l)} = Σ_{j (l) = 1}^{d} x_{dmij (l)}$ for the lth of $(\begin{matrix} d + m \\ d \end{matrix})$ combinations of d subjects within that set, and let

E [s_{dmi} ∣ H_{0}] = {\overset{‒}{s}}_{0 dmi} = \frac{Σ_{l} s_{dmi (l)}}{(\begin{matrix} d + m \\ d \end{matrix})} .

(7)

Then the score test can be expressed as

Z = \frac{Σ_{d = 1}^{D} Σ_{m = 1}^{M} Σ_{i = 1}^{n_{dm}} [s_{dmi} - {\overset{‒}{s}}_{0 dmi}]}{{(Σ_{d = 1}^{D} Σ_{m = 1}^{M} [(\frac{Σ_{l} s_{dmi (l)}^{2}}{(\begin{matrix} d + m \\ d \end{matrix})}) - {(\frac{Σ_{l} s_{dmi (l)}}{(\begin{matrix} d + m \\ d \end{matrix})})}^{2}])}^{1 ∕ 2}}

(8)

For the special case of a design with D = 1 case and m controls in each matched set, so that n = m + 1, then s_i = x_i1 and

{\overset{‒}{s}}_{0 i} = \frac{Σ_{l} s_{i (l)}}{(\begin{matrix} n_{i} \\ d_{i} \end{matrix})} = {\overset{‒}{x}}_{0 i} = \frac{Σ_{j = 1}^{m + 1} x_{ij}}{m + 1}

(9)

so that

\begin{matrix} U (θ_{0}) = & Σ_{i = 1}^{N} [x_{i 1} - {\overset{‒}{x}}_{0 i}] = Σ_{i = 1}^{N} [x_{i 1} - \frac{Σ_{j = 1}^{m + 1} x_{ij}}{m + 1}], \\ I (θ_{0}) = & Σ_{i = 1}^{N} [\frac{Σ_{j = 1}^{m + 1} {(x_{ij} - {\overset{‒}{x}}_{0 i})}^{2}}{m + 1}] = Σ_{i = 1}^{N} V_{0 i} . \end{matrix}

(10)

This yields the score test

Z = \frac{Σ_{i = 1}^{N} [x_{i 1} - {\overset{‒}{x}}_{0 i}]}{\sqrt[]{Σ_{i = 1}^{N} V_{0 i}}} .

(11)

2.2 A Binary Covariate

Now consider a single binary covariate X to represent “exposed” versus not to some risk factor, where x_ij = 1 if exposed (or +), 0 if not (−). Let n_i+ = #{x_ij = 1} and n_i− = #{x_ij = 0} where n_i = n_i− +n_i+. As above, for the matched set of n_i subjects, the members are ordered such that the first d_i are the cases and the remaining m_i = n_i – d_i are the controls. Then $s_{i} = Σ_{j = 1}^{d_{i}} x_{ij}$ is the number of cases positive for the covariate in the ith set, and s_i(l) is the like sum for the first d_i observations in the lth combination of $(\begin{matrix} n_{i} \\ d_{i} \end{matrix})$ subjects in the ith set.

For the simple 1:m design with a single case matched to m controls with n = m + 1 subjects per set, from (10) it follows that the score equation under H₀ is

\begin{matrix} U (θ_{0}) = & Σ_{i = 1}^{N} [x_{i 1} - {\overset{‒}{x}}_{0 i}] = s - Σ_{i = 1}^{N} [\frac{n_{i} +}{n}] \\ = & s - Σ_{i = 1}^{N} p_{0 i} = s - e_{0} \end{matrix}

(12)

where p_0i is the probability of a positive covariate in the ith set under H₀, and e₀ is the sum of the probability positive for all cases, or the expected number under H₀. The corresponding Information function is

\begin{matrix} I (θ_{0}) = & Σ_{i = 1}^{N} [\frac{Σ_{j = 1}^{n} x_{ij}^{2}}{n} - {\overset{‒}{x}}_{i}^{2}] = Σ_{i = 1}^{N} [\frac{n_{i +}}{n} - {(\frac{n_{i +}}{n})}^{2}] \\ = & Σ_{i = 1}^{N} [p_{0 i} (1 - p_{0 i})] = Σ_{i = 1}^{N} V_{0 i} . \end{matrix}

(13)

Then this yields the score test

Z = \frac{U (θ_{0})}{\sqrt[]{I (θ_{0})}} = \frac{s - e_{0}}{\sqrt[]{Σ_{i = 1}^{N} V_{0 i}}} = \frac{s - e_{0}}{\sqrt[]{Σ_{i = 1}^{N} [p_{0 i} (1 - p_{0 i})]}} .

(14)

Miettinen’s [4] test of H₀ for the 1:m design is readily shown to be equivalent. Walter [5] presents a generalization of Miettinen’s test for the 1:m_i design with variable numbers of controls matched to each case, with only a single case per set. His test is also a special case of (8).

3. SAMPLE SIZE AND POWER

3.1 Distribution Under the Alternative

As employed by Schoenfeld [15], and also Hsieh and Lavori [16], under the alternative hypothesis H₁: θ ≠ 0, the score equation U(θ₀) in (4) can be expressed as

U (θ_{0}) = Σ_{i = 1}^{N} [s_{i} - E (s_{i} ∣ H_{1})] + Σ_{i = 1}^{N} [E (s_{i} ∣ H_{1}) - E (s_{i} ∣ H_{0})] .

(15)

The first term is simply the score function U(θ) in (2) and from basic principles [3], U(θ) ~ N[0, I(θ)]. The second term then is

Σ_{i = 1}^{N} [E (s_{i} ∣ H_{1}) - E (s_{i} ∣ H_{0})] = Σ_{i = 1}^{N} [\frac{Σ_{l} s_{i (l)} e^{s_{i} θ}}{Σ_{l} e^{s_{i} θ}} - \frac{Σ_{l} s_{i (l)}}{(\begin{matrix} n_{i} \\ d_{i} \end{matrix})}]

(16)

Applying a Taylor’s expansion for θ in the neighborhood of θ₀ = 0, it can then be shown that

Σ_{i = 1}^{N} [E (s_{i} ∣ H_{1}) - E (s_{i} ∣ H_{0})] ≅ θ I (θ_{0}) = θ Σ_{i = 1}^{N} V_{0 i}

(17)

Combining the two results, it follows that

U (θ_{0}) ∣ H_{1} \sim N [θ I (θ_{0}), I (θ)]

(18)

Thus

Z ∣ H_{1} \sim N [θ \sqrt[]{I (θ_{0})}, (\frac{I (θ)}{I (θ_{0})})] .

(19)

For a specific true value θ under the alternative hypothesis, the basic equation for the power of the test [3] yields

\begin{matrix} ∣ θ ∣ \sqrt[]{I (θ_{0})} = & Z_{1 - α} + Z_{1 - β} \sqrt[]{I (θ) ∕ I (θ_{0})} \\ Z_{1 - β} = & \frac{∣ θ ∣ \sqrt[]{I (θ_{0})} - Z_{1 - α}}{\sqrt[]{I (θ) ∕ I (θ_{0})}} \end{matrix}

(20)

where Z_1–α is the critical value for the test, using Z_1–α/2 for a two-sided test, and the power of the test is provided by Φ(Z_1–β), Φ(z) being the standard normal cdf at z. Approximately I(θ) ≅ I(θ₀), as under a local alternative, in which case the expression for power simplifies to

Z_{1 - β} = ∣ θ ∣ \sqrt[]{I (θ_{0})} - Z_{1 - α} .

(21)

3.2 A Quantitative Covariate

3.2.1 As a Function of θ

Assume that X is a quantitative covariate with variance $σ_{i}^{2}$ within the ith matched set with d_i cases and m_i controls, n_i = d_i + m_i, so that within that set $V (s_{i}) = V (s_{i (l)}) = d_{i} σ_{i}^{2}$ . Noting that V_0i in (5) is similar to the expression for V (s_i) but without the usual “−1” correction in the denominator, it follows that

V_{0 i} = [\frac{(\begin{matrix} n_{i} \\ d_{i} \end{matrix}) - 1}{(\begin{matrix} n_{i} \\ d_{i} \end{matrix})}] V (s_{i}) = d_{i} σ_{i}^{2} [\frac{(\begin{matrix} n_{i} \\ d_{i} \end{matrix}) - 1}{(\begin{matrix} n_{i} \\ d_{i} \end{matrix})}] = d_{i} σ_{i}^{2} [\frac{n_{i}! - d_{i}! m_{i}!}{n_{i}!}]

(22)

Under some model that provides a set of values ${n_{i}, d_{i}, σ_{i}^{2}}$ for all N matched sets, this then provides the value of the variance term for each set, and thus the value of $I (θ_{0}) = Σ_{i = 1}^{N} V_{0 i}$ . Under the model that generated the ${n_{i}, d_{i}, σ_{i}^{2}}$ , the power of the test in (6) is provided by the expression for Z_1–β in (21). Under an appropriate model that provides an arbitrary number of such sets, such as sampling from a distribution for (n, d, σ²), it would also be possible to solve for the number of such sets N that would provide a desired level of power.

When there is a fixed number of cases and controls for all matched sets with a common variance among sets, i.e. n_i = n, d_i = d, and $σ_{i}^{2} = σ^{2}$ for all i = 1,…,N, then

I (θ_{0}) = Nd σ^{2} [\frac{(\begin{matrix} n \\ d \end{matrix}) - 1}{(\begin{matrix} n \\ d \end{matrix})}] = Nd σ^{2} [\frac{n! - d! m!}{n!}] .

(23)

This result also applies to the heteroscedastic case for suitably large N such that $Σ_{i} σ_{i}^{2} \to N σ^{2}$ where $E (σ_{i}^{2}) = σ^{2}$ . Substituting into (21), the power of the test is provided by

Z_{1 - β} = ∣ θ ∣ σ \sqrt[]{Nd} {[\frac{n! - d! m!}{n!}]}^{1 ∕ 2} - Z_{1 - α} .

(24)

Thus, for a given n and d, the number of matched sets (N) required to provide a desired level of power 1 – β using a test at level α (or α/2 two-sided) is provided by

N = \frac{{(Z_{1 - α} + Z_{1 - β})}^{2}}{θ^{2} σ^{2} d [\frac{n! - d! m!}{n!}]}

(25)

When there is a single case (d = 1) and m controls for each of the N sets, where n = m + 1, then I(θ₀) = Nmσ²/(m + 1). The power of the test is then provided by

Z_{1 - β} = ∣ θ ∣ σ \sqrt[]{\frac{Nm}{m + 1}} - Z_{1 - α}

(26)

and the number of cases required to provide the desired level of power to detect a coefficient θ is

N = {[\frac{Z_{1 - α} + Z_{1 - β}}{θ σ}]}^{2} (\frac{m + 1}{m}) = K (\frac{m + 1}{m}) .

(27)

Note that N is a decreasing function of m. For m = 1, N = 2K. For m = 2, N = 1.5K. For m = 3, N = K4/3, etc.

In the above expressions, θ = ln(O) is the log odds ratio (hazard ratio in the Cox PH model) per unit change in X. In some cases it may be more convenient to specify the odds ratio per c unit change in X, say O_c = exp(cθ), from which O is obtained as $O = O_{c}^{(1 ∕ c)}$ . The effect size may also be described in terms of the odds ratio per standard deviation unit, say O_σ, assuming a common variance within sets. In this case the term θσ is replaced by ln(O_σ) = ln(O^σ) in the above expressions. The units change could also be specified in terms of a fraction of the within-set standard deviation. For example, assume that σ = 10 and it is desired to detect an odds ratio of 2 per c = σ/2 = 5 unit difference in X. The inverse implies that the odds ratio per unit change in X is O = 2^(1/5) = 1.1487 and θ = ln(O) = 0.139.

The above expressions for Z_1–β and N are similar to those of Hsieh and Lavori [16] for a quantitative covariate in the Cox PH model. At first glance these equations are counterintuitive, implying that as the variance of the covariate increases, the required number of subjects with the event decreases, and that for any number of events, power increases. The reason, however, is that the higher the σ, the greater the variation in risk over the range of the covariate. For a given O and σ, the odds ratio per standard deviation change in the covariate is O^σ, and the odds ratio comparing the extremes of the distribution of X (±3σ) is O^6σ. Thus for a given O, the range of risk increases as σ increases, and so also does power.

3.2.2 As a Function of the Mean Difference

It may be difficult for investigators to specify the minimal treatment effect of interest for a quantitative covariate in terms of the odds ratio or relative hazard per unit change in X or per standard deviation. In this situation, it might be more relevant to describe the power of a study in terms of the distributions of the covariate among the cases and controls within sets. Assume that the quantitative covariate is distributed as $X \sim f_{1} (μ_{i 1}, σ_{i}^{2})$ among the cases and $X \sim f_{2} (μ_{i 2}, σ_{i}^{2})$ among the controls within the ith set, for some distributions f₁ and f₂ with average variance $E (σ_{i}^{2}) = σ^{2}$ and average mean difference E(μ_i1 – μ_i2) = (μ₁ – μ₂) = Δ ≠ 0 under H₁ over all sets.

First consider the design with d cases and m controls for all of the N sets. Let E[U_i(θ₀)|H₁] denote the expectation of the contribution of the ith set to the score equation U(θ₀) in (4) with respect to the distribution of the covariate values under H₁. Then for N sets (N suitably large) E[U(θ₀)|H₁] = NE {E[U_i(θ₀)|H₁]} that is a function of (μ₁, μ₂). This yields

\begin{matrix} E [U (θ_{0}) ∣ H_{1}] = & NE {E ∣ s_{i} ∣ H_{1}} - NE {E [(s_{i} H_{0}) ∣ H_{1}]} \\ = & Nd μ_{1} - N {(\begin{matrix} n \\ d \end{matrix})}^{- 1} [d μ_{1} + (\begin{matrix} m \\ 1 \end{matrix}) [(d - 1) μ_{1} + μ_{2}] \\ + (\begin{matrix} m \\ 2 \end{matrix}) [(d - 2) μ_{1} + 2 μ_{2}] + \dots + (\begin{matrix} m \\ d \end{matrix}) d μ_{2}] . \end{matrix}

(28)

Simultaneously, from (18) above, where $Σ_{i} σ_{i}^{2} \to N σ^{2}$ , the expected value can be expressed as a function of θ to yield

E [U (θ_{0}) ∣ H_{1}] = θ I (θ_{0}) = θ Nd σ^{2} [\frac{(\begin{matrix} n \\ d \end{matrix}) - 1}{(\begin{matrix} n \\ d \end{matrix})}] .

(29)

Equating (28) to (29) one can then solve for θ as a function of μ₁, μ₂, σ², d, and m. For example, for d = 2 and m = 3 the resulting equations yield

\begin{matrix} N [2 μ_{1} - \frac{2 μ_{1} + 6 [μ_{1} + μ_{2}] + (3) (2) μ_{2}}{10}] = & θ N 2 σ^{2} [\frac{(\begin{matrix} 5 \\ 2 \end{matrix}) - 1}{(\begin{matrix} 5 \\ 2 \end{matrix})}] \\ [\frac{12 (μ_{1} - μ_{2})}{18 σ^{2}}] = & θ \end{matrix}

(30)

so that the model parameter θ is a function of the mean difference divided by the variance times a constant depending on d and m.

The power of the study is then provided by substituting the above value of θ, and the value of I(θ₀) in (23) into the expression for Z_1–β in (24). The number of such sets required to provide the desired level of power is then provided by (25).

The above developments could also be applied to the case where there are (d_i, m_i) cases and controls within the ith set in which case E[U(θ₀)|H₁] = NE {E[U_i(θ₀)|H₁]} and I(θ₀) = NE(V_0i), each with respect to the distribution of (d_i, m_i). Substituting into (29) one could solve for θ as a function of μ₁, μ₂, σ².

For the special case where d = 1 and fixed m and n = m + 1, the above reduces to

\begin{matrix} E [U (θ_{0}) ∣ H_{1}] = & NE {E [(x_{i 1} - {\overset{‒}{x}}_{0 i}) ∣ H_{1}]} \\ = & NE [E (x_{i 1}) - \frac{Σ_{j = 1}^{m + 1} E (x_{ij})}{m + 1}] = θ N (\frac{m σ^{2}}{m + 1}) \end{matrix}

(31)

which implies that

\begin{matrix} N μ_{1} - N [\frac{μ_{1} + m μ_{2}}{m + 1}] = & N θ [\frac{m σ^{2}}{m + 1}] \\ \frac{μ_{1} - μ_{2}}{σ^{2}} = & θ \end{matrix}

(32)

for any m. Substituting this expression for θ in (24), the power of a study with N cases each matched to m controls is provided by

Z_{1 - β} = \sqrt[]{\frac{Nm}{m + 1}} [\frac{μ_{1} - μ_{2}}{σ}] - Z_{1 - α} .

(33)

Likewise, from (27), the number of matched sets required to provide a given level of power to detect a difference Δ = (μ₁ – μ₂) is provided by

N = {[\frac{(Z_{1 - α} + Z_{1 - β}) σ}{Δ}]}^{2} (\frac{m + 1}{m})

(34)

Compared to the expressions for power in (26) and number of sets in (27) in terms of θ, we now see that power increases (sample size decreases) as the variance decreases for a given mean difference between cases and controls.

For the design with variable numbers of controls matched to each case, with a single case per set, Walter [5] presents a test for the difference in the means of a quantitative covariate, assumed normally distributed among cases and controls with different means and variances. His test is computationaly different from the expression (8) above. He also presents an expression for the power of his test and the computation of sample size. Both his test and his equations for sample size and power are computationaly different from the above. The expressions herein are distribution-free, only requiring specification of the means and variance of the covariate, and also allow for any number of cases and controls within each matched set.

3.3 For a Binary Covariate

In the most general case with variable numbers of cases (d_i) and controls (m_i) within each of N matched sets, under the alternative hypothesis H₁: θ ≠ 0, the expected value of the score statistic from (18) is E[U(θ₀)|H₁] = θI(θ₀) where the information function has elements with V_0i as in (5). Since s_i is a sum of d_i Bernoulli variables, then $V (s_{i}) = d_{i} σ_{0 i}^{2}$ where $σ_{0 i}^{2} = π_{0 i} (1 - π_{0 i})$ and π_0i = E(p_0i) is the probability of a positive covariate value under H₀ in the ith set. For a collection of N sets with d_i cases, m_i controls and null probability π_0i (i = 1,…,N), the power of the study to detect a coefficient θ is provided by (21).

For the case where there are d cases and m controls for all N matched sets, then it follows that

I (θ_{0}) = d [\frac{(\begin{matrix} n \\ d \end{matrix}) - 1}{(\begin{matrix} n \\ d \end{matrix})}] Σ_{i = 1}^{N} σ_{0 i}^{2} = Nd σ_{0}^{2} [\frac{n! - d! m!}{n!}] .

(35)

where $σ_{0}^{2}$ is the average Bernoulli variance over all sets under H₀. Alternately, under H₀ it can be assumed that there is a common probability of exposure E(X) = π₀ among the cases and controls in all sets with the common Bernoulli variance $σ_{0}^{2} = π_{0} (1 - π_{0})$ . Then the power of the study is provided by (24) and the sample size by (25).

Power and sample size can also be expressed in terms of the differences in the probabilities under the alternative hypothesis by again applying the result in (15). Under H₁ assume common (or average) probabilities of exposure E(x_i1) = π₁ among the cases in each set and E(x_ij) = π₂ for the controls. The resulting expression for E[U(θ₀)|H₁] is as in shown in (28) with (π₁, π₂) substituted for (μ₁, μ₂), and also in (29) using the null Bernoulli variance $σ_{0}^{2}$ . It follows that

θ = C [\frac{π_{1} - π_{2}}{σ_{0}^{2}}]

(36)

where C is a function of d and m as shown in the example above for a quantitative covariate (30), where C = 1 when d = 1 for any value m.

For the design with m controls matched to each case, the resulting expressions for power and sample size as a function of θ are the same as presented in (26) and (27), and as a function of the difference in probabilities positive for the covariate among cases and controls under the alternative hypothesis, Δ = π₁ – π₂, in (33) and (34), each using the Bernoulli variance $σ_{0}^{2}$ .

Miettinen [4] presents an expression for the power of his test for a binary covariate in a 1:m design with m controls matched to a single case in each set. Walter [5] presents a generalization of Miettinen’s results to a 1:m_i design with variable numbers of controls matched to a single case within each set, that includes Miettinen’s original derivation as a special case. His expression for the power of his test is obtained from

Z_{1 - β (W)} = \frac{∣ Δ ∣ Σ_{i = 1}^{N} m_{i}}{{[(ψ^{*} ∕ 2) Σ_{i = 1}^{N} m_{i} (m_{i} + 1)]}^{1 ∕ 2}} - Z_{1 - α}

(37)

where ψ* is the probability that a random pair of controls is discordant that is an approximation to the probability (ψ) that a case and one of its controls are discordant under H₀. Conversely, from the derivations herein, the power for a study with a single case and variable numbers of controls in each set, with a common probability positive over all sets, is

Z_{1 - β} = ∣ θ ∣ {(π_{0} (1 - π_{0}) Σ_{i = 1}^{N} [\frac{n_{i} - 1}{n_{i}}])}^{1 ∕ 2} - Z_{1 - α}

(38)

where (n_i! – d_i!m_i!)/n_i! reduces to (n_i – 1)/n_i for d_i = 1 and m_i = n_i – 1. Whereas Miettinen [4] and Walter [5] describe the power of the test as a function of a common difference in probabilities positive for cases and controls among all matched sets, the methods herein assume a common odds ratio for all matched sets in accordance with the underlying assumptions of the conditional logistic regression model on which the test is based.

4. SOME EXAMPLES

4.1 A Quantitative Covariate

The study of the Epidemiology of Diabetes Interventions and its Complications (EDIC) is a long term follow-up study of the cohort of 1441 subjects originally enrolled in the Diabetes Control and Complications Trial. The DCCT/EDIC [17] showed that lower levels of glucose markedly reduced the risk of cardiovascular disease (CVD) in this cohort. The study is now planning a nested case-control study to measure “biomarkers” of oxidative stress, one of the possible mechanisms through which prolonged hyperglycemia could lead to CVD. Recently Blankenberg, et al., [18] reported that the biomarker soluble intercellular adhesion molecule had a hazard ratio of 1.46 per SD unit difference for risk of CVD. It is anticipated that 125 subjects will have experienced CVD in the DCCT/EDIC, all of whom will be employed as cases. With m = 2 controls per case, the 250 time-matched controls will provide 85% power to detect an odds ratio of 1.39 per SD difference using a score test at the 0.05 level, two-sided. In the matched case-control study the odds ratio is approximately equal to the hazard ratio. This odds ratio was obtained by solving (27) for θ with σ = 1. Using (26), this study would provide power of 93% to detect an odds ratio of 1.46 per SD.

One of the biomarkers of primary interest is the 8-isoprostane/creatinine ratio with a standard deviation of σ = 8.41 mg/ng from preliminary data. It follows that an odds ratio of 1.39 per SD corresponds to an odds ratio of (1.39)^1/8.41 = 1.04 per mg/ng difference in the biomarker, with corresponding θ = 0.038. From (32), this in turn implies that the study would have 85% power to detect a mean difference between cases and controls of μ₁ – μ₂ = θσ² = 2.69 ng/mg.

Hosmer and Lemeshow [19] present data from a multiply-matched case control study of factors associated with the risk of a pregnancy culminating in an infant with low birth weight (cf. [3], p. 301 et seq.). The study consisted of 17 matched sets with the following number of sets (w) with d cases and m controls:

\begin{matrix} d & 1 & 1 & 1 & 1 & 2 & 2 & 2 & 3 & 4 & 5 & 5 & 6 & 8 \\ m & 4 & 5 & 6 & 8 & 1 & 7 & 11 & 13 & 4 & 7 & 8 & 9 & 10 \\ w & 1 & 1 & 3 & 1 & 1 & 1 & 1 & 1 & 1 & 2 & 2 & 1 & 1 \end{matrix}

(39)

One of the risk factors to be assessed was maternal body weight with a standard deviation of σ = 32 among women with normal birth weight infants (the controls). Given the above distribution of cases and controls, based on (22), I(θ₀) is provided by

I (θ_{0}) = Σ_{i = 1}^{N} V_{0 i} = σ^{2} Σ_{i = 1}^{N} d_{i} [\frac{(\begin{matrix} n_{i} \\ d_{i} \end{matrix}) - 1}{(\begin{matrix} n_{i} \\ d_{i} \end{matrix})}] = {(32)}^{2} (51.2615) = 52491.8

(40)

and the power of the study to detect a log odds ratio (θ) per lb maternal weight difference is provided by (21). Examination over a range of values for θ shows that the study provided 90% power to detect an odds ratio of 0.986 per unit change in weight or θ = ln(0.986) = −0.0141.

4.2 A Qualitative Covariate

Another variable assessed by Hosmer and Lemeshow [19] is a maternal history of urinary tract infection. Approximately 10% of women with normal births have a history of UTI (π₂ = 0.1). Assume that it was desired to detect a doubling of this risk among the cases who give birth to low birth-weight infants (π₁ = 0.2), such that π₀ = 0.15. Thus $σ_{0}^{2} = (0.15) (0.85) = 0.1275$ . Given the mixtures of (d_i, m_i) presented in (39), and the sum of the combinatorial terms in (40), then I(θ₀) = (0.1275)(51.2615) = 6.536. Examination over a range of values for θ shows that the study provided 90% power to detect an odds ratio of 3.55 for those with a history of UTI versus not, or θ = ln(3.55) = 1.27.

Walter [5] presents an example of a matched case control study with 104 matched sets, each with a single case, 27 containing a single control and 77 containing two controls. Using the sample estimate of a discordant control pair from the sets with two controls, it is estimated that ${\hat{ψ}}^{*} = 0.385$ . For a design with 104 matched sets with ∑_i m_i = 181 and $Σ_{i} m_{i}^{2} = 355$ , Walter’s (37) provides an estimate of power = 0.57 to detect a difference of Δ = 0.1 in the probability of a positive covariate value among cases versus controls.

To examine the agreement with the more general expressions herein, power was calculated using (38) and Walter’s (37) over a range of values 0.15 ≤ π₁ ≤ 0.55 with corresponding π₂ = π₁ −0.1, and π₀ = π₁ −0.05. For all of these cases, since Δ = 0.1, Walter’s equation yields power of 0.57. The following are the corresponding odds ratios for each value of π₀, and the corresponding level of power:

\begin{matrix} π_{0} & 0.10 & 0.15 & 0.20 & 0.25 & 0.30 & 0.35 & 0.40 & 0.45 & 0.50 \\ OR & 3.353 & 2.250 & 1.889 & 1.714 & 1.615 & 1.556 & 1.519 & 1.500 & 1.494 \\ Power & 0.899 & 0.754 & 0.657 & 0.593 & 0.550 & 0.521 & 0.502 & 0.492 & 0.488 \end{matrix}

(41)

As π₀ decreases below 0.25 (or increases above 0.75) the odds ratio for a fixed Δ = 0.1 increases exponentially and so does power. Walter’s method does poorly in this case. Similar results are obtained for other values of Δ, Walter’s method underestimating power for π₀ < 0.25 (> 0.75), and increasingly overestimating power as π₀ approaches 0.5.

5. DISCUSSION

This paper describes the power of the score test in the conditional logistic regression model, or the Cox proportional hazards model with the Cox adjustment for ties and stratified sets, that can be applied to a study with any numbers of cases and controls or to the case where sets are sampled from a distribution of cases and controls within sets. For example, suppose that the distribution of (d_i, m_i) for the Hosmer-Lemeshow study presented in (39) represents the population of possible sets of cases and controls. From (40), the mean contribution among sets to I(θ₀) is 51.2615/17 = 3.0154. Then the above expressions could be generalized to provide the power of a study with N sets sampled at random from this distribution. For example, for the test of the effect of a history of urinary tract infection, then the value of I(θ₀) with N = 25 such sets would be I(θ₀) = 25 * (0.1275)(3.0154) = 9.6115 that would provide 90% power to detect an odds ratio of 2.845. This strategy could be employed when the distribution of numbers of cases and controls is unknown but the mean numbers, and approximately the mean factor, can be specified.

In many cases, such as the study described by Hosmer and Lemeshow, the objective is to evaluate multiple risk factors simultaneously, or the effect of one risk factor adjusted for other covariates. In this case, the score test is a C_α test. In principle, the power function for this test could also be derived for a given covariance matrix of the parameter estimates evaluated under the null hypothesis for the one factor of interest. This would be algebraically and computationally tedious.

Alternately, it seems reasonable to employ a deflation factor in conjunction with a univariate assessment. Let ${\hat{β}}_{x}$ denote the estimate of the coefficient for the risk factor of interest X in a univariate unadjusted model with information function $I ({\hat{β}}_{x})$ and corresponding variance of the estimate $V ({\hat{β}}_{x})$ . Then let ${\hat{β}}_{x ∣ z}$ denote the coefficient estimate in a model adjusted for a vector of other covariates Z, where $R_{x ∣ z}^{2}$ is the coefficient of determination for the regression of X on Z. Hsieh and Lavori [16] then obtain the covariate adjusted variance as

V ({\hat{β}}_{x ∣ z}) = V ({\hat{β}}_{x}) {[1 - R_{x ∣ z}^{2}]}^{- 1} = {(I ({\hat{β}}_{x}) [1 - R_{x ∣ z}^{2}])}^{- 1} .

(42)

Likewise, the expressions herein would be modified to employ $I (θ_{0}) [1 - R_{x ∣ z}^{2}]$ . This approach was also recently shown by Bernardo et al. [20] to apply to the power function of the score test in an exponential regression model for survival data.

For example, in the Hosmer-Lemeshow study, there is 90% power to detect an odds ratio of 0.986 per unit increase in weight in a univariate analysis. In the multivariate analysis in [3], R² = 0.0634 for the regression of maternal weight on the other covariates. Then (40) would be modified to yield

I (θ_{0}) = {(32)}^{2} (51.2615) (1 - 0.0634) = 49163.8

(43)

that in turn yields 87.8% power to detect an odds ratio of 0.986 per unit increase in weight in the adjusted analysis. If weight had a higher intercorrelation with other covariates in the model, such as $R_{x ∣ z}^{2} = 0.3$ , power would be reduced to 77%.

Recently Schoenfeld and Borenstein [21] describe calculation of the precise non-centrality parameter for the Wald test of a subset of coefficients in a multivariate logistic or PH model that employs a vector of quantitative covariates that are assumed to be distributed as multivariate normal. In simulations they show that the above variance inflation adjustment is not accurate when one or more of the adjusting covariates has a strong effect on the response. However, in a case-control study this adjustment may be fine if it can be assumed that any covariate with a large effect will be controlled by the matching.

An alternative approach to the design of a case-control study embedded within a prospective cohort study is a case-cohort design, originally proposed by Prentice [22], wherein the cases are compared to a pre-selected random sub-cohort of the main cohort. In this case, however, the standard PH model can not be applied. Self and Prentice [23] describe a model that provides unbiased estimates of the coefficients and the variance of the estimates. Therneau and Li [24] show that standard programs for the Cox PH model can be “tricked” into providing the Self-Prentice estimates, among others that have been proposed. There has been no derivation of the expressions for the power of this analysis as a function of the number of cases and the size of the sub-cohort. Chen and Lo [25], however, evaluated the relative efficiency of the case-cohort design versus the standard case-control design and concluded that the two designs with equivalent total numbers of cases and controls (i.e. the sub-cohort) were equally efficient. This suggests that the computations presented herein for a nested case-control study can also be used to evaluate the sample size and power for a case-cohort study with the same number of cases and a sub-cohort with the same size as the combined set of controls.

Acknowledgment

This work was partially supported by funding from the National Institute of Diabetes, Digestive and Kidney Diseases.

REFERENCES

1.Breslow NE. Covariance adjustment of relative-risk estimates in matched studies. Biometrics. 1982;38:661–672. [PubMed] [Google Scholar]
2.McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;12:153–157. doi: 10.1007/BF02295996. [DOI] [PubMed] [Google Scholar]
3.Lachin JM. Biostatistical Methods: The Assessment of Relative Risks. Wiley; New York: 2000. [Google Scholar]
4.Miettinen OS. Individual matching with multiple controls in the case of all-or-none responses. Biometrics. 1969;25:339–55. [PubMed] [Google Scholar]
5.Walter SD. Matched case-control studies with a variable number of controls per case. Applied Statistics. 1980;29:172–9. [Google Scholar]
6.Ury HK. Efficiency of case-control studies with multiple controls per case: continuous or dichotomous data. Biometrics. 1975;31:643–9. [PubMed] [Google Scholar]
7.Taylor JMG. Choosing the number of controls in a matched case-control study, some sample size, power and efficiency considerations. Statistics in Medicine. 1986;5:29–36. doi: 10.1002/sim.4780050106. [DOI] [PubMed] [Google Scholar]
8.Lui K-J. Estimation of sample sized in case-control studies with multiple controls per case: Dichotomous data. American Journal of Epidemiology. 1988;127:1064–1070. doi: 10.1093/oxfordjournals.aje.a114882. [DOI] [PubMed] [Google Scholar]
9.Ejigou A. Power and sample size for matched case-control studies. Biometrics. 1996;52:925–33. [Google Scholar]
10.Sinha S, Mukherjee A score test for determining sample size in matched case-control studies with categorical exposure. Biometrical Journal. 2006;48:35–53. doi: 10.1002/bimj.200510200. [DOI] [PubMed] [Google Scholar]
11.Liddel FDK, McDonald JC, Thomas DC. Methods for cohort analysis: Appraisal by application to asbestos mining. Journal of the Royal Statistical Society, Series A. 1977;140:469–483. [Google Scholar]
12.Thomas DC, Liddel FDK, McDonald JC, Thomas DC. Methods for cohort analysis: Appraisal by application to asbestos mining. Journal of the Royal Statistical Society, Series A. 1977;140:483–485. Addendum to the paper by. [Google Scholar]
13.Cox DR. Regression models and life-tables (with discussion) J. Roy. Statist. Soc., B. 1972;34:187–220. [Google Scholar]
14.Goldstein L, Langholz B. Asymptotic theory for nested case control sampling in the Cox regression model. Annals of Statistics. 1992;20:1903–28. [Google Scholar]
15.Schoenfeld D. Sample-size formula for the proportional-hazards regression model. Biometrics. 1983;39:499–503. [PubMed] [Google Scholar]
16.Hsieh FY, Lavori PW. Sample-Size Calculations for the Cox Proportional Hazards Regression Model with Nonbinary Covariates. Controlled Clinical Trials. 2000;21:552–560. doi: 10.1016/s0197-2456(00)00104-5. [DOI] [PubMed] [Google Scholar]
17.Nathan DM, Cleary PA, Backlund JC, Genuth SM, Lachin JM, Orchard TJ, Raslin P, Zinman B. Intensive Diabetes Treatment and Cardiovascular Disease in Patients with Type 1 Diabetes. The New England Journal of Medicine. 2005;353:2643–2653. doi: 10.1056/NEJMoa052187. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Blankenberg S, et al. for the HOPE Study Investigators Comparative impact of multiple biomarkers and N-Terminal Pro-Brain Natriuretic Peptide in the context of conventional risk factors for the prediction of recurrent cardiovascular events in the Heart Outcomes Prevention Evaluation (HOPE) Study. Circulation. 2006;114:201–208. doi: 10.1161/CIRCULATIONAHA.105.590927. [DOI] [PubMed] [Google Scholar]
19.Hosmer DW, Lemeshow S. Applied Logistic Regression. John Wiley; New York: 1989. [Google Scholar]
20.Bernardo MPV, Lipsitz SR, Harrington DP, Catalano PJ. Sample size calculation for failure time random variables in non-randomized studies. The Statistician. 2000;49:31–40. [Google Scholar]
21.Schoenfeld DA, Borenstein M. Calculating the power or sample size for the logistic and proportional hazards models. J Stat Computation and Simulation. 2005;75:771–85. [Google Scholar]
22.Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11. [Google Scholar]
23.Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Statist. 1988;16:64–81. [Google Scholar]
24.Therneau TM, Li H. Computing the Cox model for case-cohort designs. Lifetime Data Analysis. 1999;5:99–112. doi: 10.1023/a:1009691327335. [DOI] [PubMed] [Google Scholar]
25.Chen K, Lo S-H. Case-cohort and case-control analysis with Cox’s model. Biometrika. 1999;86:755–764. [Google Scholar]

[R1] 1.Breslow NE. Covariance adjustment of relative-risk estimates in matched studies. Biometrics. 1982;38:661–672. [PubMed] [Google Scholar]

[R2] 2.McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;12:153–157. doi: 10.1007/BF02295996. [DOI] [PubMed] [Google Scholar]

[R3] 3.Lachin JM. Biostatistical Methods: The Assessment of Relative Risks. Wiley; New York: 2000. [Google Scholar]

[R4] 4.Miettinen OS. Individual matching with multiple controls in the case of all-or-none responses. Biometrics. 1969;25:339–55. [PubMed] [Google Scholar]

[R5] 5.Walter SD. Matched case-control studies with a variable number of controls per case. Applied Statistics. 1980;29:172–9. [Google Scholar]

[R6] 6.Ury HK. Efficiency of case-control studies with multiple controls per case: continuous or dichotomous data. Biometrics. 1975;31:643–9. [PubMed] [Google Scholar]

[R7] 7.Taylor JMG. Choosing the number of controls in a matched case-control study, some sample size, power and efficiency considerations. Statistics in Medicine. 1986;5:29–36. doi: 10.1002/sim.4780050106. [DOI] [PubMed] [Google Scholar]

[R8] 8.Lui K-J. Estimation of sample sized in case-control studies with multiple controls per case: Dichotomous data. American Journal of Epidemiology. 1988;127:1064–1070. doi: 10.1093/oxfordjournals.aje.a114882. [DOI] [PubMed] [Google Scholar]

[R9] 9.Ejigou A. Power and sample size for matched case-control studies. Biometrics. 1996;52:925–33. [Google Scholar]

[R10] 10.Sinha S, Mukherjee A score test for determining sample size in matched case-control studies with categorical exposure. Biometrical Journal. 2006;48:35–53. doi: 10.1002/bimj.200510200. [DOI] [PubMed] [Google Scholar]

[R11] 11.Liddel FDK, McDonald JC, Thomas DC. Methods for cohort analysis: Appraisal by application to asbestos mining. Journal of the Royal Statistical Society, Series A. 1977;140:469–483. [Google Scholar]

[R12] 12.Thomas DC, Liddel FDK, McDonald JC, Thomas DC. Methods for cohort analysis: Appraisal by application to asbestos mining. Journal of the Royal Statistical Society, Series A. 1977;140:483–485. Addendum to the paper by. [Google Scholar]

[R13] 13.Cox DR. Regression models and life-tables (with discussion) J. Roy. Statist. Soc., B. 1972;34:187–220. [Google Scholar]

[R14] 14.Goldstein L, Langholz B. Asymptotic theory for nested case control sampling in the Cox regression model. Annals of Statistics. 1992;20:1903–28. [Google Scholar]

[R15] 15.Schoenfeld D. Sample-size formula for the proportional-hazards regression model. Biometrics. 1983;39:499–503. [PubMed] [Google Scholar]

[R16] 16.Hsieh FY, Lavori PW. Sample-Size Calculations for the Cox Proportional Hazards Regression Model with Nonbinary Covariates. Controlled Clinical Trials. 2000;21:552–560. doi: 10.1016/s0197-2456(00)00104-5. [DOI] [PubMed] [Google Scholar]

[R17] 17.Nathan DM, Cleary PA, Backlund JC, Genuth SM, Lachin JM, Orchard TJ, Raslin P, Zinman B. Intensive Diabetes Treatment and Cardiovascular Disease in Patients with Type 1 Diabetes. The New England Journal of Medicine. 2005;353:2643–2653. doi: 10.1056/NEJMoa052187. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Blankenberg S, et al. for the HOPE Study Investigators Comparative impact of multiple biomarkers and N-Terminal Pro-Brain Natriuretic Peptide in the context of conventional risk factors for the prediction of recurrent cardiovascular events in the Heart Outcomes Prevention Evaluation (HOPE) Study. Circulation. 2006;114:201–208. doi: 10.1161/CIRCULATIONAHA.105.590927. [DOI] [PubMed] [Google Scholar]

[R19] 19.Hosmer DW, Lemeshow S. Applied Logistic Regression. John Wiley; New York: 1989. [Google Scholar]

[R20] 20.Bernardo MPV, Lipsitz SR, Harrington DP, Catalano PJ. Sample size calculation for failure time random variables in non-randomized studies. The Statistician. 2000;49:31–40. [Google Scholar]

[R21] 21.Schoenfeld DA, Borenstein M. Calculating the power or sample size for the logistic and proportional hazards models. J Stat Computation and Simulation. 2005;75:771–85. [Google Scholar]

[R22] 22.Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11. [Google Scholar]

[R23] 23.Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Statist. 1988;16:64–81. [Google Scholar]

[R24] 24.Therneau TM, Li H. Computing the Cox model for case-cohort designs. Lifetime Data Analysis. 1999;5:99–112. doi: 10.1023/a:1009691327335. [DOI] [PubMed] [Google Scholar]

[R25] 25.Chen K, Lo S-H. Case-cohort and case-control analysis with Cox’s model. Biometrika. 1999;86:755–764. [Google Scholar]

PERMALINK

Sample Size Evaluation for a Multiply Matched Case-Control Study Using the Score Test From a Conditional Logistic (Discrete Cox PH) Regression Model

John M Lachin

Summary

1. INTRODUCTION

2. MULTIPLY MATCHED SCORE TEST

2.1 A Quantitative Covariate

2.2 A Binary Covariate

3. SAMPLE SIZE AND POWER

3.1 Distribution Under the Alternative

3.2 A Quantitative Covariate

3.2.1 As a Function of θ

3.2.2 As a Function of the Mean Difference

3.3 For a Binary Covariate

4. SOME EXAMPLES

4.1 A Quantitative Covariate

4.2 A Qualitative Covariate

5. DISCUSSION

Acknowledgment

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Sample Size Evaluation for a Multiply Matched Case-Control Study Using the Score Test From a Conditional Logistic (Discrete Cox PH) Regression Model

John M Lachin

Summary

1. INTRODUCTION

2. MULTIPLY MATCHED SCORE TEST

2.1 A Quantitative Covariate

2.2 A Binary Covariate

3. SAMPLE SIZE AND POWER

3.1 Distribution Under the Alternative

3.2 A Quantitative Covariate

3.2.1 As a Function of θ

3.2.2 As a Function of the Mean Difference

3.3 For a Binary Covariate

4. SOME EXAMPLES

4.1 A Quantitative Covariate

4.2 A Qualitative Covariate

5. DISCUSSION

Acknowledgment

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases