artbin: Extended sample size for randomized trials with binary outcomes

Ella Marley-Zagar; Ian R White; Patrick Royston; Friederike M-S Barthel; Mahesh K B Parmar; Abdel G Babiker

doi:10.1177/1536867X231161971

. Author manuscript; available in PMC: 2023 Jul 17.

Published in final edited form as: Stata J. 2023 Apr 5;23(1):24–52. doi: 10.1177/1536867X231161971

artbin: Extended sample size for randomized trials with binary outcomes

Ella Marley-Zagar ¹, Ian R White ², Patrick Royston ³, Friederike M-S Barthel ⁴, Mahesh K B Parmar ⁵, Abdel G Babiker ⁶

PMCID: PMC7614770 EMSID: EMS178743 PMID: 37461744

Abstract

We describe the command artbin, which offers various new facilities for the calculation of sample size for binary outcome variables that are not otherwise available in Stata. While artbin has been available since 2004, it has not been previously described in the Stata Journal. artbin has been recently updated to include new options for different statistical tests, methods and study designs, improved syntax, and better handling of noninferiority trials. In this article, we describe the updated version of artbin and detail the various formulas used within artbin in different settings.

Keywords: st0013_3, artbin, sample size, power, binary outcome, randomized clinical trial, superiority trial, noninferiority trial

1. Introduction

Sample-size calculation is essential in the design of a randomized clinical trial to ensure that there is adequate power to evaluate treatment. It is also used in the design of randomized experiments in other fields such as education, international development (Attanasio, Kugler, and Meghi 2011), and criminology (Braga et al. 1999). It can also be used in the design of nonrandomized comparative studies (Quigley et al. 2019).

In Stata, several standard sample-size calculations are available in the inbuilt power family. More-advanced sample-size calculations are provided in the Analysis of Resources for Trials (ART) package (Barthel, Royston, and Babiker 2005; Barthel et al. 2006; Royston and Barthel 2010). ART is primarily aimed at trials with a time-to-event outcome, but it also includes the command artbin for trials with a binary outcome. artbin differs from the official power command by allowing many statistical tests, such as score, Wald, conditional, and trend across K groups, and by offering calculations under local or distant alternatives with or without continuity correction.

The calculations in artbin are based on a set of anticipated probabilities of the binary outcome, one in each treatment group. If the unknown probabilities of the binary outcome equal the anticipated probabilities, then artbin tells us the power achieved for a specified sample size or the sample size required to achieve the specified power.

The basic idea of sample-size calculation with a binary outcome is well known. We define the power 1 − β to be the probability of rejecting the null hypothesis at the two-sided α level of significance.

In a two-group superiority trial, the null hypothesis is that the outcome probabilities in the two groups are equal and the alternative hypothesis is that they take the unequal anticipated probabilities $π_{1}^{a}$ and $π_{2}^{a}$ . If the trial has equal sample sizes n in each group, then a popular formula for the total sample size required is

2 n = 2 \frac{{z_{1 - α / 2} \sqrt{2 {\bar{π}}^{a} (1 - {\bar{π}}^{a})} + z_{1 - β} \sqrt{π_{1}^{a} (1 - π_{1}^{a}) + π_{2}^{a} (1 - π_{2}^{a})}}^{2}}{{(π_{2}^{a} - π_{1}^{a})}^{2}}

where z_c = Φ⁻¹(c) is the standard normal deviate and ${\bar{π}}^{a} = (π_{1}^{a} + π_{2}^{a}) / 2$ (Julious and Campbell 2012). Extensions are well known for unequal sample sizes.

However, several complications arise that are tackled by artbin. Some trials have more than two groups, and in these cases we may test for trend across the groups or for heterogeneity between the groups. There are variants of the sample-size formulas for different versions of the test applied to the data (for example, Pearson’s χ² or Wald), and there are “local” variants that are valid only when the treatment effect is small. A loss to follow-up option is useful for the replication of sample-size calculations, as advocated by Clark, Berger, and Mansmann (2013).

Further, some two-group trials are noninferiority trials, in which the null hypothesis is that the experimental treatment is no worse than the control treatment by a prespecified amount m, termed the margin. They are used when the experimental treatment is not expected to be superior, but they do have other benefits, such as being cheaper, less toxic, or easier to administer, for example. Substantial-superiority trials are now increasingly used, especially in vaccine trials, where the null hypothesis states that the experimental treatment is better than the control treatment by at least m (see Krause et al. [2020]).

The latest upgrade of artbin substantially improves the original version released in 2004. The option to specify a margin for noninferiority or substantial-superiority trials has been included to enable sample-size and power calculations for more-complex two-group trials. New options for statistical tests and methods are now available, such as the Wald test, which is commonly used for sample-size calculation in noninferiority trials in medicine. The syntax and output have been improved, with more options available and clearer output. artbin does not require the anticipated event probabilities to be the same in the two groups for noninferiority or substantial-superiority trials, unlike any other software packages currently available in Stata. Previous users of artbin will need to alter existing artbin code to accommodate the changes. Please see the description of what has changed (appendix 1) for further details.

This article has three aims. First, it clearly lays out the scope of the artbin package and its dialog boxes and exemplifies its use. Second, it describes the updates made. Third, it clarifies the formulas used.

The article comprises a description of the new syntax (section 3.1), illustrative examples (section 3), a description of the updated menus and dialogs (section 4), details of the methods used (section 5), a description of how the software has been tested (section 6), and conclusions (section 7).

2. The artbin command

2.1. Syntax

artbin, pr(numlist) [margin(#)
  [unfavourable | unfavorable | favourable | favorable] [power(#) | n(#)]
  aratios(aratio_list) ltfu(#) alpha(#) onesided trend doses(dose_list)
  condit wald ccorrect local noround force]

artbin calculates the power or total sample size for various tests comparing K anticipated probabilities. Power is calculated if n() is specified; otherwise, total sample size is estimated. artbin can be used in designing superiority, noninferiority, and substantial-superiority trials.

artbin makes comparisons on the scale of difference in probabilities. The results on other scales, such as odds ratios, will be very similar for superiority trials but potentially very different for noninferiority and substantial-superiority trials (Quartagno et al. 2020).

In a multigroup trial, artbin is based on a test of the global null hypothesis that the probabilities are equal in all groups. The alternative hypothesis is that there is a difference between two or more of the groups.

In a two-group superiority trial, artbin is based on a test of the null hypothesis that the probabilities in the two groups are equal. The alternative hypothesis is that they take unequal values, such that the experimental treatment is better than the control treatment.

In a noninferiority trial, artbin is based on a test of the null hypothesis that the experimental treatment is worse than the control treatment by at least a prespecified amount, termed the margin. artbin supports the design of more-complex noninferiority trials in which $π_{1}^{a}$ and $π_{2}^{a}$ are unequal. Substantial-superiority trials are increasingly used; here the null hypothesis is that the experimental treatment is better than the control treatment by the margin at most.

To minimize the risk of error in two-group trials, the user is advised to identify whether the trial outcome is favorable or unfavorable. By default, artbin infers favorability status from the pr() and margin() options. If $π_{2}^{a} > π_{1}^{a} +$ margin(), the outcome is assumed to be favorable; otherwise, it is assumed to be unfavorable.

2.2. Options

pr(#1 … #K) specifies the anticipated outcome probabilities in the groups that will be compared. #1 is the anticipated probability in the control group $(π_{1}^{a})$ , and #2, …, #K are the anticipated probabilities in the treatment groups $(π_{2}^{a}, \dots, π_{K}^{a})$ . pr() is required.

margin(#) is used with two-group trials and must be specified if a noninferiority or substantial-superiority trial is being designed. The default is margin(0), denoting a superiority trial. If the event of interest is unfavorable, the null hypothesis for all of these designs is π₂ − π₁ ≥ m, where m is the prespecified margin. The alternative hypothesis is π₂ − π₁ < m. m > 0 denotes a noninferiority trial, whereas m < 0 denotes a substantial-superiority trial. On the other hand, if the event of interest is favorable, the above inequalities are reversed. The null hypothesis for all of these designs is then π₂ − π₁ ≥ m, and the alternative hypothesis is π₂ − π₁ > m. m < 0 denotes a noninferiority trial, while m > 0 denotes a substantial-superiority trial. The hypothesized margin for the difference in anticipated probabilities, #, must lie between −1 and 1.

unfavourable | unfavorable or favourable | favorable are used with two-group trials to specify whether the outcome is unfavorable or favorable. If either option is used, artbin checks the assumptions; otherwise, it infers the favorability status. American and English spellings are both allowed.

power(#) specifies the required power of the trial at the alpha() significance level and computes the total sample size. power() cannot be used with n(). The default is power(0.8).

n(#) specifies the total sample size available and computes the corresponding power. n() cannot be used with power(). The default is to calculate the sample size for power 0.8.

aratios(aratio_list) specifies the allocation ratios. The allocation ratio for group k is #k, k = 1, …, K; for example, aratios(1 2) means that two participants are randomized to the experimental group for each one randomized to the control group. With two groups, aratios(#) is taken to mean aratios(1 #). The default is equal allocation to all groups.

ltfu(#) assumes a proportional loss to follow-up of #, where # is a number between 0 and 1. The total sample size is divided by 1−# before rounding. The default is ltfu(0), meaning no loss to follow-up.

alpha(#) specifies that the trial will be analyzed using a significance test with level #. That is, # is the type 1 error probability. The default is alpha(0.05).

onesided is used for two-group trials and for trend tests in multigroup trials. It specifies that the significance level given by alpha() is one sided. Otherwise, the value of alpha() is halved to give a one-sided significance level. Thus, for example, alpha(0.05) is exactly the same as alpha(0.025) onesided.

artbin always assumes that a two-group trial or a trend test in a multigroup trial will be analyzed using a one-sided alternative, regardless of whether the alpha level was specified as one sided or two sided. artbin, therefore, uses a slightly different definition of power from the power command: when a two-tailed test is performed, power reports the probability of rejecting the null hypothesis in either direction, whereas artbin only considers rejecting the null hypothesis in the direction of interest.

artbin assumes that multigroup trials will be analyzed using a two-sided alternative, so onesided is not allowed with multigroup trials unless trend or doses() is specified (see below).

trend is used for trials with more than two groups and specifies that the trial will be analyzed using a linear trend test. The default is a test for any difference between the groups. See also doses().

doses(dose_list) is used for trials with more than two groups and specifies “doses” or other quantitative measures for a dose–response (linear trend) test. doses() implies trend. doses(#1 #2 … #r) assigns doses for groups 1, …, r. If r < K (the total number of groups), the dose is assumed equal to #r for groups r + 1, r + 2, …, K. If trend is specified without doses(), then the default is doses(1 2 … K). doses() is not permitted for a two-group trial.

condit specifies that the trial will be analyzed using Peto’s conditional test. This test conditions on the total number of events observed and is based on Peto’s local approximation to the log odds-ratio. This option is also likely to be a good approximation with other conditional tests. The default is the usual Pearson χ² test. condit is not available for noninferiority and super-superiority trials. condit cannot be used with wald, because only one test type is allowed. condit implies local. The ccorrect option is not available with condit.

wald specifies that the trial will be analyzed using the Wald test. The default is the usual Pearson χ² test. wald cannot be used with condit, because only one test type is allowed. The Wald test inherently allows for distant alternatives, so wald and local cannot be used together.

ccorrect specifies that the trial will be analyzed with a continuity correction. ccorrect is not available with condit. The default is no continuity correction.

local specifies that the calculation should use the variance of the difference in proportions only under the null. This approximation is valid when the treatment effect is small. The default uses the variance of the difference in proportions both under the null and under the alternative hypothesis. The local method is not recommended and is only included to allow comparisons with other software. The Wald test inherently allows for distant alternatives, so wald and local cannot be used together.

noround prevents rounding of the calculated sample size in each group up to the nearest integer. The default is to round.

force can be used with two-group studies to override the program’s inference of the favorable or unfavorable outcome type. This may be needed, for example, when designing an observational study with a harmful risk factor; the favorability types would be reversed and the force option applied.

3. Examples

3.1. Binary outcome and comparison with published sample size

We reproduce the sample-size calculation in Pocock (1983) for a two-group superiority trial comparing the efficacy of therapeutic doses of Anturan in patients after a myocardial infarction with the placebo standard treatment. The primary outcome was death from any cause within one year of first treatment. The control (placebo) group was anticipated to have a 10% probability of death within one year and the Anturan treatment group a 5% probability, with the trial powered at 90%. The patient outcome was binary: either failure (death in a year) or success (survival). The published sample size was 578 patients per group (1,156 patients in total).

In the below artbin example, we do not specify in the syntax whether the outcome is favorable or unfavorable; rather, we let the program infer it. The aim of a clinical trial is always to improve patient outcome. Therefore, because the experimental-group anticipated probability $(π_{2}^{a} = 0.05)$ is less than the control-group anticipated probability $(π_{1}^{a} = 0.1)$ , it can be inferred that the outcome is unfavorable (that is, the trial is aiming to reduce the probability of the event occurring, in this case, death).

. artbin, pr(0.1 0.05) alpha(0.05) power(0.9) wald
ART - ANALYSIS OF RESOURCES FOR TRIALS (binary version 2.0.1 09june2022)

A sample size program by Abdel Babiker, Patrick Royston, Friederike Barthel, Ella Marley-Zagar and Ian White MRC Clinical Trials Unit at UCL, London WC1V 6LJ, UK.
Type of trial	superiority
Number of groups	2
Favourable/unfavourable outcome	unfavourable Inferred by the program
Allocation ratio	equal group sizes
Statistical test assumed	unconditional comparison of 2 binomial proportions using the wald test
Local or distant	distant
Continuity correction	no
Anticipated event probabilities	0.100 0.050
Alpha	0.050 (two-sided) (taken as .025 one-sided)
Power (designed)	0.900
Total sample size (calculated)	1156
Sample size per group (calculated)	578 578
Expected total number of events	86.70

Open in a new tab

The artbin output table shows the trial setup information, including the study design, statistical tests, and methods used. The hypothesis tests are shown with the calculated sample size and events based on the selected power. A total sample size of 1,156 participants is required, as per the published sample size given by Pocock (1983). The same result is achieved by the command artbin, pr(0.9 0.95) alpha(0.05) power(0.9) wald, assuming a favorable outcome (survival) instead. The Wald test is used instead of the default score test because Pocock used the sample estimate in the method of estimating the variance of the difference in proportions under the null hypothesis H₀.

3.2. Binary outcome and comparison with power

We compare the output of artbin with the output of Stata’s power command, which, like artbin, uses the score test as the default.

. power twoproportions 0.1 0.05, alpha(0.05) power(0.9)
Performing iteration …
Estimated sample sizes for a two-sample proportions test Pearson’s chi-squared test
H0: p2 = p1 versus Ha: p2 != p1
Study parameters:
        alpha =	 0.0500
        power =	 0.9000
        delta =	-0.0500 (difference)
           p1 =	 0.1000
           p2 =  0.0500
Estimated sample sizes:
            N =   1,164
  N per group =	    582
. artbin, pr(0.1 0.05) alpha(0.05) power(0.9)
ART - ANALYSIS OF RESOURCES FOR TRIALS (binary version 2.0.1 09june2022)

A sample size program by Abdel Babiker, Patrick Royston, Friederike Barthel, Ella Marley-Zagar and Ian White MRC Clinical Trials Unit at UCL, London WC1V 6LJ, UK.
Type of trial	superiority
Number of groups	2
Favourable/unfavourable outcome	unfavourable Inferred by the program
Allocation ratio	equal group sizes
Statistical test assumed	unconditional comparison of 2 binomial proportions using the score test
Local or distant	distant
Continuity correction	no
Anticipated event probabilities	0.100 0.050
Alpha	0.050 (two-sided) (taken as .025 one-sided)
Power (designed)	0.900
Total sample size (calculated)	1164
Sample size per group (calculated)	582 582
Expected total number of events	87.30

Open in a new tab

Both give a total sample size of 1,164.

3.3. One-sided noninferiority trial

Next we show a one-sided noninferiority trial with the onesided option. We anticipate a 90% probability of survival in both the control group and the treatment group, with the null hypothesis that the treatment group is at least 5% less effective than the control.

. artbin, pr(0.9 0.9) margin(-0.05) onesided
ART - ANALYSIS OF RESOURCES FOR TRIALS (binary version 2.0.1 09june2022)

A sample size program by Abdel Babiker, Patrick Royston, Friederike Barthel, Ella Marley-Zagar and Ian White MRC Clinical Trials Unit at UCL, London WC1V 6LJ, UK.
Type of trial	non-inferiority
Number of groups	2
Favourable/unfavourable outcome	favourable Inferred by the program
Allocation ratio	equal group sizes
Statistical test assumed	unconditional comparison of 2 binomial proportions using the score test
Local or distant	distant
Continuity correction	no
Null hypothesis H0:	H0: pi2 - pi1 <= -.05
Alternative hypothesis H1:	H1: pi2 - pi1 > -.05
Anticipated event probabilities	0.900 0.900
Alpha	0.050 (one-sided)
Power (designed)	0.800
Total sample size (calculated)	914
Sample size per group (calculated)	457 457
Expected total number of events	822.60

Open in a new tab

A sample size of 457 is required in each group.

3.4. Superiority trial with multiple groups

Here we demonstrate a superiority trial with more than two groups. Instead of comparing each of the treatment groups with the control group, artbin uses a global test to assess if there is any difference among the groups.

. artbin, pr(0.1 0.2 0.3 0.4) alpha(0.1) power(0.9)
ART - ANALYSIS OF RESOURCES FOR TRIALS (binary version 2.0.1 09june2022)

A sample size program by Abdel Babiker, Patrick Royston, Friederike Barthel, Ella Marley-Zagar and Ian White MRC Clinical Trials Unit at UCL, London WC1V 6LJ, UK.
Type of trial	superiority
Number of groups	4
Favourable/unfavourable outcome	not determined
Allocation ratio	equal group sizes
Statistical test assumed	unconditional comparison of 4 binomial proportions using the score test
Local or distant	distant
Continuity correction	no
Anticipated event probabilities	0.100 0.200 0.300 0.400
Alpha	0.100 (two-sided)
Power (designed)	0.900
Total sample size (calculated)	176
Sample size per group (calculated)	44 44 44 44
Expected total number of events	44.00

Open in a new tab

A sample size of 44 is required in all four groups.

3.5. Complex noninferiority trial in a real-life setting

Finally, we demonstrate a more complex noninferiority design from the STREAM trial. The need for the STREAM trial arose from the increase of multidrug-resistant strains of tuberculosis, especially in countries without robust healthcare systems that were unable to administer treatment over long periods of time. The STREAM trial evaluated a shorter, more intensive treatment for multidrug-resistant tuberculosis compared with the lengthier treatment recommended by the World Health Organization.

A favorable outcome was defined by cultures negative for mycobacterium tuberculosis at 132 weeks and at a previous occasion, with no intervening positive culture or previous unfavorable outcome (Nunn et al. 2019). The sample-size calculation used an anticipated 0.7 probability of a favorable outcome on control $(π_{1}^{a})$ and 0.75 on treatment $(π_{2}^{a})$ . Hence, it was assumed that 70% of the participants in the long-regimen group and 75% in the short-regimen group would attain a favorable outcome. A 10-percentage-point noninferiority margin was considered to be an acceptable difference in efficacy, given the shorter treatment duration (m = −0.1 defined as π₂ − π₁). It was assumed there were twice as many patients in treatment compared with control. The wald test was applied because it is often used in noninferiority trials.

. artbin, pr(0.7 0.75) margin(-0.1) power(0.8) aratios(1 2) wald ltfu(0.2)
ART - ANALYSIS OF RESOURCES FOR TRIALS (binary version 2.0.1 09june2022)

A sample size program by Abdel Babiker, Patrick Royston, Friederike Barthel, Ella Marley-Zagar and Ian White MRC Clinical Trials Unit at UCL, London WC1V 6LJ, UK.
Type of trial	non-inferiority
Number of groups	2
Favourable/unfavourable outcome	favourable Inferred by the program
Allocation ratio	1:2
Statistical test assumed	unconditional comparison of 2 binomial proportions using the wald test
Local or distant	distant
Continuity correction	no
Null hypothesis H0:	H0: pi2 - pi1 <= -.1
Alternative hypothesis H1:	H1: pi2 - pi1 > -.1
Anticipated event probabilities	0.700 0.750
Alpha	0.050 (two-sided) (taken as .025 one-sided)
Power (designed)	0.800
Loss to follow up assumed:	20 %
Total sample size (calculated)	399
Sample size per group (calculated)	133 266
Expected total number of events	292.60

Open in a new tab

The noninferiority trial required a total sample size of 399 (133 in control and 266 in treatment), assuming 20% of patients were not assessable in primary analysis. When the STREAM trial concluded, it estimated that a shorter, more intensive treatment for multidrug-resistant tuberculosis was only 1% less effective than the lengthier treatment recommended by the World Health Organization and demonstrated significant evidence of noninferiority.

4. Menu and dialogs

All the features in artbin are available from the artbin menu and associated dialogs. Once the selections have been inputted into the menu box, the associated command line will be displayed in the Review window. If the user would like to generate a do-file to reproduce the calculations, a log file can be opened before executing the commands via the dialog, which will then save the command line.

Once the ART package has been installed in Stata, the artbin dialog menu can be used. To access the interactive menu, type artmenu on, which will cause a new item, ART, to appear on the system menu bar under User. To turn this menu off, type artmenu off. ART consists of three programs, namely,

survival outcomes (corresponding to artsurv),
projection of events and power (corresponding to artpep), and
binary outcomes (corresponding to artbin).

artsurv and artpep are described in Barthel, Royston, and Babiker (2005) and Royston and Barthel (2010), respectively.

Compared with previous versions, new options such as Margin, Favourable or Unfavourable, Loss to follow-up, Score test, Wald test, Continuity correction, and Do not round have now been included within an updated layout design.

Figure 1 illustrates the dialog box for binary outcomes. The artbin dialog box allows the user to input the parameters for the trial setup. Options are deselected based on the user’s choices; for example, if the Conditional test (Peto) checkbox is selected, then the Wald test checkbox will be grayed out.

The dialog box output is the same as the output in section 3.5 and corresponds to the inputs shown in the figure 1 menu box. The detailed display enables the user to check that the trial design has been inputted correctly.

5. Methods and formulas

5.1. Notation

Consider the design of a study to compare K independent groups in terms of a binary outcome whose probability of occurrence for an individual in group k is π_k, k = 1, 2, …, K. We refer to group 1 as a control group and groups 2, …, K as experimental groups.

Let Y_k be the number of events in a sample of size n_k = r_kN from a total sample size N, where r_k is the fraction allocated to group k for k = 1, 2, …, K. Then Y_k has the binomial distribution binom(n_k, π_k). Write $\bar{π} = \sum_{k = 1}^{K} r_{k} π_{k}$ as the overall outcome probability. Let Y = Σ_k Y_k. The estimated outcome probabilities ${\hat{π}}_{k}$ and $\bar{\hat{π}}$ are ${\hat{π}}_{k} = Y_{k} / r_{k} N$ and $\bar{\hat{π}} = Y . / N = \sum_{k = 1}^{K} r_{k} {\hat{π}}_{k}$ .

We consider the general case and then the case K = 2. For each case, we define a test statistic and derive its distribution under the null and alternative hypotheses (section 5.2). We then apply generic methods to derive sample sizes or powers (section 5.3).

5.2. Summary of test statistics and their distributions

Unconditional methods are based on a score vector U = (U₂, …, U_K)′, where $U_{k} = {\hat{π}}_{k} - \bar{\hat{π}}$ . Conditional methods are based on a different score vector X = (X₂, …, X_K)^′, where X_k = Y_k − r_kY_. = r_kNU_k. Table 1 shows the test statistics and their null and alternative distributions. See appendix 2 for further details of definitions, such as Q, V, A, M, and T. All methods are unconditional unless otherwise stated. The approximate distant method is based on the work of Yuan and Bentler (2010).

Table 1. Summary of test statistics and their distributions.

Method	Statistic		Distribution
Method	Statistic	Null	Alternative
K groups, heterogeneity
Score local	$\begin{array}{l} Q_{u} = N U^{'} {\hat{V}}_{u}^{- 1} U \\ {\hat{V}}_{u} = N \hat{var} (U \| H_{0}) \end{array}$	$χ_{K - 1}^{2}$	$\begin{array}{l} N C χ^{2} (K - 1, λ) \\ λ = N μ^{'} V_{u}^{- 1} μ \\ μ_{k} = π_{k}^{a} - {\bar{π}}^{a} \end{array}$
Score distant approximate	same	same	cNC𝓧²(K – 1, γ) Yuan and Bentler (2010) with equations for c, γ (see appendix 2)
Wald	$\begin{array}{l} Q_{w} = N U^{'} {\hat{A}}^{- 1} U \\ \hat{A} = N \hat{var} (U \| H_{a}) \end{array}$	$χ_{K - 1}^{2}$	$\begin{array}{l} N C χ^{2} (K - 1, λ) \\ λ = N μ^{'} A^{- 1} μ \end{array}$
Conditional local	$\begin{array}{l} Q_{c} = X^{'} V_{c}^{- 1} X / M \\ M = \bar{\hat{π}} (1 - \bar{\hat{π}}) N^{2} / (N - 1) \\ V_{c} = var (X \| H_{0}) / M \end{array}$	$χ_{K - 1}^{2}$	$\begin{array}{l} N C χ^{2} (K - 1, λ) \\ λ = M η^{'} V_{c} η \\ η_{k} = l o g i t π_{k}^{a} - l o g i t π_{1}^{a} \end{array}$
K groups, trend
Score local	T_u = c′U c_k = r_k(d_k – d₁ where d₁, d₂, …, d_k are doses for groups 1, 2, …, k	N(0, c′V _uc/N	N(c′ μ, c′V _uc/N
Score distant	same	same	N(c′ μ, c′A _uc/N
Wald	same	N(0, c′A c/N	N(c′ μ, c′A _uc/N
Conditional local	T_c = c′X/M	N(0, c′V _uc/N	N(c′V _cη, c′V _cc/M
Two groups, superiority or noninferiority
All	$\begin{array}{l} T_{2} = \hat{δ} - m \\ \hat{δ} = {\hat{π}}_{2} - {\hat{π}}_{1} \\ m = m a r g i n \end{array}$	N(0, V_n/N)	$\begin{array}{l} N (δ - m, V_{a} / N) \\ V_{a} = \frac{π_{1}^{a} (1 - π_{1}^{a})}{r_{1}} + \frac{π_{2}^{a} (1 - π_{2}^{a})}{r_{2}} \end{array}$
In the above, $V_{n} = {{\tilde{π}}_{1}^{a} (1 - {\tilde{π}}_{1}^{a})} / r_{1} + {{\tilde{π}}_{2}^{a} (1 - {\tilde{π}}_{2}^{a})} / r_{2}$ , where ${\tilde{π}}_{1}^{a}$ and ${\tilde{π}}_{2}^{a}$ are values of $π_{1}^{a}$ and $π_{2}^{a}$ modified to conform to H₀ in one of the following ways:
Score distant	Maximum likelihood estimates of π₁ and π₂ constrained to δ = m
Score local	As score, but replacing V_a with V_n
Wald	${\tilde{π}}_{1}^{a} = π_{1}^{a} and {\tilde{π}}_{2}^{a} = π_{2}^{a} (so V_{n} = V_{a})$
Conditional local	Methods for K groups are used (superiority trial only)

Open in a new tab

5.3. Summary of methods

5.3.1. K groups, heterogeneity

The following statistics are approximated as $χ_{K - 1}^{2}$ under the null. Let x_α(m) be the (1−α)100th percentile of the (central) χ² distribution with m degrees of freedom. Then, for a test statistic for which we write S_N to emphasize its dependence on sample size N, power is related to the total sample size N by the equation

power = \Pr {S_{N} > x_{α} (K - 1) | H_{a}}

(1)

The distributions under the alternative hypothesis are all of the form cX, where c is a constant depending on N and X is a noncentral χ² random variable with K − 1 degrees of freedom and noncentrality parameter λ depending on N and the anticipated probabilities. Then (1) gives the key equation

power = 1 - F_{K - 1, λ} {x_{α} (K - 1) / c}

where F_K−1,λ(x) is the cumulative distribution function of the noncentral χ² distribution with K − 1 degrees of freedom and noncentrality parameter λ. We can directly evaluate this for power given N. Solving for N given power involves iterative methods in some cases.

5.3.2. All other cases

These statistics S_N are all approximated as $N (0, σ_{0}^{2} / N)$ under H₀ and $N (μ_{1}, σ_{1}^{2} / N)$ under H_a, where σ₁ depends on the anticipated probabilities. Let z_a denote the (1 − a)100th percentile of the standard normal distribution, where, for a one-sided test, a = α, and for a two-sided test, a = α/2. Then (1) gives the key equation

power = \Pr (S_{N} > z_{a} σ_{0} / \sqrt{N} | H_{a}) = Φ (\frac{μ_{1} - z_{a} σ_{0} / \sqrt{N}}{σ_{1} / \sqrt{N}})

Rearranging, the total sample size to achieve power 1 − β is

N = {(\frac{z_{a} σ_{0} + z_{β} σ_{1}}{μ_{1}})}^{2}

6. Software testing

artbin is for use in the design of randomized trials, so we have tested it extensively. The program was modified by Ella Marley-Zagar and tested by Ella Marley-Zagar, Ian R. White, Patrick Royston, and Abdel G. Babiker. We report the testing methods below to verify both the sample-size and the power results. We ran the test scripts with the default variable type (set type) as float and as double.

We compared results for noninferiority trials with those given by Julious and Owen (2011), Blackwelder (1982), Pocock (2003), and the online calculator Sealed Envelope (2012). Exact agreement was achieved.
We compared results for a superiority binary outcome with those given by Pocock (1983) and the online calculator Sealed Envelope (2012). Exact agreement was achieved.
We tested several scenarios including continuity correction results given by artbin and those given by the Stata program power. The results from both programs were in agreement.
We checked the results given by artbin using the margin() option against Julious and Owen (2011). Exact agreement was achieved.
The output of artbin was compared with Cytel’s software EAST, which is a sophisticated package able to produce sample-size and power calculations for several binary outcomes in clinical trial settings. We achieved perfect agreement in all but a handful of cases where the sample size differed by 1, which we believe is due to the difference in the way the packages round sample size.
For the new syntax options, we tested onesided for a one-sided test and ccorrect to apply a continuity correction.
We tested every permutation of two-group and more than two-group and noninferiority, substantial-superiority, and superiority trials with margin, local or distant, conditional or unconditional, trend, and Wald test options to check that the results were as expected and that sample size was increased or decreased accordingly.
We checked error messages in several impossible cases to ensure that we obtained error messages as required.
We tested the dialog box menu options to verify that the results were as required.

7. Conclusions

We have written artbin to include new syntax with additional options, including extensions to the tests and methods offered by previous versions of the software. We have also refreshed the layout of the dialog box for artbin, with mutually exclusive options grayed out for clarity. The updated artbin program compares well with Stata’s power program, as well as other commercially available products such as Cytel’s EAST and the Sealed Envelope Calculator. One of the main features of artbin that sets it apart from the other available software in Stata is the range of trial types, statistical tests, and methods that it offers for sample-size calculation. Notably, Stata’s power can provide sample size for superiority trials only.

As noted in section 2.2, artbin reports power as the probability of rejecting the null hypothesis in the direction of interest, whereas power reports the probability of rejecting the null hypothesis in either direction if a two-tailed test is performed. We believe the former is more appropriate for a clinical trial. Technically, this procedure is conservative, but the difference matters only for unrealistically large alpha.

The majority of noninferiority trials are designed so that $π_{1}^{a} = π_{2}^{a}$ . However, artbin allows more flexibility where $π_{1}^{a}$ and $π_{2}^{a}$ can differ, as in section 3.5. The noninferiority margin is expressed on the risk-difference scale, and the results would be very different for other scales (Quartagno et al. 2020). All calculations in artbin are based on the approximation that the difference in proportions is normally distributed (or for the conditional case that the score statistic is normally distributed). This approximation may fail with very small sample sizes, in which case the continuity correction should be used. We suggest using the usual rule for the Pearson χ² test, namely, to mistrust the results when any expected cell count is lower than about 5. Concerned users should check the power by simulation.

We have not so far offered advice on which method to use. In our experience, analysts often use the score test for superiority trials and the Wald test for noninferiority trials. For small trials, conditional tests are often used. With small differences in probabilities, all tests give similar results. We recommend avoiding the Wald test when there are large differences in probabilities, and we would never use the local option except when comparing results from other programs.

Furthermore, the design of multigroup trials in artbin is based on testing the global null hypothesis evaluating if there is a difference between any of the groups. The latter is in contrast to the case of comparing each group with the control. This can, however, be achieved by applying the two-group case; if the familywise error rate is to be controlled, this can be done by dividing alpha by the number of comparisons.

artbin has been created to assist the design of clinical trials, but it can also be used in the design of observational studies to explore a protective or harmful factor. The trial and outcome types may need to be reinterpreted; for example, for a harmful risk factor in an observational study, the favorable or unfavorable outcome types would be reversed. This would be an example of when the option force would be used. An observational study design to demonstrate a protective factor could be designed in exactly the same way as a trial, but the term superiority might be replaced by benefit. This is further described in the newly available artcat, a Stata program to calculate sample size or power for a two-group trial with an ordered categorical outcome (White et al. 2023).

A useful future extension will be for artbin to handle the conditional test for non-inferiority or substantial-superiority trials.

Supplementary Material

Appendix

EMS178743-supplement-Appendix.pdf^{(363.8KB, pdf)}

Acknowledgements

This work was supported by the Medical Research Council Unit Programme number MC_UU_00004/09. We thank Henry Bern and Tim Morris for their very helpful comments and for testing the program.

Biographies

About the authors

Ella Marley-Zagar is a senior research associate and medical statistician in methodological software at the MRC Clinical Trials Unit in London, U.K. Her interests include developing new software and research into health and the environment, particularly issues affecting lower- and middle-income countries.

Ian White is a professor of statistical methods for medicine at the MRC Clinical Trials Unit in London, U.K., where he coleads programs of design of clinical trials, analysis of clinical trials, and meta-analysis. His research interests include study design, handling missing data and noncompliance in clinical trials, statistical models for meta-analysis, and simulation studies. He is the author of other Stata commands, including mvmeta, network, and simsum.

Patrick Royston is a medical statistician with more than 40 years of experience and a strong interest in biostatistical methods and in statistical computing and algorithms. He works largely in methodological issues in the design and analysis of clinical trials and observational studies. He is currently focusing on alternative outcome measures and tests of treatment effects in trials with a time-to-event outcome and nonproportional hazards, on parametric modeling of survival data, and on novel clinical trial designs.

Sophie Barthel is currently a functional manager of the real world solutions group at PRA/ICON PLC. Her work includes consultancy in clinical research in the areas of clinical trials and real world data. She is a published author of international research papers in statistics and eating disorders and has presented at many international conferences, including several invited presentations.

Mahesh Parmar is a professor of medical statistics and epidemiology and the director of the MRC Clinical Trials Unit at University College London and the Institute of Clinical Trials and Methodology at University College London. The unit he directs is at the forefront of resolving internationally important questions, particularly in infectious diseases, cancer, and more recently neurodegenerative diseases, and it also aims to deliver swifter and more effective translation of scientific research into patient benefits by carrying out challenging and innovative studies and by developing and implementing methodological advances in study design, conduct, and analysis. Examples of his methodological contributions include the development and implementation of the MAMS platform and DURATIONS designs.

Abdel Babiker is a professor of epidemiology and medical statistics at the MRC Clinical Trials Unit at University College London. He works on clinical trials in infectious diseases, including HIV, influenza, and COVID-19, and associated methodology.

Contributor Information

Ella Marley-Zagar, Email: e.marley-zagar@ucl.ac.uk, MRC Clinical Trials Unit University College London London, U.K..

Ian R. White, Email: ian.white@ucl.ac.uk, MRC Clinical Trials Unit University College London London, U.K..

Patrick Royston, Email: j.royston@ucl.ac.uk, MRC Clinical Trials Unit University College London London, U.K..

Friederike M.-S. Barthel, Email: sophie@fm-sbarthel.de, PRA / ICON PLC Germany Mannheim, Germany.

Mahesh K. B. Parmar, Email: mp@ctu.mrc.ac.uk, MRC Clinical Trials Unit University College London London, U.K..

Abdel G. Babiker, Email: a.babiker@ucl.ac.uk, MRC Clinical Trials Unit University College London London, U.K..

References

Attanasio O, Kugler AD, Meghi C. Subsidizing vocational training for disadvantaged youth in Colombia: Evidence from a randomized trial. American Economic Journal: Applied Economics. 2011;3:188–220. doi: 10.1257/app.3.3.188. [DOI] [Google Scholar]
Barthel FM-S, Babiker A, Royston P, Parmar MKB. Evaluation of sample size and power for multi-arm survival trials allowing for non-uniform accrual, non-proportional hazards, loss to follow-up and cross-over. Statistics in Medicine. 2006;25:2521–2542. doi: 10.1002/sim.2517. [DOI] [PubMed] [Google Scholar]
Barthel FM-S, Royston P, Babiker A. A menu-driven facility for complex sample size calculation in randomized controlled trials with a survival or a binary outcome: Update. Stata Journal. 2005;5:123–129. doi: 10.1177/1536867X0500500114. [DOI] [Google Scholar]
Blackwelder WC. “Proving the null hypothesis” in clinical trials. Controlled Clinical Trials. 1982;3:345–353. doi: 10.1016/0197-2456(82)90024-1. [DOI] [PubMed] [Google Scholar]
Box GEP. Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification. Annals of Mathematical Statistics. 1954;25:290–302. doi: 10.1214/aoms/1177728786. [DOI] [Google Scholar]
Braga AA, Weisburd DL, Waring EJ, Mazerolle LG, Spelman W, Gajewski F. Problem-oriented policing in violent crime places: A randomized controlled experiment. Criminology. 1999;37:541–580. doi: 10.1111/j.1745-9125.1999.tb00496.x. [DOI] [Google Scholar]
Clark T, Berger U, Mansmann U. Sample size determinations in original research protocols for randomised clinical trials submitted to UK research ethics committees: Review. BMJ. 2013;346:f1135. doi: 10.1136/bmj.f1135. [DOI] [PMC free article] [PubMed] [Google Scholar]
Farrington C, Manning G. Test statistics and sample size formulae for comparative binomial trials with null hypothesis of non-zero risk difference or non-unity relative risk. Statistics in Medicine. 1990;9:1447–1454. doi: 10.1002/sim.4780091208. [DOI] [PubMed] [Google Scholar]
Fleiss JL, Tytun A, Ury HK. A simple approximation for calculating sample sizes for comparing independent proportions. International Biometric Society. 1980;36:343–346. doi: 10.2307/2529990. [DOI] [PubMed] [Google Scholar]
Julious SA, Campbell MJ. Tutorial in biostatistics: Sample sizes for parallel group clinical trials with binary data. Statistics in Medicine. 2012;31:2904–2936. doi: 10.1002/sim.5381. [DOI] [PubMed] [Google Scholar]
Julious SA, Owen RJ. A comparison of methods for sample size estimation for non-inferiority studies with binary outcomes. Statistical Methods in Medical Research. 2011;20:595–612. doi: 10.1177/0962280210378945. [DOI] [PubMed] [Google Scholar]
Krause P, Fleming TR, Longini I, Henao-Restrepo AM, Peto R. COVID-19 vaccine trials should seek worthwhile efficacy. Lancet. 2020;396:741–743. doi: 10.1016/S0140-6736(20)31821-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mathai A, Provost S. Quadratic Forms in Random Variables: Theory and Applications. Dekker; New York: 1992. [Google Scholar]
McCullagh P, Nelder JA. Generalized Linear Models. 2nd ed. Chapman & Hall/CRC; London: 1989. [Google Scholar]
Nunn AJ, Phillips PPJ, Meredith SK, Chiang C-Y, Conradie F, Dalai D, van Deun A, et al. A trial of a shorter regimen for rifampin-resistant tuberculosis. New England Journal of Medicine. 2019;380:1201–1213. doi: 10.1056/NEJMoa1811867. [DOI] [PubMed] [Google Scholar]
Pocock SJ. Clinical Trials: A Practical Approach. Wiley; Chichester, U.K: 1983. [Google Scholar]
Pocock SJ. The pros and cons of noninferiority trials. Fundamental and Clinical Pharmacology. 2003;17:483–490. doi: 10.1046/j.1472-8206.2003.00162.x. [DOI] [PubMed] [Google Scholar]
Quartagno M, Walker AS, Babiker AG, Turner RM, Parmar MKB, Copas A, White IR. Handling an uncertain control group event risk in non-inferiority trials: Non-inferiority frontiers and the power-stabilising transformation. Trials. 2020;21:145. doi: 10.1186/s13063-020-4070-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Quigley JM, Thompson JC, Halfpenny NJ, Scott DA. Critical appraisal of nonrandomized studies—A review of recommended and commonly used tools. Journal of Evaluation in Clinical Practice. 2019;25:44–52. doi: 10.1111/jep.12889. [DOI] [PubMed] [Google Scholar]
Rencher AC, Schaalje GB. Linear Models in Statistics. 2nd ed. Wiley; Hoboken, NJ: 2008. [Google Scholar]
Royston P, Barthel FM-S. Projection of power and events in clinical trials with a time-to-event outcome. Stata Journal. 2010;10:386–394. doi: 10.1177/1536867X1001000306. [DOI] [Google Scholar]
Satterthwaite FE. Synthesis of variance. Psychometrika. 1941;6:309–316. doi: 10.1007/BF02288586. [DOI] [Google Scholar]
Sealed Envelope. Power calculator for binary outcome non-inferiority trial. 2012. https://www.sealedenvelope.com/power/binary-noninferior/
Welch BL. The significance of the difference between two means when the population variances are unequal. Biometrika. 1938;29:350–362. doi: 10.2307/2332010. [DOI] [Google Scholar]
White IR, Marley-Zagar E, Morris TP, Parmar MKB, Royston P, Babiker AG. artcat: Sample-size calculation for an ordered categorical outcome. Stata Journal. 2023;23:3–23. doi: 10.1177/1536867X231161934. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan K-H, Bentler PM. Two simple approximations to the distributions of quadratic forms. British Journal of Mathematical and Statistical Psychology. 2010;63:273–291. doi: 10.1348/000711009X449771. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

EMS178743-supplement-Appendix.pdf^{(363.8KB, pdf)}

[R1] Attanasio O, Kugler AD, Meghi C. Subsidizing vocational training for disadvantaged youth in Colombia: Evidence from a randomized trial. American Economic Journal: Applied Economics. 2011;3:188–220. doi: 10.1257/app.3.3.188. [DOI] [Google Scholar]

[R2] Barthel FM-S, Babiker A, Royston P, Parmar MKB. Evaluation of sample size and power for multi-arm survival trials allowing for non-uniform accrual, non-proportional hazards, loss to follow-up and cross-over. Statistics in Medicine. 2006;25:2521–2542. doi: 10.1002/sim.2517. [DOI] [PubMed] [Google Scholar]

[R3] Barthel FM-S, Royston P, Babiker A. A menu-driven facility for complex sample size calculation in randomized controlled trials with a survival or a binary outcome: Update. Stata Journal. 2005;5:123–129. doi: 10.1177/1536867X0500500114. [DOI] [Google Scholar]

[R4] Blackwelder WC. “Proving the null hypothesis” in clinical trials. Controlled Clinical Trials. 1982;3:345–353. doi: 10.1016/0197-2456(82)90024-1. [DOI] [PubMed] [Google Scholar]

[R5] Box GEP. Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification. Annals of Mathematical Statistics. 1954;25:290–302. doi: 10.1214/aoms/1177728786. [DOI] [Google Scholar]

[R6] Braga AA, Weisburd DL, Waring EJ, Mazerolle LG, Spelman W, Gajewski F. Problem-oriented policing in violent crime places: A randomized controlled experiment. Criminology. 1999;37:541–580. doi: 10.1111/j.1745-9125.1999.tb00496.x. [DOI] [Google Scholar]

[R7] Clark T, Berger U, Mansmann U. Sample size determinations in original research protocols for randomised clinical trials submitted to UK research ethics committees: Review. BMJ. 2013;346:f1135. doi: 10.1136/bmj.f1135. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Farrington C, Manning G. Test statistics and sample size formulae for comparative binomial trials with null hypothesis of non-zero risk difference or non-unity relative risk. Statistics in Medicine. 1990;9:1447–1454. doi: 10.1002/sim.4780091208. [DOI] [PubMed] [Google Scholar]

[R9] Fleiss JL, Tytun A, Ury HK. A simple approximation for calculating sample sizes for comparing independent proportions. International Biometric Society. 1980;36:343–346. doi: 10.2307/2529990. [DOI] [PubMed] [Google Scholar]

[R10] Julious SA, Campbell MJ. Tutorial in biostatistics: Sample sizes for parallel group clinical trials with binary data. Statistics in Medicine. 2012;31:2904–2936. doi: 10.1002/sim.5381. [DOI] [PubMed] [Google Scholar]

[R11] Julious SA, Owen RJ. A comparison of methods for sample size estimation for non-inferiority studies with binary outcomes. Statistical Methods in Medical Research. 2011;20:595–612. doi: 10.1177/0962280210378945. [DOI] [PubMed] [Google Scholar]

[R12] Krause P, Fleming TR, Longini I, Henao-Restrepo AM, Peto R. COVID-19 vaccine trials should seek worthwhile efficacy. Lancet. 2020;396:741–743. doi: 10.1016/S0140-6736(20)31821-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Mathai A, Provost S. Quadratic Forms in Random Variables: Theory and Applications. Dekker; New York: 1992. [Google Scholar]

[R14] McCullagh P, Nelder JA. Generalized Linear Models. 2nd ed. Chapman & Hall/CRC; London: 1989. [Google Scholar]

[R15] Nunn AJ, Phillips PPJ, Meredith SK, Chiang C-Y, Conradie F, Dalai D, van Deun A, et al. A trial of a shorter regimen for rifampin-resistant tuberculosis. New England Journal of Medicine. 2019;380:1201–1213. doi: 10.1056/NEJMoa1811867. [DOI] [PubMed] [Google Scholar]

[R16] Pocock SJ. Clinical Trials: A Practical Approach. Wiley; Chichester, U.K: 1983. [Google Scholar]

[R17] Pocock SJ. The pros and cons of noninferiority trials. Fundamental and Clinical Pharmacology. 2003;17:483–490. doi: 10.1046/j.1472-8206.2003.00162.x. [DOI] [PubMed] [Google Scholar]

[R18] Quartagno M, Walker AS, Babiker AG, Turner RM, Parmar MKB, Copas A, White IR. Handling an uncertain control group event risk in non-inferiority trials: Non-inferiority frontiers and the power-stabilising transformation. Trials. 2020;21:145. doi: 10.1186/s13063-020-4070-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Quigley JM, Thompson JC, Halfpenny NJ, Scott DA. Critical appraisal of nonrandomized studies—A review of recommended and commonly used tools. Journal of Evaluation in Clinical Practice. 2019;25:44–52. doi: 10.1111/jep.12889. [DOI] [PubMed] [Google Scholar]

[R20] Rencher AC, Schaalje GB. Linear Models in Statistics. 2nd ed. Wiley; Hoboken, NJ: 2008. [Google Scholar]

[R21] Royston P, Barthel FM-S. Projection of power and events in clinical trials with a time-to-event outcome. Stata Journal. 2010;10:386–394. doi: 10.1177/1536867X1001000306. [DOI] [Google Scholar]

[R22] Satterthwaite FE. Synthesis of variance. Psychometrika. 1941;6:309–316. doi: 10.1007/BF02288586. [DOI] [Google Scholar]

[R23] Sealed Envelope. Power calculator for binary outcome non-inferiority trial. 2012. https://www.sealedenvelope.com/power/binary-noninferior/

[R24] Welch BL. The significance of the difference between two means when the population variances are unequal. Biometrika. 1938;29:350–362. doi: 10.2307/2332010. [DOI] [Google Scholar]

[R25] White IR, Marley-Zagar E, Morris TP, Parmar MKB, Royston P, Babiker AG. artcat: Sample-size calculation for an ordered categorical outcome. Stata Journal. 2023;23:3–23. doi: 10.1177/1536867X231161934. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Yuan K-H, Bentler PM. Two simple approximations to the distributions of quadratic forms. British Journal of Mathematical and Statistical Psychology. 2010;63:273–291. doi: 10.1348/000711009X449771. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

artbin: Extended sample size for randomized trials with binary outcomes

Ella Marley-Zagar

Ian R White

Patrick Royston

Friederike M-S Barthel

Mahesh K B Parmar

Abdel G Babiker

Abstract

1. Introduction

2. The artbin command

2.1. Syntax

2.2. Options

3. Examples

3.1. Binary outcome and comparison with published sample size

3.2. Binary outcome and comparison with power

3.3. One-sided noninferiority trial

3.4. Superiority trial with multiple groups

3.5. Complex noninferiority trial in a real-life setting

4. Menu and dialogs

Figure 1. Example of a completed artbin menu for binary outcomes.

5. Methods and formulas

5.1. Notation

5.2. Summary of test statistics and their distributions

Table 1. Summary of test statistics and their distributions.

5.3. Summary of methods

5.3.1. K groups, heterogeneity

5.3.2. All other cases

6. Software testing

7. Conclusions

Supplementary Material

Acknowledgements

Biographies

About the authors

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Figure 1. Example of a completed `artbin` menu for binary outcomes.