Abstract
Background:
Clinical trials often include interim analyses of the proportion of participants experiencing an event by a fixed time-point. A pre-specified proportion excluded from a corresponding confidence interval (CI) may lead an independent monitoring committee to recommend stopping the trial. Frequently this cumulative proportion is estimated by the Kaplan-Meier estimator with a Wald approximate CI, which may have coverage issues with small samples.
Methods:
We reviewed four alternative CI methods for cumulative proportions (Beta Product Confidence Procedure (BPCP), BPCP Mid P, Rothman-Wilson, Thomas-Grunkemeier) and two CI methods for simple proportions (Clopper-Pearson, Wilson). We conducted a simulation study comparing CI methods across true event proportions for 12 scenarios differentiated by sample sizes and censoring patterns. We re-analyzed interim data from A5340, a HIV cure trial considering the proportion of participants experiencing virologic failure.
Results:
Our simulation study highlights the lower and upper tail error probabilities for each CI method. Across scenarios, we found differences in the performance of lower versus upper bounds. No single method is always preferred. The upper bound of a Wald approximate CI performed reasonably with some error inflation, whereas the lower bound of the BPCP Mid P method performed well. For a trial design similar to A5340, we recommend BPCP Mid P.
Conclusions:
The design of future single-arm interim analyses of event proportions should consider the most appropriate CI method based on the relevant bound, anticipated sample size and event proportion. Our paper summarizes available methods, demonstrates performance in a simulation study, and includes code for implementation.
Keywords: clinical trials, single-arm trial, confidence intervals, cumulative proportion, interim review, clinical trials data monitoring committees
Background
Clinical trials frequently assess the proportion of participants experiencing an event of interest by a fixed time-point. For example, in trials of novel treatment for HIV cure, the proportion of participants experiencing virologic failure by week 8 of an antiretroviral treatment interruption (ATI) may be assessed (1). In tuberculosis treatment trials, the proportion of participants experiencing recurrence within a year is evaluated. Under complete follow-up, a simple proportion summarizes the participants experiencing the event. In a single-arm trial, statistical inference may be based on the lower or upper bound of the confidence interval (CI) around this simple proportion. There is extensive literature on constructing CIs for a simple proportion (2–7).
Furthermore, trials often include pre-specified interim analyses to consider the proportion of participants experiencing an event at an interim review while the trial is still ongoing and follow-up is incomplete. The primary purpose of such an interim review is for an independent monitoring committee, such as a Data and Safety Monitoring Board, to recommend stopping the trial if an unacceptably high or low proportion of participants experience an event. In this setting, a cumulative proportion of participants experiencing the event by a fixed time-point is a preferred summary because it accounts for incomplete follow-up. A pre-specified proportion excluded from the CI around the cumulative proportion estimate may result in the committee recommending stopping the trial. For example, in AIDS Clinical Trials Group (ACTG) A5340, which studied a neutralizing antibody for HIV cure (1), virologic failure by week 8 of an ATI was assessed. If an unacceptably high proportion of participants experienced virologic failure by week 8 at the interim review, the committee may recommend stopping the trial. To give a specific example, if the lower bound of the 90% CI for the cumulative proportion of participants experiencing virologic failure by week 8 of an ATI was above 0.65, there may be enough evidence to stop the trial.
The cumulative proportion of event occurrences by a fixed time-point is frequently estimated by the Kaplan-Meier estimator (8). The cumulative proportion is defined as one minus the survival probability. We describe several methods to estimate CIs for the Kaplan-Meier estimator by a fixed time-point. We assess the error probabilities of each approach at the lower and upper bounds in the context of the CI facilitating a decision at the interim review of a single-arm trial. In this context of an interim analysis, we compare methods for CI estimation for a cumulative proportion and a simple proportion limited to participants with complete follow-up.
We first describe five CI methods for a cumulative proportion based on the Kaplan-Meier estimator, and two CI methods for a simple proportion based on participants with complete follow-up. We then introduce our motivating example, A5340, with a focus on the virologic failure assessment at an interim review of this single-arm trial. Next, we describe our simulation study to assess the performance of the CIs under different sample sizes and censoring patterns due to incomplete follow-up across the complete range of true event probabilities. We describe the implications of the simulation results and revisit the motivating example in light of these results. We end with a Discussion.
Methods
The cumulative proportion of participants experiencing an event by a fixed time-point is frequently estimated using the Kaplan-Meier estimator (8). Statistical inference for the cumulative proportion based on the Kaplan-Meier estimator is commonly conducted via Greenwood’s variance formula (9). A CI is estimated using a Wald approximation with a transformation. A clog-log transformation, defined as log(-log(cumulative proportion)), has been applied in practice and ensures the CI bounds remain in the 0 to 1 range (10) [Table 1, first row]. While this clog-log transformed CI has been noted to have correct coverage with about 25 participants and up to 50% censoring (11), it may have coverage issues at smaller sample sizes or with a higher proportion censored. Additionally, the CI does not change when censoring occurs, which may result in incorrect errors particularly at the upper bound. Therefore, we explore alternative CIs that could perform better in finite samples or with high censoring proportions, as is typical at interim analysis. We describe four alternative approaches to estimate the CI for a cumulative proportion.
Table 1:
Summary of Confidence Intervals (CIs) Methods and Results at the Interim Review for the Motivating Example
| Summary of CI Methods | Application in Motivating Example | |||
|---|---|---|---|---|
| Method | Type | Description | N | Proportion with Virologic Failure (90% CI) |
| 1. Greenwood (clog-log) | Cumulative proportion |
|
10 | 0.88 (0.65, 0.99) |
| 2. BPCP (method of moments) (12) | Cumulative proportion |
|
10 | 0.88 (0.56, 1.00) |
| 3. BPCP Mid P (method of moments) (13) | Cumulative proportion |
|
10 | 0.88 (0.62, 1.00) |
| 4. Rothman-Wilson (15) | Cumulative proportion |
|
10 | 0.88 (0.61, 0.97) |
| 5. Thomas-Grunkemeier (17) | Cumulative proportion |
|
10 | 0.88 (0.64, 0.99) |
| 6. Clopper-Pearson (4) | Simple proportion |
|
7 | 0.86 (0.48, 0.99) |
| 7. Wilson (7) | Simple proportion |
|
7 | 0.86 (0.55, 0.97) |
Firstly, the Beta Product Confidence Procedure (BPCP) proposed by Fay et al (12) is based on the fact the order statistics of event times follow a beta distribution. The CI is constructed as the quantiles of a beta product induced by the beta distributions of these order statistics [Table 1, second row]. In the absence of censoring, the BPCP is exact and equivalent to the Clopper-Pearson exact CI for a simple proportion. Under certain independent censoring mechanisms, the BPCP guarantees central coverage by acting conservatively. For the upper bound the procedure looks forward and changes when censoring occurs, whereas for the lower bound the procedure looks backwards and does not change when censoring occurs (12, 13). The procedure ensures that both one-sided error rates are no more than half of the total nominal rate, so if a 95% CI is constructed the error at both the lower and upper bounds of the CI is less than or equal to 2.5%. For general independent censoring, simulations show that the BPCP maintains central coverage (12), and is asymptotically equivalent to an untransformed Wald approximate CI based on Greenwood’s variance. However, since the BPCP is near exact, it is often conservative. Therefore, the second CI we consider is a mid-P version of the BPCP that aims to address the conservativeness of the regular BPCP (13) [Table 1, third row]. The traditional p-value is defined as the probability of observing equal or more extreme data under the null hypothesis, while the mid p-value is the probability of observing more extreme data plus one half the probability of observing equally extreme data (14). The BPCP Mid P is formed by taking the quantiles of a mixture of beta products looking backwards before each event and forwards after the event (13), and reduces to the mid-P CI for a simple proportion (2) when there is no censoring. This approach should produce CIs with coverage closer to the nominal value on average, but does not guarantee coverage.
The third CI we explore is the Rothman-Wilson approach (15) based on calculating the effective sample size for a cumulative proportion and then using this effective sample size to construct a Wilson score CI that accounts for censoring [Table 1, fourth row]. This method considers the variance under the null (16) and reduces to the Wilson score CI in the absence of censoring. Lastly, a CI based on the empirical likelihood for a cumulative proportion was proposed by Thomas and Grunkemeier (17) [Table 1, fifth row]. This method forms the CI as set of probabilities that would not be rejected by a likelihood ratio test, and reduces to the corresponding likelihood ratio CI for a simple proportion in the absence of censoring.
As a comparison, we construct simple proportion CIs using two common approaches for small sample settings; namely the Clopper-Pearson exact method (4) [Table 1, sixth row] and the Wilson score method (7) [Table 1, seventh row]. The simple proportion estimate only includes participants with follow-up at or beyond the fixed time-point of interest. That means only participants enrolled early enough in the trial to have been observed for the event for the fixed time period are included to estimate the proportion and its CI.
Motivating Example
As a motivating example we consider the single-arm trial, A5340, that assessed virologic failure by week 8 [Supplemental Table 1]. In the design stage, a total sample size of 15 was pre-specified. The final analysis was based on a simple proportion excluding participants lost to follow-up, along with a 90% Clopper-Pearson CI. A pre-specified interim analysis guideline recommended stopping the study early if all of the first 7 enrolled participants experienced virologic failure signifying an unacceptably high proportion of events. This is equivalent to the lower bound of the 90% CI for a simple proportion based on the Clopper-Pearson method being >0.65. When the interim review was conducted, 6 of the first 7 enrolled participants experienced virologic failure by week 8, and 10 participants had been enrolled. Figure 1 (Panel A) is a swimmer plot of available data where the rows represent the time to virologic failure or censoring for each participant. The plot identifies the first 7 enrolled participants and the 3 additional participants that had been enrolled prior to the interim review. Panel B displays a Kaplan-Meier curve of the cumulative proportion with virologic failure by week 8.
Figure 1:

Swimmer Plot (A) and Kaplan-Meier Curve (B) of Virologic Failure at the Interim Review for the Motivating Example
We re-analyzed the interim review data using the CI approaches previously described. Table 1 and Figure 2 display the simple and cumulative proportion of participants with an event by week 8, along with a CI calculated using each method. The estimated cumulative proportion with virologic failure by week 8 using data from all 10 enrolled participants is 0.88, and the estimated simple proportion with virologic failure by week 8 using data from the first 7 enrolled participants with follow-up at or beyond week 8 is 0.86. As the interim review focuses on an unacceptably high virologic failure proportion, we focus on the CI lower bound. For the cumulative proportion CI methods, the lower bound estimates range from 0.56 to 0.65. The BPCP gives the smallest lower bound and the Greenwood (clog-log) method the highest. For the simple proportion CI methods, the lower bound estimates are 0.48 for Clopper-Pearson and 0.55 for Wilson. As there are fewer participants included in the simple proportion estimate, the CI from this approach is wider (less precise) than the CI based on the cumulative proportion. Since only 6 out of the first 7 enrolled participants experienced virologic failure, the independent monitoring committee did not recommend stopping the trial as the guideline of all of the first 7 enrolled participants experiencing virologic failure was not met. However, if this trial decision guideline had instead been based on the cumulative proportion and the Greenwood (clog-log) CI, the lower bound would have been 0.65; closer to being deemed unacceptably high. We explore the behavior of each CI method in our simulation study in the next section to shed some light on preferred approaches.
Figure 2: Confidence Interval (CI) for the Cumulative Proportion of Virologic Failure by Each Method at the Interim Review for the Motivating Example.

The figure shows the cumulative proportion and simple proportion point estimates together with the corresponding respective 90% CIs. The 0.65 lower bound threshold for considering to stop the trial is indicated by a dotted line.
Simulation Study
Methods
We conducted a simulation study to assess the lower and upper tail error probabilities as well as central coverage for each CI method for 12 scenarios across the range of true event proportions from 0.01 to 0.99. We defined the 12 scenarios by the number of participants enrolled (n=15, 25, 50, 100) and one of three underlying censoring mechanisms. We simulated interim analyses of single-arm trial data based on a cumulative and simple proportion. We generated participant event times according to exponential distributions that corresponded with average cumulative event probabilities by week 8 from 0.01 to 0.99 (Supplemental Figure 1). We primarily generated corresponding participant level follow-up times according to uniform distributions with 80% of participants achieving follow-up less than 8 weeks and the remaining 20% of participants achieving follow-up between 8 and 10 weeks at the simulated interim analysis (uniform censoring mechanism). Secondarily, we generated follow-up times for the 80% of participants with less than 8 weeks of follow-up from distributions where linearly more (linear upward censoring mechanism) or less (linear downward censoring mechanism) participants censor closer to 8 weeks. We determined each simulated participant time to event by taking the minimum of the event and follow-up time, along with an indicator specifying if the participant had the event (i.e. event time less than follow-up time).
We simulated 100,000 iterations for each scenario. For each iteration we estimated the 90% CI (two-sided alpha=0.10) for the cumulative or simple proportion estimate according to each of the CI methods (Greenwood (clog-log), BPCP, BPCP Mid P, Rothman-Wilson, Thomas-Grunkemeier, Clopper-Pearson, and Wilson). For iterations in which there were no observed events by the fixed time-point of 8 weeks, we set the CI for the cumulative proportion to be (0, 0) for the Greenwood clog-log, Rothman-Wilson and Thomas-Grunkemeier approaches. Similarly, if all participants had an observed event by the fixed time-point, we set the CI for the cumulative proportion to be (1, 1) for these three approaches. The other approaches naturally calculate an appropriate CI in these settings. We calculated the proportion of iterations where the lower bound exceeded the true proportion to estimate the lower error probability of the CI method. Similarly, we calculated the proportion of iterations where the upper bound was less than the true proportion to estimate the upper error probabilities. The central coverage is then given by 1 minus the upper and lower error probabilities. We repeated the simulation study for two-sided alpha=0.05, corresponding with a 95% CI. We applied all CI methods for each censoring mechanism even though the Clopper-Pearson and Wilson CIs are not affected by the differing censoring patterns prior to week 8.
Results
Figures 3–5 show the simulation study results for each scenario using a two-sided alpha of 0.10. Results for a two-sided alpha of 0.05 are in the supplementary materials. As anticipated, the performance of each CI method improves with increased sample size and for true event proportions nearer to the center (50%). However, the error probabilities are not symmetric. That is, we see stark differences in the performance of lower versus upper bound error probabilities across the investigated scenarios and CI methods. Furthermore, the error probabilities follow a so-called sawtoothed jagged shaped pattern oscillating between lower and higher values across the range of true event probabilities.
Figure 3:

Simulations Results for a Uniform Censoring Mechanism Across the Range of Cumulative Probabilities Using a Two-sided Alpha of 0.10
Figure 5:

Simulations Results for a Linear Downward Censoring Mechanism Across the Range of Cumulative Probabilities Using a Two-sided Alpha of 0.10
With the uniform censoring mechanism, for lower true event proportions (i.e. between about 0.1 and 0.25) and small sample sizes, the lower error probability is closest to nominal for Thomas-Grunkemeier. As sample size increases to 50 and 100, the BPCP Mid P method also becomes close to nominal. In contrast, for small sample sizes and lower true event proportions, the upper error probability markedly suffers for the Greenwood (clog-log), Rothman-Wilson, and Thomas-Grunkemeier methods. For the largest sample size of 100 participants (20 with sufficient follow-up), Greenwood (clog-log) approaches nominal central coverage with slight error inflation at the upper bound. Across all sample sizes, the BPCP methods exhibit over-coverage, especially at the upper bound.
With the uniform censoring mechanism for true event proportions near the center (i.e. between 0.4 and 0.6) and a sample size of 15 (3 with sufficient follow-up), the lower error probability is closest to nominal for BPCP Mid P. With a sample size of 25 (5 with sufficient follow-up), the BPCP Mid P, Rothman-Wilson, and Thomas-Grunkemeier methods all have close to nominal lower error probabilities while Greenwood (clog-log) has upper error slightly higher but closest to 5.0%.
With the uniform censoring mechanism for higher true event proportions (i.e. greater than 0.75) and sample sizes of 15 and 25, the lower error probability has either extreme over or under coverage across all methods. At the upper bound, Greenwood (clog-log) performs best with an error probability higher than the nominal level except at true event proportions approaching 1.0. With increased sample size, the lower bound error probabilities become closer to nominal. For very high true event proportions, the lower bound error probability markedly suffers for Greenwood (clog-log) and Rothman-Wilson while the remaining methods have zero error probabilities. Even with a sample size of 100, no method results in nominal error at either bound.
As anticipated (2,3), the coverage of the Wilson interval for a simple proportion oscillates around 90% with nominal coverage on average. Whereas, the exact Clopper-Pearson interval has greater than 90% coverage with oscillations above 90% due to discreteness of the binomial distribution.
For the linear upwards and downwards censoring mechanisms, we broadly see comparable results to the uniform mechanism. There are some subtle differences at the upper bound to highlight. Firstly, when more participants censor closer to week 8 (Figure 4) more simulation replicates have censoring after the last event prior to week 8, this subtly influences the upper bound for the BPCP intervals where the error probabilities become lower since the methods act conservatively. The Greenwood (clog-log), Rothman-Wilson and Thomas-Grunkemeier intervals also have lower errors at the upper bound perhaps because there is more information at week 8 since relatively more participants are followed longer. Secondly, when less participants censor closer to week 8 (Figure 5) less simulation replicates have censoring after the last event prior to week 8, this results in higher upper bound error probabilities for the BPCP intervals making them closer to nominal, as well as higher upper bound error probabilities for Greenwood (clog-log), Rothman-Wilson and Thomas-Grunkemeier causing error inflation perhaps as more participants are followed for a shorter period.
Figure 4:

Simulations Results for a Linear Upward Censoring Mechanism Across the Range of Cumulative Probabilities Using a Two-sided Alpha of 0.10
Implications of the Simulation Results
The simulation study results highlight that the preferred CI method at interim analysis depends on the bound of interest, the anticipated sample size and the event probability. No single method that is always preferred across the range of scenarios explored. However, if one is compelled to make a general statement, Greenwood (clog-log) tends to perform reasonably at the upper bound with some error inflation while BPCP Mid P tends to perform reasonably at the lower bound and often acts conservatively.
We revisit our motivating example to reflect on which CI method would have been most appropriate at the interim review. Given interest was in recommending stopping the trial if an unacceptably high proportion of participants experience virologic failure by week 8, we focus on the lower CI bound and recommend the BPCP Mid P CI because the lower bound error probability is closest to nominal for small sample sizes and at high true event proportions. Furthermore, the BPCP Mid P utilizes all the available data. In our re-analysis, the lower bound of the BPCP Mid P CI was 0.62 [Table 1 and Figure 2, third row], which is below the 0.65 threshold and therefore study continuation is appropriate. This is consistent with the decision made by the committee in the actual trial, which was based on fewer than 7 of the first 7 enrolled participants experiencing virologic failure.
When designing future trials with a small sample size that anticipate high event proportions, we might recommend not including a stopping guideline based on the lower confidence bound because the lower error probability is not close to nominal for all CI methods and therefore an interim analysis would not be sufficient to lead to an informed decision to stop the trial early.
Conclusion
We have reviewed five CI methods for a cumulative proportion in the context of utilizing all available data to make decisions at interim analyses of single-arm trials. The cumulative proportion approach was additionally compared with two CI methods based on a simple proportion that only use some of the available information. Our simulation study clearly reveals a difference in the performance of each method at the upper and lower confidence bounds across a range of sample sizes and event proportions. Our illustrative example demonstrates how the choice of CI method may lead an independent monitoring committee to make a different decision regarding their recommendation to continue or stop a trial.
As the best CI approach is dependent on the particular scenario, we recommend researchers base their decision of which CI to use on the bound of interest, sample size and likely true event probabilities in their trial. We have shared code to enable researchers to implement a simulation study in the design stage of their trial so an appropriate CI procedure can be chosen. In general, as previously noted in the literature (12, 18), the Greenwood CI approach performs poorly at the lower bound, particularly for small sample sizes and at high event probabilities. Therefore, we would not recommend this approach when the interim analysis is to be based on the lower confidence bound. We would favor the BPCP Mid P approach for the lower bound in many scenarios. In contrast, at the upper bound the Greenwood (clog-log) CI approach performs reasonably, albeit with some error inflation, in the 12 simulation scenarios we considered. It should be noted that when zero or all participants experience the event by the fixed time-point, the Greenwood, Rothman-Wilson and Thomas-Grunkemeier approaches cannot estimate a CI and therefore we set the CIs to (0,0) or (1,1), respectively, in our simulation study. However, when zero or all participants experience the event in practice, it is best to estimate a BPCP based CI.
Interim reviews that focus on stopping for harm should decide to stop the trial if too many harmful events occur. For this type of interim monitoring, the lower CI bound for a cumulative proportion of harmful events is most important. The Greenwood approach would perform poorly in this scenario leading to trials being unnecessarily stopped early. The BPCP Mid P approach is better, and would lead to a higher chance of correct decisions. In our illustrative example, the interim review focused on virologic failure. However, a safety outcome may be of primary interest instead with the same interpretation of our results where a trial should be stopped early if an unacceptability high proportion of participants experience severe adverse events. Alternatively, interim reviews that focus on futility should decide to stop a trial early if not enough participants have a good outcome. The focus is on the upper bound and a trial stopped if the CI is below a threshold. Therefore, a Greenwood (clog-log) CI may be appropriate when assessing futility.
In the context of interim reviews where the CI is being used as a tool for choosing an acceptable stopping guideline, it is important to calculate CIs with the correct coverage, yet adjustment of the CI for repeated analyses may not be a major concern. In this setting, type I error is typically not formally controlled, particularly when monitoring for adverse events as strong statistical evidence is not needed to declare a safety concern. However, it is possible to formally control the type I error by adjusting the length of the CIs calculated at the interim and final analysis by using a repeated CI. For example, via Pocock boundaries (19), if the trial has an overall 10% two-sided alpha instead of constructing a 90% CI at both the interim and final analyses a 94% CI could be considered at each analysis. Other alpha spending strategies could be considered (20).
The results from this simulation study could be applicable beyond interim reviews of single-arm trials. At the final analysis a trial may be designed to exclude a threshold from the CI around the proportion of participants experiencing an event (21). As the Kaplan-Meier estimator can account for non-informative loss to follow-up, the cumulative proportion estimate from the Kaplan-Meier curve is frequently used in final analyses as well as at interim reviews. The best CI to use in this scenario accounting for loss to follow-up could be investigated using our shared code.
We have focused on interim monitoring of single-arm trials. Randomized two-arm trials also commonly assess the proportion of participants experiencing an event by a fixed time-point. In this case, a CI for the difference in cumulative proportions between arms is relevant. Typically, a CI for the difference in cumulative proportions is based on normal approximation. In the testing framework a clog-log transformation is generally the best approach among approximate methods at retaining the type I error rate (11). However, it can fail to retain the type I error rate with small samples or a high censoring proportion. Therefore, an approach to meld two BPCP CIs from each arm has been proposed (22). Further work could explore the best approach to interim reviews based on monitoring a cumulative proportion in the two-arm setting.
In conclusion, the best CI procedure to use when assessing a cumulative proportion at the interim review of a single-arm trial depends on the bound of interest, sample size, and true event proportion. Our study can guide CI choice under various interim review scenarios and our shared code allows for exploration of other settings.
Reproducibility
De-identified motivating example data and code for CI method implementation/simulation is provided: https://github.com/iweir/Confidence-Intervals-in-Single-Arm-Interim-Analyses.
Supplementary Material
Acknowledgements
Research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Number UM1 AI068634, UM1 AI068636 and UM1 AI106701. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We thank the AIDS Clinical Trials Group (ACTG) A5340 trial participants and team (ClincalTrials.gov number NCT02463227).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflicts of Interest
The Authors declare that there are no conflicts of interest.
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- 1.Bar KJ, Sneller MC, Harrison LJ, Justement JS, Overton ET, Petrone ME, et al. Effect of HIV Antibody VRC01 on Viral Rebound after Treatment Interruption. The New England journal of medicine. 2016;375(21):2037–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Agresti A, Coull BA. Approximate Is Better than “Exact” for Interval Estimation of Binomial Proportions. The American Statistician. 1998;52(2):119–26. [Google Scholar]
- 3.Brown LD, Cai TT, DasGupta A. Interval Estimation for a Binomial Proportion. Statistical Science. 2001;16(2):101–17. [Google Scholar]
- 4.Clopper CJ, Pearson ES. The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial. Biometrika. 1934;26(4):404–13. [Google Scholar]
- 5.Newcombe RG. Two-sided confidence intervals for the single proportion: comparison of seven methods. Statistics in medicine. 1998;17(8):857–72. [DOI] [PubMed] [Google Scholar]
- 6.Vollset SE. Confidence intervals for a binomial proportion. Statistics in medicine. 1993;12(9):809–24. [DOI] [PubMed] [Google Scholar]
- 7.Wilson EB. Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association. 1927;22(158):209–12. [Google Scholar]
- 8.Kaplan EL, Meier P. Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association. 1958;53(282):457–81. [Google Scholar]
- 9.Greenwood M The natural duration of cancer. Reports on Public Health and Medical Subjects. 1926;33:1–26. [Google Scholar]
- 10.Anderson JR, Bernstein L, Pike MC. Approximate Confidence Intervals for Probabilities of Survival and Quantiles in Life-Table Analysis. Biometrics. 1982;38(2):407–16. [PubMed] [Google Scholar]
- 11.Klein JP, Logan B, Harhoff M, Anderson PK. Analyzing surival curves at a fixed point in time. Statistics in medicine. 2007;26(24):4505–19. [DOI] [PubMed] [Google Scholar]
- 12.Fay MP, Brittain EH, Proschan MA. Pointwise confidence intervals for a survival distribution with small samples or heavy censoring. Biostatistics. 2013;14(4):723–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fay MP, Brittain EH. Finite sample pointwise confidence intervals for a survival distribution with right-censored data. Statistics in medicine. 2016;35(16):2726–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lancaster HO. Significance Tests in Discrete Distributions. Journal of the American Statistical Association. 1961;56(294):223–34. [Google Scholar]
- 15.Rothman KJ. Estimation of confidence limits for the cumulative probability of survival in life table analysis. Journal of chronic diseases. 1978;31(8):557–60. [DOI] [PubMed] [Google Scholar]
- 16.Dorey FJ, Korn EL. Effective sample sizes for confidence intervals for survival probabilities. Statistics in medicine. 1987;6(6):679–87. [DOI] [PubMed] [Google Scholar]
- 17.Thomas DR, Grunkemeier GL. Confidence Interval Estimation of Survival Probabilities for Censored Data. Journal of the American Statistical Association. 1975;70(352):865–71. [Google Scholar]
- 18.Yuan X, Rai SN. Confidence Intervals for Survival Probabilities: A Comparison Study. Communications in Statistics - Simulation and Computation. 2011;40(7):978–91. [Google Scholar]
- 19.Pocock SJ. Group sequential methods in the design and analysis of clinical trials. Biometrika. 1977;64(2):191–9. [Google Scholar]
- 20.East 6 (2020). Statistical software for the design, simulations and monitoring of clinical trials. Cytel Inc., Cambridge MA. [Google Scholar]
- 21.Zheng L, Rosenkranz SL, Taiwo B, Para MF, Eron JJ Jr., Hughes MD. The design of single-arm clinical trials of combination antiretroviral regimens for treatment-naïve HIV-infected patients. AIDS research and human retroviruses. 2013;29(4):652–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Fay MP, Proschan MA, Brittain E. Combining one-sample confidence procedures for inference in the two-sample case. Biometrics. 2015;71(1):146–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
