Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Aug 26.
Published in final edited form as: Contemp Clin Trials. 2006 Oct 14;28(4):343–347. doi: 10.1016/j.cct.2006.10.006

A Permutation Test for a Weighted Kaplan-Meier Estimator with Application to the Nutritional Prevention of Cancer Trial

Paul H Frankel 1, Mary E Reid 2, James R Marshall 3
PMCID: PMC7449600  NIHMSID: NIHMS25195  PMID: 17150413

Abstract

The phenomenon of losing statistical significance with increasing follow-up can arise when a proportional hazard model is applied in a clinical trial where the impact of the intervention results in delaying a negative event such as cancer diagnosis, progression or death. Often parametric methods can be employed in such a setting, however, in studies where only a small percentage of subjects have an event, these methods are often inappropriate. We present an alternative method based on a weighted Kaplan-Meier estimator and a permutation test, and demonstrate its utility in the setting of the Nutritional Prevention of Cancer study where increasing follow-up resulted in loss of statistical significance for the ability of selenized yeast to prevent lung cancer.

Keywords: selenium, power, loss of power, weighted Kaplan-Meier, permutation test, proportional hazards, nonparametric

1. Introduction

The early clinical evidence of the chemopreventive role of selenium is considerable. Selenium-based clinical intervention studies demonstrate reductions in primary liver cancer [1], esophageal and gastric cancers [2, 3], and decreased adduct formation in oral lesions [4]. Observational studies of Barrett’s esophagus and lymphoma also demonstrate the clinical relevance of serum selenium levels. In addition, the animal data has demonstrated clear chemopreventive effects in a variety of carcinogenic challenge studies [5].

The Nutritional Prevention of Cancer trial, however, is the most definitive selenium study to date. It studied the effect of high selenium yeast on cancer incidence and mortality in a randomized study consisting of 1312 patients followed from 1983 to 1993 with a total follow-up of 8271 person-years [6]. Consistent with the benefit observed in the earlier data, the authors noted roughly a 50% decrease in the risk of colorectal cancer, prostate cancer, lung cancer and all carcinomas combined. These results were statistically significant, and adjusting for multiple comparisons due to the exploratory nature of the non-skin cancer findings does not mute enthusiasm for the general reduction in all carcinomas combined.

While this result has spawned a variety of large cooperative prevention studies in diseases ranging from prostate cancer to lung cancer, a more recent paper based on the Nutritional Prevention of Cancer trial demonstrated that the protective effect of selenized yeast on lung cancer noted in Clark’s original paper was no longer statistically significant when follow-up was updated in 1996[7]. This was due to 8 additional cases in the selenium group in comparison to only 4 cases in the placebo group.

The phenomenon of losing statistical significance with increasing follow-up is a general problem that can arise when proportional hazard models are applied in a setting involving delayed events and does not necessarily suggest that the treatment lacked effect. In fact, if the main effect of the selenized yeast was to slow the growth of pre-existing subclinical lung cancer lesions proportional hazards would be inappropriate, and we have previously demonstrated that the log-rank test would lose power with increasing follow-up, even if standard diagnostic tests failed to detect violations of proportional hazards [8]. Alternative tests based on accelerated failure or accelerated failure with cure were previously employed to enhance the power to observe such effects based suffer from identifiability issues due to the low incidence rate.

Clearly, proportional hazards appears inappropriate as initially proportionately fewer patients on the selenium arm were diagnosed with lung cancer, whereas later proportionately more patients on the selenium arm were diagnosed. Combined with the non-identifiability problem of the parametric tests in this prevention setting, we must use an entirely different test.

2. Weighted Kaplan Meier Statistic

The test statistic chosen, a Weighted Kaplan Meier (WKM) statistic, is motivated by the fact that the area under a survival curve is an estimate of the mean failure time. When there is censoring, it is still a reasonable estimate, where the censored times (ordered from last to first) are recursively replaced with the mean of the failure times to follow. Hence the difference in areas is a measure of the difference in mean failure times [9]. Using a statistical test based on such a metric, a growth delay based on differences in failure times would make sense, whereas methods such as the log-rank statistic would have deflated power due to the deviation from proportional hazards.

Generally, the relationship between the mean failure time, μ, and a given survival curve S(t) is as follows:

μ=S(t)dt (1)

where S(t) is the percent surviving at time t. If we have two survival curves, in this case one for placebo and one for selenized yeast, we can write:

μ1μ2=S1(t)S2(t)dt (2)

However, due to censoring, and decreasing information towards the tail-end of the survival curve, the WKM method for generating a test statistic weights the difference in Kaplan-Meier estimators:

WKM=w(t)[S^1(t)S^2(t)]dt (3)

That W KM ≠ = μ1μ2 reflects that we are trading off interpretability for enhanced power. This is appropriate if we are not estimating μ1μ2, and are instead interested in testing the hypothesis that S1(t) = S2(t) for all t.

There are a variety of weighting functions that can be used, with fairly reasonable limitations [9]. One weighting is the inverse variance weighting function:

w=k[var(S^1(t))+var(S^2(t))] (4)

This weights the area differences based on the level of confidence in the survival estimates, which naturally tends to give higher weight to the earlier part of the survival curve. The constant k is usually fixed to insure that the weights sum to unity, however, for our purposes, such renormalization would not impact the determination of statistical significance nor enhance interpretability, so we set k = 1 and use Greenwood’s formula for the variances [13] which, while not uniformly down-weighting later events in the survival curve, is a reasonable method for this application.

To use this method, we must consider that the variance is zero prior to any events, however, since both curves are unity, the difference is zero, and we start the intergral after the first event. To avoid overweighting last events, we can truncate the integral at the last time when there are events in both groups.

Based on this, we can arrive at a WKM metric motivated by the area between the curves. To convert this metric into a statistic, we generate a permutation test:

  1. For each simulation each patient is assigned to one of the two groups randomly. This represents the null hypothesis that treatment group had no role in deciding patient fate.

  2. Generate survival curves based on the randomly assigned treatment group and calculate the WKM metric.

  3. Repeat simulation. The number of simulations can be determined by pre-specifying a maximum standard error in the probability of a type I error under the binomial distribution. In our example, we used 1000, 5000 and 10000 simulations to demonstrate that the estimate was stable and obtain a small SE in the estimate (≈ 0.002).

  4. Determine the percentage of times the simulated WKM statistic was more extreme in absolute value than the observed data. This is the two-sided p-value. The one-sided test statistic is obtained by avoiding the use of the absolute value. The one-sided test is a more natural estimate of the type I error in this case, but we have presented the two-sided p-value to compare to the previous log-rank based p-values presented in the previous publications.

The result for these data is that the two-sided p-value with the WKM statistic is 0.07, whereas the log-rank statistic p-value is 0.21. See figure 1, which depicts the Kaplan-Meier survival curves for selenium versus placebo, along with estimates of the standard error at 24 and 60 months.

Figure 1.

Figure 1

Other weightings for the survival difference metric exist. In particular, Pepe and Fleming[9] discussed the following weighting:

w=C1C2p1C1+p2C2 (5)

where C represents the probability of not being censored in each group, and p is the proportion in each group. With this weighting, the p-value generated was 0.18.

More generally, Shen[12] provides a maxGρ,γ statistic for weighting the difference in survival curves. Using this method with a permutation test on the max-statistics, we obtained an approximate two-sided p-value of 0.10, further supporting our hypothesis that selenized yeast did play a role which was being missed by conventional log-rank statistics.

More conventionally, a class of linear rank statistics is also available based on weighting different parts of the survival curve. The easiest weighting to attempt is the Gρ family proposed by Harrington and Flemming[10] as this has been well-vetted and is incorporated into R and S-Plus. However, neither this family of weights nor the Gρ,γ family [11] (software by M. Kosorok) improved upon the log-rank test. Further work and simulation studies exploring various weighting approaches is very much needed.

3. Discussion

While there is concensus regarding the chemopreventive effects of a variety of selenium compounds, there has yet to arise a concensus regarding the mechanism of action.

One possibility is that selenized yeast acts primarily by up-regulating detoxification pathways. If that were the primary mechanism one would expect a proportional hazards model to have a reasonable fit. Under proportional hazards, however, one would not expect a change from protective in years 1983–1993 (17 cases in the selenized yeast arm compared to 31 in the placebo arm) to deleterious during a three year increase in follow-up (8 additional cases on the selenized yeast arm and 4 additional cases on the placebo arm). Additionally, since the population studied in the NPC trial was selenium replete, the detoxification pathways would normally be saturated prior to supplementation. In contrast, if the mechanism was attributed to growth delay either through apoptosis of pre-existing cancer cells or altered turnover rate of the cancer cells, then one would expect that patients with pre-existing cancer would be diagnosed later on the selenium arm resulting in the perceived change in risk observed.

By using an appropriate statistical method when proportional hazards are in question, we theorized that there would be higher power to detect a difference between the treatment groups. In this case, we determined that the probability of observing the magnitude of the benefit observed or larger due to selenized yeast given the null hypothesis of no effect, based on the WKM metric, is about 3.5%. (one-sided p-value).

There are, of course, limitations to both the technique and this application. In particular, there are serious competing risk issues when evaluating lung cancer specific survival, as it is not a proper survival distribution. In addition, the multiple possible weightings is a concern, and the use of weights significantly alters the interpretation as the difference in mean failure times. This test also has limited power to detect crossing survival curves. However, while a Kolmogorov-Smirnoff maximum difference test or a permutation test using absolute magnitude of area between the curves can remedy this problem, the clinical relevance of statistical significance when curves cross is limited.

However, in this case, we have concluded that it is likely that selenized yeast does have an effect in the prevention of lung cancer as suggested by the weighted Kaplan-Meier statistic, a result that would be missed with the standard log-rank statistic at the latest follow-up evaluation. Furthermore, the data supports the potential for selenium in pre-existing cancers, a theory which is currently being tested in a variety of clinical trials in combination with chemotherapy.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Paul H. Frankel, Department of Biostatistics, City of Hope National Medical Center, 1500 E. Duarte Rd., Duarte, CA 91010-3000

Mary E. Reid, Cancer Prevention and Population Sciences, Roswell Park Cancer Institute, Elm and Carlton Streets, Buffalo, NY 14263

James R. Marshall, Cancer Prevention and Population Sciences, Roswell Park Cancer Institute, Elm and Carlton Streets, Buffalo, NY 14263

References

  • [1].Yu SY, Zhu YJ, Li WG, et al. A preliminary report on the intervention trials of primary liver cancer in high-risk populations with nutritional supplementation of selenium in China. Biol. Trace Elem. Res 1991; 29(3):289–294. [DOI] [PubMed] [Google Scholar]
  • [2].Taylor PR, Li B, Dawsey SM, et al. Prevention of esophageal cancer: the nutrition intervention trails in Linxian, China: Linxian intervention trials study group. Cancer Res. 1994; 54(7 suppl):2029s–2031s. [PubMed] [Google Scholar]
  • [3].Blot WJ, Li JY, Taylor PR, Guo W, Dawsey S m, Li, B. The Linxian trials: mortality rates by vitamin-mineral intervention group. Am. J. Clin. Nutr 1995; 62 (6 suppl): 1424–1426. [DOI] [PubMed] [Google Scholar]
  • [4].Prasad MP, Mukundan MA, Krishnaswamy K Micronuclei and carcinogen DNA adducts as intermediate end points in nutrient intervention trial of precancerous lesions in the oral cavity. Eur. J. Cancer B Oral Oncol 1995;31(B3):155–159. [DOI] [PubMed] [Google Scholar]
  • [5].Ip C The chemopreventive role of selenium in carcinogenesis. Adv. Exp. Med. Biol 1986;206:431–437. [DOI] [PubMed] [Google Scholar]
  • [6].Clark LC, Combs GF, Turnbull BW Effects of selenium supplementation for cancer prevention in patients with carcinoma of the skin. JAMA 1996; 276:1957–1963. [PubMed] [Google Scholar]
  • [7].Reid ME, Duffeld-Lillico AJ, Garland L, Turnbull BW, Clark LC, Marshall JR Selenium supplementation and lung cancer incidence: An update of the nutritional prevention of cancer trial. Cancer Epidemiol. Biomarkers Prev 2002; 11:1285–1291. [PubMed] [Google Scholar]
  • [8].Frankel P, Longmate J Parametric models for accelerated and long-term survival: A comment on proportional hazards. Statist. Med 2002; 21(21):3279–3289. [DOI] [PubMed] [Google Scholar]
  • [9].Pepe MS, Fleming TR Weighted Kaplan-Meier statistics: A class of distance tests for censored survival data. Biometrics 1989; 45:497–507. [PubMed] [Google Scholar]
  • [10].Harrington DP, Fleming TR A class of rank test procedures for censored survival data. Biometrika 1982; 69:133–143. [Google Scholar]
  • [11].Fleming TR, Harrington DP Counting Processes and Survival Analysis. 1991. New York: John Wiley. [Google Scholar]
  • [12].Shen Y, Cai J Maximum of the weighted Kaplan-Meier tests with Application to cancer prevention and screening trials. Biometrics 2001; 57:837–843. [DOI] [PubMed] [Google Scholar]
  • [13].Greenwood M The natural duration of cancer Reports on public health and medical subjects. London: Her Majesty’s Stationery Office, 1926; 33:1–26. [Google Scholar]

RESOURCES