Abstract
Survival analysis is used to analyze data from patients who are followed for different periods of time and in whom the outcome of interest, a dichotomous event, may or may not have occurred at the time the study is halted; data from all patients are used in the analysis, including data from patients who dropped out, regardless of the duration of follow-up. This article discusses basic concepts in survival analysis, explains technical terms such as censoring, and provides reasons why ordinary methods of analysis cannot be applied to such data. The Kaplan-Meier survival curve is described, as is the Cox proportional hazards regression and the hazard ratio. Supplementary information includes a data file, graphs with explanations, and additional discussions; these are provided to enhance the reader’s experience and understanding.
Keywords: Survival analysis, censoring, Kaplan-Meier curve, Cox proportional hazards regression, hazard ratio
Imagine that I have a cohort of bipolar patients. In this cohort, I identify patients who are receiving either lithium or valproate in monotherapy. I follow these patients in order to determine which drug is better at preventing relapse into a mood episode. I add new patients to the cohort, if they fulfill my study selection criteria, as and when they present to my center. Eighteen months later, I end the study and examine the proportion of lithium and valproate patients who have relapsed. Can I use a chi-square test to determine whether or not the relapse rate differs significantly between the lithium and valproate groups?
The answer is “No” because of the possibility that, regardless of whether or not the relapse rates are similar, relapse may have occurred substantially earlier in one group than in the other. So, can I use an independent sample t test to compare the mean time to relapse in the two groups?
The answer is again “No”, and for two reasons. First, different patients entered the cohort at different times, but the study ended on the same date for everybody; so, many patients may not have relapsed by the study endpoint only because they were followed for less time than others. Second, some patients who had not relapsed may have dropped out because, as examples, they shifted houses or withdrew consent; others may have been lost to follow-up. Data from these patients should not be discarded because their medication clearly protected them from relapse, at least until the time of dropout or loss to follow-up.
Survival Analysis
Data such as these are analyzed using survival or time-to-event analysis. This procedure takes both relapse (or not) and time to relapse (or not) into consideration and admits all data of all patients, including those who dropped out, right up to the point of dropout.
For survival analysis, the outcome should be a dichotomous event that did or did not happen. Examples are relapsed or did not relapse, did or did not develop dementia, and died or did not die. There should not be secular trends that may influence the outcome; for example, midway through my hypothetical study, I should not include a large group of patients referred after treatment in a ketamine clinic, or I should not stop recruiting women and older adults.
For time to event, which is the variable of interest in analysis, there are two possibilities: the event occurred, resulting in a classification at the time the event occurred, or the event did not occur, resulting in “censoring.” Patients who withdraw or drop out without relapsing are censored at the time of withdrawal or dropout, and patients who reach the study endpoint without relapsing are censored (“right censored”) at the endpoint. Time to event and time to censoring are recorded, as applicable.
Censoring merely means that data availability ends at the point of censoring; had censoring not happened, the expectation is that the event would occur, given enough time, but we cannot know when. In this context, a competing risk is an event that prevents the target event. As an example, a patient who dies cannot relapse no matter how long the study continues. If the competing risk event is independent of the target event, we may not need to worry. If it is related to the target event, it may need to be factored into the analysis. In my hypothetical study of relapse into a mood episode, dying from infection is independent of the target event, whereas committing suicide is related to the target event.
Kaplan-Meier Curve
The Kaplan-Meier curve displays the probability of survival (event did not occur) as a function of time. Time is plotted on the X-axis and the probability of survival on the Y-axis. So, the graph starts at probability = 1.0 (100%) because, at the start of the study, when time = 0, nobody has experienced the event; that is, the probability of survival is 100%.
As the study progresses, the curve is defined by new probability points; these are plotted each time a patient experiences an event or is censored (e.g., because of dropout); the latter is because there are now fewer patients available based on whom the probability of survival is estimated.
Figures 1 and 2 in the supplementary materials display Kaplan-Meier curves for a single group and for two groups, respectively. Accompanying notes explain the curves. The spreadsheet from which Figure 2 was generated is also included in the supplementary materials.
Cox Regression
Cox proportional hazards regression, or just Cox regression, is conceptually similar to multivariable linear or logistic regression. Cox regression examines survival as a function of several different independent variables (IVs), and the statistical significance of each of these IVs is assessed for the outcome of interest (occurrence of the event). More usually, we are interested in just one IV, and the remaining IVs are covariates, the effects of which are “adjusted for” in the analysis. In my hypothetical study of lithium vs. valproate in bipolar patients, treatment (lithium vs. valproate) is the IV of interest, and other IVs, such as age, sex, illness duration, and the number of previous episodes, can be adjusted for because I believe that these may also influence relapse.
In Cox regression, the analysis yields a hazard ratio (HR) that is interpreted like a relative risk. Thus, values below 1 indicate a lower risk of occurrence of the event relative to the comparison group, values above 1 indicate a higher risk, and a value of 1 indicates an identical risk. Here, 95% confidence intervals and a P value are presented along with the HR. In my hypothetical study, if patients were followed for a median of 12 months and if the HR was 0.50 for lithium with valproate set as the reference group and relapse set as the event of interest, it means that, across 12 months of treatment, patients receiving lithium were half as likely to relapse as patients receiving valproate.
Parting Notes
More detailed discussions (with examples) on survival analysis and related concepts are available in the supplementary materials accompanying this article as well as elsewhere.1,2
Supplemental Material
Supplemental material for this article is available online.
Footnotes
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The author received no financial support for the research, authorship and/or publication of this article.
References
- 1.Streiner DL. Stayin’ alive: An introduction to survival analysis. Can J Psychiatry 1995; 40(8): 439–444. [DOI] [PubMed] [Google Scholar]
- 2.Schober P, Vetter TR. Survival analysis and interpretation of time-to-event data: The tortoise and the hare. Anesth Analg 2018; 127(3): 792–798. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material for this article is available online.