Abstract
Time differences and time ratios are often more interpretable estimates of effect than hazard ratios for time-to-event data, especially for common outcomes. We developed a SAS macro for estimating time differences and time ratios between baseline-fixed binary exposure groups based on inverse probability weighted Kaplan-Meier curves. The macro uses pooled logistic regression to calculate inverse probability of censoring and exposure weights, draws Kaplan-Meier curves based on the weighted data, and estimates the time difference and time ratio at a user-defined survival proportion. The macro also calculates the risk difference and risk ratio at a user-specified time. Confidence intervals are constructed by bootstrap. We provide an example assessing the effect of exclusive breastfeeding during diarrhea on the incidence of subsequent diarrhea in children followed from birth to 3 years in Vellore, India. The SAS macro provided here should facilitate the wider reporting of time differences and time ratios.
INTRODUCTION
In longitudinal observational studies of the occurrence and timing of health outcomes, analyses often make extensive use of Cox proportional hazards models1 to estimate adjusted hazard ratios. Hazard ratios have several methodological limitations,2 and may be difficult to interpret when applied to common outcomes. For example, because almost all children will have multiple diarrhea episodes during the first few years of life, an estimate of increased relative hazard comparing two exposure conditions cannot be interpreted as a relative difference in the number of children experiencing the outcome between groups since the overall risk is near 100%. Here, estimates of the difference in timing of outcomes may be more relevant.
While the time ratio measure can be estimated parametrically using accelerated failure time models,3,4 these models have the disadvantage of requiring parametric assumptions if there is no strong evidence that the data follow a particular distribution. Bayesian models can provide adjusted means and credible intervals of median times and ratios,5 and censored quantile regression provides an option to semiparametrically estimate contrasts in survival times using the Kaplan-Meier estimator.6 However, these methods are not widely used by epidemiologists. Here, we offer here an additional method in a SAS macro to estimate time differences and time ratios between baseline-fixed binary exposure groups based on inverse probability weighted Kaplan-Meier curves.
METHODS
Inverse probability weighting is a parametric approach to standardize functionals of the observed data to the total population distribution of confounders7,8 and results in marginal effect estimates. Each stabilized exposure weight (SWiE) is the marginal probability of a subject’s observed exposure divided by the probability of her observed exposure conditional on her observed covariates, or
where Xi is subject i’s exposure and Zi is subject i’s covariates.
Censoring weights (SWi,tC)9,10 are constructed over time as the marginal probability of a subject’s censoring status divided by the probability of her censoring status conditional on her observed baseline covariates. This quantity is multiplied by the same quantity at all previous time intervals, such that
where Ci,t is an indicator for censoring at time t, and t is the time interval ranging from 0 (baseline) to the interval of subject i’s event or censoring. Final inverse probability of exposure and censoring weights are calculated for each time interval as the product of the time-fixed exposure weight and the time-varying censoring weight, or . The weights can be truncated by resetting the values of weights outside extreme percentiles of the weight distribution to their values at specified percentile cut-offs. Further details on constructing inverse probability weights are available.11
Time differences are defined as the difference in the time each exposure group reaches a given survival percentile p, t1,p and t0,p, and can be calculated by subtracting these times as read off the Kaplan-Meier curves, t1,p − t0,p. Time ratios are similarly defined as the ratio of these two times, . Figure 1 shows schematically how to calculate a median time difference and ratio (i.e. p=0.5) from two sample curves.
Macro
Based on the above, the SAS macro provided here performs an analysis by calculating inverse probability of exposure and censoring weights for a binary exposure, drawing Kaplan-Meier curves based on the weighted data, and then estimating the effect measures requested by the user. First, the macro uses a logistic regression model (fit with PROC LOGISTIC) to calculate inverse probability of exposure weights stabilized by the marginal probability of exposure for a binary exposure.12 Inverse probability of censoring weights9,10 are then calculated by segmenting the time-scale into deciles based on the distribution of drop-out times. Probability of censoring is calculated in each time decile using pooled logistic regression (fit with PROC LOGISTIC) and censoring weights (SWtC) are constructed within time deciles. Final weights, calculated as the product of the exposure and censoring weights, are truncated at percentiles specified by the user.
Inverse probability-weighted Kaplan-Meier curves are drawn for exposed and unexposed groups (using PROC PHREG),9,12,13 and the time difference (t1,p − t0,p) and time ratio at p% survival are calculated. The macro also calculates the risk difference (R1,t − R0,t) and risk ratio at time t = q, where R1 and R0 are the risks in the exposed and unexposed groups, respectively. The macro constructs 95% confidence intervals (CI) by non-parametric bootstrap with a user-defined number of simple random resamples each of the original sample size.14 Two confidence intervals are produced:15 directly using the 2.5th and 97.5th percentiles of the effect estimate distribution and using the standard deviation (SD) of the effect estimates to calculate Wald-based confidence limits . Comments in the eAppendix provide users with guidance on choosing confidence intervals and the number of bootstrap resamples.
To run the macro, the user inputs the dataset name, a subject ID variable, binary exposure indicator, list of covariates, time at outcome, binary outcome indicator, indicator for macro to include censoring weights (default is 1=yes), binary dropout indicator, low and high percentiles for truncating weights (default is 0 and 100 corresponding no truncation), survival proportion to calculate time differences/ratios (default is 0.5, corresponding to the median survival time), time to calculate risk differences/ratios, number of bootstrap resamples (default is 2000), and a seed for random sampling (see eAppendix for further information and annotated code).
Example
We provide an example using combined data from three prospective observational cohort studies of diarrhea among children from Vellore, India conducted between 2002 and 2013.16–18 The studies were approved by the Institutional Review Boards of the Christian Medical College, Vellore, India and Tufts University Health Sciences campus, Boston, USA. The data set includes 982 children who were followed weekly or twice-weekly from birth to 3 years of age. We assessed the effect of exclusive breastfeeding during diarrhea on the incidence of subsequent diarrhea. The exposure was exclusive breastfeeding at a child’s first diarrhea episode, defined as feeding with breast milk only with the exception of vitamins, mineral supplements, and medicines.19 The event of interest was an incident second diarrhea episode, defined as at least three loose or watery stools in a 24-hour period.20 Covariates were age and malnutrition status at first episode, child sex, low birth weight (<2.5 kg), socioeconomic status (based on the modified Kuppuswamy scale21), maternal education, and household hygiene (based on a composite score of water, food, and personal hygiene practices). 24 (2.4%) missing low birth weight values were set to the mean value. Censoring weights were not included since there were only 4 drop-outs. The inverse probability-weights were truncated11 at the 0.5th and 99.5th percentiles with a mean of 0.92 and range from 0.46 to 9.85.
RESULTS
More than half of children stopped exclusively breastfeeding before their first diarrhea episode (n=548, 56%), and almost all children experienced a second diarrhea episode (n=848, 86%). Crude and inverse probability-weighted Kaplan-Meier curves are shown in Figure 2. After weighting, which largely adjusted for confounding by age, there was little evidence of an effect at all time points. The macro estimated that children who were exclusively breastfed at their first diarrhea episode had their second episode 2 weeks later than children who were not exclusively breastfed (weighted median time difference: 2 weeks, 95% CI: −2, 5). Median time ratio, 6-month risk difference, and risk ratio results are also shown in the Table. Both the crude and adjusted results were similar when using age as the timescale to account for confounding by age (Table, eFigure 1). For comparison with the estimated time difference of 2 weeks, the hazard ratio from a marginal structural Cox model with weights for the same baseline covariates was 0.95 (95% CI: 0.82, 1.1) and was 0.87 (95% CI: 0.73, 1.1) from a Cox model conditional on baseline covariates.
Table 1.
Crude | Weighteda | |||||
---|---|---|---|---|---|---|
Estimate | 95% CIb | 95% CIc | Estimate | 95% CIb | 95% CIc | |
Timescale: time since first episode | ||||||
Median time difference (weeks) |
−3 | −5, 1 | −6.0, −0.0 | 2 | −2, 5 | −2.0, 6.0 |
Median time ratio | 0.8 | 0.7, 1.1 | 0.7, 1.0 | 1.15 | 0.9, 1.5 | 0.9, 1.5 |
6-month risk difference |
0.08 | 0.03, 0.14 | 0.03, 0.14 | −0.04 | −0.12, 0.05 | −0.13, 0.04 |
6-month risk ratio | 1.1 | 1.0, 1.2 | 1.0, 1.2 | 0.94 | 0.83, 1.1 | 0.83, 1.1 |
Hazard ratio | 1.2 | 1.1, 1.4d | 0.95 | 0.82, 1.1d | ||
Timescale: age | ||||||
Median time difference (weeks) |
1 | −5, 5 | −4.4, 6.4 | 1 | −5, 5 | −4.4, 6.4 |
Median time ratio | 1.1 | 0.7, 1.5 | 0.7, 1.6 | 1.1 | 0.7, 1.5 | 0.7, 1.6 |
6-month risk difference |
−0.03 | −0.13, 0.06 | −0.13, 0.06 | −0.03 | −0.12, 0.07 | −0.13, 0.06 |
6-month risk ratio | 0.96 | 0.86, 1.1 | 0.85, 1.1 | 0.96 | 0.85, 1.1 | 0.85, 1.1 |
Weighted by age at first episode, malnutrition status at first episode, child sex, low birth weight, socioeconomic status, maternal education, and household hygiene; weights truncated at 0.5th and 99.5th percentiles
Confidence interval using the 2.5th and 97.5th percentiles of 2000 bootstrap replicates
Wald-based confidence interval with 2000 bootstrap replicates
Wald-based confidence interval from Cox proportional hazards model
DISCUSSION
Time difference and ratio measures derived from inverse probability-weighted Kaplan-Meier curves provide highly interpretable summaries of time-to-event data, especially for common outcomes. Because the use of weighted Kaplan-Meier curves allows for adjustment for covariates without reliance on a proportional hazards assumption, several of the main disadvantages of the Cox model are circumvented.2 The weighted survival curves presented here allow a visual representation of the potentially changing relationship between exposed and unexposed survival curves. Instead of being averaged over the duration of follow-up, time differences and ratios at a given survival proportion can be calculated at any survival proportion, or at several. Similarly, risk differences and risk ratios can be calculated at any time over follow-up. However, we recommend that clinically important proportions and times are chosen a priori to prevent selectively reporting results in which the magnitude of estimates is greatest. Presentation of adjusted Kaplan-Meier curves also eliminates the inherent selection bias of the hazard ratio2 by focusing on survival, which accumulates with time and therefore does not continually restrict the samples under comparison.
The macro presented here is limited by the crude handling of measured informative censoring by calculating censoring weights within deciles of the time scale. This strategy allows users to input continuous time data with minimal dataset management. However, if censoring is common, discretizing time (e.g. into person-weeks) and calculating weights in smaller intervals will allow for finer control of selection bias. We provide a version of the macro for discrete time data (e.g., with one observation per person-week) in the eAppendix. The weights presented can be extended to further account for time-varying exposure.13
Adjusted Kaplan-Meier curves9,12,13 and time differences and ratios should be reported more often as summary measures of effect; our hope is that the SAS macro provided here will facilitate their wider use.
Supplementary Material
Acknowledgments
Funding
This study was supported by the National Institute of Allergy and Infectious Diseases at the National Institutes of Health [5-T32-AI070114-08 to ETR], and the Eunice Kennedy Shriver National Institute of Child Health & Human Development and the Office of the Director of the National Institutes of Health [DP2-HD084070 to DJW]. The parent studies providing data for the reported example were supported by the National Institute of Allergy and Infectious Diseases at the National Institutes of Health [5-R01-AI072222 to HDW.].
Footnotes
Conflict of interest
DJW engages in occasional, ad hoc consulting on epidemiologic methods for NIH/NICHD – there is no overlap with the present work. All other authors have no conflicts of interest to disclose.
REFERENCES
- 1.Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society Series B Methodological. 1972;34:187–220. [Google Scholar]
- 2.Hernán MA. The hazards of hazard ratios. Epidemiology. 2010;21(1):13–15. doi: 10.1097/EDE.0b013e3181c1ea43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wei LJ. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat Med. 1992;11(14–15):1871–1879. doi: 10.1002/sim.4780111409. [DOI] [PubMed] [Google Scholar]
- 4.Cole SR, Chu H. Effect of acyclovir on herpetic ocular recurrence using a structural nested model. Contemp Clin Trials. 2005;26(3):300–310. doi: 10.1016/j.cct.2005.01.009. [DOI] [PubMed] [Google Scholar]
- 5.Ibrahim JG, Chen M-H, Sinha D. Bayesian Survival Analysis. New York, NY: Springer New York; 2001. [Accessed December 22, 2015]. http://link.springer.com/10.1007/978-1-4757-3447-8. [Google Scholar]
- 6.Portnoy S. Censored regression quantiles. J Am Statist Assoc. 2003;98(464):1001–1012. [Google Scholar]
- 7.Hernán MA, Robins JM. Estimating causal effects from epidemiological data. J Epidemiol Community Health. 2006;60(7):578–586. doi: 10.1136/jech.2004.029496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Sato T, Matsuyama Y. Marginal structural models as a tool for standardization. Epidemiology. 2003;14(6):680–686. doi: 10.1097/01.EDE.0000081989.82616.7d. [DOI] [PubMed] [Google Scholar]
- 9.Robins JM, Finkelstein DM. Correcting for noncompliance and dependent censoring in an AIDS Clinical Trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics. 2000;56(3):779–788. doi: 10.1111/j.0006-341x.2000.00779.x. [DOI] [PubMed] [Google Scholar]
- 10.Satten GA, Datta S. The Kaplan–Meier Estimator as an Inverse-Probability-of-Censoring Weighted Average. The American Statistician. 2001;55(3):207–210. doi: 10.1198/000313001317098185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cole SR, Hernán MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol. 2008;168(6):656–664. doi: 10.1093/aje/kwn164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cole SR, Hernán MA. Adjusted survival curves with inverse probability weights. Comput Methods Programs Biomed. 2004;75(1):45–49. doi: 10.1016/j.cmpb.2003.10.004. [DOI] [PubMed] [Google Scholar]
- 13.Westreich D, Cole SR, Tien PC, et al. Time scale and adjusted survival curves for marginal structural cox models. Am J Epidemiol. 2010;171(6):691–700. doi: 10.1093/aje/kwp418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Cole SR. Simple bootstrap statistical inference using the SAS system. Computer Methods and Programs in Biomedicine. 1999;60(1):79–82. doi: 10.1016/s0169-2607(99)00016-4. [DOI] [PubMed] [Google Scholar]
- 15.Greenland S. Interval estimation by simulation as an alternative to and extension of confidence intervals. Int J Epidemiol. 2004;33(6):1389–1397. doi: 10.1093/ije/dyh276. [DOI] [PubMed] [Google Scholar]
- 16.Gladstone BP, Muliyil JP, Jaffar S, et al. Infant morbidity in an Indian slum birth cohort. Arch Dis Child. 2008;93(6):479–484. doi: 10.1136/adc.2006.114546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Sarkar R, Sivarathinaswamy P, Thangaraj B, et al. Burden of childhood diseases and malnutrition in a semi-urban slum in southern India. BMC Public Health. 2013;13(1):87. doi: 10.1186/1471-2458-13-87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kattula D, Sarkar R, Sivarathinaswamy P, et al. The first 1000 days of life: prenatal and postnatal risk factors for morbidity and growth in a birth cohort in southern India. BMJ Open. 2014;4(7):e005404. doi: 10.1136/bmjopen-2014-005404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Division of Child Health and Development. Indicators for Assessing Breast-Feeding Practices. Geneva: World Health Organization; 1991. [Google Scholar]
- 20.World Health Organization. The Treatment of Diarrhoea: A Manual for Physicians and Other Senior Health Workers. 2005 http://www.who.int/maternal_child_adolescent/documents/9241593180/en/index.html.
- 21.Kuppuswami B. Manual of Socioeconomic Scale (Urban) New Delhi, Manasayan, 32: Netaji Subhash Marg; 1981. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.