Abstract
Selection bias due to loss to follow up represents a threat to the internal validity of estimates derived from cohort studies. Over the last fifteen years, stratification-based techniques as well as methods such as inverse probability-of-censoring weighted estimation have been more prominently discussed and offered as a means to correct for selection bias. However, unlike correcting for confounding bias using inverse weighting, uptake of inverse probability-of-censoring weighted estimation as well as competing methods has been limited in the applied epidemiologic literature. To motivate greater use of inverse probability-of-censoring weighted estimation and competing methods, we use causal diagrams to describe the sources of selection bias in cohort studies employing a time-to-event framework when the quantity of interest is an absolute measure (e.g. absolute risk, survival function) or relative effect measure (e.g., risk difference, risk ratio). We highlight that whether a given estimate obtained from standard methods is potentially subject to selection bias depends on the causal diagram and the measure. We first broadly describe inverse probability-of-censoring weighted estimation and then give a simple example to demonstrate in detail how inverse probability-of-censoring weighted estimation mitigates selection bias and describe challenges to estimation. We then modify complex, real-world data from the University of North Carolina Center for AIDS Research HIV clinical cohort study and estimate the absolute and relative change in the occurrence of death with and without inverse probability-of-censoring weighted correction using the modified University of North Carolina data. We provide SAS code to aid with implementation of inverse probability-of-censoring weighted techniques.
Keywords: Selection bias, informative censoring, cohort studies, attrition, inverse probability-of-censoring weights
Introduction
In cohort studies a group of individuals are sampled from a source population and followed over time to ascertain the occurrence of an outcome of interest 1. Such cohort data are often analyzed using a time-to-event framework given the frequent occurrence of loss to follow up. In the analysis of time-to-event data, a common objective is to estimate survival in the source population, as well as how survival differs by levels of exposure. Selection bias due to loss to follow up, also known as informative censoring, represents a threat to the internal validity of estimates derived from cohort studies 2. Over the last fifteen years, stratification-based techniques such as standard regression adjustment as well as methods such as inverse probability-of-censoring weighted estimation have been more prominently discussed and offered as a means to correct for such selection bias 2-9. However, unlike correcting for confounding bias using inverse probability-of-exposure weights 7,10, uptake of inverse probability-of-censoring weighted estimation as well as competing methods 11-17, including missing data approaches, such a multiple imputation to correct for selection bias has been limited in the applied epidemiologic literature.
This limited uptake may be due to a lack of clarity regarding the sources of selection bias in cohort studies as well as few detailed applications. Lack of clarity regarding the sources of selection bias may also contribute to the limited discussion in the epidemiologic literature concerning the importance of incorporating in the design phase of a cohort study the collection of information necessary to correct analytically for such selection bias 9,18. This limited discussion is in stark contrast to the frequently mentioned importance of collecting information on potential confounders as part of the study design.
Therefore, the objectives of this paper are, first, to use causal diagrams to describe the sources of selection bias in cohort studies analyzed under a time-to-event framework given the presence of loss to follow up when the quantity of interest is an absolute measure (e.g. absolute risk, survival function) or relative effect measure (e.g., risk difference, risk ratio). The absolute measure describes the occurrence of a certain characteristic or outcome in a single group. By relative effect measure we mean a measure that compares two or more groups (e.g., exposed versus unexposed) that is intended to estimate a causal effect or an associational effect when the exposure is not well-defined 3. We focus primarily on the risk difference and risk ratio for the relative effect measures of interest instead of the hazard ratio, which is more commonly estimated in time-to-event analyses, to avoid the selection bias that the hazard ratio is innately subject to 19. The second objective is to broadly describe inverse probability-of-censoring weighted techniques. Third, we will provide a simple example that demonstrates how inverse probability-of-censoring weighted estimation corrects for selection bias. Fourth, we will discuss related challenges to estimation. Fifth, we will modify more complex, real-world data from the University of North Carolina Center for AIDS Research (UNC CFAR) HIV clinical cohort study and estimate the absolute and relative change in the occurrence of death with and without inverse probability-of-censoring weighted correction for potential selection bias due to loss to follow up using the UNC data. The UNC analyses were performed in SAS, version 9.3, software (SAS Institute, Inc., Cary, North Carolina).
Notation
In a cohort of i = 1 to n HIV-positive individuals who became infected at least five years prior to study entry, let Ti represent the time in visits from study entry to the occurrence of the event (death), Ci is the time in visits from study entry to censoring due to loss to follow up, Mi is the time in visits from study entry to the administrative end of the study, and Yi is the observed follow up time (Yi = min(Ti, Ci, Mi)) for person i. Defining u to be an index of time in visits since study entry (u = 1 to max(yi)), Ai(u) is a measured indicator of injection drug use in the prior 6 months (1: yes; 0: no), Li(u) is a measured indicator of heavy alcohol use in the prior 6 months (1: yes, 0: no), Qi(u) is an unmeasured indicator of CD4 cell count (1: ≥200 cells/microL, 0: <200 cells/microL), and Zi(u) is an unmeasured indicator of level of education (1: not college educated, 0: college educated) at time u for person i. Further at time u, Di(u) is an indicator of loss to follow up (1: lost, 0: otherwise), while Oi(u) is an indicator of developing the event (1: event, 0: otherwise) for person i. Henceforth, i and u will be suppressed when possible.
Causal Diagrams for The Sources of Selection Bias Due to Loss to Follow Up
Selection bias due to loss to follow up is the absolute or relative bias that arises from how participants are selected out of a given risk set 3. Here and throughout this paper, absolute bias refers to bias of an absolute measure, while relative bias pertains to the bias of a relative effect measure. We define bias as a difference between the expected value of an estimator (e.g., mean survival, mean log risk ratio) and the true value for the quantity of interest in the study population present at baseline which we henceforth assume represents the source population 20.
Hernán et al. 2,9 outlined a common structure for selection bias based on causal diagrams when the quantity of interest is a relative effect measure and the exposure does not cause the outcome resulting in an equivalence between collider-stratification bias (i.e., bias resulting from conditioning on a collider) and relative selection bias 3,21. Here we build upon this prior work when the exposure causes the outcome and demonstrate that selection bias of a relative effect measure can occur even in the absence of conditioning on a collider. Furthermore, we discuss absolute bias and the fact that whether a given estimate is subject to selection bias depends on the causal diagram and the measure. For some diagrams, both the absolute and relative estimates are unbiased, while in others solely the absolute measure or both the absolute and relative measure may be biased. The diagrams we identify here for when the absolute or relative measure may be biased build upon work by Hernán et al. 2,9, are informed by theoretical and applied work by Daniel et al. 18, Greenland and Pearl 22, as well as Westreich 23, and have been demonstrated in simulations included in our eAppendix 1. For those less familiar with relevant definitions as well as the rules of and assumptions encoded in causal diagrams including the definition of a collider we refer the reader to the Appendix of Hernán et al 24.
Figure 1 shows five causal diagrams for the effect of injection drug use, heavy alcohol use, CD4 cell count, and education on loss to follow up and time to death. In each diagram the exposure (if applicable) is injection drug use and a box appears around D given that the analysis is restricted to those participants who remain not lost to follow up at a given time u. Diagram I) indicates that losses occur completely at random given that losses are not associated with A, L, or T. Losses that occur completely at random imply that those who are lost represent a simple, uniform random sample of those who were at risk for the event at a given time since study entry. Completely at random losses are considered to be a type of non-informative censoring where losses occur independently of the event of interest. In contrast, Diagrams II) through V) imply that losses do not occur completely at random, meaning that those who are lost to follow up are not a random sample of all participants who are in the risk set at the time a given participant is lost. When who is lost is related to the occurrence of the outcome of interest then losses are considered to be informative 25.
Figure 1.
Causal diagram depicting five scenarios for the effect of injection drug use (A), heavy alcohol use (L), CD4 cell count (Q), and education (Z), on lost to follow up (D) and time to death (T) in a cohort study where u indexes time in visits since study entry and denotes that A, L, Q, Z, and D can vary with time.
In Diagram I) given that losses are random with respect to A, L, and T, loss to follow up in the cohort does not induce absolute or relative selection bias when standard survival analysis methods (e.g., discrete-time survival function estimator, discrete-time hazard model) are used for estimation. However, in Diagram II) losses are dependent on L, therefore loss to follow up is not random. Given that L also predicts T these losses are informative and therefore losses may introduce bias of absolute measures or relative effect measures.
For instance, let us assume that those who engage in heavy alcohol use were more likely to be lost to follow up as well as die than those who do not engage. This prior scenario which is represented by L being a common cause of D and T in Diagram II), would result in non-engagers, who are less likely to die, being more likely to remain in the risk set during follow up. As such, the survival function in the source population is expected to be overestimated in the analysis sample. The estimated relative effect of injection drug use on death may be biased as well. Such relative bias may occur because of inaccurate estimation of a joint effect. As discussed in our eAppendix 2 and elsewhere 3,6, validly estimating the relative effect of injection drug use on death in the presence of loss to follow up requires accurate estimation of a joint effect. Accurate estimation of a joint effect requires adequately accounting for all common causes of loss and the outcome of interest (e.g., L) 3,6.
Prior work 23 and simulations (not shown) indicate that when A does not cause T in Diagram II), the estimated relative effect (i.e., risk difference, risk ratio, and odds ratio) is not be biased. Similar to Diagram II), losses should be informative, but in this case, dependent on A in Diagram III) given that A is a common predictor of D and T. These losses are expected to introduce bias of absolute measures, but will not bias the relative effect of the exposure, injection drug use, given that within strata of injection drug use losses should be random.
In Diagrams IV) and V), losses are informative related to both A and L. Specifically, in IV) both A and L are common causes of D and T. In V) A is a common cause of D and T, while L causes D and shares a common cause with T, the covariate Z. These informative losses are expected to result in selection bias for both the absolute and relative measures. The absolute measure is expected to be biased because D and T are associated via A and L. The relative effect measure is expected to be biased because restricting the analysis sample to those who remain in the risk set opens a non-causal path from A to D to L to T (or A to D to L to Z to T) given that D is a collider. In other words, even within levels of injection drug use losses will be informative. Losses will be informative given that engaging in heavy alcohol use is associated with injection drug use due to restricting the analysis to those who remain under follow up and engaging in heavy alcohol use is associated with time to death.
Using Inverse Probability-of-Censoring Weights to Correct for Selection Bias Due to Loss to Follow Up
Ideally losses to follow up would be minimized during the design and conduct stages of a cohort study by minimizing losses since selection via loss is required to have selection bias and the extent of selection bias is partly dependent on the degree of selection (e.g., percent lost to follow up). However, in most settings some losses are unavoidable and such losses often do not occur completely at random. Therefore, informed by causal diagrams, non-standard analytic methods should be considered and perhaps employed to correct for potential bias induced by loss to follow up. Such methods include inverse probability-of-censoring weighted estimation as well as stratification-based techniques including standard regression adjustment that stratify the data to address selection bias 2,3.
As noted by Hernán et al. 2 and described later using the UNC HIV example as well as in our eAppendix 2 in the case of Diagram V), there are situations where stratification-based methods may be insufficient to correct for selection bias, while inverse probability-of-censoring weighted estimation continues to provide unbiased estimates given that necessary assumptions outlined below are met. Furthermore, compared to stratification-based techniques, inverse probability-of-censoring weighted estimation can more readily provide marginal rather than conditional estimates of absolute measures corrected for potential selection bias. Marginal estimates have a preferred interpretation and are easier to display graphically compared to conditional estimates 26. Therefore, the remainder of the paper largely focuses on inverse probability-of-censoring weighted estimation rather than stratification-based techniques to address selection bias. Next, we broadly describe the use of inverse probability-of-censoring weights to correct for potential selection bias.
Inverse probability-of-censoring weights can be used to create the pseudo-population that would have been observed had losses to follow up occurred but been random with respect to measured determinants of loss to follow up (depicted in the relevant causal diagram) including the exposure (if applicable). This pseudo-population can be created by re-weighting the contribution of each participant who was not lost to follow up to a given risk set. Specifically, at time u each participant is typically assigned a stabilized weight SW(u) that is a ratio of the probability that the participant was not lost to follow up through time u conditional on the exposure (if applicable) and the probability that the participant remained not lost to follow up through time u conditional on measured determinants of loss to follow up including the exposure (if applicable). The aforementioned probabilities as well as the weight, SW(u), are often estimated using a pooled logistic regression model for not being lost to follow up 9. The SW(u) can then be used to estimate weighted versions of standard survival analysis methods. In our eAppendix 2 we use a simple example to more thoroughly demonstrate the use of inverse probability-of-censoring weights to reduce selection bias when estimating survival after study entry as well as the change in survival as a function of injection drug use via the risk difference or risk ratio.
For valid estimation of absolute measures and causal relative effect measures using inverse probability-of-censoring weights, the assumptions of exchangeability, positivity, and correct model specification in the outcome and weight model (where appropriate) must hold. Further, the exposure (if applicable) and censoring mechanism must be well-defined given that the exposure (if applicable) and censoring mechanism represent points of intervention 3,5. When any of the prior assumptions and conditions are not met the results from using inverse probability-of-censoring weighted estimation may be biased or lack a causal interpretation. Conditional exchangeability assumes that there are no unaccounted for sources of confounding bias (if applicable) and selection bias due to lost to follow up. Positivity requires that there is a non-zero probability of every possible exposure level (if applicable) within every observed combination of the measured confounders. In addition, there must be a non-zero probability of not being lost to follow up at each time that losses occur within every combination of possible exposure levels (if applicable) and observed measured variables that contribute to the selection bias. Lack of positivity can occur for systematic reasons (e.g., a given exposure level is not possible at a specific level of the confounder) or due to random chance (e.g., small sample size) 5,27,28. Correct model specification means that the model choice, including model form and functional forms between the predictors and the dependent variable (i.e., exposure (if applicable), censoring, or outcome) in all relevant regression models are correct. A well-defined exposure and censoring mechanism does not suffer from interference 29 and either corresponds to a single well-defined intervention or has version irrelevance when more than one well-defined intervention exists 30.
To minimize the potential for violations in conditional exchangeability, potential confounders as well as common causes of loss to follow up and the outcome of interest should be considered in the study design phase and included in data collection 9,18. Although violations in conditional exchangeability are not testable, sensitivity analyses can be performed to assess the robustness of inference to unmeasured sources of selection bias 31,32. In the presence of potential positivity violations, more complex double robust estimators such as targeted minimum loss-based estimation can instead be used for appropriate estimation as long as the outcome distribution is consistently estimated 17,28. Correct specification in the weight model can be facilitated by using data-adaptive procedures including super and ensemble learning techniques rather than the more commonly used pooled logistic regression model 33. Even if positivity and correct model specification are not an issue, targeted minimum loss-based estimation with data-adaptive procedures should still be considered given the potential for efficiency gains when measured covariates can predict the outcome well 17.
Example: University of North Carolina Center for AIDS Research HIV Clinical Cohort Study
African Americans have been shown to suffer disproportionately from HIV-related mortality 34. Therefore, here we use modified data on 2,511 HIV-infected persons in the UNC CFAR HIV clinical cohort to examine the association between African American race and subsequent mortality. We focus on association since African American race is not a well-defined exposure because it does not correspond to any possible well-defined, real-world intervention 3,35. The UNC CFAR HIV clinical cohort (henceforth, the cohort) collects relevant information from all HIV-positive patients attending the UNC HIV clinic who provide written informed consent in English or Spanish. All study forms and protocols were approved by the UNC institutional review board. The secondary data analysis below was approved by the institutional review boards at UNC and Brown University. Additional details concerning this clinic cohort are provided elsewhere 36.
This analysis uses data on the 2,511 African American and Caucasian patients who attended the UNC HIV clinic during the study period, January 1, 1999 to January 1, 2012, and who had information available on date of birth, gender, insurance status, prior AIDS-defining illness diagnoses, CD4 cell count, and HIV RNA level at least at the first clinic visit during the study period (henceforth, the first clinic visit). The data were modified such that clinic visits as well as assessment and updating of CD4 cell count and HIV RNA level occurred every six months subsequent to the first clinic visit. Insurance status and prior AIDS-defining illnesses were assumed to only be known at the first clinic visit. Death dates were coarsened to only occur at clinic visits. Last observation carried forward methods were used to complete CD4 and HIV RNA measures that were unavailable for a given visit. For the purposes of this simplified example, these completed values were assumed to represent the truth. However, beyond this simplified example, other more sophisticated and potentially less biased techniques for handling missing data should be considered 37. Patients were considered to be lost to follow up two years after the last time they were seen at a clinic visit during the study period. Patients who were last seen within two years of January 1, 2012 were administratively censored at January 1, 2012.
Graph I) in Figure 2 is a causal diagram for the effect of African American race on time to death among the UNC cohort patients 36,38,39. Assuming this causal diagram is correct, then the effect of African American race on time to death is potentially subject to selection bias via the non-causal path from African American race to loss to follow up to covariates that include CD4 cell count, AIDS, HIV RNA level, and insurance to death. Stratification-based methods such as standard regression adjustment for the abovementioned covariates would address this potential selection bias. However, any indirect effect that African American race has on time to death that operates though these covariates may also be removed with standard regression adjustment. Inverse probability-of-censoring weights can account for this selection bias while allowing for estimation of the effect of African American race on death operating through pathways that include and do not include the mentioned covariates. Simulations that appear in our eAppendix 1 were performed to confirm the potential selection bias of the effect of African American race on time to death and that inverse probability-of-censoring weights can be used to appropriately reduce such selection bias.
Figure 2.
Causal diagram depicting the association between African American race and time to death in the unweighted (top) and weighted (bottom) data among 2,511 HIV-infected African American and Caucasian men and women with 25,319 total person-visits of follow-up where u indexes time in visits since study entry, UNC CFAR HIV clinical cohort, 1999–2012.
To further demonstrate the impact of informative losses on estimation, the hypothesized causal relationships indicated by the causal diagram in Graph I) of Figure 2 were created or strengthened by modifying the UNC data. Standard and inverse probability-of-censoring weighted approaches were then used to estimate measures of interest based on the altered UNC data. Table 1 shows observed patient characteristics at the first clinic visit for the modified data. During follow up, 404 patients died, 1,390 patients were lost to follow up, and 717 patients reached the end of study follow up alive.
Table 1. Observed and weighted characteristics of 2,511 African American and Caucasian HIV-infected men and women, UNC CFAR HIV clinical cohort, 1999–2012.
| Characteristic | At first clinic visit during study period N=2,511 patients in observed population |
At first clinic visit during study period N=2,512 patients in weighted a population |
|---|---|---|
| Age in years, median (quartiles) | 39 (32; 46) | 39 (32; 46) |
| Male, % (n) | 70 (1,749) | 70 (1,749) |
| African American, % (n) | 66 (1,652) | 66 (1,659) |
| Prior AIDS-defining illness diagnosis, % (n) | 24 (605) | 24 (602) |
| Prior antiretroviral therapy use, % (n) | 79 (1,979) | 79 (1,976) |
| Insurance, % (n) | ||
| Private | 25 (639) | 25 (634) |
| Public b | 38 (947) | 38 (944) |
| Uninsured | 37 (925) | 37 (934) |
| CD4 cell count in cells/microL, % (n) | ||
| <200 | 29 (738) | 29 (734) |
| ≥200 | 71 (1,773) | 71 (1,778) |
| Detectable HIV-1 RNA level, % (n) | ||
| Yes | 62 (1,562) | 62 (1,567) |
| No | 38 (949) | 38 (945) |
Accounts for insurance status and receiving a prior AIDS-defining illness diagnosis at the first clinic visit as well as CD4 cell count and HIV RNA level at the prior visit.
Medicaid, Medicare, or other US public insurance (e.g., AIDS Drug Assistance Program, Veterans Administration, Department of Defense for prisoners).
African American race, insurance status, and ever receiving a diagnosis of an AIDS-defining illness at the first clinic visit, as well as CD4 cell count and HIV RNA level at the prior visit were used to estimate inverse probability-of-censoring weights using pooled logistic regression. Our eAppendix 3 provides the SAS, version 9.3 code that was used to estimate the aforementioned weights. In the pooled logistic regression model continuous covariates were fit using linear and quadratic terms while indicator variables were used for non- continuous covariates. The resultant weights had a mean (standard deviation) of 1.00 (0.37) with a range from 0.33 to 11.30. As shown in Table 1, the observed distribution of characteristics at the first clinic visit was preserved in the weighted population. However, the sample size at the first clinic visit in Table 1 and the number of deaths in the weighted data compared to the observed data increased by 1 and 66, respectively. The aforementioned increases may indicate model misspecification or non-positivity 5 which the alternative stratification-based approaches are subject as well. Assuming all necessary assumptions hold, the diagram that corresponds to this weighted population is shown in Graph II) of Figure 2 where censoring due to loss to follow up is random with respect to African American race and all of the other measured covariates.
Risk ratios obtained from the standard and inverse probability-of-censoring weighted survival functions were used to quantify the association between African American race and subsequent death. Figure 3 shows the survival functions and risk ratios comparing African Americans to Caucasians in the observed and weighted populations. Assuming Graph I) in Figure 2 is correct, the aforementioned results show that selection bias due to loss to follow up related to the measured exposure and covariates was sizeable. Specifically, informative selection appeared to overestimate survival and alter the association between African American race and subsequent death at later visits.
Figure 3.
Proportion alive (left) and risk ratio for death comparing African Americans to Caucasians (right) by visit among 2,511 HIV-infected men and women with 25,319 total person-visits of follow-up, UNC CFAR HIV clinical cohort, 1999–2012. The solid curve (Crude) does not correct for selection bias while the dashed curve (Weighted) corrects for selection bias due to loss to follow up dependent on African American race and measured covariates including insurance status and a prior AIDS-defining illness diagnosis at the first clinic visit as well as CD4 cell count and HIV RNA level at the prior visit using inverse probability-of-censoring weights.
Discussion
Here we used simple notation and causal diagrams to better characterize the sources of selection bias due to attrition in cohort studies when the quantity of interest is an absolute measure or relative effect measure. We discussed that when the exposure causes the outcome, conditioning on a collider is not necessary for selection bias of a relative effect. Instead, selection bias of a relative effect may occur solely due to the existence of a common cause of loss and the outcome. In addition, whether a given estimate obtained from standard methods is subject to selection bias can depend on the measure. For some scenarios, both the absolute and relative estimates obtained from standard methods will be unbiased, while in others, solely the absolute measure or both the absolute and relative measures obtained from standard methods may be biased.
Inverse probability-of-censoring weighted estimation was reviewed as a technique to correct for selection bias due to loss to follow up when estimating absolute measures or relative effect measures. Compared to non-standard techniques such as stratification-based methods, weighted methods can correct for selection bias in a broader number of scenarios and more readily provide covariate-corrected marginal estimates. However, when necessary assumptions or conditions are potentially violated, alternative techniques such as targeted learning should be considered 17,28,33.
Supplementary Material
Acknowledgments
This work was supported by the National Institutes of Health grant P30 AI50410.
The authors thank Dr. Bianca De Stavola for helpful feedback on an earlier draft, Dr. Daniel Westreich for expert advice, Mr. Sam Stinnette for assistance with the UNC CFAR HIV clinical cohort data, and the rest of the UNC CFAR HIV clinical cohort study staff.
Footnotes
The authors report no other financial interests related to this research.
References
- 1.Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd. Philadelphia, PA: Lippincott Williams & Wilkins; 2008. [Google Scholar]
- 2.Hernán MA, Hernandez-Diaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15(5):615–25. doi: 10.1097/01.ede.0000135174.63482.43. [DOI] [PubMed] [Google Scholar]
- 3.Hernán MA, Robins J. Causal Inference Book. Boca Raton: Chapman & Hall/CRC; 2016. Forthcoming. [Google Scholar]
- 4.Cole SR, Hernan MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol. 2008;168(6):656–64. doi: 10.1093/aje/kwn164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Howe CJ, Cole SR, Chmiel JS, Munoz A. Limitation of inverse probability-of-censoring weights in estimating survival in the presence of strong selection bias. Am J Epidemiol. 2011;173(5):569–77. doi: 10.1093/aje/kwq385.Epub2011Feb2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Howe CJ, Cole SR, Mehta SH, Kirk GD. Estimating the effects of multiple time-varying exposures using joint marginal structural models: alcohol consumption, injection drug use, and HIV acquisition. Epidemiology. 2012;23(4):574–82. doi: 10.1097/EDE.0b013e31824d1ccb. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11(5):561–70. doi: 10.1097/00001648-200009000-00012. [DOI] [PubMed] [Google Scholar]
- 8.Robins JM, Finkelstein DM. Correcting for Noncompliance and Dependent Censoring in an AIDS Clinical Trial with Inverse Probability of Censoring Weighted (IPCW) Log-Rank Tests. Biometrics. 2000;56(3):779–788. doi: 10.1111/j.0006-341x.2000.00779.x. [DOI] [PubMed] [Google Scholar]
- 9.Hernán MA, McAdams M, McGrath N, Lanoy E, Costagliola D. Observation plans in longitudinal studies with time-varying treatments. Stat Methods Med Res. 2009;18(1):27–52. doi: 10.1177/0962280208092345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–60. doi: 10.1097/00001648-200009000-00011. [DOI] [PubMed] [Google Scholar]
- 11.Barnighausen T, Bor J, Wandira-Kazibwe S, Canning D. Correcting HIV prevalence estimates for survey nonparticipation using Heckman-type selection models. Epidemiology. 2011;22(1):27–35. doi: 10.1097/EDE.0b013e3181ffa201. [DOI] [PubMed] [Google Scholar]
- 12.Robins J. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. [Google Scholar]
- 13.Malani HM. A Modification of the Redistribution to the Right Algorithm Using Disease Markers. Biometrika. 1995;82(3):515–526. [Google Scholar]
- 14.Murray S, Tsiatis AA. Nonparametric survival estimation using prognostic longitudinal covariates. Biometrics. 1996;52(1):137–151. [PubMed] [Google Scholar]
- 15.Scharfstein DO, Robins JM. Estimation of the failure time distribution in the presence of informative censoring. Biometrika. 2002;89(3):617–634. [Google Scholar]
- 16.Hsu CH, Taylor JM, Murray S, Commenges D. Survival analysis using auxiliary variables via non-parametric multiple imputation. Stat Med. 2006;25(20):3503–17. doi: 10.1002/sim.2452. [DOI] [PubMed] [Google Scholar]
- 17.Neugebauer R, Schmittdiel JA, van der Laan MJ. Targeted learning in real-world comparative effectiveness research with time-varying interventions. Stat Med. 2014;33(14):2480–520. doi: 10.1002/sim.6099.Epub2014Feb17. [DOI] [PubMed] [Google Scholar]
- 18.Daniel RM, Kenward MG, Cousens SN, De Stavola BL. Using causal diagrams to guide analysis in missing data problems. Stat Methods Med Res. 2012;21(3):243–56. doi: 10.1177/0962280210394469.Epub2011Mar9. [DOI] [PubMed] [Google Scholar]
- 19.Hernán MA. The hazards of hazard ratios [Commentary] Epidemiology. 2010;21(1):13–5. doi: 10.1097/EDE.0b013e3181c1ea43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.DeGroot MH, Schervish MJ. Unbiased Estimators Probability and Statistics. Third. Addison-Wesley; 1975. pp. 427–433. [Google Scholar]
- 21.Hernán MA. Caveats and Considerations in Symposium on Selection Bias due to Loss: An Old and Often Ignored Problem Revisited at 2014 Society for Epidemiologic Research Annual Meeting. [Accessed February 6, 2015]; https://epiresearch.org/about-us/archives/video-archives-2/selection-bias-due-to-loss/
- 22.Greenland S, Pearl J. Adjustments and their consequences-collapsibiliy analysis using graphical models. International Staistical Review. 2011;79(3):401–426. [Google Scholar]
- 23.Westreich D. Berkson's bias, selection bias, and missing data. Epidemiology. 2012;23(1):159–64. doi: 10.1097/EDE.0b013e31823b6296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hernán MA, Hernandez-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002;155(2):176–84. doi: 10.1093/aje/155.2.176. [DOI] [PubMed] [Google Scholar]
- 25.Collett D. Modelling Survival Data in Medical Research. Second. Chapman and Hall; 2003. [Google Scholar]
- 26.Cole SR, Lau B, Eron JJ, et al. Estimation of the standardized risk difference and ratio in a competing risks framework: application to injection drug use and progression to AIDS after initiation of antiretroviral therapy. Am J Epidemiol. 2015;181(4):238–45. doi: 10.1093/aje/kwu122.Epub2014Jun24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Westreich D, Cole SR. Invited commentary: positivity in practice. Am J Epidemiol. 2010;171(6):674–7. doi: 10.1093/aje/kwp436. discussion 678-81. Epub 2010 Feb 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Petersen ML, Porter KE, Gruber S, Wang Y, van der Laan MJ. Diagnosing and responding to violations in the positivity assumption. Stat Methods Med Res. 2012;21(1):31–54. doi: 10.1177/0962280210386207.Epub2010Oct28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hudgens MG, Halloran ME. Toward Causal Inference With Interference. J Am Stat Assoc. 2008;103(482):832–842. doi: 10.1198/016214508000000292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.VanderWeele TJ. Concerning the consistency assumption in causal inference. Epidemiology. 2009;20(6):880–3. doi: 10.1097/EDE.0b013e3181bd5638. [DOI] [PubMed] [Google Scholar]
- 31.Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for non-ignorable drop-out using semi-parametric non-response models. J Am Statist Assoc. 1999;94:1096–1120. [Google Scholar]
- 32.Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for non-ignorable drop-out using semi-parametric non-response models [Comments and Rejoinder] J Am Statist Assoc. 1999;94:1121–1146. [Google Scholar]
- 33.Gruber S, Logan RW, Jarrin I, Monge S, Hernan MA. Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets. Stat Med. 2015;34(1):106–17. doi: 10.1002/sim.6322.Epub2014Oct15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.The Antiretroviral Therapy Cohort Collaboration. Influence of geographical origin and ethnicity on mortality in patients on antiretroviral therapy in Canada, Europe, and the United States. Clin Infect Dis. 2013;56(12):1800–9. doi: 10.1093/cid/cit111.Epub2013Mar1. [DOI] [PubMed] [Google Scholar]
- 35.VanderWeele TJ, Robinson WR. On the causal interpretation of race in regressions adjusting for confounding and mediating variables. Epidemiology. 2014;25(4):473–84. doi: 10.1097/EDE.0000000000000105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Howe CJ, Cole SR, Napravnik S, Eron JJ. Enrollment, retention, and visit attendance in the University of North Carolina Center for AIDS Research HIV clinical cohort, 2001-2007. AIDS Res Hum Retroviruses. 2010;26(8):875–81. doi: 10.1089/aid.2009.0282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Vourli G, Touloumi G. Performance of the marginal structural models under various scenarios of incomplete marker's values: A simulation study. Biom J. 2014;28(10):201300159. doi: 10.1002/bimj.201300159. [DOI] [PubMed] [Google Scholar]
- 38.Howe CJ, Cole SR, Napravnik S, et al. The role of at-risk alcohol/drug use and treatment in appointment attendance and virologic suppression among HIV+ African Americans. AIDS Res Hum Retroviruses. 2014;30(3):233–40. doi: 10.1089/aid.2013.0163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Howe CJ, Napravnik S, Cole SR, et al. African American Race and HIV Virological Suppression: Beyond Disparities in Clinic Attendance. Am J Epidemiol. 2014;179(12):1484–92. doi: 10.1093/aje/kwu069.Epub2014May. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



