Skip to main content
American Journal of Epidemiology logoLink to American Journal of Epidemiology
. 2020 Jul 15;189(12):1633–1636. doi: 10.1093/aje/kwaa143

DATA VISUALIZATION TOOLS FOR CONFOUNDING AND SELECTION BIAS IN LONGITUDINAL DATA: THE %LENGTHEN, %BALANCE, AND %MAKEPLOT (CONFOUNDR) MACROS AND R PACKAGE

Erin M Schnellinger, Linda Valeri, John W Jackson
PMCID: PMC7705602  PMID: 32666075

Confounding and selection bias are extremely concerning in longitudinal studies (1) yet hard to diagnose. Ascertaining the success of methods used to control for such biases is also difficult (2), especially with time-varying exposures. The confoundr software addresses these difficulties in numerous situations, enabling researchers to examine 1) patterns of confounding/selection bias among measured covariates in longitudinal data; 2) similar patterns in studies of interaction or mediation, or with artificial censoring; 3) the extent to which adjustment procedures such as inverse probability weighting (IPW) resolve imbalances from measured confounding/selection bias; and 4) the amount of exposure-covariate feedback, which occurs when covariates are affected by exposure or an unmeasured cause of prior exposure (3). Confoundr is flexible and accommodates real-world subtleties such as right-censoring, different depths of covariate history, multiple exposures, and multivalued exposures. It can also focus on specific times and produce summaries nonparametrically or through modeling, which is useful for longitudinal analyses.

Existing software, including Toolkit for Weighting and Analysis of Nonequivalent Groups (TWANG) (4), Covariate Balance Tables and Plots (COBALT) (5), and WeightIt (6), can estimate propensity score weights and compute standardized balance metrics, respectively. However, TWANG employs a particular IPW estimation approach (4), ignores exposure history, and produces a single balance metric averaged across all covariates, rather than separate metrics for each covariate; COBALT and WeightIt ignore exposure history and focus on time-fixed exposures (5, 6). Confoundr, available in SAS (SAS Institute, Cary, North Carolina) and R (R Foundation for Statistical Computing, Vienna, Austria) (7, 8), is more robust: It can handle time-varying exposures, covariates, and weights, along with censoring indicators and time-varying exposure history strata. Additionally, it can handle propensity-score strata when implementing g-methods for effects of sustained treatments as in Achy-Brou et al. (9) and Hong (10), current treatment as in Keogh et al. (11), time-fixed treatment, and when evaluating treatment-confounder feedback as in Jackson (2). Confoundr is interactive, requires minimal user input, and outputs both balance tables and meaningful visualizations of these metrics. This letter and accompanying tutorial outline the features and use of confoundr.

OVERVIEW OF CONFOUNDR

Confoundr consists of a series of SAS macros, which users call successively. These macros use PROC IML, PROC SQL, and PROC SGPANEL, along with basic SAS data manipulation techniques (SAS version 9.4 or above required). Three diagnostics are available in confoundr: Diagnostic 1 examines whether the mean of prior covariates differs across exposure groups, among people who experience the same exposure history up to a given time; diagnostic 2 assesses whether the covariate mean differs across prior exposure groups, to determine whether exposure-covariate feedback is present (i.e., it helps evaluate whether g-methods (1) should be used, although causal diagrams can also aid this decision (3)); and diagnostic 3 examines the distribution of prior covariates after g-methods are applied (e.g., it assesses residual confounding among measured covariates after weighting). The theory behind these diagnostics is beyond the scope of this letter; interested readers may see work by Jackson (2, 12).

Confoundr requires input data to be in “wide” format, with 1 record per subject. The %widen() macro transforms long data into wide format. Rows of the input data set should be distinguished by unique subject identifiers, and columns should indicate the name and measurement time of each variable, separated by an underscore. Once the data fit this structure, analysis proceeds as follows:

  • %makehistory_one() or %makehistory_two(): Generate exposure history from time-indexed exposure variables (optional).

  • %lengthen(): Restructure the wide input data set into “tidy” format, with each row uniquely identified by the pairing of exposure and covariate measurement times (13).

  • %balance(): From the “tidy” data set, create a covariate balance table.

  • %makeplot(): Plot the resulting balance statistics.

Additional features include %diagnose(), which iteratively calls %lengthen() and %balance()to reduce memory consumption; %omit_history(), which removes covariate measurements that do not support exchangeability assumptions at certain times; and %apply_scope(), which subsets the %balance() output to include only balance metrics at a certain distance/recency or produces summary estimates averaged over person-time.

Additional information about these macros appears in the software manual (7) and Web Appendix 1 (available at https://academic.oup.com/aje), where we apply confoundr to a large pragmatic trial of first- and second-generation antipsychotics (14), conducted in the United States in 2000–2004, to evaluate covariate balance before and after IPW weighting. The resulting trellis plots (Figures 1 and 2, Web Figure 1) illustrate how the extent of imbalance for each covariate changes over time.

Figure 1.

Figure 1

Standardized mean difference of time-varying covariates comparing censored versus uncensored patients in the Clinical Antipsychotic Trials of Intervention Effectiveness (conducted in the United States, 2000–2004), before application of inverse probability of censoring weights. Standardization was performed by dividing raw mean differences by the standard deviation among the uncensored. Symbols distinguish standardized mean differences for specific treatment arms (circles: olanzapine; triangles: ziprasidone). Dashed reference lines are placed at within 0.25 of a standardized mean difference. For clarity, we report results only for the ziprasidone and olanzapine treatment arms, which experienced the highest and lowest dropout rates, respectively. CGI, Clinical Global Impressions scale; EPS, Extrapyramidal Side-Effect Scale; PANSS, Positive and Negative Syndrome Scale.

Figure 2.

Figure 2

Standardized mean difference of time-varying covariates comparing censored versus uncensored patients in the Clinical Antipsychotic Trials of Intervention Effectiveness (conducted in the United States, 2000–2004), after application of inverse probability of censoring weights. Standardization was performed by dividing raw mean differences by the standard deviation among the uncensored. Symbols distinguish standardized mean differences for specific treatment arms (circles: olanzapine; triangles: ziprasidone). Dashed reference lines are placed at within 0.25 of a standardized mean difference. For clarity, we report results only for the ziprasidone and olanzapine treatment arms, which experienced the highest and lowest dropout rates, respectively. CGI, Clinical Global Impressions scale; EPS, Extrapyramidal Side-Effect Scale; PANSS, Positive and Negative Syndrome Scale.

DISCUSSION

Confoundr can assess only how well confounding/selection bias is mitigated among measured covariates, provided that sequential exchangeability (1) holds. It cannot assess residual confounding/selection bias among unmeasured covariates. Thus, covariate selection should be performed before running confoundr. Subsequently, confoundr can be implemented to visualize imbalance among selected covariates and determine whether the applied methods correct for observed imbalances. Confoundr can also assess the presence of treatment-confounder feedback (2), which can inform the use of g-methods.

Confoundr provides a more transparent way to assess measured confounding/selection bias in one’s studies, and present results of these assessments to the scientific community. It also enables investigators to more easily detect whether g-methods, as implemented, have improved covariate balance, allowing them to modify their approach before proceeding to outcome analyses. Moreover, confoundr is applicable to observational studies and clinical trials conducted in many disciplines (15, 16).

A limitation of confoundr is that the data transformations used to compute balance metrics consume considerable memory if the number of observations, covariates, or measurement times is large. To mitigate this issue, call %diagnose(), or subset data to selected times. Another limitation of confoundr is that columns of the input data set must contain the variable name and indexed measurement time, separated by an underscore. The %widen() macro, based on Svolba (17), facilitates this structure. Last, %makeplot() relies on PROC SGPANEL, which allows only limited dimensions for trellis plots. When it does not suffice, balance results can be read into the confoundr R package for plotting (see Jackson et al. (8), Web Appendix 2, and Web Figures 2 and 3).

Despite these limitations, confoundr can compute balance metrics for time-varying exposures consistent with nuanced assumptions about the nature of confounding/selection bias, and it can produce summary statistics if needed. We hope confoundr will benefit investigators by illuminating their data and informing their analyses in ways that are concise, intuitive, and meaningful to stakeholders.

Supplementary Material

Web_Material_kwaa143

Acknowledgments

This work was funded by the National Center for Advancing Translational Science (grant UL1TR001102); the National Heart, Lung, and Blood Institute (grants K01HL145320 to J.W.J. and F31HL194338 to E.M.S.); and the National Institute of Mental Health (grant K01MH118477 to L.V.).

This study used data from the National Institutes of Mental Health National Data Archive. We are extremely grateful to Dr. Hardeep Ranu and Dr. Garry Gray, both from Harvard Catalyst Reactor Program, for project facilitation and support; Dr. Marsha Wilcox, Janssen scientific director and fellow, for making this study possible; and to the other Harvard Catalyst funded researchers using these data (Dr. Sharon-Lise Normand and Jake Spertus) for their generous advice and comments.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Conflict of interest: none declared.

REFERENCES

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web_Material_kwaa143

Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES