Abstract
As medical expenses continue to rise, methods to properly analyze cost outcomes are becoming of increasing relevance when seeking to compare average costs across treatments. Inverse probability weighted regression models have been developed to address the challenge of cost censoring in order to identify intent-to-treat effects (i.e., to compare mean costs between groups on the basis of their initial treatment assignment, irrespective of any subsequent changes to their treatment status). In this paper, we describe a nested g-computation procedure that can be used to compare mean costs between two or more time-varying treatment regimes. We highlight the relative advantages and limitations of this approach when compared to existing regression-based models. We illustrate the utility of this approach as a means to inform public policy by applying it to a simulated data example motivated by costs associated with cancer treatments. Simulations confirm that inference regarding intent-to-treat effects versus the joint causal effects estimated by the nested g-formula can lead to markedly different conclusions regarding differential costs. Therefore, it is essential to pre-specify the desired target of inference when choosing between these two frameworks. The nested g-formula should be considered as a useful, complementary tool to existing methods when analyzing cost outcomes.
Keywords: Causal inference, confounding, g-computation, observational studies, time-varying treatment
1. Introduction
In the modern era, high medical costs are of growing importance in the selection of increasingly many treatments for purposes of policy implementation and resource allocation (Thorpe, 2005; Nikolova, 2014). Analyses of cumulative medical costs can help paint a complete picture in the comparison of treatments when interpreted in conjunction with clinically relevant outcomes. However, cost analyses are typically complicated by censoring, whereby complete cost data are only available on a subset of study participants. For instance, cost over five years may be of interest, but some participants may only have data for two years.
Methods have been developed in recent years to properly address censoring in studies with medical cost outcomes. It is widely recognized that standard approaches to address right-censoring (e.g., the Kaplan-Meier method and the Cox Proportional Hazards model) are not appropriate when the outcome of interest is cumulative cost. As cost accumulation rates are heterogeneous across individuals, the observed cost at the time of censoring is correlated with the theoretical cumulative cost over some pre-specified time range of interest, violating the “non-informative censoring” condition imposed by standard approaches. We refer to this fundamental challenge as the problem of informative cost trajectories, a problem that exists in the analysis of cost outcomes even if the censoring mechanism is itself completely random.
Bang & Tsiatis (2000) developed an estimator for mean cost that addresses informative cost trajectories by re-weighting complete-case observations; weights are specifically generated according to individual subjects’ estimated survival probabilities, perhaps conditional on baseline covariates. This general framework prompted further research to develop regression methods that model mean cost across one or more covariates of interest (Lin, 2000; Lin et al., 2003). Like the mean estimator, regression models can accommodate repeated cost measures to improve efficiency of estimation. To date, this framework remains the most widely implemented by health economists seeking to understand how mean cost varies across different treatment groups. Subsequent refinements and extensions have sought to further improve efficiency and more flexibly model cost outcome distributions (Li et al., 2016).
Regression methods based on the Bang & Tsiatis framework identify intent-to-treat (ITT) effects in that they compare mean costs between groups on the basis of baseline treatment status, irrespective of post-baseline treatment modifications. An ITT effect therefore characterizes the extent to which intending to administer some medication or therapy impacts an individual patient’s total cost relative to some other medication or therapy, regardless of changes in treatment received. For example, suppose one were interested in comparing costs between two competing chemotherapy agents; an ITT effect addresses a research question about differences in mean cost under two hypothetical scenarios: one in which it is intended at baseline for the population at hand to receive the first agent, and the other in which it is intended at baseline for the population at hand to receive the second agent. Whether or not they continue to adhere to these agents is not considered in an ITT analysis.
In practice, emerging insights in a patient’s individual care can bring about changes to his or her treatment status. Some reasons for treatment changes include: treatment toxicity, dramatic improvements in health, insurance changes, prohibitively high cost, and comorbidities. Many of these factors serve as time-varying confounders in that they influence both future treatment and future cost. Decisions to change or maintain a particular patient’s treatment status are often iterative and in accordance with concurrent standard of care and best practices. Therefore, there is often a lack of ethical grounds to impose particular treatment patterns on a subpopulation in an experimental setting.
In many circumstances, policy makers and health economists may benefit from insights into mean cost differences across theoretical time-varying treatment regimes. In the simple case of two treatment options (say, A and A′), it may be of interest to understand the difference in mean cost under the hypothetical setting in which all participants receive treatment A for the study duration and the hypothetical setting in which all participants receive treatment A′ for the study duration. Such a difference is an example of a joint causal (JC) effect, providing insights into the impact of implementing a particular treatment policy on medical costs relative to some other policy. For example, in the setting of two competing chemotherapy agents described above, a JC effect answers a research question about differences in mean cost under two hypothetical scenarios: one in which all patients adhere to the first agent for the study duration, and one in which all patients adhere to the second agent for the study duration. Predictions of cost savings could be computed by multiplying the JC effect by, for example, yearly disease incidence in a subpopulation of interest. It is unclear how, or even whether, an ITT effect could be used to gain similar insights if participants switch in and out of different treatments. The ITT and JC effects are equivalent when treatment is time-stable, but they need not otherwise be. The JC effect may be either of higher or lower magnitude than the ITT effect, depending on multiple factors.
G-computation, developed by Robins (1986), is a method to estimate JC effects in the setting of time-varying treatment and confounding. We have previously formalized a nested g-computation procedure to allow censoring and death to depend upon observed time-varying covariates and repeated outcome history (Spieker et al., 2017). The nested g-formula has demonstrated a fair amount of robustness to misspecification of the cost distribution, and therefore shows promise as a method to analyze cost data.
The goal of this paper is to present the nested g-formula as an alternative to existing methods when seeking to analyze cost outcomes. A motivating example involving cancer treatments will be used to demonstrate how to implement the nested g-formula. We will utilize Monte-Carlo simulations to illustrate mechanisms by which inferences on JC and ITT effects can lead to different conclusions. Throughout, we will elaborate on how JC effects have important health policy and decision making implications.
2. Estimating Intent-To-Treat Effects
We briefly review approaches to estimate mean cost differences across treatments. We focus on the complete-case and interval-partition linear regression approaches formalized by Lin (2000) to compare costs between treatments in the presence of baseline confounding.
2.1. Complete-case inverse probability weighting approach
Let i = 1, . . . ,N index independently sampled study subjects. Let denote cumulative medical cost for τ units of time after commencing treatment. Naturally, τ cannot be longer than the study duration. Let A denote baseline treatment status (which we assume to be binary for the time being), and L a vector of baseline confounders. Consider the following regression model:
(1) |
where β = (β0, β1, β2)T is a vector of unknown regression parameters and ϵ an error term of mean zero. In the complete absence of censoring, regression parameters can be estimated using ordinary least squares (OLS). When participants’ survival times are censored prior to time τ, so too are their cost outcomes. Let T denote the survival time, and T* = min(T, τ) the study duration or post-treatment survival time, whichever is smaller. Further let C denote the censoring time, and δ = 1(T* ≤ C) the indicator that the cost outcome is observed.
If V denotes baseline covariates that influence survival time (these covariates are permitted to overlap with the covariates of L), let K(t|V) = P(C ≥ t|V) denote the conditional probability of survival past time t. K(t|V) can be estimated using the Cox Proportional Hazards model, for example (or the Kaplan-Meier estimator in the absence of V). Let denote the estimated conditional probability of survival for subject i. Letting , a closed-form estimator for β is given as follows:
(2) |
We refer to this method for estimation of β as the complete-case (CC) approach.
2.2. Refinement based on repeated measures
Much akin to the refinement proposed by Bang & Tsiatis (2000), repeated outcome measures can be exploited to allow study participants to contribute to estimation of β for as long as they are not censored. Let 0 ≡ τ0 < τ1 < ·· · < τJ−1 < τJ ≡ τ denote a discrete interval partition of the time range of interest. Let Yj denote the cost accumulated in interval j–namely, the interval [τj−1, τj)–such that . Further let and the indicator that complete-cost data are observed through interval j. The CC estimator can be applied within each respective interval and summed to estimate β:
(3) |
That the CC estimator and the interval-partition (IP) estimator are each consistent and asymptotically normal is theoretically justified by Lin (2000). In either case, asymptotically valid sandwich-based estimators and the nonparametric bootstrap are both commonly implemented approaches to estimate and conduct inference.
2.3. Interpreting ITT effects
Regardless of whether the CC or IP approach is used to estimate regression model (1), β1 can be interpreted as the difference in expected cost between two randomly sampled individuals of the same values for L, but differing in baseline treatment status: . However, only under the assumption of no unmeasured confounding (which specifically requires the potential cost to be independent of A, conditional on L) is β1 equivalent to the ITT effect, (here, denotes the hypothetical cost when given baseline treatment A = a). This equivalence is proven in Appendix I of the web-based Supporting Information. What’s more, under proper randomization, the parameter β1 from regression model (1) is consistent for the marginal ITT causal parameter regardless of further model misspecification (Tsiatis et al., 2008).
The parameter ΔITT is a particular kind of causal effect that characterizes the difference in mean cost under hypothetical scenarios in which everyone receives two comparator treatments at baseline, disregarding post-baseline treatment changes. If treatment is randomized at baseline, there are no systematic differences in L, and consistently estimates ΔITT without adjustment.
Researchers will sometimes use generalized linear models (with, for example, a Gamma outcome) in place of OLS to account for cost skewness. If a difference in marginal means is of scientific interest, it is important that the “identity” link be chosen rather than the default “inverse” link. In cases where ITT effects involving contrasts other than differences may be of interest, it is possible to model cost as a nonlinear function of A and L (Lin et al., 2003), though the equivalence of the conditional association to the corresponding ITT effect is often lost (with some exceptions, such as use of the log-link in which collapsibility holds) even when the assumption of no unmeasured confounding is met. Li et al. (2016) propose the use of inverse probability-of-treatment weights in the regression model as an alternative to conditioning on L so that an ITT contrast can be recovered.
3. Time-Dependent Treatment and Joint Causal Effects
In this section, we summarize the nested g-formula approach (Spieker et al., 2017) for addressing time-dependent treatment and confounding in order to target the JC effect.
3.1. Joint causal effects and g-computation
Continuing the notation from the previous section, let L1, . . . ,LJ denote confounders measured at each of the J time points, and A1, . . . ,AJ the corresponding treatment statuses. For ease of notation, let denote treatment history through interval j and let . We adopt analogous notation for confounder history and the repeated cost outcomes, noting that . The JC effect, , is defined to be the difference in mean cost between two hypothetical treatment regimes and . Here, denotes the hypothetical cumulative cost under treatment history . This is distinctly different than the ITT effect, which only concerns the baseline treatment status.
The g-formula, first developed by Robins (1986), could be used to account for time-varying treatment and confounding when estimating marginal means if cost were measured as a single variable at the end of a study. G-computation refers to the process by which the g-formula is used to estimate those marginal means. In the absence of repeated cost measures, it would be challenging to formulate reasonable assumptions regarding censoring and death in this setting. The g-computation framework has not been previously considered for analysis of cumulative costs; in order to make reasonable assumptions about censoring and death, we formalized a nested g-computation procedure to accommodate these features by means of repeated outcome measures over an interval partition. Figure 1 illustrates how confounders, treatment, and cost influence each other in the setting of repeated cost measures, highlighting the temporal ordering of these variables. Letting Dj denote the indicator of death at the end of interval j (after data have been collected on confounders, treatment, and cost in interval j), the nested g-computation procedure can be described as follows:
Select and fit model for the conditional mean of Yj given prior treatment, confounding, and cost history: . For example, this could be accomplished with standard ordinary least squares regression or a generalized linear model.
Select and fit model for the distribution of the confounders given previous treatment, confounding, and cost history: . For binary confounders, this can be accomplished with, for example, logistic regression; for continuous confounders, this can be accomplished with ordinary least squares regression or generalized linear models.
Select and fit model for the risk (log-linear regression) or odds of death (logistic regression) given previous treatment, confounding, and cost history: .
Choose a hypothetical treatment regime of interest, (e.g., ). Simulate data from the models fitted in Steps 1 through 3 many, many times (in the appropriate temporal order, and plugging in for treatment at each stage) to generate predicted cost values. Note that once a patient dies, they would not continue to accumulate further costs. Averaging the predicted total costs from these simulated predictions provides an estimate of .
Steps 1 through 3 are implemented on the basis of all available data–i.e., individuals are included in analysis for as many observations as they are not censored. Cost and confounder models can be fit, for example, through OLS or other more flexible regression techniques. Step 4 identifies a particular marginal mean, and can be implemented for comparator treatment regimes of interest. Among the most important assumptions implied by this algorithm include: (1) correct model specification, (2) the potential costs under the observed treatment regime are equal to the observed costs, (3) sequentially ignorable treatment assignment—that is, the assumption that treatment at each point is random conditional on observed variable history, and (4) conditionally independent censoring and death—more specifically, conditional on observed covariate history. Further specific details regarding the derivation of this algorithm and technical assumptions required to identify the JC effect are provided by Spieker et al. (2017). Note that in the total absence of censoring and death, the standard g-computation procedure could be applied. This procedure is provided in Appendix II of the Supporting Information in the case of a single cost outcome.
FIGURE 1.
Directed acyclic graph depicting how confounders, cost, and treatment are related in the setting of three time points. For simplicity, it is assumed that variables in one interval depend on observations only from the concurrent and previous interval.
4. Applying Nested G-computation to Data
We now demonstrate how to implement nested g-computation with a simulated data set. The purpose of this section is to provide details on how to implement the method for a particular data set. The code that was used to generate the data and the corresponding g-computation algorithm are available in the supplementary files, so that readers can replicate the results. For data generation, we specified that there are two possible active treatments which are time-varying in nature. An example of a situation like this where cost is a concern is in cancer treatment. Many forms of cancer can be managed with multiple treatments in different patterns to attempt to induce remission or slow progression; in many instances, patients undergo some combination of radiation therapy (RT) and chemotherapy (CT), perhaps alternating between the two. Given the time-varying nature of treatment patterns, this general setting serves as an excellent motivating example for a cost comparison analysis.
Data were generated with three time points, a single continuous time-varying confounder (e.g., a transformed propensity score), Gamma-distributed cost outcomes (in thousands of dollars), and three treatment categories (0 = None, 1 = RT, and 2 = CT). Death and censoring were generated in a time-updating fashion. More details regarding data generation can be found in the supplementary file, Data_Gen.R. This code generates the analytic data set in “.csv” format (Table I highlights key characteristics of the data). These characteristics are not designed to reflect the population characteristics for a specific form of cancer; rather, this simulation provides a framework to generate a data set we could imagine as plausible, solely for the purposes of illustration. While we will proceed to perform nested g-computation and provide interpretations the way we would if these data truly were real, it is important to note that no clinical conclusions can or should be made on the basis of this artificial data.
Table I:
Key characteristics of the simulated working example.
Characteristic | Description |
---|---|
Treatment distribution at baseline | None: 17.9%, RT: 52.1%, CT: 30.1% |
Probability of switching treatment | P(A1 ≠ A2) = 13.5% |
P(A2 ≠ A3) = 9.2% | |
P(A1 ≠ A3) = 20.4% | |
Mean baseline cost |
|
Adjusted censoring risk (ITT) | P(Censored|A1 = None, L1 = l1) = 13.3% |
P(Censored|A1 = RT, L1 = l1) = 10.7% | |
P(Censored|A1 = CT, L1 = l1) = 10.0% | |
Adjusted death risk (ITT) | P(Death|A1 = None, L1 = l1) = 15.9% |
P(Death|A1 = RT, L1 = l1) = 10.6% | |
P(Death|A1 = CT, L1 = l1) = 8.6% |
To implement nested g-computation, we perform the following step-wise procedure. The supplementary file, G_Comp.R provides the complete code used to generate the salient results.
-
Estimate baseline models:
-
1A.
Estimate the mean and standard deviation of the baseline confounder (i.e., parameters from a normal distribution).
-
1B.
Model baseline cost, conditional on treatment and the confounder, using (for example) Gamma regression with the identity link.
-
1C.
Model the odds of death (e.g., with logistic regression) after the first interval, conditional on the confounder, treatment, and baseline cost.
-
1A.
-
Estimate follow-up models, conditional on being observed:
-
2A.
Model follow-up confounders with (for example), OLS, conditional on the prior confounder, treatment, and cost.
-
2B.
Model follow-up cost, conditional on prior two confounders, prior two treatments, and prior cost, using (for example) Gamma regression with the identity link.
-
2C.
Model the odds of death (e.g., with logistic regression) after the second interval, conditional on the prior two confounders, treatments, and costs.
-
2A.
Set the number of g-computation iterations, Ng (e.g., 100,000), and select a treatment regime (for example, CT followed by two courses of RT corresponds to ).
-
Generate Ng baseline variables:
-
4A.
Generate Ng normally distributed baseline confounders under the model parameters of Step 1A.
-
4B.
Generate Ng Gamma-distributed baseline costs under the model parameters of Step 1B, of Step 3, and confounders of Step 4A.
-
4C.
Generate Ng death indicators under the model parameters of Step 1C, of Step 3, and generated confounders/costs of Steps 4A and 4B.
-
4A.
-
Generate Ng follow-up variables:
-
5A.
Generate Ng confounders under the estimated model parameters of Step 2A, of Step 3, and generated variables of Step 4.
-
5B.
Generate Ng Gamma-distributed costs under the estimated model parameters of Step 2B, of Step 3, and generated variables of Steps 4 through 5A.
-
5C.
Generate Ng death indicators under the estimated model parameters of Step 6, of Step 3, and generated prior variables of Steps 4 through 5B.
-
5D.
Repeat Steps 5A through 5C to generate the third and final cost, on the basis of corresponding variables specifically generated in Steps 5A through 5C.
-
5A.
Set cost post-death equal to zero (i.e., set Y2 = Y3 = 0 for those who die after the first interval, and set Y3 = 0 for those who die after the second interval). This is merely a computational bookkeeping trick to avoid the need to monitor the time at which participants die in the simulation of the cost outcomes and confounders. Then, define total cost by summing Y1, Y2, and Y3 within each of the Ng simulated observations. The sample mean is an estimate of for the treatment regime .
Steps 1 and 2 need only be performed once; Steps 3 through 6 can be cycled through for each comparator treatment. The nonparametric bootstrap can be used to conduct inference. This entails first selecting an appropriate number of replicates, B (e.g., B = 500). For b = 1, . . . ,B, a data set of N subjects is created by resampling subjects from the original data, with replacement. Steps 1 through 6 are then implemented in each resampled data set. Interval estimates can be obtained using the standard deviation of the realizations of .
There are twenty-seven possible comparator treatment regimes (3 treatment categories and 3 intervals), many of which may not be of clinical importance. The data were generated such that receiving no treatment was markedly less safe than RT or CT (as evidenced by comparing risks of death), and so the appropriate course of action in a cost analysis would be not to give weight to hypothetical regimes involving no treatment. For the purposes of illustration, though, we consider the scenario in which participants receive no treatment alongside the eight hypothetical treatment patterns involving RT and CT treatments. Table II provides estimates, standard errors, and 95% confidence intervals for mean cost under each treatment regime considered. Figure 2 depicts the point estimates and 95% confidence intervals across the treatment patterns of most interest (Regimes 2 through 9).
Table II:
Summary of estimated marginal means, bootstrap standard errors, and resulting 95% confidence intervals under nine hypothetical treatment regimes from data example. To highlight the importance of considering cost data alongside clinically important outcomes, the simulated cumulative risk of death is provided.
Marginal Cost | Death Risk (%) | |||||
---|---|---|---|---|---|---|
# | Est. | 95% CI | Est. | 95% CI | ||
1 | (None, None, None) | $34.3k | $0.64k | [$33.0k, $35.5k] | 25.4 | [22.6, 27.9] |
2 | (RT, RT, RT) | $65.6k | $0.62k | [$64.3k, $66.8k] | 19.3 | [17.5, 20.6] |
3 | (CT, CT, CT) | $62.8k | $0.70k | [$61.4k, $64.1k] | 16.3 | [14.3, 18.2] |
4 | (RT, CT, RT) | $63.5k | $1.14k | [$61.3k, $65.8k] | 21.0 | [17.4, 29.7] |
5 | (CT, RT, CT) | $64.8k | $0.66k | [$63.6k, $66.1k] | 15.1 | [13.9, 18.2] |
6 | (CT, CT, RT) | $66.9k | $1.02k | [$64.8k, $68.9k] | 19.2 | [17.6, 20.6] |
7 | (RT, RT, CT) | $61.6k | $0.96k | [$59.7k, $63.4k] | 16.2 | [14.3, 18.2] |
8 | (RT, CT, CT) | $64.6k | $1.55k | [$61.5k, $67.6k] | 21.2 | [17.3, 29.6] |
9 | (CT, RT, RT) | $63.5k | $1.14k | [$61.3k, $65.7k] | 15.1 | [13.1, 18.3] |
FIGURE 2.
Point estimates and 95% Wald‐based confidence intervals for the marginal mean cost under eight of the comparator regimes considered in Section 4.
That lack of treatment yields the lowest average cost is attributable to high costs associated with RT and CT, to higher death rates in individuals lacking treatment, and to the dependence structure surrounding confounders, treatment, and cost. Among the remaining treatment patterns, receiving two rounds of RT, followed by CT, (Regime 7) yields the lowest average cost. While differences between regimes may seem small, a total yearly savings can be predicted when comparing two hypothetical regimes by multiplying their mean cost difference by the disease incidence. In the United States, there are approximately 61,000 incident cases of endometrial cancer each year; if we imagine that this data came from women with endometrial cancer, then Regime 7 results in over $73M per year in healthcare savings over Regime 3, based on the difference of $1.2k in total medical costs per individual.
The focus of this work is to highlight the nested g-computation framework to compare mean costs (and not, in particular, cost effectiveness). Given, though, that differential death risk can be at least partially responsible for cost differences, we must underscore the importance of performing cost analysis in a way that is not misleading. Table II provides estimated cumulative risks of death and 95% confidence intervals by treatment group. Comparing Regimes 2 and 7, we conclude that Regime 7 enjoys both a lower associated cost and a lower risk of death. This type of synergy is a prototypically ideal scenario in a two-group comparison. However, conclusions regarding cost and safety may be at odds with each other (e.g., Regime 7 has a lower average cost than Regime 5, but a higher associated risk of death). Advice regarding how to proceed in policy development in such unfortunate settings cannot be generalized without further insights into clinical effectiveness and other factors.
5. Comparison of Methods
We seek to compare the inverse probability weighted (IPW) approaches described Section 2 to the nested g-formula. First, we seek to explore ways in which data features can give rise to discrepancies between the ITT and JC effects. We further seek to highlight the advantages and disadvantages of each approach to estimate their respective targets of inference.
5.1. Simulation study
We conduct a simulation study to compare the regression approaches of Lin (2000) to the nested g-formula. The data generation mechanism is similar in spirit to that of Section 4, with two treatment categories at each time. The comparator treatment regimes of interest are defined to be and . Hence, , and define the effects of interest.
In the first scenario, treatment was generated with very high correlation between adjacent treatment pairs. Initial costs were taken to be substantively higher for treatment “1,” but only modestly higher at follow-up times. In Scenario 1, the ITT and JC effects are both of relatively comparable magnitude and of the same sign. In the second and third scenarios, treatment assignment was still positively correlated between adjacent pairs, although slightly less so than in the first scenario. In Scenario 2, initial costs were markedly higher for treatment “0,” but modestly lower at follow up times, resulting in a null JC effect but a modest ITT effect. In Scenario 3, initial costs were modestly higher for treatment “0,” but also modestly lower at follow-up times, resulting in a null ITT effect but a modest JC effect. In the final scenario, the first and second treatments were highly positively correlated, though participants were modestly likely to switch medications by the third observation; initial costs were markedly higher for treatment “0,” but markedly higher for treatment “1” at follow-up. This results in moderate ITT and JC effects in opposite directions. The risk of death was generated comparably across the four scenarios, with approximately 30% of subjects dying before study completion. Supplementary Table A in the Supporting Information (Appendix III) provides a summary of specific simulation characteristics.
One-thousand simulations were conducted under sample sizes of N = 500 and N = 1,000, with Ng = 100,000 g-computation iterations and B = 100 bootstrap replicates. Results are summarized in Table III (N = 500); Supplementary Table B of the Supporting Information (Appendix III) presents results when N = 1,000. Results from the IP and CC methods were across the board, comparable, with the former yielding slightly greater efficiency, as expected. Hence, we report results only from the IP method and the nested g-formula.
Table III:
Simulation results comparing differences in raw cost between the nested g-formula and Lin’s IP approach for N = 500. Note that the different treatment effects are given as follows: , and . Bias and coverage are taken to be with respect to the target of inference within each method.
Lin (Interval Partition) |
Nested g-formula |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
Scenario | ΔITT | Est. | MCSE | CP | ΔJC | Est. | MCSE | CP | ||
1 | 5.93 | 5.94 | 0.89 | 0.88 | 0.947 | 6.43 | 6.45 | 0.65 | 0.63 | 0.949 |
2 | −2.93 | −2.37 | 0.95 | 0.93 | 0.897 | 0.00 | 0.028 | 0.63 | 0.67 | 0.958 |
3 | 0.00 | 0.40 | 0.98 | 0.94 | 0.918 | 2.27 | 2.29 | 0.59 | 0.59 | 0.955 |
4 | −2.39 | −2.13 | 0.98 | 0.96 | 0.933 | 3.63 | 3.65 | 0.70 | 0.68 | 0.937 |
Importantly, both approaches perform well, showing relatively low bias. The modest levels of bias and loss of coverage seen in the ITT approaches are likely attributable to the fact that censoring and death risk are time-varying. That is to say that the main conclusions of this simulation are not that one approach is inherently better than the other in terms of bias. Rather, these two approaches target different parameters and are effective at identifying them. This simulation demonstrates that under realistic hypothetical scenarios, these parameters may have either comparable or markedly different values, depending upon the timing of cost accumulation and the tendency of participants to switch in and out of medications (among other factors). This has important consequences in that care must be taken to identify the target effect of interest before conducting an analysis of cost data. The choice of a target effect will in turn dictate which of the two frameworks should be used.
5.2. Descriptive comparison
Though identifying a suitable target of inference should be scientifically motivated, it is important to be aware of the relative advantages and limitations of each existing framework. Table IV summarizes the properties of the IPW-based regression methods for estimation of ITT effects and the nested g-computation procedure for estimation of JC effects.
Table IV:
Descriptive comparison of the Bang & Tsiatis and Nested g-formula approaches.
IPW-based regression | Nested g-formula | |
---|---|---|
Type of effect targeted | Intent-to-treat | Joint causal |
Computation time | Fast | Slow |
Marginal means? | Sometimes | Always |
Model cost distribution? | Sometimes | Always |
Model confounder distribution? | Never | Always |
Model treatment assignment? | Sometimes | Never |
Model censoring mechanism? | Always | Never |
Interval partition required? | No | Yes |
No unmeasured confounding? | Yes | Yes |
Censoring risk | Time-stable | Time-updating |
Death risk | Time-stable | Time-updating |
The nested g-formula is computationally intensive, requiring multiple models to be fit to baseline/follow-up data and subsequent generation of data from those estimated models. Since there generally is no closed-form expression for Var(ΔJC), the bootstrap is typically employed for inferential procedures, further compounding this issue. However, much of the computing can be parallelized to mitigate potential challenges involving run time.
The nested g-formula separately estimates marginal means within comparator treatment regimes; in this way, it is straightforward to implement a suitable contrast (e.g., a difference or a ratio) to define a target joint causal effect of interest. When using the IPW-based regression approaches, the target contrast is tied directly to the choice of a link function in the cost outcome model. For example, exponentiated coefficients from log-linear regression models correspond to mean ratios rather than differences. In this regard, the nested g-formula is more flexible since the choice of a link function can be more data-driven.
IPW-based regression approaches are robust to misspecification of the cost model when the normal OLS equations are used, though distributional assumptions are sometimes made for potential efficiency gains. The nested g-formula requires specification of models for confounders and costs, though there are ways to do so flexibly that are not the subject of this paper. On the other hand, the nested g-formula does not require the censoring mechanism or the treatment assignment to be modeled. In the context of generating survival weights, a direct relationship between death and censoring is implied by the fact that those who die are exactly the ones who are not censored. Furthermore, a treatment model must be posited for the IPTW-based extension proposed by Li (2016), which is essential to recover marginal means in the setting of non-collapsible link functions or when there are many confounders.
The IPW-based regression approaches may optionally exploit an interval partition to gain efficiency. The nested g-formula, on the other hand, requires repeated cost measures if any participants die or are censored during the course of the study, so that reasonable assumptions regarding censoring and death can be made. Therefore, if interval-based cost data are not available, it may not even be possible to implement the nested g-formula.
Aside from considerations regarding the target of inference, the major advantage to the nested g-formula is that it allows the censoring and death risk to be time-varying based on observed covariates. To date, the use of time-updating censoring and death risks has not been explored in the cotext of cost outcomes; this merits further exploration for settings in which ITT effects are of interest.
6. Discussion
We have presented the nested g-computation procedure as an approach to analyze censored cost outcomes. Through a simulated illustrative data example and a simulation study, we have drawn contrasts between the existing work of Bang & Tsiatis (2000) and Lin (2000); and the causal approach of Spieker et al. (2017).
The simulated example underscores the importance of considering cost analyses in proper clinical context. We are currently developing methods that seek to jointly summarize treatment efficacy/safety outcomes and marginal cost outcomes for the purposes of cost-effectiveness analyses. This example further illustrates that, while more computationally intensive than existing IPW-based regression models, the nested g-formula can be implemented by fitting a group of reasonably simple models. We note that different distributions can be posited for confounders and cost outcomes than the ones used in this example. Additionally, in settings where there are multiple confounders, the joint density of the confounders must be estimated.
Of note is that it is not generally appropriate to consider a whole host of treatment regimes as we have in the data example. Doing so induces a problem of multiple comparisons in which inferential procedures become compromised. Though methods exist to ameliorate such challenges (e.g., the Bonferroni correction), the number of potential treatment patterns increases multiplicatively with the number of intervals. Measuring variables over one year on a monthly basis, for example, yields over five-hundred-thousand comparator treatment regimes. Ignoring computational infeasibility, applying a correction as conservative as that proposed by Bonferroni would almost assuredly preclude the possibility of detecting an association in any modern analytic data set. This problem has been documented; some have suggested the use of marginal structural models in which a relationship is posited between and (Daniel et al., 2013). However, this example demonstrates that a more parsimonious model (e.g., one that assumes that treatment ordering is irrelevant) may lead to bias. As an alternative to restrictive assumptions induced by some marginal structural models, we propose first limiting comparator regimes (a priori) to those among current standard of care. Selecting comparator treatment regimes outside the scope of existing clinical best practices for the purposes of searching for a minimal-cost treatment regime is generally inappropriate. Moreover, the g-computation framework is often considered most useful for constructing point estimates and confidence intervals, rather than for inferential procedures, as it suffers from the g-null paradox whereby nominal levels are not achieved unless all parametric assumptions are met.
Our simulation study highlights the importance of selecting between the IPW-based regression framework and the nested g-formula on the basis of the desired target of inference. ITT and JC effects are two components of a complete picture regarding the impact of a treatment or therapy. In some cases, they may lead to comparable conclusions regarding contrasts in mean cost (as in Scenario 1), but in other cases they may be seemingly at odds with each other (Scenarios 2, 3, and 4).
ITT effects may be of greater utility at a subject-specific level, or when seeking to provide treatment recommendations to particular subgroups of interest. For example, an individual with a short life expectancy post-baseline may be interested in understanding differences in costs based only on his or her baseline treatment (i.e., irrespective of any hypothetical subsequent changes), rather than differences in costs under the theoretical, likely counterfactual setting in which he or she lives long enough to adhere to a full hypothetical treatment regime.
However, when guidance is lacking and there exist no clearly defined policies, health economists and policy makers may benefit from comparing marginal means across comparator hypothetical treatment regimes. Such comparisons provide direct insights into contrasts between medical costs on a population level and therefore can be used to provide objective measures of long-term cost savings. Especially given the growing costs associated with patient care and the uncertainty regarding the future of health insurance structures in the USA, well conducted studies with repeated cost measures will play a crucial role going forward in effecting informed, responsible policy decisions.
Supplementary Material
Acknowledgments
This study was supported by NIH Grant R01 GM 112327
Footnotes
The authors have no conflicts of interest to declare.
References
- Bang H and Tsiatis AA (2000). Estimating medical costs from censored data. Biometrika, 87(2):329–343. [Google Scholar]
- Daniel R, Cousens S, Stavola BD, Kenward M, and Sterne J (2013). Methods for dealing with time-dependent confounding. Statistics in Medicine, 32(9):1584–1618. [DOI] [PubMed] [Google Scholar]
- Davison A and Hinkley D (1997). Bootstrap Methods and their Application. Cambridge University Press, 1st edition. [Google Scholar]
- Li J, Handorf E, Bekelman J, and Mitra N (2016). Propensity score and doubly robust methods for estimating the effect of treatment on censored cost. Statistics in Medicine, 35(12):1985–1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin D (2000). Linear regression analysis of censored medical costs. Biostatistics, 1(1):35–47. [DOI] [PubMed] [Google Scholar]
- Lin D (2003). Regression analysis of incomplete medical cost data. Statistics in Medicine, 22(7):1181–1200. [DOI] [PubMed] [Google Scholar]
- Nikolova S and Stearns S (2014). The impact of CHIP premium increases on insurance outcomes among CHIP eligible children. BMC Health Services Research, 14(101). doi: 10.1186/1472-6963-14-101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robins J (1986). A new approach to causal inference in mortality studies with a sustained exposure period–application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9–12):1393–1512. [Google Scholar]
- Rubin D (1978). Bayesian inference for causal effects: the role of randomization. The Annals of Statistics, 6(1):34–58. [Google Scholar]
- Spieker AJ, Oganisian A, Ko E, Roy JA, and Mitra N (2017). A causal approach to analysis of censored medical costs in the presence of time-varying treatment. arXiv:1705.08742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thorpe KE (2005). The rise in health care spending and what to do about it. Health Affairs, 24(6):1436–1445. [DOI] [PubMed] [Google Scholar]
- Tsiatis AA, Davidian M, Zhang M, and Lu, Xiaomin. (2008). Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Statistics in Medicine, 37(23):4658–4677. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.