Abstract
Big data and (deep) machine learning have been ambitious tools in digital medicine, but these tools focus mainly on association. Intervention in medicine is about the causal effects. The average treatment effect has long been studied as a measure of causal effect, assuming that all populations have the same effect size. However, no “one-size-fits-all” treatment seems to work in some complex diseases. Treatment effects may vary by patient. Estimating heterogeneous treatment effects (HTE) may have a high impact on developing personalized treatment. Lots of advanced machine learning models for estimating HTE have emerged in recent years, but there has been limited translational research into the real-world healthcare domain. To fill the gap, we reviewed and compared eleven recent HTE estimation methodologies, including meta-learner, representation learning models, and tree-based models. We performed a comprehensive benchmark experiment based on nationwide healthcare claim data with application to Alzheimer’s disease drug repurposing. We provided some challenges and opportunities in HTE estimation analysis in the healthcare domain to close the gap between innovative HTE models and deployment to real-world healthcare problems.
Keywords: Causal inference, Target trial, Conditional average treatment effect, Drug development, Deep learning, Machine learning
1. Introduction
Causal inference discovers a cause of an effect. Although randomized experiments (e.g., randomized clinical trials, A/B test) are a de facto gold standard to identify causation, they are sometimes economically infeasible or unethical if intervention harms subjects [1]. The treatment effect estimation using observational data (e.g., real-world data) is an alternative strategy to emulate the randomized experiments and infer the causation. However, observation inevitably contains bias. A confounding variable is a variable that influences exposure to the treatment and outcomes(Fig. 1A). It is one of the major sources of bias that can mislead us to draw a wrong conclusion that the treatment has effects on the outcome when it does not.[2] A statistical approach to reducing such bias in observational data for treatment effect estimation has long been studied in multiple disciplines. For example, a target trial framework in epidemiology and biostatistics has been focused on hypothesis testing to infer the average treatment effect by adjusting the confounders via matching or weighting [3–12] (Fig. 1Bc, 1Bd).
Fig. 1.
Illustrations of treatment effects analysis in the medical science field. A. An example of a causal relationship. B. Estimating causal treatment effects from real-world data under the Neyman-Rubin framework. (a) Subjects in RCTs are randomly assigned to a treatment group and a control group, thus the subjects in both groups have similar characteristics. (b) Subjects in real-world data are not randomly assigned to a treatment group and control group due to disease indication. (c) Matching subjects in each group can reduce bias [13]. (d) Weighting subjects by their propensity for treatment can create a comparable pseudo population [14,15] (Details described in S.1.1). (e) Neyman-Rubin causal effect calculation. C. Heterogeneous treatment effects vs. Average treatment effect. Patients are diverse and treatment effects vary. Estimating the average treatment effect (ATE) may oversimplify the heterogeneity of each patient.
However, patients are diverse and treatment effects vary. Decades of drug development in complex diseases have shown that there is no “one-size-fits-all” treatment [16]. Estimating the average treatment effect (ATE) may oversimplify the heterogeneity of each patient. The need for personalized treatment is tremendous. Therefore, it is important to estimate treatment effects for each individual or similar subgroups of patients, which is the so-called heterogeneous treatment effect (HTE) (Fig. 1C). The HTE estimation has transformative potential in personalized medicine by respecting the disease and patients’ heterogeneity.
The HTE estimation is recently gaining attention in econometrics [17] (e.g. uplift modeling), and machine learning (non-parametric HTE) [18], but is rarely investigated in the computational medicine area [19]. To close the gap in this translational effort, we review and compare some recent HTE methodologies and perform benchmark experiments to test the feasibility of the methodologies in emulating clinical trials for personalized treatment development. Our benchmark experiments use nationwide electronic health records with ~ 60 M patients in the US under the target trial protocol. Our scope is within a translational biomedical research perspective, particularly with digital health. This review and benchmark paper adapts notations and naming strategies from various sources including econml [20], causalml [21], Künzel et al. [22], and Bica et al. [19]. For a theoretical and methodological comparison, see [23–25]. For an econometric perspective, see [26]. For a clinical pharmacology perspective, see [19].
2. Preliminaries
2.1. Potential outcome framework
In this paper, we investigate models built under Neyman-Rubin’s potential outcome framework [27,28]. Suppose that we have N subjects (i = 1, ⋯, N) with the feature X. For each patient, T denotes treatment assignment; T = 1 if the subject is in the treatment group t with the potential outcome Y(1), and T = 0 if the subject is in the control (placebo) group with the potential outcome Y(0). The Neyman-Rubin framework is to estimate the treatment effect of treatment t given subject feature X (Fig. 1Be) by
(1) |
The potential outcome framework requires several assumptions:
Strong ignorability (or exchangeability). We assume no unobserved confounders exist or we observe all the variables X affecting treatment assignment T and outcomes Y, i.e., Y⊥T|X, [27,29]. Take the language of real-world drug administration as an example, we observe a sufficient set of confounding variables on patient characteristics that determine the outcome and administration of drugs.
Positivity. The probability, P(T|X), of receiving the treatment is not deterministic. i.e., 0 < P(T|X) < 1 [30]. For example, a patient’s feature X does not 100 % guarantee the onset of a particular specific drug.
Stable Unit Treatment Value Assumption. Each subject’s potential outcomes remain the same regardless of what treatment the other subjects receive (no interference between the subjects).[27] For example, a patient taking a drug does not affect other patients’ choice of drug.
2.2. Counterfactual outcome
A fundamental challenge of the treatment effect estimation is that it is impossible to observe Y(1) and Y(0) simultaneously. We call the outcome that the subject has a factual outcome and the other hypothetical outcomes in an alternative situation a counterfactual outcome. A common approach to address this missing counterfactual outcome is to calculate the average of the two potential outcomes in the treatment group and control group separately after randomization. We estimate the average treatment effect (ATE) of treatment by E[Y(1)] − E[Y(0)] if the treatment assignment is randomized. However, in real-world observational data, patients take drugs based on indication, not at random. The patients exposed to the drugs and those not exposed to the drugs are not equivalent. The confounding variable creates a selection bias between the treated and untreated. Several techniques to adjust the confounding variables use propensity scores e(x) = P(T|X), such as matching (Fig. 1Bc), stratification, and inverse weighting (Fig. 1Bd), doubly robust estimations [31], or conditional independence (g-estimation) [32]. More details in Supplementary Material S.1.1.
3. Heterogeneous treatment effects
3.1. Overview
Heterogeneous treatment effect estimation is to quantify individual or subgroups’ treatment effect by accounting for the heterogeneity of patient’s conditions to outcome while reducing selection bias. The HTE of treatment T ∈ {1, 0} given the patient’s condition X is formulated as Eqn 1, which varies based on the subject’s features X. A general step to estimate the HTE is to first learn the “nuisance” or “context” (likelihood of being exposed to treatment T and expected values of outcomes Y at given subject X) via arbitrary supervised models (Fig. 2A and Fig. 2B). Then it estimates the HTE by learning a coefficient of a structural equation model or imputing missing counterfactual outcomes in non-parametric methods (Fig. 2C). Many HTE estimation methods have been proposed [18,22,33–37]. Among the models, double machine learning, meta-learners, representation learning, and causal forests are the most popular methods and are flexible to be implemented in many scenarios. We will focus on these three methods, and they were summarized in Table 1.
Fig. 2.
A toy example of heterogeneous treatment effects with high-dimensional features. Based on the subject’s feature X, the subject can be exposed to treatment T or not (i.e.,E[T|X]) and has different levels of outcomes (i.e.,E[Y|X]). The treatment effects also vary based on these different contexts or nuisances stemming from the subject’s features.
Table 1.
Comparison of HTE estimation methods.
Models | Characteristics | Treatment type | How is the treatment assignment incorporated? | How is the selection bias handled? |
---|---|---|---|---|
Double machine learning [33] | Learn HTE by the coefficient of a partially linear structural equation Widely used in the econometric community | Continuous, discrete | T is a function of X to learn. | T is treated as a function of X to learn. |
Single learner | Learn single outcome prediction model for both the treated and untreated group (E[Y|X,T]) The treated and untreated group share the same regression models | Continuous, discrete | T is a variable (together with X) to predict outcome | Not considered |
Two-learner | Learn multiple outcome prediction models for each treatment group. | Discrete | Separate outcome prediction model based on T | Not considered |
X-learner [22] | T learner with additional treatment effect estimation regression models for each of the treated and untreated groups. Handle data size imbalance in the treated and untreated group | Discrete | Separate outcome prediction model based on T | Weighted average of each treatment effect estimation based on propensity scores |
BNN [34] | Minimize the distribution distance between the transformed representation of the treated and untreated | Binary | Not considered | Covariate shift |
TARNET [18] | BNN with multi-task heads for each treatment group, without covariate shift | Binary | Separate potential outcome prediction layers on each T | Not considered |
CFR [38] | BNN with multi-task heads for each treatment group | Binary | Separate potential outcome prediction layers on each T | Covariate shift |
DR-CFR [35] | Separate representations for propensity score and outcomes prediction | Binary | Separate potential outcome prediction layers on each T | Covariate shift |
SCIGAN [36] | Estimate counterfactual outcomes via generative adversarial networks. | Continuous, discrete | T is a variable to predict dose-dependent outcome | Covariate shift |
Dragonnet [37] | Predict Y using X that are predictive of T (Sufficiency of Propensity Scores). Regularize the outcome prediction to satisfy augmented IPTW estimator | Discrete | Separate potential outcome prediction layers on each T | Learn neural network to predict outcome satisfying augmented IPTW estimator |
Double sample causal tree [39] | Use an ‘honest’ method to grow a tree. The splitting criterion is to maximize the variance of estimated treatment effects among samples in the training subset. | Discrete | Not considered; only for randomized data, in which treatment assignment is fully randomized | Honest splitting criterion and cross-validation |
Causal forest [40] | Using multiple causal trees to get predictions of treatment effects (ensembling). Point estimates from the model are asymptotically normal and unbiased, and so allow for confident intervals to be calculated. | Discrete | Not considered; only for randomized data, in which treatment assignment is fully randomized | Honest splitting criterion and cross-validation |
3.2. Double machine learning
Double machine learning shows us how the parametric causal inference evolves to semi-parametric causal inference with machine learning. A challenge that classical causal inference faces are the high-dimensional and complex nuisance parameter around the causal estimates. The complex nuisance parameters are often difficult to be estimated with a classical semi-parametric framework (e.g., 10 binary features mean 210 combinations of features to test whether the treatment effects are distinctive from the others). To address the high dimensionality of features, Chernozhukov proposes to replace semi-parametric nuisance with arbitrary supervised machine learning models, which is the so-called Double machine learning [33]. The Double machine learning architecture is composed of two arbitrary supervised machine learning models which are used to estimate the conditional probability of taking the treatment and the conditional expectation of the outcome respectively. Then one can estimate the HTE by fitting a regression model (Details in Supplementary material S.1.2). The architecture was later generalized to R-learner and was proved to have an asymptotic error rate.
3.3. Meta-learners
Meta-learners are one of the nonparametric HTE estimation methods that treat the HTE estimation for discrete treatment as missing counterfactual outcome imputation. They decompose the HTE estimation into several sub-regression problems that any arbitrary supervised machine learning model can be utilized. We follow the naming strategy introduced by Künzel et al.[22].
Single-learner (or S-learner), the most basic method in meta-learner, treats the treatment assignment variable T as just another feature (in addition to X) and builds a “single” supervised model to estimate the outcomes Y (Fig. 3Aa).
Two-learner (T-learner) fits separate regression models for treatment and control groups respectively (Fig. 3Ab). The advantage of T-learner over S-learner is that T-learner performs relatively better when there is not much similarity between the outcome given treatment and the outcome given the control [22].
X-learner is a variant of the T-learner with extra steps to separate the HTE functions (Fig. 3Ac).[22] It is named after “X”-like shaped use of training data for counterfactual outcome estimation. The main advantage of X-learner over T-learner is that X-learner separates the treatment effect regression of the treatment and placebo group so that the HTE estimators can capture information about their differences. This strategy is beneficial when the number of subjects treated and untreated is not balanced.
Fig. 3.
Illustrations of different architectures of HTE models (A) A toy example to compare meta-learners in Künzel et al.[22] Metalearners decompose the HTE estimation into several sub-regression problems that any arbitrary supervised machine learning model can be utilized. (B) Illustration of covariate shift problem and counterfactual regression network [18,38]. Balanced representation learning via covariate shift using neural networks [18,38] with the treatment-invariant representation of a patient’s feature X.
Despite its flexibility and simple intuition, the meta-learner has several limitations. The meta-learners are non-parametric models with full flexibility to select any arbitrary supervised machine learning model, making it difficult to obtain a valid confidence interval. Also, the meta-learners are only available with discrete treatment. With multiple discrete treatments, they require a model for each treatment, posing extra computations. For more details, see [22] and some open-source implementations, causalml [41], and econml [42].
3.4. Representation learning
Representation learning based on deep neural networks has also been actively used in nonparametric HTE estimation. Recent works have focused on learning a covariate shift function by which feature representation of the treated and untreated follows a similar distribution (Fig. 3B) and studied the trade-off between balance and predictive power of such algorithms. BNN [38], CFRNet [38], and TARNet [18] propose a family of algorithms to predict HTE using the balanced representations learned from observational data (Fig. 3B), with the last two providing a bound for the HTE estimation error. Their theoretical approaches are also applied to many synthetic and real-world datasets [38].
Specifically, the balanced learning approach finds a representation ϕ : X→R and treatment-specific head h1 and h0, that will minimize the evaluation measure PEHE:
(2) |
where . It utilizes a loss function that is lower bounded by PEHE to train the model; the loss function consists of the sum of the expected factual treated and control losses for outcome regression and distance of ϕ(x) given T = 1 and T = 0 for covariate shift (Fig. 3B).
3.5. Tree-based methods
Another class of models is a tree-based model [39,40,43]. Tree-based models, including causal trees and causal forests, are nonparametric models that use recursively splitting criteria to find subgroups in which the sub-samples can be viewed as from randomized experiments and the divergence between outcomes of treatment and control groups is maximized. One characteristic of tree-based models is that they keep good asymptotic properties, and thus allow users to conduct solid statistical inference about the point estimators from these models. In addition, tree-based models can provide generated rules for further interpretation and external validation [44].
Double sample causal tree, one of the most representative tree methods, uses an ‘honest’ estimation method to grow a tree. It splits the training random sub-samples into two parts, one is used for predicting outcomes, and the other part is used to find the split for the node. The ‘honest’ mean that with sample i, one can use the response Yi for estimating treatment effects within the leaf or for finding the split but cannot use it for both. Propensity tree is another method that incorporates the estimation of propensity scores in the model to adjust for confounders in observational datasets [40].
Causal forest generates ensembles of many causal trees and estimates treatment effects by averaging predictions of the ensembled trees. By aggregating results from many trees, the causal forest thus can provide more robust estimators and smooth decision rules.
3.6. Methodology comparison
The HTE estimation methods we discussed have their strengths and weaknesses. Double machine learning utilizes structural equations to model the discrete or continuous treatment with confidence intervals. Meta-learners allow users to use any base learner to fit outcome prediction and treatment effect regression, so researchers have the autonomy to choose supervised models that best work for their data. The HTE estimation using representation learning is an end-to-end approach that all tasks (reducing selection bias, predicting potential outcomes, and calculating treatment effect) are seamlessly connected in one neural network framework. Thanks to the neural network’s flexibility as a function approximator, various modeling hypotheses (e.g., multi-head for respective treatments [18], disentanglements of representation [35].) can be incorporated and tested.
Let us compare the methods based on three criteria: treatment type, treatment assignment, and selection bias (Table 1). For the treatment type, T-learner and X-learner handle binary treatment, but it’s straightforward to extend it to multiple discrete treatments. In contrast, BNN, CFRNet, and TARNet assume minimizing the distance between two treatment groups, thus requiring more computational challenge when extending the binary treatment to multiple discrete or continuous treatments. Instead, Double machine learning and S-learner can incorporate continuous treatment. For the treatment assignment variable, most methods for discrete treatments adopt separate outcome prediction (i.e., either neural network layers/heads or an independent prediction model) to handle the different potential outcomes. This setting is advantageous when the size of treatment groups is not balanced. For selection bias handling, covariate shift is the main approach in representation learning [18]; covariate selection by propensity scores was also investigated. Weighting by propensity scores was also widely used [22].
3.7. Evaluation metric
Directly evaluating the accuracy of the estimated HTE is challenging because the ground-truth treatment effect is never observed in data; randomized experiments are the only method to obtain the ground truth. Researchers have used several indirect measurements: robustness and estimated goodness-of-fit.
Robustness: To evaluate whether the model’s estimation is robust to different data, one can train the same model on training data and test data respectively, and then calculated the “estimated” root mean squared error (ERMSE) of the estimated treatment effects from training data and test data. High robustness (or low ERMSE) means the model consistently generates a similar HTE estimation regardless of the input data, which supports the validity of the estimation.
Estimated goodness-of-fit: To evaluate how accurately the models can predict HTE, the precision of estimating heterogeneous effects (PEHE) is a direct metric for this goal [45]. The PEHE is defined as the difference between the true HTE and the estimated HTE (Eq. 2). As the true HTE is never observed, Alaa and van der Schaar proposed the influence function-PEHE (IF-PEHE) that approximates the true PEHE by “derivatives” of the PEHE function, not directly relying on the unobserved counterfactual outcomes [46]. The approximation is composed of two parts: a plugin estimate and an influence function used to compensate for bias. Using a well-designed plugin estimate makes it easier to train and gives a partial guess on the true PEHE; the remaining bias from the plugin estimate is approximated by the influence function, which is analogous to the derivatives of a function in standard calculus. See details in the paper [46] or Supplementary Material S.1.4.
4. Benchmark experiments
We tested the feasibility of the HTE estimation in identifying personalized drug effectiveness. We focused on a task to emulate a randomized clinical trial (NCT03991988) [47] that tests the treatment effect of Montelukast on treating AD. Our emulation is based on a target trial framework[11] that mimics the actual trial’s eligibility criteria, treatment strategy, and observation period from nationwide large-scale claim data. From the emulated randomized clinical trial, we aim to estimate the HTE of Montelukast (as well as other anti-asthma drugs) in reducing AD risk and ultimately evaluate the feasibility of the HTE estimation as a new tool for identifying personalized drug effectiveness.
4.1. Background in AD and Montelukast
AD is a progressive neurologic disorder with the death of brain cells. Chronic inflammation is known to be linked to AD [48,49]. Montelukast is an anti-inflammatory drug for chronic inflammation, such as asthma. Some researchers speculate that Montelukast might reduce AD risk, not only by reducing chronic peripheral inflammation but also by targeting the leukotriene signaling pathway, which mediates various aspects of AD pathologies [49]. An ongoing Phase II trial is investigating the treatment effect of Montelukast on AD [47]. However, the average cost of such trials is ~ 13 million dollars in the U.S. [1], and with 99 % failure rates in AD treatment trials [49]. Emulating the RCTs via treatment effect estimation before the actual trials may offer a cost-effective and safe tool to reduce the failure rates. However, there are several confounding variables in observational data. If Montelukast and AD turn out to have a significant correlation or association, the high association may or may not be due to the direct preventive effects of Montelukast on AD. It may be due to the anti-inflammatory effects of Montelukast on reducing chronic inflammation. In addition, AD has various pathologies developing clinical symptoms (such as via neuroinflammation, and protein misfolding). The treatment effects of Montelukast on AD vary by the pathology one patient has.
4.2. Experimental setting
We defined eligibility criteria, standard-of-care arm, follow-up period, and outcome based on NCT03991988 [47] (Table 2, Supplementary Material S.2.1).
Table 2.
Comparison of our experiment setting and the actual trial NCT03991988 [47].
Randomized clinical trial on Montelukast (NCT03991988) | Our target trial design | |
---|---|---|
Aim | Assess the effects of Montelukast initiation on amyloid and tau accumulation and cognitive symptoms of prodromal AD. | Assess the effect of one type of anti-asthma drug on AD onset prevention from routine clinical care setting in real-world data (2007–2020) |
Eligibility criteria |
|
|
Treatment strategies |
Treatment arm:
|
Treatment arm:
|
Assignment procedure | Random assignment | Reduce selection bias computationally (Propensity score, covariate shift in balanced learning) |
Follow-up period | 2019.09.25 – 2022.10 | Follow-up starts at first records meeting eligibility criteria Follow-up ends at ADRD onset or last observation, whichever occurs first |
Outcome |
|
AD and related dementia onset (PheWas dx codes and medication codes) |
Montelukast is a cysteinyl leukotriene type 1 (CycsLT-1) receptor antagonist to treat asthma and allergy symptoms. Animal model studies show Montelukast’s efficacy in reducing amyloid-beta toxicity and neuroinflammation [50,51]. A retrospective study reports an association between Montelukast and reduced uses of dementia medicine in older adults, compared to other anti-asthma drugs [52]. Two Phase 2 clinical trials (NCT03402503, NCT03991988) are ongoing to test the effectiveness of Montelukast on AD’s neuropsychological progression.
To emulate clinical trials, we used real-world patient drug claim data. Claim data capture routine clinical care, although there are ongoing debates about whether such real-world healthcare administrative data (including electronic health records and claim data) are suitable data sources to infer treatment effects [53]. We used the Optum Clinformatics® Data Mart subscribed by UTHealth. It comprises administrative health claims from Jan 2007 to June 2020 for commercial and Medicare Advantage health plan members. These administrative claims are submitted for payment by providers and pharmacies and are verified, adjudicated, adjusted, and de-identified before inclusion in Clinformatics ® Data Mart. It contains 6.5 billion claims with 7.6 billion diagnosis codes (in ICD 9/10), 2.7 billion medication codes, and 2.5 billion lab results.
We defined the study treatment as leukotriene receptor antagonist (Montelukast, Zileuton, Zafirlukast, and Pranlukast) and standard-of-care treatment as remaining active anti-asthma drugs classes (Beta-2 adrenergic receptor agonist, Anticholinergic drug, Xanthine, and Corticosteroid). See Supplementary Table S1 for specific drug names and their DrugBank ID.
We selected cohorts of older adults meeting the eligibility criteria in NCT03991988 [47]. We included 2,740 patients taking the study treatment and 8,545 patients taking standard-of-care treatments, so the final cohort size was 11,285. We split the subjects into training, test, and validation datasets randomly by 6:2:2 (6,771 for training, 2,257 for validation, and 2,257 for test data).
Potential confounding variables that both affect the exposure to the study treatment and ADRD onset include age when follow-up starts, sex, race, and comorbidities. For comorbidities, we converted ICD9 or ICD10 diagnosis codes into PheWas codes to increase the clinical relevance of the billing codes.[54] PheWas ICD code is a hierarchical grouping of ICD codes based on statistical co-occurrence, code frequency, and human review. A total of 242 PheWas diagnosis codes were included in the dataset. We used the onset of AD as an implicit measure of the increased level of AD risk as the outcome. AD onset was detected as having either an ADRD diagnosis code or medication.
Based on Eq. (1), the HTE is defined as the difference between the estimated AD onset when the patients are intervened to take the study treatment (leukotriene receptor antagonist) and when the patients are intervened to take the control (standard-of-care treatments). A negative HTE value means a reduced AD onset risk due to the study treatment. To evaluate the performance of our HTE estimation model, we measured the robustness (ERMSE) and the estimated goodness-of-fit (IF-PEHE). The detailed plug-in function setting for IF-PEHE is in Supplementary Material S.2.3.
Some HTE estimation methods require a propensity score, the likelihood of a patient to take the study treatment, to weight their estimates and reduce selection bias. We chose the random forest as a propensity score estimation model (Supplementary Material S.2.4). As to estimate propensity score, we used all the features available (the 242 diagnosis codes and three demographics). We checked whether the treatment assignment was at random or affected by other features by calculating the prediction accuracy of the propensity score model (Supplementary Material S.2.4, Table S2).
Meta-learners require several prediction tasks: i) potential outcome classification models for treatment and control groups, respectively, and ii) treatment effect regression (if X-learner and R-learner). We compared logistic regression, random forest, and XGBoost as a choice for each prediction task.
4.3. Results
The benchmark experiments show that most HTE estimation models with low variance (S-learners, DML, DragonNet) had better robustness (low ERMSE) and goodness-of-fit (low IF-PEHE). Particularly, S-learner with the Random Forest as the base learner gave the best HTE estimation in both metrics (Table 3). We further discussed the possible reason of the high performance of the simple model in the Discussion section. The representation learning methods had a low ERMSE and high IF-PEHE generally. Although the representation learning methods were not competitive with the low variance models, the DragonNet performs better in both metrics among representation learning methods, because our sparse and noisy input data can benefit from the sufficiency of the propensity score for causal adjustments. Since it uses only the information relevant to the treatment, it gives better estimates when many covariates influence outcomes but have no effect on the treatment.
Table 3.
Comparison of various HTE estimation models in a target trial for NCT03991988 using nationwide claim data.
Setting | Robustness (ERMSE) | Goodness of fit (IF-PEHE) | ||
---|---|---|---|---|
Meta learners | S learner | Logistic Regression | 0.0225 | 22.12 |
Random Forest | 0.0029 | 9.82 | ||
XGBoost Classifier | 0.0380 | 19.01 | ||
T learner | Logistic Regression | 0.2933 | 155.63 | |
Random Forest | 0.0737 | 21.66 | ||
XGBoosting | 0.2926 | 57.77 | ||
X learner | Outcome learner: Logistic Regression Effect learner: ElasticNet |
0.0705 | 35.78 | |
Outcome learner: Random Forest Classifier Effect learner: Random Forest Regressor |
0.1147 | 22.46 | ||
Outcome learner: XGBoost Classifier Effect learner: XGBoost Regressor |
0.1980 | 79.33 | ||
R learner | Outcome learner: Logistic regression Effect learner: ElasticNet (equivalent to DML) |
0.0110 | 22.61 | |
Outcome learner: XGBoost Classifier Effect learner: XGBoost Regressor |
0.5873 | 385.34 | ||
Balanced representation learning | CFRNet | Representation Learner: Deep Neural Network Hypothesis Learner: Deep Neural Network |
0.0547 | 102.60 |
DRLearner | Representation Learner: Three neural networks, two regression networks for each treatment arm Confounder Learner: two logistic networks to model logging policy, design weights for confounder impact |
0.0597 | 152.99 | |
Representation learning (Others) | DragonNet | Outcome learner: Neural Network Regressor which learns outcome as well as propensity scores. Effect learner: Neural Network Regressor |
0.0401 | 53.57 |
Causal forest | Generalized Random Forest | Single tree estimator: double sample causal tree | 0.0060 | 18.90 |
4.4. Important features in the best model
Using the best model (S-learner with Random Forest), we investigated patients’ features that contribute to the high/low HTE. We calculated feature importance scores in the HTE estimation using the Shapley scores (Fig. 4).[55] The Shapley score measures the average marginal contribution of a feature by excluding the feature over a varying subset of other features. As a result, age at baseline was listed as the second most important factor; it is shown that older patients benefit less from leukotriene receptor antagonists in reducing the risk of AD. On the other hand, leukotriene receptor antagonists have a beneficial treatment effect on patients with disorders of bone and cartilage (PheWas dx: 733) or gout and other crystal arthropathies (PheWas dx: 274). This finding implies that patients with chronic inflammatory bone and joint disorders are more likely to have beneficial treatment effects from leukotriene receptor antagonists to reduce AD risk. Indeed, there is accumulating evidence suggesting significant interplay between peripheral immune activity (e.g., chronic inflammatory bone and joint disorders), blood–brain barrier permeability, microglial activation/proliferation, and AD-related neuroinflammation [56]. In an in vivo study, Montelukast (one type of leukotriene receptor antagonist) inhibits inflammation-induced osteoclastogenesis in the calvarial model [57]. This finding requires in-depth biological investigation.
Fig. 4.
Feature importance values of the best-performed model (S-learner with Random Forest). A feature with a negative SHAP value means the feature can lead to negative HTE values, implying a reduced AD onset risk due to the study treatment. Features are sorted in decreasing order by importance. Blue points = low feature value; Red points = high feature values.
5. Discussion on the HTE estimation methods
We have reviewed and compared the recent HTE estimation methods to show their feasibility for translational research in biomedicine and drug development. Our extensive benchmark experiments compared several state-of-the-art and popular HTE estimation methodologies such as meta-learners, causal trees, and counterfactual representation learning. As a result, we found that the simple S-learner with Random Forest estimates the HTE most stably with sparse and noisy drug claim data. Possible reasons that the simple model performs best on our sparse claim data are three-fold: i) shared function in treatment and control groups: The treatment and control groups might have similar mechanisms between the features and outcome regardless of being treated with the study treatment or not. Incorporating treatment assignment variables as one of the features and building one shared model for the two groups can help models learn the shared mechanisms in the two groups, particularly when the features are sparse, and data are not rich. ii) Biologically zero treatment effect: If the study treatment has no treatment effect biologically in nature, S-learner is known to be more accurate in estimating the zero value of HTE [22]. Indeed, there has been no clear scientific evidence that leukotriene receptor antagonists prevent AD. iii) Avoid overfitting: S-learner has a small number of model parameters (low variance in predicted values). Possibly S-learner is more likely to avoid overfitting than other meta-learners with a larger number of model parameters (e.g., the X-learner requires three prediction tasks). Our empirical finding is corroborated by prior theoretical investigations on inductive bias in HTE estimation[58]. Based on this finding, we suggest that the future HTE estimation model should consider the shared structure between the arms.
In addition, we found several limitations and challenges in data and problem formulation when applying them to real-world healthcare data and problems. The methods we compared primarily focus on how to achieve a more robust and accurate estimation of HTE through different model architectures, assuming that an ideal form of data is given. However, real-world data is incomplete in observation and lacks unbiased control to compare. Here we elaborate on the challenges we identified in detail.
5.1. Unobserved confounding variables
We assumed that our data contains a sufficient set of confounding variables that determines the outcome (e.g., AD onset) and treatment (e. g., the onset of study treatment). Caution is needed as the assumptions can be violated with real-world drug administrative observational data. Some important confounding variables, such as socioeconomic status, are not included in healthcare administrative data because these data are collected for billing purposes and not for scholarly purposes.
5.2. Variables in healthcare administrative data
The HTE estimation requires a larger set of variables to capture patients’ various conditions that might affect treatment effects at varying levels. Healthcare administrative data (e.g. EHRs, claim data) is a useful data source to contain the various comorbid conditions and co-medication patterns. Two data issues arise, sparsity and multimodality. Healthcare administrative data are sparse, noisy, and missing, not at random. Although we mitigated the sparsity by mapping all the diagnosis codes to PheWas codes[59] to derive compact and dense variables and increase the clinical relevance of the billing-purpose codes, the data was not rich enough, particularly, to train data-hungry deep learning models (e.g., CFRNet, DRLearner, Dragonnet). Previous work on deep learning and healthcare administrative data [60] show that the medical event prediction accuracy of deep learning models is marginal to that of traditional feature-based baselines. This data sparsity challenge might be a barrier to applying the representation learning methods to healthcare administrative data and general machine learning tasks. To mitigate the sparsity of healthcare administrative data, one can consider feature selection. Extra consideration is needed when applying feature selection to some HTE models as they consist of multiple prediction sub-tasks of supervised learning compared to general supervised learning tasks. Several feature selection methods for the HTE estimation were proposed [61]. We tested these feature selections on the best-performing model, S-learner with random forest regression, to see if the feature selection decreased the HTE performance measures, which turns out to be not helpful (Supplementary S.2.5).
Another challenge is data multimodality. Both medication order and diagnosis codes capture a patient’s comorbid conditions from different perspectives. One focus of the HTE estimation is to reduce selection bias by non-randomized treatment assignments and incorporating comprehensive multimodal variables. It is a study design choice to determine which modality to include or how to incorporate both modalities that follow different distributions. Healthcare experts’ knowledge would also be very critical to selecting the most important variables to predict propensity to treatment.
5.3. Define standard-of-care treatments
Estimating treatment effect using observational data requires us to define a virtual placebo (or standard-of-care treatments) from which the study treatment is compared. The target trial framework compares the choice of standard-of-care treatment: active drugs vs no active drugs [9,11,12]. Defining “good” standard-of-care treatments is critical that are distinctive enough to compare with the study treatment, not co-administered with the study treatment (avoid patients taking both the study treatment and standard-of-care treatment at the same time), and share similar disease indications of the study treatment to avoid confounding variables. A tradeoff arises when defining the standard-of-care treatment. Active drugs as standard-of-care treatment (e.g., other active anti-asthma drugs as a control in the benchmark experiment) help us avoid confounding effects by disease indication (e.g., all patients in the cohort have asthma), but provide treatment effects of the study treatment that are marginal to the active drugs. No active drugs as standard-of-care treatment (e.g., no active anti-asthma drugs as control) allow us to estimate treatment effects by the significant contrast of the study treatment and control, but the confounding by indication remains [62,63].
6. Conclusion
Recently, many machine learning models for estimating HTE have been proposed, but there has been limited effort to apply the methods to a real-world healthcare problem. Our methodology review and benchmark study provided an overview of current methods and benchmark tests by applying them to the task of emulating clinical trials. We estimated the HTE of leukotriene receptor antagonists in reducing the risk of AD using nationwide healthcare claim data.
Our study has several limitations. First of all, our methodology comparison is only focused on the non-parametric approach using machine learning. Parametric HTE estimation methods we did not discuss in this review include generalized linear models with stratification-multilevel method or match-smoothing method [64]. Also, for our benchmark experiment, we could not set the time zero to separate the pre-treatment and post-treatment comorbidity because the first time the treatment was assigned is ambiguous in our short-term data. This might reduce treatment effect size. For future investigation, we plan to incorporate the interaction between treatment and post-treatment covariates in order to capture the underlying causal structure information among the variables and thus can lead to a more accurate and robust estimator of treatment effects [65].
Nonetheless, we delivered some insights that may benefit future research direction, such as that i) simple models work better if the true treatment effect is close to zero, which is the case in most clinical trials, ii) observational healthcare data, particularly administrative data (e.g., claim, EHRs) are incomplete, thus researchers need a caution for unobserved confounding variables, and iii) active drugs sharing the same indication to the treatment of interest as a placebo may help reduce confounding by indication. For future research direction, we envision that i) it is promising to develop HTE estimation methods that leverage the similar mechanism of treatment and placebo (e.g., active drugs sharing the same indication), either by the single learner or parameter sharing, ii) it is also critical to infuse human knowledge (e.g., PheWas, [54]) to complement the incomplete and sparse records in healthcare administrative data iii) As healthcare data always contains timestamps of those records, it is also worthwhile to incorporating temporal relationships of the variables when estimating HTE. Our benchmark experiment codes are publicly available to facilitate transparent comparison1. As a future benchmark study, we will investigate the utility of HTE methods in clinical registries and trial data.
Supplementary Material
Funding
This work was supported by Cancer Prevention and Research institutes of Texas [RR180012 to X.J.]; the National Institute on Aging [R01AG066749]; the University of Texas system STARs Program; and the University of Texas Health Center startup.
Footnotes
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Appendix A. Supplementary material
Supplementary data to this article can be found online at https://doi.org/10.1016/j.jbi.2022.104256.
Data availability
Data will be made available on request.
References
- [1].What is the cost of a clinical trial?, (2021). http://www.sofpromed.com/what-is-the-cost-of-a-clinical-trial/ (accessed October 1, 2021).
- [2].Tyler IS, VanderWeele J, On the definition of a confounder, Ann. Stat 41 (2013) 196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Pearl J, Causality: Models, Reasoning, and Inference, Cambridge University Press, 2000. [Google Scholar]
- [4].Pearl J, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Elsevier, 2014. [Google Scholar]
- [5].S.V.F. McGough James J., Estimating the Size of Treatment Effects: Moving Beyond P Values, Psychiatry. 6 (n.d.) 21. [PMC free article] [PubMed] [Google Scholar]
- [6].Ohlsson H, Kendler KS, Applying Causal Inference Methods in Psychiatric Epidemiology: A Review, JAMA Psychiat. 77 (2020) 637–644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Parascandola M, Weed DL, Causation in epidemiology, J. Epidemiol. Community Health 55 (2001), 10.1136/jech.55.12.905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Zenil H, Kiani NA, Zea AA, Tegnér J, Causal deconvolution by algorithmic generative models, Nature, Machine Intelligence. 1 (1) (2019) 58–66. [Google Scholar]
- [9].Prosperi M, Guo Y.i., Sperrin M, Koopman JS, Min JS, He X, Rich S, Wang M. o., Buchan IE, Bian J, Causal inference and counterfactual prediction in machine learning for actionable healthcare, Nat. Machine Intell 2 (7) (2020) 369–375. [Google Scholar]
- [10].Glymour MM, Spiegelman D, Evaluating Public Health Interventions: 5. Causal Inference in Public Health Research-Do Sex, Race, and Biological Factors Cause Health Outcomes? Am. J. Public Health 107 (2017) 81–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Hernán MA, Robins JM, Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available, Am. J. Epidemiol 183 (2016), 10.1093/aje/kwv254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Chen Z, Zhang H, Guo Y, George TJ, Prosperi M, Hogan WR, He Z, Shenkman EA, Wang F, Bian J, Exploring the feasibility of using real-world data from a large clinical data research network to simulate clinical trials of Alzheimer’s disease, npj Digital Med. 4 (2021), 10.1038/s41746-021-00452-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Dehejia RH, Wahba S, Propensity score-matching methods for nonexperimental causal studies, Rev. Econ. Stat 84 (1) (2002) 151–161. [Google Scholar]
- [14].Wang Y, Shah RD, Debiased Inverse Propensity Score Weighting for Estimation of Average Treatment Effects with High-Dimensional Confounders, arXiv [stat.ME]. (2020). http://arxiv.org/abs/2011.08661. [Google Scholar]
- [15].Austin PC, Stuart EA, Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies, Stat. Med 34 (28) (2015) 3661–3679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Lam B, Masellis M, Freedman M, Stuss DT, Black SE, Clinical, imaging, and pathological heterogeneity of the Alzheimer’s disease syndrome, Alzheimers. Res. Ther 5 (2013) 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, Robins J, Double/Debiased Machine Learning for Treatment and Causal Parameters, (2016). http://arxiv.org/abs/1608.00060 (accessed October 1, 2021).
- [18].Shalit U, Johansson FD, Sontag D, Estimating individual treatment effect: generalization bounds and algorithms, (2016). http://arxiv.org/abs/1606.03976 (accessed July 21, 2021).
- [19].Bica I, Alaa AM, Lambert C, van der Schaar M, From Real-World Patient Data to Individualized Treatment Effects Using Machine Learning: Current and Future Methods to Address Underlying Challenges, Clin. Pharmacol. Ther 109 (2021) 87–100. [DOI] [PubMed] [Google Scholar]
- [20].microsoft, microsoft/EconML, (n.d.). https://github.com/microsoft/EconML (accessed March 25, 2021).
- [21].Chen H, Harinen T, Lee J-Y, Yung M, Zhao Z, CausalML: Python Package for Causal Machine Learning, arXiv [cs.CY]. (2020). http://arxiv.org/abs/2002.11631. [Google Scholar]
- [22].Künzel SR, Sekhon JS, Bickel PJ, Yu B, Metalearners for estimating heterogeneous treatment effects using machine learning, Proc. Natl. Acad. Sci. U. S. A 116 (10) (2019) 4156–4165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Curth A, Svensson D, Weatherall J, van der Schaar M, Really Doing Great at Estimating CATE? A Critical Look at ML Benchmarking Practices in Treatment Effect Estimation, (2021). https://openreview.net/pdf?id=FQLzQqGEAH (accessed December 1, 2021).
- [24].Curth A, van der Schaar M, Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms, in: Banerjee A, Fukumizu K (Eds.), Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR, 2021: pp. 1810–1818. [Google Scholar]
- [25].Curth A, van der Schaar M, On Inductive Biases for Heterogeneous Treatment Effect Estimation, arXiv [stat.ML]. (2021). http://arxiv.org/abs/2106.03765. [Google Scholar]
- [26].Jacob D, CATE meets ML – The Conditional Average Treatment Effect and Machine Learning, (2021). http://arxiv.org/abs/2104.09935 (accessed July 22, 2021).
- [27].Rubin DB, Causal Inference Using Potential Outcomes, J. Am. Stat. Assoc 100 (2005) 322–331, 10.1198/016214504000001880. [DOI] [Google Scholar]
- [28].He H, Wu P, (din) Chen D-G, Statistical Causal Inferences and Their Applications in Public Health Research, Springer, 2016. [Google Scholar]
- [29].Pearl J, Mackenzie D, The Book of Why: The New Science of Cause and Effect, Basic Books, 2018. [Google Scholar]
- [30].Hernán MA, Robins JM, Estimating causal effects from epidemiological data, J. Epidemiol. Community Health 60 (2006) 578–586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Funk MJ, Westreich D, Wiesen C, Stürmer T, Brookhart MA, Davidian M, Doubly robust estimation of causal effects, Am. J. Epidemiol 173 (2011) 761–767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Naimi AI, Cole SR, Kennedy EH, An introduction to g methods, Int. J. Epidemiol 46 (2017) 756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Chetverikov D, Demirer M, Duflo E, Hansen C, Newey WK, Chernozhukov V, Double machine learning for treatment and causal parameters, (2016). 10.1920/wp.cem.2016.4916. [DOI] [Google Scholar]
- [34].Johansson FD, Shalit U, Sontag D, Learning Representations for Counterfactual Inference, (2016). http://arxiv.org/abs/1605.03661 (accessed July 21, 2021).
- [35].Hassanpour N, Greiner R, Learning Disentangled Representations for CounterFactual Regression, in: Eighth International Conference on Learning Representations, 2020. http://www.openreview.net/pdf?id=HkxBJT4YvB (accessed July 21, 2021). [Google Scholar]
- [36].Bica I, Jordon J, van der Schaar M, Estimating the Effects of Continuous-valued Interventions using Generative Adversarial Networks, (2020). http://arxiv.org/abs/2002.12326 (accessed July 21, 2021).
- [37].Shi C, Blei DM, Veitch V, Adapting Neural Networks for the Estimation of Treatment Effects, arXiv [stat.ML]. (2019). http://arxiv.org/abs/1906.02120. [Google Scholar]
- [38].Johansson F, Shalit U, Sontag D, Learning Representations for Counterfactual Inference, in: International Conference on Machine Learning, PMLR, 2016: pp. 3020–3029. [Google Scholar]
- [39].Athey S, Imbens G, Recursive partitioning for heterogeneous causal effects, Proc. Natl. Acad. Sci. U. S. A 113 (27) (2016) 7353–7360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Wager S, Athey S, Estimation and Inference of Heterogeneous Treatment Effects using Random Forests, J. Am. Stat. Assoc 113 (523) (2018) 1228–1242. [Google Scholar]
- [41].Welcome to Causal ML’s documentation — causalml documentation, (n.d.). https://causalml.readthedocs.io/en/latest/ (accessed August 4, 2021).
- [42].Welcome to econml’s documentation! — econml 0.12.0b5 documentation, (n.d.). https://econml.azurewebsites.net (accessed August 4, 2021).
- [43].Athey S, Wager S, Estimating Treatment Effects with Causal Forests: An Application, Observational, Studies. 5 (2) (2019) 37–51. [Google Scholar]
- [44].Nie X, Wager S, Quasi-oracle estimation of heterogeneous treatment effects, Biometrika. 108 (2020) 299–319. [Google Scholar]
- [45].Hill JL, Bayesian Nonparametric Modeling for Causal Inference, J. Comput. Graph. Stat 20 (2011) 217–240, 10.1198/jcgs.2010.08162. [DOI] [Google Scholar]
- [46].Alaa A, Van Der Schaar M, Validating Causal Inference Models via Influence Functions, in: Chaudhuri K, Salakhutdinov R (Eds.), Proceedings of the 36th International Conference on Machine Learning, PMLR, 2019: pp. 191–201. [Google Scholar]
- [47].Montelukast Therapy on Alzheimer’s Disease, (n.d.). https://clinicaltrials.gov/ct2/show/NCT03991988 (accessed July 8, 2021).
- [48].Bozek A, Jarzab J, Improved activity and mental function related to proper antiasthmatic treatment in elderly patients with Alzheimer’s disease, Allergy Asthma Proc. 32 (2011) 341–345, 10.2500/aap.2011.32.3459. [DOI] [PubMed] [Google Scholar]
- [49].Michael J, Marschallinger J, Aigner L, The leukotriene signaling pathway: a druggable target in Alzheimer’s disease, Drug Discov. Today 24 (2019) 505–516, 10.1016/j.drudis.2018.09.008. [DOI] [PubMed] [Google Scholar]
- [50].Lai J, Mei ZL, Wang H, Hu M, Long Y, Miao MX, Li N, Hong H, Montelukast rescues primary neurons against Aβ1–42-induced toxicity through inhibiting CysLT1R-mediated NF-κB signaling, Neurochem. Int 75 (2014) 26–31, 10.1016/j.neuint.2014.05.006. [DOI] [PubMed] [Google Scholar]
- [51].Mansour RM, Ahmed MAE, El-Sahar AE, El Sayed NS, Montelukast attenuates rotenone-induced microglial activation/p38 MAPK expression in rats: Possible role of its antioxidant, anti-inflammatory and antiapoptotic effects, Toxicol. Appl. Pharmacol 358 (2018) 76–85. [DOI] [PubMed] [Google Scholar]
- [52].Grinde B, Engdahl B, Prescription database analyses indicates that the asthma medicine montelukast might protect against dementia: a hypothesis to be verified, Immun. Ageing 14 (2017) 20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Beaulieu-Jones BK, Finlayson SG, Yuan W, Altman RB, Kohane IS, Prasad V, Yu K, Examining the Use of Real-World Evidence in the Regulatory Process, Clin. Pharmacol. Ther 107 (2020) 843–852, 10.1002/cpt.1658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, Field JR, Pulley JM, Ramirez AH, Bowton E, Basford MA, Carrell DS, Peissig PL, Kho AN, Pacheco JA, Rasmussen LV, Crosslin DR, Crane PK, Pathak J, Bielinski SJ, Pendergrass SA, Xu H, Hindorff LA, Li R, Manolio TA, Chute CG, Chisholm RL, Larson EB, Jarvik GP, Brilliant MH, McCarty CA, Kullo IJ, Haines JL, Crawford DC, Masys DR, Roden DM, Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data, Nat. Biotechnol 31 (12) (2013) 1102–1111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].Lundberg SM, Lee S-I, A unified approach to interpreting model predictions, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: pp. 4768–4777. [Google Scholar]
- [56].Culibrk RA, Hahn MS, The Role of Chronic Inflammatory Bone and Joint Disorders in the Pathogenesis and Progression of Alzheimer’s Disease, Front. Aging Neurosci 12 (2020), 583884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Kang J-H, Lim H, Lee D-S, Yim M, Montelukast inhibits RANKL-induced osteoclast formation and bone loss via CysLTR1 and P2Y12, Mol. Med. Rep 18 (2018) 2387–2398. [DOI] [PubMed] [Google Scholar]
- [58].Curth, Schaar, On inductive biases for heterogeneous treatment effect estimation, Adv. Neural Inf. Process. Syst (n.d.). https://proceedings.neurips.cc/paper/2021/hash/8526e0962a844e4a2f158d831d5fddf7-Abstract.html. [Google Scholar]
- [59].Bastarache L, Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS, Annu Rev Biomed Data Sci 4 (1) (2021) 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M, Sundberg P, Yee H, Zhang K, Zhang Y, Flores G, Duggan GE, Irvine J, Le Q, Litsch K, Mossin A, Tansuwan J, Wang D, Wexler J, Wilson J, Ludwig D, Volchenboum SL, Chou K, Pearson M, Madabushi S, Shah NH, Butte AJ, Howell MD, Cui C, Corrado GS, Dean J, Scalable and accurate deep learning with electronic health records, Npj Digital Medicine. 1 (2018) 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Zhao Z, Zhang Y, Harinen T, Yung M, Feature Selection Methods for Uplift Modeling, arXiv [cs.LG]. (2020). http://arxiv.org/abs/2005.03447. [Google Scholar]
- [62].Salas M, Hofman A, Stricker BH, Confounding by indication: an example of variation in the use of epidemiologic terminology, Am. J. Epidemiol 149 (1999), 10.1093/oxfordjournals.aje.a009758. [DOI] [PubMed] [Google Scholar]
- [63].Kyriacou DN, Lewis RJ, Confounding by Indication in Clinical Research, JAMA 316 (2016) 1818–1819. [DOI] [PubMed] [Google Scholar]
- [64].Xie Y.u., Brand JE, Jann B, Estimating Heterogeneous Treatment Effects with Observational Data, Sociol. Methodol 42 (1) (2012) 314–347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [65].Tran C, Zheleva E, Improving Data-driven Heterogeneous Treatment Effect Estimation Under Structure Uncertainty, in: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, New York, NY, USA, 2022: pp. 1787–1797. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data will be made available on request.