G-Computation and Hierarchical Models for Estimating Multiple Causal Effects From Observational Disease Registries With Irregular Visits

Zach Shahn; Ying Li; Zhaonan Sun; Amrita Mohan; Cristina Sampaio; Jianying Hu

. 2019 May 6;2019:789–798.

G-Computation and Hierarchical Models for Estimating Multiple Causal Effects From Observational Disease Registries With Irregular Visits

Zach Shahn ¹, Ying Li ¹, Zhaonan Sun ¹, Amrita Mohan ², Cristina Sampaio ², Jianying Hu ¹

PMCID: PMC6568089 PMID: 31259036

Abstract

Huntington’s Disease (HD) is a neurodegenerative disorder with serious motor, cognitive, and behavioral symptoms. Chorea, a motor symptom of HD characterized by abrupt involuntary movements, is typically treated with tetrabenazine or certain off-label antipsychotics. Clinical trial evidence about the impact of these drugs in the HD population is scant. However, multiple observational HD registries have recently been used with success to model HD progression^1,2 and provide an opportunity to obtain effect estimates in the absence of clinical trials. We use a dataset integrated from four large-scale HD registries to generate evidence on the efficacy of chorea treatments for chorea as well as their impact on other aspects of HD progression. Clinical conclusions are meant only to illustrate our methodological approach. We employ parametric G-computation for causal inference to adjust for confounding and accommodate irregular visits and treatment patterns. We fit Bayesian hierarchical models to the results of multiple related analyses to share strength across studies and handle multiple comparisons concerns.

Introduction

Huntington’s Disease (HD) is a neurodegenerative disorder caused by an abnormal expanded trinucleotide (CAG) repeat in the Huntingtin gene³. HD is characterized by progressive decay of motor and cognitive abilities, accompanied by psychiatric episodes. There are no treatments known to slow disease progression, but certain symptoms can be treated. Chorea is a typical motor symptom of HD. It is characterized by brief, abrupt, irregular, unpredictable, non-stereotyped movements. Chorea is typically treated with tetrabenazine, the only drug approved specifically for HD in the US, or certain off-label antipsychotics. Little is currently known about the comparative effectiveness of common chorea treatments at treating chorea, or about other possible effects these treatments may have in the HD population.

Randomized trials produce gold standard estimates of causal effects. However, few trials have evaluated medications prescribed for chorea in the HD population. Past trials of antipsychotic chorea medications have been very small and yielded mixed results⁴. No sizable randomized trial of chorea medications has examined long-term effects or composite measures of disease state.

In the absence of randomized trials, we might turn to promising sources of observational data to generate evidence about the impact of chorea treatments. Several international HD registries collect rich longitudinal data on HD patients, including medication use and results of repeated clinical exams. Unfortunately, in observational data it is common for treated patients to differ from untreated patients on variables that are associated with the outcome. When this is the case, ‘confounding’ is said to be present, and a naive comparison of the treated to the untreated will yield a biased estimate of the effect of treatment. However, when confounding variables are recorded, it is possible to adjust for them such that estimates of causal effects are similar to what would be obtained from a randomized trial.

We employ parametric G-computation^5,6, a method appropriate to adjust for confounding in longitudinal data when estimating the effects of time-varying treatments. We implement the approach in a way that accommodates outcomes recorded at irregular intervals and treatments given for varying durations between visits. Even with the rich set of variables recorded in the registries, residual unobserved confounding likely remains because (a) treatments are often administered many months after clinical variables were last measured, and (b) clinical measurements are often noisy. We mitigate this residual confounding by defining our cohort of interest in a very restrictive way.

We estimated effects of multiple related drugs on multiple related outcomes. Both to avoid concerns about multiple comparisons and to share strength across related analyses, we fit hierarchical Bayesian models to our effect estimates as an important post-processing step.

A final notable feature of our analysis is our utilization of recent work generating representations of HD progression⁷ to define both outcomes and variables that can be used to adjust for confounding. In both contexts, we exploit the fact that the extracted features produce low dimensional clinically meaningful summaries of patient state. We hope this analysis serves as an illustration of how to extract evidence about a set of related causal questions from irregularly sampled, observational, longitudinal data.

Data

Our dataset was integrated from four large prospective observational studies of HD patients–Enroll-HD⁸, REGISTRY⁹, TRACK-HD/TRACK-ON^10,11, and PREDICT-HD¹². Enroll-HD is a worldwide observational study of Huntington’s Disease families. In this work, we used the ENROLL-IDS-2017-09-R2 version of the Enroll-HD periodic data, which contains data from 14,241 subjects who made their baseline visits prior to September 2017, of whom 10,186 are HD gene expansion carriers (HDGECs), with CAG length ≥ 36. REGISTRY is a multi-center, multi-national observational study, managed by the European Huntington’s Disease Network. This study used the REGISTRY-IDS-2017-09-R2 version of REGISTRY data, which consists of 13,348 participants, among whom 7,939 are HDGECs. TRACK-HD is a multinational study of HD that examines clinical and biological findings of disease progression in individuals with pre-manifest HD and early-stage HD. At the baseline visit, 402 participants were enrolled. Among the participants, 274 were HDGECs subjects. TRACK-ON is a follow-up study of TRACK-HD with 133 HDGECs. PREDICT-HD is another longitudinal observational study of subjects who have not yet met criteria for a diagnosis of HD. The PREDICT-HD data used in this study consists of 1,481 participants of whom 1,145 were HDGECs.

Participants received unified IDs across the four registries. For patients who participated in multiple studies, time gaps between the study baselines were available. Hence, we were able to integrate the datasets. The integrated dataset comprised 25,546 participants, of whom 18,941 were HDGECs. The mean number of visits per participant was 3.12.

Participants entered at various stages of disease progression and were tracked over time for varying numbers of visits. Visits were meant to be approximately annual, but time between visits was somewhat irregular in practice. Each registry collected results from a battery of clinical assessments such as the Unified Huntington’s Disease Rating Scale (UHDRS)¹³ at each visit. Changes in the values of the clinical assessments can reflect HD progression. Therefore, they are used (directly or indirectly) as the outcome measures in this study. Registries also collected timestamped data on concomitant medications and comorbidities, as well as their durations, between visits. A diagram of a hypothetical patient’s data can be found in Figure 1.

Figure 1: — Diagram of hypothetical data from a patient who appeared for two visits after baseline.

The concomitant medications in all four studies are encoded using the WHO Drug Dictionary (WHO-DD). The WHO-DD is administered and licensed by the World Health Organizations (WHO) Uppsala Monitoring Center (UMC)¹⁴. The first six digits of a drug code identifies the active ingredient, regardless of salt form or plant part and extract. Therefore, we used these six digits to link the medications with the same active ingredients in the four studies.

The indications and comorbidities are coded using different terminologies in different studies. For example, indications are coded by MedDRA in Enroll study but by ICD10 in Track study. We used Unified Medical Language System (UMLS) to map MedDRA codes and ICD10 codes together¹⁵. For example, diabetes mellitus in MedDRA code of 10012601 and in ICD10 code of E14 should be mapped to the same concept C0011849.

Outcome Variables of Interest

We used the mean value of the seven chorea components in the UHDRS¹³ system as a measure of chorea severity at each visit. The severity of bradykinesia (i.e. slowness of movement), a known side effect of many drugs considered, was represented by the Bradykinesia component in the UHDRS system at each study visit.

We also evaluated the effects of chorea medications on more general HD progression. A previous work⁷ used a robust Bayesian latent variable analysis to extract latent factors that can represent heterogeneous HD progression paths based on the periodic clinical assessments. Specifically, the method was applied on motor, functional, and cognitive assessments separately, and the resulting leading factors, referred to as ‘motor1’, ‘functional1’, and ‘cognition1’ in the rest of this article, can serve as indicators of overall motor, functional, and cognitive progression.

Finally, [16] proposed a composite measure of motor, cognitive, and functional decline that could characterize HD clinical progression¹⁶. The resulting composite score, referred to as the CUHDRS score, is used as an index to indicate the general severity of HD.

Methods

Defining Causal Estimands of Interest

The field of causal inference is concerned with estimating comparisons of population distributions of outcomes under various counter factual population interventions. Before undertaking a causal analysis, it is important to precisely define the population and interventions of interest. In our application, we take the population of interest to be HD patients with serious chorea who have not yet initiated a chorea treatment. We define serious chorea as an average chorea score across the seven UHDRS chorea score components of ≥ 2. For each treatment/outcome pair, we seek to estimate the difference in our population of the mean outcome (measured after one and two years) had contrary to fact(a) everyone in the population continuously taken the treatment of interest for two years and not taken any other chorea treatment, v.s. (b) nobody in the population taken any chorea treatment at all for two years. Under assumptions we will lay out below, this counterfactual quantity can be consistently estimated from observed data.

To formalize, define baseline to be the first visit at which a patient has mean chorea score ≥ 2 and has not yet taken any chorea medication. Let:

t ∈ 1,..., K denote visit number after baseline, with t = 1 indicating the visit immediately following the baseline visit and K denoting the latest observed visit;
A_t denote treatment of interest in the interim between visits t - 1 and t;
B_t denote use of any chorea treatments other than the treatment of interest A between visits t - 1 and t;
Y_t denote the outcome at visit t, e.g. mean chorea score at visit t
L_t denote covariates preceding visit t that may influence treatment decisions about A_t and be associated with outcomes at visit t or later;
C_t indicate whether the patient has data for visit t, with C_t = 1 meaning that data is observed;
R_t denote the number of days since the previous visit; and
${\bar{X}}_{t}$ denote X₀,..., X_t and X_t denote X_t,..., X_K for arbitrary time varying variable X.

L₁ contains all baseline information about the patient, such as demographic data and CAG length. L_t may include the outcome variable observed at the previous visit and other clinical variables from the previous visit, as well as information about other drugs and comorbidities between visits t - 1 and t. The treatment variable A_t may be multidimensional in order to capture relevant information about treatment between visits t - 1 and t. We divide A_t into two components—A¹_t indicating whether treatment was ongoing less than 30 days before visit t, and A²_t denoting the proportion of the duration between t - 1 and t that was treated. We let B_t be binary and simply indicate whether any non -A chorea treatment was taken at all between t - 1 and t. We use capital letters to denote random variables and corresponding lower case letters to denote specific values that random variables might take.

We adopt the counterfactual framework for time-varying treatments introduced by [5]. This framework posits that for each patient and visit t, corresponding to each possible treatment history ā_t through visit t are counterfactual outcomes that would have been observed in that individual had they actually received treatment history ā_t. We denote by Y_t(ā_t) the counterfactual outcome at visit t under treatment ā_t. Implicit in the notation Y_t(ā_t) is the assumption that the treatment strategy followed by one patient does not influence the outcome of any other patient.

Under this framework, the difference between the mean outcomes in scenarios (a) and (b) (described in the first paragraph of this section) at one and two years after baseline can be written

E [Y_{1} (a_{1} = (a_{1}^{1} = 1, a_{1}^{2} = 1), c_{1} = 1, r_{1} = 365, b_{1} = 0)] - E [Y_{1} (a_{1} = (a_{1}^{1} = 0, a_{1}^{2} = 0), c_{1} = 1, r_{1} = 365, b_{1} = 0)]

(1)

and

E [Y_{2} ({\bar{a}}_{2} = ({\bar{a}}_{2}^{1} = 1, {\bar{a}}_{2}^{2} = 1), c_{1} = 1, r_{1} = 365, {\bar{b}}_{2} = 0)] - E [Y_{1} ({\bar{a}}_{2} = ({\bar{a}}_{2}^{1} = 0, {\bar{a}}_{2}^{2} = 0), {\bar{c}}_{2} = 1, {\bar{r}}_{2} = 365, {\bar{b}}_{2} = 0)]

(2)

respectively. The first term in (1) expressed in words is the expected value of the outcome at visit 1 had everyone in the population taken the treatment of interest for the full period between baseline and visit 1 (i.e. $a_{1}^{2} = 1$ ), (redundantly) taken the treatment of interest less than 30 days before visit 1 (i.e. $a_{1}^{1} = 1$ ), shown up for visit 1 365 days after the baseline visit (i.e. c₁ = 1 and r₁ = 365), and not taken any other chorea medication.

G-computation

[5] showed that under the assumptions

Consistency: Y_{t} = Y_{t} (\bar{A}, \bar{C}, \bar{R}, \bar{B}) \forall t

(3)

Sequential Exchangeability : {\underline{Y}}_{t} ({\bar{a}}_{t}, {\bar{c}}_{t}, {\bar{r}}_{t}, {\bar{b}}_{t}) ⊥ A_{t}, C_{t}, R_{t}, B_{t} | {\bar{L}}_{t}, {\bar{A}}_{t - 1}, {\bar{C}}_{t - 1}, {\bar{R}}_{t - 1}, {\bar{B}}_{t - 1} \forall t, {\bar{a}}_{t}, {\bar{c}}_{t}, {\bar{r}}_{t}, {\bar{b}}_{t}

(4)

each term of (1) and (2) is identified by the g-formula:

E [Y ({\bar{a}}_{t}, {\bar{c}}_{t} = 1, {\bar{r}}_{t} = 365, {\bar{b}}_{t} = 0)] = \sum_{{\bar{l}}_{t}} {E [Y_{t} | {\bar{A}}_{t} = {\bar{a}}_{t}, {\bar{C}}_{t} = 1, {\bar{R}}_{t} = 365, {\bar{B}}_{t} = 0, {\bar{L}}_{t} = {\bar{l}}_{t}] \times \prod_{m = 1}^{t} f (l_{m} | {\bar{L}}_{m - 1} = {\bar{l}}_{m - 1}, {\bar{A}}_{m - 1} = {\bar{a}}_{m - 1}, {\bar{C}}_{m} = 1, {\bar{R}}_{m - 1} = 365, {\bar{B}}_{m - 1} = 0)}

(5)

where f(·|·) denotes conditional density. Consistency is just a technical assumption necessary to link counterfactual to observed data. It states that the observed outcomes are equal to the counterfactual outcomes corresponding to the observed treatments. Sequential exchangeability states that treatment, censoring, and time between visits are independent of future counterfactual outcomes conditional on recorded patient history. It basically means that there are no unobserved confounders at any time. It is satisfied, for example, if all causes of treatment, visit timing, and censoring that are also associated with the outcome are recorded as in the causal graph in Figure 2. Note that (5) implies that under assumptions (3) and (4) each term in our causal quantities of interest (1) and (2) can be expressed as a function of just the observed data, and therefore our causal quantities of interest can be estimated from data. Estimation proceeds by estimating each term of (1) and (2) separately by approximating the g-formula.

Figure 2: — Causal graph depicting a scenario under which the assumptions required for causal inference hold. *U_t* are unobserved variables that affect the outcome but do not directly affect treatment except through observed variables *L_t*.

To approximate the g-formula (5), we first fit regression models for the conditional expectation of the outcome given patient history $E [Y_{t} | {\bar{A}}_{t} = {\bar{a}}_{t}, {\bar{C}}_{t} = 1, {\bar{R}}_{t} = 365, {\bar{B}}_{t} = 0, {\bar{L}}_{t} = {\bar{l}}_{t}]$ and the conditional densities of the confounders given patient history $f (L_{m} | {\bar{L}}_{m - 1}, {\bar{A}}_{m - 1}, {\bar{C}}_{m} = 1, {\bar{R}}_{m - 1}, {\bar{B}}_{m - 1} = 0)$ . In our application, we specified generalized linear models for each of these regressions and, since we always condition on the event that ${\bar{B}}_{t} = 0$ , fit them to data with patients censored as soon as they took a chorea drug other than the drug of interest.

We next used these models to approximate the g-formula through Monte-Carlo simulation. Approximation is necessary because the g-formula is a sum (or integral) over all possible patient histories and cannot be computed analytically. We sampled with replacement 250 patients from our study population. For each patient, we set their treatment value A₁ at time 1 to be the treatment value of interest a₁ and set the days between the baseline visit and visit 1 (i.e. R₁) to 365. Then, given their observed visit 1 covariates L₁ and imposed treatment and visit gap, we simulated their visit 2 covariates from our estimated covariate regression $\hat{f} (L_{m} | {\bar{L}}_{m - 1}, {\bar{A}}_{m - 1}, {\bar{C}}_{m} = 1, {\bar{R}}_{m - 1}, {\bar{B}}_{m - 1} = 0)$ . Next, we set their treatment value at visit 2 to treatment value of interest a2 and set the gap between visits 1 and 2 to 365 days. Finally, we use our estimated outcome regression $\hat{E} [Y_{t} | {\bar{A}}_{t}, {\bar{C}}_{t} = 1, {\bar{R}}_{t}, {\bar{B}}_{t} = 0, {\bar{L}}_{t}]$ to generate expected outcomes for t = 1 and t = 2 given our simulated patient history. We then average the expected outcomes at each time point across the 250 simulated patient histories to generate Monte-Carlo estimates of $E [Y_{t} ({\bar{a}}_{t}, {\bar{c}}_{t} = 1, {\bar{r}}_{t} = 365, {\bar{b}}_{t} = 0)]$ for t = 1 and t = 2. Confidence intervals were computed by bootstrap.

See [6] for a thorough and accessible tutorial on G-computation and an application to the Nurses’ Health Study dataset. Our implementation differs from theirs to handle irregular visits. We handle irregular visits by incorporating time between visits as a component of the treatment and modeling the association of visit gaps with the outcome and covariates.

Confounder Selection

Whether the sequential exchangeability assumption (4) holds depends on what variables we include in L. We want to choose which variables to adjust for, i.e. include in L, such that (4) is most likely to hold. Whether the assumption holds for a given combination of treatment, outcome, and L is not verifiable from data. For each analysis, we chose L using a combination of subject matter knowledge and data driven heuristics. There are certain variables we adjusted for in analyses of all treatment/outcome pairs: CAP score³, outcome at previous visit, CAG length, age, sex, motor1, cognition1, function1, visit number, days since baseline, category (i.e. manifest or not), chorea score, number of other drugs taken, and number of comorbidities diagnosed. These variables should jointly give a good characterization of underlying disease state and symptom severity, and hopefully fill the role of L in Figure 2. Thus, it would not be too unreasonable to simply adjust for these variables alone. In addition to the above variables, which we call L* for purposes of discussion in this subsection, we also adjusted for variables chosen through an automated screening process in each analysis. We fit boosted trees predictive models for treatment and outcome with all clinical variables from the previous visit, all demographic and baseline variables, and all medication and comorbidity variables as predictors. We then fit a LASSO¹⁷ regression for the outcome Y_t including as predictors A_t, L_t*, the 40 variables with highest importance measure for predicting the treatment, the 40 variables with highest importance measure for predicting the outcome, and all two way interactions between ( $A_{t}^{1}, A_{t}^{2}$ ) and the other predictors. In the LASSO regression, we specify that there is no penalty for variables in L^* or ( $A_{t}^{1}, A_{t}^{2}$ ). Let L’ denote the variables that had nonzero coefficients in the LASSO regression or were included in interactions with non-zero coefficients in the LASSO regression. L’, which includes L*, is the full set of variables we adjusted for, and the outcome model in G-computation included all terms with non-zero coefficients in the Lasso regression.

Cohort Selection Considerations

Our restrictive cohort definition is also part of our approach to reduce confounding. Even if we adjust for all the variables that are both relevant to treatment decisions and associated with outcomes of interest, confounding would still likely remain as a result of two features of our dataset. First, treatment decisions are sometimes made months after the most recent clinical visit, so variables could have changed from their last measured values by the time of the treatment decision. Secondly, clinical measurements can be noisy. Figure 3 explains how each of these issues can lead to biased effect estimates. By restricting our cohort to patients with serious chorea, we mitigate these sources of confounding. In a cohort of patients with serious chorea, unobserved differences between patients at the time of their treatment decisions would not have as large an influence on probability of treatment, since probability of treatment will be relatively high for all. The price of strict population criteria is a reduced sample size and higher standard errors.

Figure 3: — Two ways confounding can arise in our data even if all the right variables are collected at each visit. In each diagram, the stars represent the point of treatment decision. In each scenario, two patients appear similar based on clinical measurements. But, due either to measurement noise (Right) or delay from measurement to treatment (Left), one patient is actually worse off at the time of the treatment decision and therefore more likely both to be prescribed treatment and to have a worse outcome at the next visit, making it appear to us as if treatment is actually harmful.

Analyses Conducted

Using G-computation as described above, we estimated the effects of tetrabenazine and off-label antipsychotic chorea drugs on chorea score, bradykinesia score, and composite disease progression summaries CUHDRS, motor1, function1, and cognition1. The off label antipsychotic treatments we considered were sulpiride, olanzapine, pimozide, quetiapine, risperidone, haloperidol, tiapride, chlorpromazine, and aripiprazole. We estimated the effect of antipsychotics as a group, and also estimated effects of certain individual antipsychotic chorea drugs (i.e. olanzapine, risperidone, and tiapride) commonly taken in our cohort.

We estimate the effect of the intervention ‘take any antipsychotic chorea drug’ by defining $A_{t}^{1} = 1$ if any antipsychotic chorea drug was taken within 30 days of visit t and defining $A_{t}^{2}$ to be the proportion of days between visits t - 1 and t that any antipsychotic chorea drug was taken. We are estimating the effect of everybody taking an antipsychotic chorea medication, with the choice of which one at the discretion of their doctors according to current practice.

Post-processing with Bayesian Hierarchical Models

For each outcome, we used G-computation to produce effect estimates for five treatments (over both one and two years). (See raw effect estimates in Figure 5.) As a post-processing step, we fit the following somewhat oversimplified hierarchical Bayesian model to the effect estimates and standard errors for each outcome and treatment duration:

{\hat{δ}}_{r i s}, {\hat{δ}}_{o l a}, {\hat{δ}}_{t i a,} {\hat{δ}}_{a n y}, {\hat{δ}}_{t e t} ~ M V N {(μ_{r i s}, μ_{o l a}, μ_{t i a}, μ_{a n y}, μ_{t e t}), {\sum^{^}}_{b o o t s t r a p}};

(6)

μ_{r i s}, μ_{o l a}, μ_{t i a}, μ_{o t h e r} ~ N (μ_{a n t i}, σ_{a n t i});

μ_{a n y} = C_{r i s} μ_{r i s} + C_{o l a} μ_{o l a} + C_{t i a} μ_{t i a} + C_{o t h e r} μ_{o t h e r};

μ_{a n t i}, μ_{t e t} ~ N (μ, σ); σ, σ_{a n t i} ~ U n i f o r m (0, 5); μ ~ N (0, 10)

Figure 5: — Raw estimated effects and .95 CIs of taking chorea drugs continuously on chorea score, bradykinesia score, and composite summaries of disease state after one and two years compared to no treatment. For Motor1, chorea score, and bradykinesia score, effects <0 are beneficial. For the other outcomes, effects > 0 are beneficial.

See Figure (4) for a graphical representation of (6). The effect estimates ${\hat{δ}}_{t r e a t}$ (with treat ranging over risperidone, olanzapine, tiapride, any antipsychotic, and tetrabenazine) are assumed to come from a multivariate Gaussian distribution centered at the corresponding true effects μ_treat with covariance matrix equal to its bootstrap estimate ${\hat{Σ}}_{b o o t}$ . This assumption is justified by the central limit theorem. The true effects of individual antipsychotic chorea treatments are themselves assumed to be independent draws from a common Gaussian distribution with mean μ_anti and standard deviation σ_anti. The true effect of tetrabenazine and the average of the effects of the antipsychotic treatments μ_anti are also assumed to come from a common Gaussian distribution with mean μ and standard deviation σ. We specify that the true effect of antipsychotic’ μ_any is equal to the weighted average of the the individual antipsychotic treatment effects we estimated and the effect μ_other of taking any of the other antipsychotic treatments whose effects we did not estimate individually. The weight given to each individual treatment in the weighted average is the proportion of patients taking any antipsychotic treatment who took that particular treatment, i.e. C_ola = 87/256, C_ris = 43/256, C_tia = 43/256, and C_other = 83/256. We put weakly informative priors on the grand mean of all treatment effects μ, the standard deviation σ of the distribution producing the tetrabenazine and mean antipsychotic true effects μ_tet and μ_anti, and the standard deviation σ_anti of the common distribution of the antipsychotic treatment effects.

The assumption that the true effects μ_ris, μ_tia, and μ_ola of the antipsychotic drugs come from a common Gaussian distribution (and are related to the effect μ_any of any antipsychotic through a deterministic function and the parameter μ_other) allows us to ‘share strength’ across estimates of these effects¹⁸. Our estimate of the effect of one treatment informs estimates of the effects of the others because they are assumed to come from the same distribution. The lower the standard error of an effect estimate ${\hat{δ}}_{t r e a t m e n t}$ , the less influence other effect estimates will have on the posterior distribution of its underlying true effect μ_treatment. If the standard error is high (because the treatment was taken by fewer patients or was strongly confounded with the outcome), the model can pull the posterior mean of /_treatm en _t closer to the posterior means of the other treatment effects and also reduce its posterior variance and narrow its Highest Density Interval (HDI). σ_anti determines how much influence treatment effect estimates can have over each other, with lower values leading to more influence. σ_anti is estimated from data. A similar dynamic holds between the true effect μ_tet tetrabenazine and the average effect of antipsychotics μ_anti, with a determining how much influence estimates of the effect of tetrabenazine have on estimates of the effects of antipsychotic treatments and vice versa.

These models also help to ameliorate multiple comparisons concerns¹⁹ that arise from scanning raw results as in Figure 5 for ‘significant’ differences between effect estimates. Model (6) brings effect estimates closer together, so that more support from the data is required to pull HDIs apart. Thus, there is less concern that chance alone is responsible for disparities in HDIs produced by a hierarchical model. See [18] and [19] for fuller discussions of hierarchical models for sharing strength and dealing with multiple comparisons.

These models make the simplifying assumption that the effect of ‘any antipsychotic’ is a weighted average of effects of individual antipsychotics with weights proportional to use prevalence. This assumption could be justified if antipsychotic treatment is not associated with treatment effect, perhaps a reasonable assumption under clinical equipoise.

Results

Figure 5 shows raw G-computation estimates and .95 bootstrap confidence intervals (CIs) of the effects of tetrabenazine and off-label antipsychotic chorea drugs on chorea score, bradykinesia score, and composite disease progression summaries CUHDRS, motor1, function1, and cognition1. Figure 6 displays the posterior mean estimated effects and .95 HDIs output by post-processing with Bayesian hierarchical models. The .95 HDIs were narrower than the .95 CIs, reflecting the benefits of sharing strength across analyses. The post-processed effect estimates were also clustered closer together, with high variance estimates pulled closer to estimates with stronger support from the data.

Figure 6: — Post-processed estimated effects and .95 HDIs of taking chorea drugs continuously on chorea score, bradykinesia score, and composite summaries of disease state after one and two years compared to no treatment. For Motor1, chorea score, and bradykinesia score, effects <0 are beneficial. For the other outcomes, effects >0 are beneficial.

Tetrabenazine and antipsychotics were estimated to be comparably effective at treating chorea after one year. After two years, tetrabenazine was estimated to be more effective. Risperidone was estimated to be the most effective treatment after one year, though its effect estimate was pulled closer to the other drugs in the post-processed results.

No treatment had a .95 confidence interval or HDI excluding no effect on CUHDRS, an indicator of overall disease progression, but certain drugs were estimated to impact composite summaries of symptom categories. Tetrabenazine was estimated to have a large harmful effect on cognition after two years but not one year in the raw effect estimates, but the bootstrap CI of the two year effect was quite large. Post-processing brought this effect estimate much closer to zero, and the narrower .95 HDI included no effect. Similarly, olanzapine was estimated to have a negative effect on function after two years but not one year with a large standard error in the raw results, but this estimated effect was also brought closer to zero by post-processing. Tetrabenazine was also estimated to have a statistically significant beneficial effect on motor1 after both one and two years in both the raw and post-processed results.

Discussion

In the absence of randomized clinical trials, large observational disease registries are promising sources from which to generate evidence about causal effects of medications. Compared to other sources of longitudinal observational data, such as claims data or EHR, they have the advantage of containing repeated measures of clinical variables chosen specifically for their relevance to disease progression, potentially improving confounding adjustment. But, as we discussed, the possibility of confounding still remains, and results from observational studies will inevitably be considered somewhat exploratory. Hence, we should not hesitate to explore, estimating multiple related causal effects.

In this study, we provide an example of how to generate evidence on a set of causal questions concerning treatment effects over time from pooled observational longitudinal disease registries. We began by carefully defining our causal questions and populations of interest. Our restrictive population definition was motivated both by substantive interest and a desire to mitigate unobserved confounding at the expense of reduced sample size. We showed how to use parametric G-computation to adjust for confounding and handle the dual complications of irregular intervals between visits and variation in actual treatment patterns among those who were treated. We exploited low dimensional summaries (motor1, function1, cognition1, and CUHDRS) of a battery of periodic clinical measurements both to effectively adjust for confounding by underlying disease state and to define outcomes that represent underlying disease state. That our effect estimates for chorea and bradykinesia were broadly consistent with clinical knowledge gives some reassurance that we adequately adjusted for confounding.

Finally, we fit hierarchical Bayesian models to our effect estimates and standard errors both to improve the precision of our estimates by sharing strength across closely related studies and to address multiple comparisons concerns that might arise if one sifts through results looking for statistically significant differences in effects. The post-processed estimates might be considered more reasonable based on prior knowledge. For example, one year of risperidone treatment is unlikely to be as superior to alternative treatments at reducing chorea as the raw estimates suggest. In future work, we will explore more complex post-processing models that incorporate prior beliefs about how treatment effects are likely to vary over time and across outcomes.

Table 1:

Number of new users of each drug in our restricted chorea cohort.

Treatment	Users
Any Antipsychotic	256
Olanzapine	87
Risperidone	43
Tiapride	43
Tetrabenazine	81

Open in a new tab

Author contributions statement

Z.S. designed the statistical analysis; Z.Sun, Y.L, and A.M. constructed the dataset; C.S. guided clinical questions and helped conceive the project; J.H. guided analysis and helped conceive the project; all reviewed the manuscript.

References

[1].Soumya Ghosh, Zhaonan Sun, Ying Li, Yu Cheng, Amrita Mohan, Cristina Sampaio, Jianying Hu. An exploration of latent structure in observational huntingtons disease studies
[2].Zhaonan Sun, Ying Li, Soumya Ghosh, Yu Cheng, Amrita Mohan, Cristina Sampaio, Jianying Hu. A data-driven method for generating robust symptom onset indicators in huntingtons disease registry data. AMIA Annual Symposium Proceedings. 2017. [PMC free article] [PubMed]
[3].Ross CA, Aylward EH, Wild EJ, Langbehn DR, Long JD, Warner JH. Huntington disease: natural history, biomarkers and prospects for therapeutics. Nature reviews Neurology. 2014;10(4):204–216. doi: 10.1038/nrneurol.2014.24. [DOI] [PubMed] [Google Scholar]
[4].Emma Coppen, Raymund Roos. Current pharmacological approaches to reduce chorea in huntingtons disease. Drugs. 2017;77(1):29–46. doi: 10.1007/s40265-016-0670-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].James Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling. 1986;277(7.9-12):1393–1512. [Google Scholar]
[6].Sarah Taubman, Robins JM, Mittleman MA, Hernan MA. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. International journal of epidemiology. 2004;38(6):1599–1611. doi: 10.1093/ije/dyp192. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Ghosh S, Sun Z, Li Y, Cheng Y, Mohan A, Sampaio C, Hu J. An exploration of latent structure in observational huntingtons disease studies. AMIA Summits on Translational Science Proceedings. 2017;92 [PMC free article] [PubMed] [Google Scholar]
[8].Mestre T, Fitzer-Attas C, Giuliano J, Landwehrmeyer B, Sampaio C. Enroll-hd: A global clinical research platform for huntingtons disease (s25. 005) Neurology. 2016;86(16 Supplement):S25–005. [Google Scholar]
[9].Orth M, Handley OJ, Schwenke C. Observing Huntingtons disease: the European Huntingtons disease networks REGISTRY. PLoS Currents. 2011;2(RRN1184) doi: 10.1371/currents.RRN1184. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Tabrizi SJ, Scahill RI, Owen G, Durr A, Leavitt BR, Roos RA. Predictors of phenotypic progression and disease onset in premanifest and early-stage huntingtons disease in the track-hd study: analysis of 36-month observational data. The Lancet Neurology. 2013;12(7):637–649. doi: 10.1016/S1474-4422(13)70088-7. [DOI] [PubMed] [Google Scholar]
[11].Papoutsi M, Labuschagne I, Tabrizi SJ, Stout JC. The cognitive burden in huntingtons disease: pathology, phenotype, and mechanisms of compensation. EBioMedicine. 2015;29(5):673–683. doi: 10.1002/mds.25864. [DOI] [PubMed] [Google Scholar]
[12].Paulsen J, Langbehn D, Stout J, Aylward E, Ross C, Nance M. Detection of huntingtons disease decades before diagnosis: the predict-hd study. Journal of Neurology, Neurosurgery & Psychiatry. 2008;79(8):874880. doi: 10.1136/jnnp.2007.128728. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Huntington Study Group. Unified huntingtons disease rating scale: reliability and consistency. Movement Disorder. 1996;11:136–142. doi: 10.1002/mds.870110204. [DOI] [PubMed] [Google Scholar]
[14].Sweden: WHO Collaborating Centre for International Drug Monitoring Uppsala. World health organization world health organization (who) drug dictionary
[15].Olivier Bodenreider. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research. 2004;32(suppl_1):D267–D270. doi: 10.1093/nar/gkh061. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Schobel SA, Palermo G, Auinger P, Long JD, Khwaja OS, Trundell D, Cudkowicz M, Hersch S, Sampaio C, Dorsey ER, Leavitt BR, Kieburtz KD, Sevigny JJ, Langbehn DR, Tabrizi SJ. Motor, cognitive, and functional declines contribute to a single progressive factor in early hd. Neurology. 2017;89(24):2495–2502. doi: 10.1212/WNL.0000000000004743. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B. 1996. pp. 267–288.
[18].Andrew Gelman, Jennifer Hill. Data analysis using regression and multilevelhierarchical models. Cambridge University Press; 2014. [Google Scholar]
[19].Andrew Gelman, Jennifer Hill, Masanao Yajima. Why we (usually) don’t have to worry about multiple comparisons. Journal of Research on Educational Effectiveness. 2012;5(2):189–211. [Google Scholar]

[r1-3048931] [1].Soumya Ghosh, Zhaonan Sun, Ying Li, Yu Cheng, Amrita Mohan, Cristina Sampaio, Jianying Hu. An exploration of latent structure in observational huntingtons disease studies

[r2-3048931] [2].Zhaonan Sun, Ying Li, Soumya Ghosh, Yu Cheng, Amrita Mohan, Cristina Sampaio, Jianying Hu. A data-driven method for generating robust symptom onset indicators in huntingtons disease registry data. AMIA Annual Symposium Proceedings. 2017. [PMC free article] [PubMed]

[r3-3048931] [3].Ross CA, Aylward EH, Wild EJ, Langbehn DR, Long JD, Warner JH. Huntington disease: natural history, biomarkers and prospects for therapeutics. Nature reviews Neurology. 2014;10(4):204–216. doi: 10.1038/nrneurol.2014.24. [DOI] [PubMed] [Google Scholar]

[r4-3048931] [4].Emma Coppen, Raymund Roos. Current pharmacological approaches to reduce chorea in huntingtons disease. Drugs. 2017;77(1):29–46. doi: 10.1007/s40265-016-0670-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5-3048931] [5].James Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling. 1986;277(7.9-12):1393–1512. [Google Scholar]

[r6-3048931] [6].Sarah Taubman, Robins JM, Mittleman MA, Hernan MA. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. International journal of epidemiology. 2004;38(6):1599–1611. doi: 10.1093/ije/dyp192. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7-3048931] [7].Ghosh S, Sun Z, Li Y, Cheng Y, Mohan A, Sampaio C, Hu J. An exploration of latent structure in observational huntingtons disease studies. AMIA Summits on Translational Science Proceedings. 2017;92 [PMC free article] [PubMed] [Google Scholar]

[r8-3048931] [8].Mestre T, Fitzer-Attas C, Giuliano J, Landwehrmeyer B, Sampaio C. Enroll-hd: A global clinical research platform for huntingtons disease (s25. 005) Neurology. 2016;86(16 Supplement):S25–005. [Google Scholar]

[r9-3048931] [9].Orth M, Handley OJ, Schwenke C. Observing Huntingtons disease: the European Huntingtons disease networks REGISTRY. PLoS Currents. 2011;2(RRN1184) doi: 10.1371/currents.RRN1184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10-3048931] [10].Tabrizi SJ, Scahill RI, Owen G, Durr A, Leavitt BR, Roos RA. Predictors of phenotypic progression and disease onset in premanifest and early-stage huntingtons disease in the track-hd study: analysis of 36-month observational data. The Lancet Neurology. 2013;12(7):637–649. doi: 10.1016/S1474-4422(13)70088-7. [DOI] [PubMed] [Google Scholar]

[r11-3048931] [11].Papoutsi M, Labuschagne I, Tabrizi SJ, Stout JC. The cognitive burden in huntingtons disease: pathology, phenotype, and mechanisms of compensation. EBioMedicine. 2015;29(5):673–683. doi: 10.1002/mds.25864. [DOI] [PubMed] [Google Scholar]

[r12-3048931] [12].Paulsen J, Langbehn D, Stout J, Aylward E, Ross C, Nance M. Detection of huntingtons disease decades before diagnosis: the predict-hd study. Journal of Neurology, Neurosurgery & Psychiatry. 2008;79(8):874880. doi: 10.1136/jnnp.2007.128728. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13-3048931] [13].Huntington Study Group. Unified huntingtons disease rating scale: reliability and consistency. Movement Disorder. 1996;11:136–142. doi: 10.1002/mds.870110204. [DOI] [PubMed] [Google Scholar]

[r14-3048931] [14].Sweden: WHO Collaborating Centre for International Drug Monitoring Uppsala. World health organization world health organization (who) drug dictionary

[r15-3048931] [15].Olivier Bodenreider. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research. 2004;32(suppl_1):D267–D270. doi: 10.1093/nar/gkh061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16-3048931] [16].Schobel SA, Palermo G, Auinger P, Long JD, Khwaja OS, Trundell D, Cudkowicz M, Hersch S, Sampaio C, Dorsey ER, Leavitt BR, Kieburtz KD, Sevigny JJ, Langbehn DR, Tabrizi SJ. Motor, cognitive, and functional declines contribute to a single progressive factor in early hd. Neurology. 2017;89(24):2495–2502. doi: 10.1212/WNL.0000000000004743. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r17-3048931] [17].Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B. 1996. pp. 267–288.

[r18-3048931] [18].Andrew Gelman, Jennifer Hill. Data analysis using regression and multilevelhierarchical models. Cambridge University Press; 2014. [Google Scholar]

[r19-3048931] [19].Andrew Gelman, Jennifer Hill, Masanao Yajima. Why we (usually) don’t have to worry about multiple comparisons. Journal of Research on Educational Effectiveness. 2012;5(2):189–211. [Google Scholar]

PERMALINK

G-Computation and Hierarchical Models for Estimating Multiple Causal Effects From Observational Disease Registries With Irregular Visits

Zach Shahn, Ph.D.

Ying Li, Ph.D.

Zhaonan Sun, Ph.D.

Amrita Mohan, Ph.D.

Cristina Sampaio, M.D. Ph.D.

Jianying Hu, Ph.D.

Abstract

Introduction

Data

Figure 1:

Outcome Variables of Interest