Abstract
Accurately estimating the effect of an exposure on an outcome requires understanding how variables relevant to a study question are causally related to each other. Directed acyclic graphs (DAGs) are used in epidemiology to understand causal processes and determine appropriate statistical approaches to obtain unbiased measures of effect. Compartmental models (CMs) are also used to represent different causal mechanisms, by depicting flows between disease states on the population level. In this paper, we extend a mapping between DAGs and CMs to show how DAG-derived CMs can be used to compare competing causal mechanisms by simulating epidemiological studies and conducting statistical analyses on the simulated data. Through this framework, we can evaluate how robust simulated epidemiological study results are to different biases in study design and underlying causal mechanisms. As a case study, we simulated a longitudinal cohort study to examine the obesity paradox: the apparent protective effect of obesity on mortality among diabetic ever-smokers, but not among diabetic never-smokers. Our simulations illustrate how study design bias (e.g. reverse causation), can lead to the obesity paradox. Ultimately, we show the utility of transforming DAGs into in silico laboratories within which researchers can systematically evaluate bias, and inform analyses and study design.
Keywords: epidemiological study design, directed acyclic graphs, compartmental models, obesity paradox
1. Introduction
Designing analyses to accurately estimate the effect of an exposure on outcome requires understanding how variables relevant to a study question are causally related to each other. Directed acyclic graphs (DAGs) are diagrams used in epidemiology to graphically map causes and effects to separate associations due to causality versus those due to bias. Compartmental models (CMs) depict parameterized flows between disease states over time [1,2] and can be used to represent mechanisms, i.e. the explicit state transitions or processes sufficient for an exposure to lead to an outcome, underlying disease progression or transmission [3,4]. CMs are often implemented using ordinary differential equations (ODEs), commonly linear ODEs with each variable representing the number of individuals in a distinct state (e.g. healthy versus diseased). Compartmental models have a long history of use in medicine and public health see [5,6] and references in [7]. Further details on the compartmental modelling framework are given in the electronic supplementary material.
Given the causal nature of both DAGs and CMs, a question arises of whether these two approaches may be linked. Indeed, Ackley et al. provided a formal mapping from the basic building blocks of DAGs (e.g. causality, confounding and selection bias) to CMs [1]. See figure 1 for an example illustration and the electronic supplementary material for a review and more in-depth comparison between DAGs and CMs. Using this mapping, a DAG and CM are defined as ‘corresponding’ if they represent the same conditional independencies. This correspondence represents an exciting new development in linking DAGs and CMs—here we expand this idea to a general framework for study design and sensitivity analysis in practice. This step is necessary to understand how simulating DAGs can provide actionable insight from the relationships between variables in a study and ultimately, inform study design and analyses. Additionally, designing this framework has allowed us to identify and develop approaches to handle practical problems encountered when translating real DAGs into CMs (e.g. combinatorial explosion).
Figure 1.
From left to right: A simple DAG showing causality wherein exposure E causes outcome D. Next, in CM1, we will assume that E and D are both dichotomous, so the corresponding CM will have 2n states (where n = 2 since there are 2 random variables on the DAG). Additionally, D status does not affect E status. The notation denotes not X, so is unexposed. Thus the rates at which individuals become exposed (i.e. go from to E) are the same whether or not they have D—we note that equal rates are denoted by the same parameter value and if the parameter symbol is not indicated, distinct rates are assumed, i.e. causal effects correspond to unmarked transition rates (in this example, the and compartments have distinct transition rates for acquiring D, indicating a causal effect of E on D). This CM is further asserting that once an individual becomes diseased or exposed, they cannot return to the non-diseased or non-exposed state. In CM2, we see that individuals can move from E to , but their D status does not affect the rate at which they transition as indicated by the equal rates k2 between to and ED to . Both CM1 and CM2 would be considered corresponding with the given DAG.
In this paper, we extended the work by Ackley et al. by developing an operationalized workflow which uses the mapping (between DAGs and CMs) to simulate epidemiological studies. We also note some opportunities to simplify this mapping to reduce the combinatorial explosion of CM compartments that results from realistic DAGs (taking advantage of simplifications to the CM that can be included when conditioning on a variable to reflect the make-up of the study population). We illustrated our findings by deriving a CM from a published DAG representing an instance of the obesity paradox, wherein obese ever-smoking diabetics have lower mortality rates than their normal weight counterparts. We examined competing hypotheses underlying the obesity paradox by incorporating different potential biases into our CM and then simulating study data. Our method can be applied to nearly any DAG or study question to gain insight into what underlying causal mechanisms can explain patterns observed in epidemiological data. This insight can be used to reduce bias in study designs and ultimately obtain more accurate effect measures of an exposure on outcome.
2. Methods
2.1. Overview of the obesity paradox
The obesity paradox is the apparent protective effect of obesity on mortality among individuals with chronic diseases such as heart failure, stroke, or diabetes [8–11]. In this analysis, we used an observational study conducted by Preston et al. in which obese, ever-smoking (but not never-smoking) diabetics had lower mortality rates than their normal weight counterparts [9] as inspiration for our simulation study, because it is a clear example of an occurrence of the obesity paradox. We did not use the same dataset or aim to replicate the analysis or results in this study, rather we extended the published DAG representing the causal processes of interest, and used statistical analyses in the study to motivate examples of competing causal mechanisms that might be investigated. Figure 2a shows the published DAG from the observational study [9] representing the obesity paradox. The exposure is body mass index (BMI) and is coded as either overweight/obese (BMI ≥ 25 kg m−2) or normal weight (BMI ) and the outcome is mortality. Individuals are considered to have diabetes or prediabetes if their haemoglobin A1c is greater than 5.7%, or if they have been previously diagnosed. Smoking is a common risk factor for diabetes, mortality, and BMI, and is coded as ever-smoking (greater than or equal to 100 cigarettes over the course of an individual’s lifetime) or never-smoking (less than 100 cigarettes). The mortality rates were age-standardized according to the 2010 census using age groups 40–59 and 60–74. For simplicity of notation, we will refer to prediabetics and diabetics as ‘diabetics’ and overweight and obese as ‘obese’.
Figure 2.
All DAGs and corresponding CMs used in our obesity paradox simulation study. DAGs (left column) and corresponding CMs (right column) for each model. (a) Preston et al. [9] DAG and (b) corresponding CM; (c) adding in age-varying mortality rates and (d) corresponding CM; (e) reverse causation and (f) corresponding CM; finally, (g) combined model and (h) corresponding CM. In the DAGs, ‘BMI’ is the exposure and ‘mortality’ is the outcome. ‘Age’ and ‘smoking’ confound the relation between ‘BMI’ and ‘mortality’. The box around ‘diabetes’ indicates that the study population is conditioned on a single strata of diabetes and ‘=1’ indicates that only individuals with diabetes (coded as 1) are included in the study. Note, this is not specified on the CMs to simplify figures. Cachexia is represented by ‘U’, and ‘COPD’ is chronic obstructive pulmonary disease. With respect to the longitudinal DAGs, ‘H’ (history) denotes status before the study, ‘0’ denotes baseline and ‘1’ represents the first time point after baseline. There may be subsequent time points following this (denoted by the ‘·· ·’), that would require a new set of variables. In the CMs, mortality rates are denoted by dotted lines which point to empty space. Rates with no labels (including mortality rates) may all be distinct. For all rate naming conventions, refer to electronic supplementary material for the Preston et al. and age-structured models and for the reverse causation and combined models.
There have been numerous potential explanations proposed for the obesity paradox. Examples include reverse causation [9], confounding, selection bias [9,12], or inaccuracy of BMI to represent body composition [13]. In general, underlying causal mechanisms that have not properly been adjusted for in the analysis may cause bias. Other causal explanations include the fact that obese individuals may receive better medical treatment [14], or are chronic disease specific, e.g. obese individuals may be protected from plaque formation on their arteries through a greater mobilization of endothelial progenitor cells [15].
For the purposes of this study, we will define the obesity paradox based on the qualitative results of the Preston et al. study, i.e. the obesity paradox occurs when obese never-smoking diabetics have higher rates of mortality than normal weight never-smoking diabetics and obese ever-smoking diabetics have lower rates of mortality than normal weight ever-smoking diabetics. We also assumed that comparable individuals who are obese or ever-smokers always have higher mortality rates than their normal weight or never-smoking counterparts, respectively. In other words, we only considered biases in study designs (specifically reverse causation or selection bias) as potential explanations, rather than examining situations where we model obesity as actually being biologically protective.
To obtain an unbiased effect measure of BMI on mortality, we can refer to the structure of the DAG from Preston et al. (figure 2a). Overall, if we assume that there are no other sources of bias in the study, and no other common causes of the variables on the DAG, an unbiased effect estimate would require that we adjust for smoking status. Diabetes is a collider or common effect of smoking and BMI, and a mediator in the path between BMI and mortality. Conditioning on a collider creates a spurious association between its causes [16] however, adjusting for smoking removes the bias. To account for the fact that we are conditioning on a mediator (diabetes is a mediator on the path between BMI and mortality), we can assume that there are no additional unmeasured confounders and only consider the controlled direct effect of BMI on mortality i.e. when diabetes is held constant [17]. See electronic supplementary material for a more details on determining what to adjust for in the statistical analysis.
2.2. Workflow summary
We propose the following workflow to simulate epidemiological studies and conduct statistical analyses on CMs derived from DAGs:
-
1.
DAG and study design. Design or use an existing DAG representing the causal processes related to a given exposure and outcome, and then design an epidemiological study (alternatively, this method can be used to conduct sensitivity analyses on an existing study in which case one would use the existing study design). Using the DAG, determine which variables will be controlled for in the statistical analysis (see Step 5 below). In our analysis, we started with a published DAG [9].
-
2.DAG→CM mapping. Derive a CM from the DAG using the mapping described by Ackley et al. [1].
-
(a)Because multiple CMs may correspond to the given DAG, decide the appropriate CM based on the chosen study design and realistic mechanisms for the process of interest. In our CM, we used ODEs with exponential transition rates (i.e. exponential waiting time models). In our models, individuals can transition from never-smoking to ever-smoking, but not back to never-smoking. In general, the research question and hypotheses will guide how to correctly derive a CM from a given DAG since the correspondence between DAGs and CMs is not one to one. [1].
-
(b)Potentially reduce the state-space for the chosen CM based on the study population and biological processes included (e.g. mortality). In our analysis, the study design conditions on diabetes, so we only track individuals with diabetes, and can therefore simplify the model state-space to only include the diabetic states (as opposed to having a corresponding non-diabetic disease state for each diabetic disease state). Similarly, one could reduce the state-space by not including compartments for individuals who have died (corresponding to the mortality variable in the DAG), but rather only including the unique mortality out-flow rates from each compartment.
-
(a)
-
3.Simulation and sampling. Simulate the chosen study population using the CM based on predefined ranges of parameter values and initial conditions. In our analyses, we simulated a year-long longitudinal cohort study among diabetics aged 40–74 (this matched the ages of the population in the observational study) for each sampled parameter set.
- (a) Parameter and initial condition values and ranges can be determined based on the mechanism of interest, existing data, the literature, or simply broad ranges that encompass the plausible space of values (as were used in our analysis). Values may be (for example) uniformly sampled from these distributions using Latin hypercube sampling (LHS) [18].
- (b) Simulation of the study using the chosen CM can be implemented in a variety of ways, e.g. as ODEs (for sufficiently large populations) or as a stochastic model. Among smaller populations, it is often necessary to use a stochastic model, because the stochastic variability not captured by ODEs may lead to unexpected findings.
-
4.
Generate simulated data. Generate a simulated dataset based on the outcome of interest and measurement details of the study (e.g. number of follow-up time points, variables measured, potential sampling or measurement error that might be of concern). In our case, because individuals were followed up once at the end of the study, we made a single simulated dataset for the entire study (consisting of person-time and the outcome, mortality, by disease state over the course of the year). For simplicity and because we simulated a very large study (1 000 000 individuals), we did not examine issues of sample size or measurement error. After simulating a study, we subsequently calculated person-time (to estimate time at risk for the study population over the course of the study) and incident mortality by disease state.
-
5.
Analysis and evaluation. Run statistical analyses and/or calculate the causal parameter of interest using the simulated data in Step 4. Analyses may include calculation of a single effect estimate and/or a wide range of statistical regression methods (depending on what analyses are of interest/planned for the study). Next, evaluate the results to examine how the causal relationships and parameters included in the model affect potential biases and patterns of interest in the data. In our analysis, for our outcome, we calculated mortality rate ratios (MRRs) by dividing normal weight mortality rates by obese mortality rates among individuals within different smoking strata and then assessed whether each given model and study design could recreate the obesity paradox. For an example calculation of MRRs, see electronic supplementary material.
-
6.
Revision and exploration. Based on the results of Step 5, potentially alter the study design and/or DAG to explore alternative biases and causal mechanisms, then re-run the workflow. We did this by simulating epidemiological studies assuming different unadjusted study design biases (i.e. reverse causation and selection bias).
In the remainder of this paper, we assess how different underlying causal mechanisms might lead to the obesity paradox to illustrate the utility of this workflow. Example code that we used for our analyses which demonstrates this workflow is available on GitHub: https://github.com/epimath/cm-dag.
2.3. Simulating a longitudinal cohort study
We simulated a year-long cohort study to examine the relationship between obesity and mortality among diabetics aged 40–74. We followed up participants once at the end of the study to calculate person-time and incident mortality by disease state. We started with a population of 1 000 000 people and (for the age-structured models mentioned below) weighted according to their age group distribution in the 2010 United States (US) census [19]. See electronic supplementary material for more details on age-weighting for our study population.
2.4. Alternative compartmental models
We used four different ODE models to explore how our simulated datasets change with different proposed underlying causal mechanisms. See figure 2 for all DAGs and corresponding CMs. We began with Model 1, a direct conversion of the published DAG from Preston et al. [9] to a CM. See electronic supplementary material for details on how we converted this DAG and reduced the number of compartments on the CM. After following the workflow for Model 1, we explored other possible mechanisms that might lead to the obesity paradox. The other mechanisms used in this case study were inspired by the Preston et al. study and literature, to evaluate whether they can provide a plausible explanation for the obesity paradox—we note this simulation study cannot provide an actual explanation to the obesity paradox, only test whether particular hypothesized mechanisms can potentially generate the obesity paradox. Model 2 incorporated age-varying rates and was age-weighted according to the US census [19]. We split our population into a younger age group (ages 40–59) and an older age group (ages 60–74) and simulated the same model within strata of age. See electronic supplementary material for details on how we incorporated age into the DAG and CM. Model 3 represents reverse causation due to chronic obstructive pulmonary disease (COPD), a co-morbidity associated with diabetes for which smoking is a risk factor that can induce cachexia (loss of weight and muscle mass) and cause higher mortality rates [20–23] (thereby increasing mortality among a subset of normal weight ever-smokers). Individuals with comorbid diabetes and COPD can transition into an ‘unhealthy’ compartment, U. Individuals in U have lost weight due to cachexia and also have higher mortality rates than their normal weight ‘healthy’ counterparts (i.e. normal weight ever-smoking individuals with COPD who have not undergone cachexia). See electronic supplementary material for details on the underlying mechanism and how we incorporated reverse causation into the DAG and CM. We note that in this case when mortality is the outcome, reverse causation can more accurately be described as confounding by disease (i.e. disease affects both weight loss and mortality) [24]. However, this type of confounding is often termed ‘reverse causation’ [25,26] (as it is in Preston et al. [9]), and thus to be consistent with Preston et al. we will refer to it as reverse causation. Finally, Model 4 is a combination of Models 2 and 3. See electronic supplementary material for details on how we incorporated age and reverse causation into the DAG and CM.
In all CMs, once individuals die, they cannot move between disease states and we no longer track them, therefore for simplicity, mortality is treated as an outgoing flow from each compartment and was not included in the set of disease states. Similarly, in the CM diagrams, death rates (mortality) are typically drawn as arrows pointing to empty space, a convention we have also used in this study. In figure 2, transition rates that are not labelled are assumed to be distinct. For instance, the mortality rates (denoted by dotted lines), are different for each disease state and represent different causal effects on mortality (e.g. obesity, smoking history).
2.5. Parameterization of the compartmental model
We aimed to make minimal assumptions about parameter values, to derive generalizable insight into potential mechanisms underlying the obesity paradox. We conducted a sweep of parameters (transition and mortality rates) and initial states (denoted ‘parameter sets’) using LHS [18] to uniformly sample values from predefined ranges [18,27]. Specifically, we allowed all compartment transition rates to vary from 1% to 20% per year. For example, this results in between 1% to 20% of obese ever-smokers becoming normal weight over the course of the 1 year study. Although 20% is unrealistically high (especially, in the general population), we intentionally set a large range of parameter values to ensure that we capture realistic ranges and to see if any extreme scenarios might lead to the obesity paradox. Furthermore, we placed no restrictions on the number of individuals starting in each state and only ensured that the total number of individuals across all disease states equaled the study population at the start of the simulation. See electronic supplementary material for more details on the calculation of (and alternative methods to derive the) initial conditions. The initial conditions were determined by random sampling from the space of possible proportions of the total population in each compartment (i.e. we randomly chose the initial compartment values conditioned on the total being the correct total population size). We imposed biologically realistic restrictions on the mortality rates such that ever-smokers have a higher mortality rate than their never-smoking counterparts (i.e. within weight strata), and obese individuals have a higher mortality rate than their normal weight counterparts (i.e. within smoking strata). In the age-structured models, older age group mortality rates for a given disease state were determined by multiplying the younger age group mortality rate of the same state by a multiplier between 1 and 2. Finally, in the reverse causation models, we derived the mortality rate in the U compartment by multiplying the mortality rate of normal weight healthy ever-smokers with COPD by a cachexia multiplier between 1 and 2 (similar to the age multiplier in Model 2). For instance, the mortality rate for smoking in the older age group is the baseline mortality rate plus the smoking mortality rate add-on value. This sum is then multiplied by the age-varying mortality multiplier. Note, we incorporate both add-ons and multipliers in our mortality rates to illustrate different simple options for defining rates relative to each other while making minimal assumptions. While these simplifying assumptions ensure that risk factors such as obesity always increase mortality, effects on mortality can be modelled generically as arbitrary different values when there is a causal effect between two states. Overall, each model represents different underlying causal mechanisms and running a model on a given parameter set represents a single simulated study. See electronic supplementary material for more details on sampling transition and mortality rates for each model and table 1 for all LHS ranges.
Table 1.
Parameters and Latin hypercube sampling ranges for all models.
| parameters | range | models |
|---|---|---|
| normal weight never-smoking to obese never-smoking | 0.01 to 0.2 | all models |
| obese never-smoking to normal weight never-smoking | 0.01 to 0.2 | all models |
| smoking initiation rate | 0.01 to 0.2 | all models |
| normal weight ever-smoking to obese ever-smoking | 0.01 to 0.2 | all models |
| obese ever-smoking to normal weight ever-smoking | 0.01 to 0.2 | all models |
| COPD incidence rate | 0.01 to 0.2 | Model 3 and combined model |
| normal weight ever-smoking with COPD to obese ever-smoking with COPD | 0.01 to 0.2 | Model 3 and combined model |
| obese ever-smoking with COPD to normal weight ever-smoking with COPD | 0.01 to 0.2 | Model 3 and combined model |
| cachexia initiation rate | 0.01 to 0.2 | Model 3 and combined model |
| baseline mortality rate | 0.01 to 0.1 | all models |
| add-on for smoking | 0 to 0.1 | all models |
| add-on for obesity | 0 to 0.1 | all models |
| age-varying mortality multiplier | 1 to 2 | Model 2 and combined model |
| add on for COPD | 0 to 0.1 | Model 3 and combined model |
| cachexia (U) multiplier | 1 to 2 | Model 3 and combined model |
2.5.1. Data generation and statistical analysis
After running each model with 10 000 randomly sampled parameter sets [28], we calculated person-time and incident deaths per compartment for each study (i.e. for each model and sampled parameter set). See electronic supplementary material for more information on these calculations. Next, we calculated MRRs comparing normal weight to obese individuals within smoking strata to measure the effect of BMI on mortality. As mentioned, to recreate the obesity paradox (as per [9]), the MRRs from a simulated dataset (i.e. study) must simultaneously show normal weight never-smokers with lower mortality rates than their obese counterparts, and normal weight ever-smokers with higher mortality rate than their obese counterparts.
In Model 1, we measured all compartments and calculated the MRRs directly from the simulated data. In Model 2 (age), we initially did not adjust for age as a confounder. Rather, MRRs were calculated by taking the sum of incident deaths divided by the sum of person-time for a given disease state across age groups. As a sensitivity analysis, we adjusted for age by externally standardizing the MRRs to the reference (obese) group [29]. Finally, in Model 3 (reverse causation), our study design did not initially adjust COPD or related complications (i.e. cachexia). Therefore individuals with COPD were measured together with ever-smokers (e.g. in our study population, all normal weight individuals with COPD including those with cachexia were measured together with normal weight ever-smokers). The MRRs were calculated in the same way as we did for Models 1 and 2. Therefore, we initially did not adjust for COPD or cachexia. As a sensitivity analyses, we adjusted for reverse causation by excluding all individuals with COPD (including those with cachexia) at baseline (and then ran the study for 1 and 5 years). Finally, in the combined model, we ignored age, COPD and cachexia in our initial analysis, and then adjusted for age only, COPD only and finally, age and COPD.
All simulations and analyses were conducted in R v. 3.3.3 [30]. Compartmental models were run using the ode function from the ‘deSolve’ package which uses lsoda (an adaptive stiff/non-stiff integrator) [31].
3. Results
Overall, we found that not adjusting for study design bias in our CMs resulted in the obesity paradox. See table 2 and figure 3 for all results.
Table 2.
Results from all analyses.
| models | unadjusted analysis | adjusted analysis |
|---|---|---|
| Model 1: published DAG | no obesity paradox | n.a. |
| Model 2: adding in age-varying mortality rates | obesity paradox occurs—when there are more younger obese or more older normal weight individuals (selective survival bias) | adjusting for age stops the obesity paradox from occurring |
| Model 3: reverse causation | obesity paradox occurs—more than in Model 2 because the mechanism directly affects normal weight individuals (reverse causation bias) | excluding those with COPD at baseline stops the obesity paradox from occurring (for the 1 year, but not the 5 year study) |
| Model 4: combined | obesity paradox occurs—predominant mechanism is reverse causation | interactive effects between biases |
Figure 3.

Results from all model runs and parameter sets. Each plot represents the MRR (mortality rate ratio) comparing normal weight individuals to obese individuals for never-smokers against the corresponding MRR for ever-smokers for each of the 10 000 LH-sampled parameter sets with each point representing a single simulated study. The obesity paradox occurs when obese never-smoking diabetics have higher rates of mortality than normal weight never-smoking diabetics and obese ever-smoking diabetics have lower rates of mortality than normal weight ever-smoking diabetics. By row: 1. Preston et al. [9] 2. adding in age-varying mortality rates; 3. reverse causation and 4. combined model.
In Model 1, we did not see the obesity paradox because the MRRs from the simulated data were simply the ratio of the CM mortality rate parameters (see figure 3a). For instance, the ever-smoker MRR is just the mortality rate of normal-weight diabetic ever-smokers (NWDS) divided by the mortality rate of obese diabetic ever-smokers (ODS) (see electronic supplementary material for more details). Due to the structure of Model 1 and the restrictions we placed on the parameter values, mortality rates for normal weight individuals were always lower than (or at the very least equal to) their obese counterparts therefore, all ever-smoking MRRs were less than or equal to 1. Overall, Model 1 cannot simulate a protective effect of obesity on mortality among diabetic ever-smokers.
Next, in Model 2, the obesity paradox did occur in a subset of studies (See figure 3b). Overall, among model runs that resulted in the obesity paradox, there were generally either more younger individuals in the obese ever-smoking compartment and/or more older individuals in the normal weight ever-smoking compartment. This caused the mortality effects of age to counterbalance those of obesity, resulting in the obesity paradox. In other words, for the obesity paradox to occur, age-varying mortality must be sufficiently high and work together with the relative age distribution of individuals across disease states. This is analogous to selective survival bias in which obese ever-smoking individuals are more likely to die before they reach older ages, thus there would tend to be more older normal weight ever-smokers than older obese ever-smokers. An illustration of this is the trade-off between the proportion of old versus young individuals who are in the ODS compartment and the relative mortality rate of NWDS versus ODS (shown in figure 4). The majority of parameter sets that resulted in the obesity paradox show the proportion of individuals in the older age group among all ODS is less than 50%. Additionally, the effect of obesity on mortality is relatively low (i.e. the NWDS mortality rate is consistently similar to the ODS mortality rate in the parameter sets that resulted in the obesity paradox). Finally, as the proportion of individuals in the older age group increases, the effect of obesity on mortality decreases even more in model runs that resulted in the obesity paradox. This is analogous to obesity becoming less risky as individuals age [32]. In the age-standardized sensitivity analysis, no runs resulted in the obesity paradox (results not shown; similar to the baseline model).
Figure 4.
Age-weighting and relative mortality among Model 2 runs. The proportion of obese diabetic ever-smokers (ODS) who are old at the beginning of the simulation is displayed on the x-axis e.g. if equal to 0.5, half of the individuals in ODS are in the older age group and half are in the younger age group. The relative mortality of normal weight diabetic ever-smokers (NWDS) to ODS is displayed on the y-axis e.g. if equal to 0.5, the NWDS mortality rate would be half of the ODS mortality rate. Parameter sets that resulted in the obesity paradox are in red and sets that did not result in the obesity are in blue. When NWDS mortality is close to ODS mortality, having ODS be primarily younger can counterbalance the higher mortality due to obesity.
In Model 3 (compared with Model 2), more runs resulted in the obesity paradox (figure 3c). This is due to the fact that the reverse causation mechanism differentially affects normal weight ever-smoking individuals (compared with obese ever-smoking and normal weight never-smoking individuals). Therefore, the obesity paradox depends on (1) the relative obese and normal weight mortality rates (for both healthy and unhealthy individuals) and (2) the distribution of individuals in healthy and unhealthy compartments. On the other hand, in Model 2, age-related mortality affects normal weight and obese individuals as well as ever-smoking and never-smoking individuals in the same manner and thus relies on the population distribution across more compartments i.e. both age groups in ever-smoking obese and normal weight and never-smoking obese and normal weight compartments. Because healthy and unhealthy normal weight ever-smokers are measured together in our observational study, the unhealthy mortality rate increases the combined (healthy and unhealthy) normal weight ever-smoking mortality rate such that the overall normal weight ever-smoking mortality rate is higher than the obese ever-smoking mortality rate and the obesity paradox occurs. When examining results among ever-smokers only, the weighting of the overall normal weight mortality rate is revealed in the relative proportion of individuals starting in different disease states (figure 5). For instance, for runs in which the obesity paradox occurs, the relative mortality rate of individuals who are unhealthy compared to those who are obese ever-smokers increases when fewer normal weight individuals start in the unhealthy compartment.
Figure 5.
Age-weighting and relative mortality among Model 3 runs. The proportion of normal weight individuals who are unhealthy at the beginning of the simulation is displayed on the x-axis. The relative mortality of unhealthy individuals to obese diabetic ever-smokers (ODS) is displayed on the y-axis i.e. if equal to 2, the U mortality rate would be twice of the ODS mortality rate. Parameter sets that resulted in the obesity paradox are in red and sets that did not result in the obesity are in blue. As U mortality becomes substantially higher than ODS mortality, having fewer individuals starting in U is sufficient to counterbalance the higher mortality due to obesity.
The results from our sensitivity analyses reveal that excluding individuals with COPD at baseline reduced the number of model runs that result in the obesity paradox to 1 (compared with 3114 in the unadjusted version). If we run the study for 5 years to evaluate whether the obesity paradox might emerge over time, only 82 model runs resulted in the obesity paradox (results not shown). This highlights the importance of inclusion and exclusion criteria in an initial study population in recreating the obesity paradox.
Finally, in the combined model, we found that reverse causation leads to the obesity paradox substantially more than age-weighting (selective survival). This is evidenced by the fact that when we adjust for age only, the obesity paradox still occurs in 97.1% of the runs in which it originally occurred (3017/3107), while if we adjust for reverse causation only, the obesity paradox occurs in 12.1% (376/3107) of the runs in which it originally occurred. The results from our sensitivity analyses reveal that when we control for both age and COPD, the obesity paradox is avoided almost completely. Interestingly, when we standardize age or exclude individuals with COPD only, certain parameter sets that did not previously result in the obesity paradox, now demonstrate the obesity paradox. This indicates a ‘two wrongs make a right’ interactive effect between these two biases: for instance, if the majority of normal weight individuals are younger, this might counteract the effects of a high proportion of individuals starting in U in the unadjusted model, but if we adjust for age only, the high proportion of individuals starting in U may result in the obesity paradox.
4. Conclusion
We have developed a workflow that can be used to explicitly examine the underlying conditional independencies of DAGs. This method provides a systematic way to quantitatively simulate and evaluate bias and provide insight into the causal relationships between variables in a study. Our workflow can be applied to nearly any study question assuming standard assumptions (e.g. no faithfulness violations on DAGs [33]). Previous work by Ackley et al. [1], provided a basic mapping from DAGs to CMs with simple examples (e.g. confounding, collider bias). We extended this work by noting opportunities to reduce the state-space (e.g. by removing diabetic compartments) and creating a workflow to use this mapping to simulate epidemiological studies and assess potential biases affecting study results. Finally, we applied this workflow by using and adapting a published DAG [9]. Importantly, although we made explicit parametric assumptions (stronger than those used on a DAG), we allowed for a large range of transition rates and initial conditions such that we only made minimal quantitative assumptions (e.g. we assumed that the maximum number of individuals that transition between a given disease state to another is 20% over the course of the year). In future work, parameter values could be informed using study- and setting-specific data if available.
Modelling DAGs and conducting simulated studies can provide insight into how to design sound observational studies and analysis plans. For instance, if results from a simulated study do not match expected results, this may imply the existence of unmeasured and/or unadjusted covariates, or interacting biases as we found in our obesity paradox simulation study. Although in this case, traditional analyses using DAGs would have likely found the same main sources of bias (i.e. unadjusted covariates), our method also identified some additional biases that would not have been easily identified using traditional methods (e.g. interaction between age and reverse causation, or excluding individuals with COPD at baseline and then running the study for 5 years shows that biases that were initially adjusted for could re-emerge over time depending on the length of follow-up). Overall, we simulated epidemiological study data in a structured manner based on the conditional independencies of DAGs to test different hypotheses. Additionally, this framework allows us to explore and simulate these biases interactively and over time, observing (for example) how different measurement times, sampling designs, etc., might affect potential biases or impact which variables are most critical to measure.
We successfully recreated the obesity paradox by deriving a compartmental model from a published DAG [9] and then incorporating two different unadjusted biases. In Model 1, we found that direct conversion of the published DAG was not able to recreate the obesity paradox. In Model 2, we incorporated age-varying mortality and found that the relative proportion of individuals in different age groups across disease states can create a selective survival bias causing the obesity paradox. In Model 3, we found that reverse causation caused by an unmeasured disease state can more effectively cause the obesity paradox compared with the age-varying mortality model. The reverse causation mechanism was more effective because it differentially affected normal weight ever-smoking individuals (compared with obese ever-smoking and normal weight never-smoking individuals). Finally, in the combined model, we observed how different biases can interact to cause or prevent the obesity paradox from occurring. Overall, adjusting for biases in these models (sensitivity analyses) made the obesity paradox nearly non-existent, indicating that incorporating bias and not adjusting for it correctly is required to recreate the obesity paradox (assuming the protective effect of obesity is not truly present, and that we have sufficient sample size). Ultimately, even with very general parameter assumptions for our model, we were able to derive insight into what causal mechanisms may drive the obesity paradox. In the literature, both selection bias [34] and reverse causation [9,25] have been suggested as potential causes of the obesity paradox in previous studies. Our results suggest that, over the parameter ranges we explored (and for this specific study design and simulated population), reverse causation may be more effective than age-related mortality at generating the obesity paradox, however more realistic, specified modelling studies are needed to further explore these issues. Moreover, other study biases likely co-exist and depend on the specific dataset and study design. However in general, in situations where only a limited number of variables can be measured e.g. due to logistical constraints, our workflow could be used to identify which biases are more important to account for (i.e. reverse causation in our analyses) and therefore which variables to measure.
Overall, our analyses were primarily meant to illustrate the utility of our workflow and further study would be needed (e.g. with more potential causes or biases, and estimation or sampling of parameters from data) to thoroughly investigate the causes of the obesity paradox. Therefore, we drew our inspiration from the Preston et al. study [9], but did not aim to recreate their results directly. Depending on the goals of the study, our workflow can be used to quantitatively recreate the results of existing studies which can be used to more precisely derive new insight into which study design biases are predominant, how biases might interact, or what combinations of factors lead to a specific scientific conclusion.
Although DAGs alone can (and are often used to) indicate the presence or absence of bias, they do not include the magnitude or direction of effects (although one may be able to infer direction based on the type of bias). There have been a number of recent methods and software developed to allow the incorporation of different parametric assumptions and direct simulation of DAGs [35–37]. While these methods will in many cases likely lead to the same results as our workflow (and in some cases may be equivalent to a type of CM), there are a few key advantages of translating DAGs into CMs. First, CMs can include transitions between disease states that are not the encoded on the DAG, by setting transition rates equal to each other for different compartments (see figure 1, for an example). Additionally, DAGs are restricted by the limitations of statistical analyses e.g. no interference between units assumption [38], while CMs can be expanded to examine processes that may violate this assumption, such as infectious disease processes, as was illustrated in [1]. There are also limited options to encode different functional form relationships between variables in DAGs. For instance, DAGs do not have a formal way to represent effect modification [39]. Because CMs are more explicit in encoding the natural history of individuals in a dynamic way, they represent causality differently from DAGs. For a given mechanism under consideration, CMs (1) define the steps sufficient to be able to progress to a given outcome e.g. cachexia, and (2) track the dynamics of the number of individuals in disease states over time. Therefore, CMs represent a population akin to what would be seen in a study or in the real-world, which can be advantageous but also requires more detailed information or assumptions to construct. Thus, using CMs can help to develop understanding and intuition for the process under consideration. This allows the CM to act as a sort of ‘population laboratory’ in which different causal processes and mechanisms can be explored and simulated. Overall, our workflow can be used to simulate epidemiological studies while allowing for the simultaneous incorporation of different types of bias and different underlying causal mechanisms. Although this may also be possible through the direct simulation of DAGs, CMs are highly flexible and intuitively causal. A useful direction for future work will be to compare these different DAG simulation approaches and evaluate the different aspects of a given study/dataset each can be used to explore.
Notably, the sufficient-component causes framework has been incorporated into DAGs and can provide detail about underlying mechanisms by illustrating how causes interact for a given outcome to occur [40]. This is akin to detailing the individual steps sufficient for a given outcome in CMs, though this method does not require explicitly separating all steps in the sequence of events for a given outcome. One advantage of this method is that it makes relationships between causes more explicit e.g. conditionally independent or synergistic. In a CM, these relationships may only appear implicitly in the transition rates. Additionally, this method requires the inclusion of all possible sufficient causes, while a CM only includes the mechanisms under consideration.
Limitations of this study include the fact that the DAGs we use are overly simplified (despite our use of a published DAG) and do not represent the complete state of knowledge about the relationships between variables relevant to the study question. We decided to use relatively simple DAGs to more effectively illustrate our workflow. It is simple to make more realistic DAGs by adding additional demographic characteristics e.g. race, socio-economic status, access to medical treatment and including these would simply require vectorizing our equations further (as we did for the extension from Model 1 to Model 2). However, since we are not fitting these models to study data that includes these variables, we would have added more parameters to our models without truly adding any information. Because each new DAG variable doubles the number of equations in the CM, this would add complexity without insight. We aimed to strike a balance between realism and parsimony in our models to isolate and examine the qualitative effects of individual causal mechanisms of interest. For instance, the effects of race may counteract the effects of age leading to overly complicated results (i.e. identifiability issues may obscure the larger point). A potential future direction is to construct larger DAGs from the literature and make simplifying assumptions to reduce the corresponding CM’s dimensionality (such as including only one variable among a collinear set). For instance, suppose both BMI at baseline and BMI history are included on a given DAG, one could assume that history is a proxy for baseline BMI among e.g. adults [41] and collapse these two variables into a single BMI variable. The robustness of results to this simplifying assumption can also be explored using our workflow. Relatedly, our workflow could also be used to identify which variable(s) on a DAG is sufficient or necessary to replicate a particular pattern in the data (e.g. by systematically removing variables and simulating the results). Finally, individually based models may be used for study questions requiring more detailed demography. Furthermore, individually based models can be used to explicitly track individuals and their transitions between disease states. Another weakness is that our crude estimate of person-time (see electronic supplementary material) will not work if the dynamics of the model are very fast. It is possible to calculate person-time precisely by tracking the flows in and out of compartments separately. Lastly, the exponential transitions between disease states used in our CMs were chosen for the sake of simplicity and may not be realistic. To address this simplifying assumption, we also tested incorporating a gamma distributed transition rate from normal weight individuals with COPD to cachexia in the reverse causation model (see electronic supplementary material for model schematic). Ultimately, we found that incorporation of this different functional form reduced the number parameter sets that resulted in the obesity paradox in the 5-year adjusted analysis (i.e. after excluding individuals with COPD at baseline and running the study for 5 years), however, the overall qualitative conclusions did not change. More generally, changing the formulation of our parameters (e.g. by making the parameters time varying or alternatively, gamma distributed) is an important topic for exploration, although it would be less likely to affect the overall qualitative conclusions of our particular simulation study, because the factors leading to the obesity paradox (i.e. the restrictions we placed on the relative values of mortality rates and how we measured the variables in our study) would still be the same. Broader distribution types beyond gamma distributions are also possible to incorporate into ODE models (e.g. Erlang mixture distributions and more general families of dwell time distributions [42]), or a more generalized stochastic framework can be used.
Strengths of this study include the methodological contributions to using CMs in conjunction with DAGs to understand patterns seen in the data. We extended the mapping that Ackley et al. developed [1] and proposed a method for comparing simulated data with epidemiological study data. This method can be expanded for different types of epidemiological analyses and can also be used for different purposes e.g. relaxing statistical assumptions, multifaceted sensitivity analyses or exploring counterfactual scenarios. We were able to show that the initial, simple DAG presented in the Preston et al. paper did not on its own reproduce the obesity paradox and then proposed alternative mechanisms and DAGs that could recreate the obesity paradox. Furthermore, we gained insight into what hypothetical causal mechanisms could result in the obesity paradox with limited data informing our model. Additionally, conducting the random sweep (i.e. LHS) of the parameters and initial conditions allowed us to account for uncertainty and draw general qualitative conclusions about the structure of the model and its effects on our statistical results. Ultimately, our workflow can help explicate causal mechanisms to explore whether or not DAGs are valid representations of hypotheses in question even when data is limited. Additionally, CMs derived from DAGs can be used as a testing ground for competing causal mechanisms to determine which ones can most closely explain patterns seen in observational study data. This represents a departure from the standard paradigm of fitting CMs to epidemiological data, where instead, here we operationalize causal relationships depicted on the DAG to simulate epidemiological study data.
Additional future research can include other statistical analyses on simulated data. For instance, a Poisson regression model (for count data) can calculate MRRs and can be useful if conditioning on multiple variables (see electronic supplementary material). Alternatively, simulated data can be individuated and other types of regression models can be run. Model parameters can be tuned to quantitatively recreate specific datasets which might be useful for gaining insight into specific study results or a specific target population. Additionally, model parameters can be informed directly from data. For instance, see electronic supplementary material for notes on how to parameterize the mortality rates from data. Similarly, the data collection process itself can be simulated in the compartmental model, allowing one to assess how issues such as measurement error or insufficient power might affect the relationships reflected in the DAG. Finally, the DAG-derived CMs could also be linked with real-world data to accomplish the parameter estimation/inference step itself (i.e. without modelling additional statistical analyses).
Overall, we presented here a new utility for CMs derived from DAGs: testing hypotheses to understand patterns seen in study data. We also proposed a method to compare simulated data with epidemiological study data that can be used to test competing hypotheses. We used our method to determine that a DAG from the literature was not complete and could not recreate the obesity paradox by itself. We, therefore, simulated two alternative causal mechanisms and derived corresponding DAGs that could recreate the qualitative results of the study. Ultimately, simulating study data by operationalizing the causal relationships on DAGs can provide insight into how to design sound observational studies and analysis plans.
Supplementary Material
Acknowledgements
We thank Dr Jon Zelner Department of Epidemiology, University of Michigan, for his helpful comments and advice on the analysis, Dr Nancy Fleischer Department of Epidemiology, University of Michigan, for her helpful advice about DAGs, and Consulting for Statistics, Computing and Analytics Research (CSCAR), University of Michigan, for their useful advice on the analysis. We would also thank Drs Joseph Eisenberg, Rafael Meza and Veronica Berrocal for their thoughtful comments on the dissertation version of this manuscript.
Data accessibility
Example code demonstrating the workflow is available on GitHub: https://github.com/epimath/cm-dag.
Authors' contributions
J.H. and M.C.E. came up with the research plan. J.H. analysed the data and wrote the paper. M.C.E. provided guidance throughout the entire process and provided key edits and feedback on the manuscript.
Competing interests
We declare we have no competing interest.
Funding
This work was supported by the National Institute of General Medical Sciences (grant no. U01GM110712).
References
- 1.Ackley SF, Mayeda ER, Worden L, Enanoria WT, Glymour MM, Porco TC. 2017. Compartmental model diagrams as causal representations in relation to DAGs. Epidemiol. Methods 6, 20160007 ( 10.1515/em-2016-0007) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Joffe M, Gambhir M, Chadeau-Hyam M, Vineis P. 2012. Causal diagrams in systems epidemiology. Emerg. Themes Epidemiol. 9, 1 ( 10.1186/1742-7622-9-1) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Moolgavkar SH, Knudson AG. 1981. Mutation and cancer: a model for human carcinogenesis. JNCI: J. Natl Cancer Inst. 66, 1037–1052. ( 10.1093/jnci/66.6.1037) [DOI] [PubMed] [Google Scholar]
- 4.Kermack WO MA. 1927. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. A 115, 700–721. ( 10.1098/rspa.1927.0118) [DOI] [Google Scholar]
- 5.Levy G, Gibaldi M, Jusko WJ. 1969. Multicompartment pharmacokinetic models and pharmacologic effects. J. Pharm. Sci. 58, 422–424. ( 10.1002/jps.2600580406) [DOI] [PubMed] [Google Scholar]
- 6.DiStefano JJ III, Fisher DA. 1976. Peripheral distribution and metabolism of the thyroid hormones: a primarily quantitative assessment. Pharmacol. Ther. Part B: Gen. Syst. Pharmacol. 2, 539–570. [DOI] [PubMed] [Google Scholar]
- 7.DiStefano J., III 2015. Dynamic systems biology modeling and simulation. New York, NY: Academic Press. [Google Scholar]
- 8.Carnethon MR. et al. 2012. Association of weight status with mortality in adults with incident diabetes. Jama 308, 581–590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Preston SH, Stokes A. 2014. Obesity paradox: conditioning on disease enhances biases in estimating the mortality risks of obesity. Epidemiology (Cambridge, Mass) 25, 454 ( 10.1097/EDE.0000000000000075) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Oga EA, Eseyin OR. 2016. The obesity paradox and heart failure: a systematic review of a decade of evidence. J. Obes. 2016, 1–9. ( 10.1155/2016/9040248) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Oesch L, Tatlisumak T, Arnold M, Sarikaya H. 2017. Obesity paradox in stroke—myth or reality? A systematic review. PLoS ONE 12, e0171334 ( 10.1371/journal.pone.0171334) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Banack HR, Kaufman JS. 2015. From bad to worse: collider stratification amplifies confounding bias in the ‘obesity paradox’. Eur. J. Epidemiol. 30, 1111–1114. ( 10.1007/s10654-015-0069-7) [DOI] [PubMed] [Google Scholar]
- 13.De Schutter A, Lavie CJ, Kachur S, Patel DA, Milani RV. 2014. Body composition and mortality in a large cohort with preserved ejection fraction: untangling the obesity paradox. Mayo Clin. Proc. 89, 1072–1079. ( 10.1016/j.mayocp.2014.04.025) [DOI] [PubMed] [Google Scholar]
- 14.Schenkeveld L, Magro M, Oemrawsingh RM, Lenzen M, de Jaegere P, van Geuns RJ, Serruys PW, van Domburg RT. 2012. The influence of optimal medical treatment on the obesity paradox, body mass index and long-term mortality in patients treated with percutaneous coronary intervention: a prospective cohort study. BMJ Open 2, e000535 ( 10.1136/bmjopen-2011-000535) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Biasucci LM, Graziani F, Rizzello V, Liuzzo G, Guidone C, De Caterina AR, Brugaletta S, Mingrone G, Crea F. 2010. Paradoxical preservation of vascular function in severe obesity. Am. J. Med. 123, 727–734. ( 10.1016/j.amjmed.2010.02.016) [DOI] [PubMed] [Google Scholar]
- 16.Hernán MA, Hernández-Díaz S, Robins JM. 2004. A structural approach to selection bias. Epidemiology 15, 615–625. ( 10.1097/01.ede.0000135174.63482.43) [DOI] [PubMed] [Google Scholar]
- 17.Richiardi L, Bellocco R, Zugna D. 2013. Mediation analysis in epidemiology: methods, interpretation and bias. Int. J. Epidemiol. 42, 1511–1519. ( 10.1093/ije/dyt127) [DOI] [PubMed] [Google Scholar]
- 18.Stein M. 1987. Large sample properties of simulations using Latin hypercube sampling. Technometrics 29, 143–151. ( 10.1080/00401706.1987.10488205) [DOI] [Google Scholar]
- 19.Howden LM, Meyer JA. 2011 Age and sex composition. U.S. Census Bureaus 2010 Census Briefs, C2010BR-03, May. See https://www.census.gov/prod/cen2010/briefs/c2010br-03.pdf . [Google Scholar]
- 20.Willi C, Bodenmann P, Ghali WA, Faris PD, Cornuz J. 2007. Active smoking and the risk of type 2 diabetes: a systematic review and meta-analysis. Jama 298, 2654–2664. ( 10.1001/jama.298.22.2654) [DOI] [PubMed] [Google Scholar]
- 21.Itoh M, Tsuji T, Nemoto K, Nakamura H, Aoshiba K. 2013. Undernutrition in patients with COPD and its treatment. Nutrients 5, 1316–1335. ( 10.3390/nu5041316) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.von Haehling S, Anker SD. 2010. Cachexia as a major underestimated and unmet medical need: facts and numbers. J. Cachexia, Sarcopenia Muscle 1, 1–5. ( 10.1007/s13539-010-0002-6) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Mannino DM, Thorn D, Swensen A, Holguin F. 2008. Prevalence and outcomes of diabetes, hypertension and cardiovascular disease in COPD. Eur. Respiratory J. 32, 962–969. ( 10.1183/09031936.00012408) [DOI] [PubMed] [Google Scholar]
- 24.Flegal KM, Graubard BI, Williamson DF, Cooper RS. 2010. Reverse causation and illness-related weight loss in observational studies of body weight and mortality. Am. J. Epidemiol. 173, 1–9. ( 10.1093/aje/kwq341) [DOI] [PubMed] [Google Scholar]
- 25.Tobias DK, Pan A, Jackson CL, O’Reilly EJ, Ding EL, Willett WC, Manson JE, Hu FB. 2014. Body-mass index and mortality among adults with incident type 2 diabetes. N. Engl. J. Med. 370, 233–244. ( 10.1056/NEJMoa1304501) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Robins JM. 2008. Causal models for estimating the effects of weight gain on mortality. Int. J. Obes. 32, S15–S41. ( 10.1038/ijo.2008.83) [DOI] [PubMed] [Google Scholar]
- 27.Marino S, Hogue IB, Ray CJ, Kirschner DE. 2008. A methodology for performing global uncertainty and sensitivity analysis in systems biology. J. Theor. Biol. 254, 178–196. ( 10.1016/j.jtbi.2008.04.011) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Loeppky JL, Sacks J, Welch WJ. 2009. Choosing the sample size of a computer experiment: a practical guide. Technometrics 51, 366–376. ( 10.1198/TECH.2009.08040) [DOI] [Google Scholar]
- 29.Naing NN. 2000. Easy way to learn standardization: direct and indirect methods. Malaysian J. Med. Sci.: MJMS 7, 10. [PMC free article] [PubMed] [Google Scholar]
- 30.R Core Team. 2017. R: a language and environment for statistical computing. R package version 3.3.3. Vienna, Austria: R Foundation for Statistical Computing. See https://www.R-project.org/.
- 31.Soetaert K, Petzoldt T, Setzer RW. 2010. Solving differential equations in R: package deSolve. J. Stat. Softw. 33, 1–25. ( 10.18637/jss.v033.i09)20808728 [DOI] [Google Scholar]
- 32.Hainer V, Aldhoon-Hainerová I. 2013. Obesity paradox does exist. Diabetes Care 36(Suppl. 2), S276–S281. ( 10.2337/dcS13-2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Koski T, Noble J. 2011. Bayesian networks: an introduction. Chichester, UK: John Wiley & Sons. [Google Scholar]
- 34.Standl E, Erbach M, Schnell O. 2013. Defending the con side: obesity paradox does not exist. Diabetes Care 36(Suppl. 2), S282–S286. ( 10.2337/dcS13-2040) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Breitling LP. 2010. dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21, 586–587. ( 10.1097/EDE.0b013e3181e09112) [DOI] [PubMed] [Google Scholar]
- 36.Sofrygin O, van der Laan MJ, Neugebauer R. 2015. simcausal: simulating longitudinal data with causal inference applications. R package version 04. See http://CRAN.R-project.org/package=simcausal. [DOI] [PMC free article] [PubMed]
- 37.Pornprasertmanit S, Miller P, Schoemann A. 2015. simsem: SIMulated structural equation modeling. R package version 05. See http://CRAN.R-projectorg/package=simsem.
- 38.Halloran ME, Struchiner CJ. 1995. Causal inference in infectious diseases. Epidemiology 6, 142–151. ( 10.1097/00001648-199503000-00010) [DOI] [PubMed] [Google Scholar]
- 39.Glymour MM. 2006. Using causal diagrams to understand common problems in social epidemiology. In Methods in social epidemiology (eds Oakes M, Kaufman J), pp. 393–428. San Francisco, CA: Jossey-Bass. [Google Scholar]
- 40.VanderWeele TJ, Robins JM. 2007. Directed acyclic graphs, sufficient causes, and the properties of conditioning on a common effect. Am. J. Epidemiol. 166, 1096–1104. ( 10.1093/aje/kwm179) [DOI] [PubMed] [Google Scholar]
- 41.Friedenberg FK, Tang DM, Mendonca T, Vanar V. 2011. Predictive value of body mass index at age 18 on adulthood obesity: results of a prospective survey of an urban population. Am. J. Med. Sci. 342, 371–382. ( 10.1097/MAJ.0b013e318212127c) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hurtado PJ, Kirosingh AS. 2019. Generalizations of the ‘Linear Chain Trick’: incorporating more flexible dwell time distributions into mean field ODE models. J. Math. Biol. 79, 1831–1883. ( 10.1007/s00285-019-01412-w) [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Example code demonstrating the workflow is available on GitHub: https://github.com/epimath/cm-dag.




