MR-DoC2: Bidirectional Causal Modeling with Instrumental Variables and Data from Relatives

Luis F S Castro-de-Araujo; Madhurbain Singh; Yi Zhou; Philip Vinh; Brad Verhulst; Conor V Dolan; Michael C Neale

doi:10.1007/s10519-022-10122-x

. 2022 Nov 2;53(1):63–73. doi: 10.1007/s10519-022-10122-x

MR-DoC2: Bidirectional Causal Modeling with Instrumental Variables and Data from Relatives

Luis F S Castro-de-Araujo ^1,^2,^✉, Madhurbain Singh ¹, Yi Zhou ¹, Philip Vinh ¹, Brad Verhulst ³, Conor V Dolan ⁴, Michael C Neale ^1,⁴

PMCID: PMC9823046 PMID: 36322200

Abstract

Establishing causality is an essential step towards developing interventions for psychiatric disorders, substance use and many other conditions. While randomized controlled trials (RCTs) are considered the gold standard for causal inference, they are unethical in many scenarios. Mendelian randomization (MR) can be used in such cases, but importantly both RCTs and MR assume unidirectional causality. In this paper, we developed a new model, MRDoC2, that can be used to identify bidirectional causation in the presence of confounding due to both familial and non-familial sources. Our model extends the MRDoC model (Minică et al. in Behav Genet 48:337–349, 10.1007/s10519-018-9904-4, 2018), by simultaneously including risk scores for each trait. Furthermore, the power to detect causal effects in MRDoC2 does not require the phenotypes to have different additive genetic or shared environmental sources of variance, as is the case in the direction of causation twin model (Heath et al. in Behav Genet 23:29–50, 10.1007/BF01067552, 1993).

Supplementary Information

The online version contains supplementary material available at 10.1007/s10519-022-10122-x.

Keywords: Causality, Pleiotropy, Twin design, Mendelian randomization

Introduction

To understand the mechanisms of diseases and disorders, it is essential to differentiate between correlation and causation. Many research designs and current statistical models are unable to distinguish correlation from causation. While randomized controlled trials (RCTs) are the gold standard for causal inference (Evans and Davey Smith 2015), they are unethical or impractical in many common situations. For example, exposure to a traumatic experience, a possible risk factor for substance abuse, is not amenable to randomization. Mendelian randomization (MR), an instrumental variable method, is an alternative approach to evaluate causality when RCTs are not possible. It has improved our understanding of the etiologies of several conditions (Ohlsson and Kendler 2020). However, MR is based on strong assumptions, which can be difficult to validate. In this article, we consider research designs that mitigate such assumptions.

MR is an important method for examining causality in observational studies (Choi et al. 2020; Katikireddi et al. 2018). It uses genetic variants as instrumental variables to detect and estimate the causal effect of an exposure on an outcome. This method has three main assumptions: (i) exchangeability, the variant is not associated with a confounder in the relation between the exposure and outcome; (ii) no horizontal pleiotropy (exclusion restriction), the variant should affect the outcome exclusively via the exposure; and (iii) relevance, the instruments are sufficiently predictive of the exposure. Most psychiatric and substance use disorders appear to be multifactorial and highly polygenic, i.e., they are subject to the effects of many environmental events and genetic variants (e.g., SNPs) with small effects on the phenotype of interest. As instruments, individual genetic variants are likely to be poorly predictive of the exposure, which renders MR susceptible to weak instrument bias. The bias can be in the direction of the confounded observed association, as previously reported (Burgess et al. 2011; Burgess and Thompson 2013; Hemani et al. 2018). The size and direction of the bias, however, depends on the research study design.

The instrument should not share common causes with the outcome and should affect the outcome exclusively via the exposure, which is known as the no horizontal pleiotropy assumption. The no horizontal pleiotropy assumption seems unlikely to be met in complex traits such as psychiatric disorders or substance use, because the relevant phenotypes and variants are characterized by pervasive comorbidity and pleiotropy. With larger genome-wide association studies (GWAS), more variants are found to have pleiotropic effects, with hundreds being reported (Jordan et al. 2019). In two-sample MR, the data originate from two separate GWAS, so no individual has data on both exposure and outcome. By contrast, in one-sample MR, genotyped individuals are assessed on both traits. In one-sample MR, weak instrument bias tends to be in the direction of the confounded observed association, whereas in two-sample MR, the bias is towards the null (i.e., regression dilution; Hemani et al. 2018).

In MR, the causation is assumed to be unidirectional from exposure to outcome (Hemani et al. 2018; Katikireddi et al. 2018). Some authors have applied MR to investigate directionality of the causal paths (Timpson et al. 2011; Welsh et al. 2010). However, here directionality is investigated by performing MR twice, one test for each direction between the variables of interest. This can be problematic, as the relationship in one direction can interfere with the relationship in the opposite direction. Biology features many such feedback loops of this kind, from homeostatic mechanisms for body temperature or nicotine level (Verhulst et al. 2021) to the mutual potentiation of aggressive behavior and punitive discipline (Smith 2006).

Like RCTs, MR can be applied to infer causality in the presence of confounding, but the approach comes with assumptions that limit its application in some cases. Other designs can be helpful in some cases where MR is not suitable given its assumptions. Here we explored a structural equation model (SEM) implementation of MR, which allows for full background confounding and reciprocal causation. This model combines two twin models, namely the Direction of Causation model and the MR-DoC model (Minică et al. 2018). The Direction of Causation (DoC) model can infer causal relationships by using information from the cross-twin cross-trait correlations, even in cross-sectional studies (Neale and Cardon 1992; Heath et al. 1993; Neale et al. 1994). This model requires two phenotypes, which differ in their MZ correlations, or their DZ correlations, or both. However, to accommodate bidirectional causality, one must assume that confounding factors are limited to either additive genetic, common environment or specific environmental components of variance. In essence, any three of the five potential sources of covariance between the traits may be estimated, but no more (Fig. 1; Duffy and Martin 1994; Maes et al. 2021).

Fig. 1 — Classic DoC model. Path diagram representing a Direction of Causation model for one twin. This is depicting the relationship between two phenotypes, and the causal paths are estimated affording information from the cross-twin cross-trait correlations. Cross-twin covariance between additive genetic effects is 0.5 (not shown) for DZ twins, as DZs are expected to share 50% of the genetic effects. Standard structural equation modeling symbology is used. Circles represent latent variables, whose variances are fixed to unity. Double-headed arrows are covariances or variances, single-headed are the causal regression paths. Squares represent observed variables. A, C and E are the additive, shared and unique environmental effects. This model is not identified with data from MZ and DZ twins. To make it identified one needs to set to zero any two of the five paths ra, rc, re, g₂ and g₁ (Maes et al. 2021)

Minică et al. (2018) combined the classical twin model and the MR model (Heath et al. 1993; Gillespie et al. 2003; Neale et al. 1994), in a model for unidirectional causality called MR-DoC. This model accommodates horizontal pleiotropy, i.e., an effect of the genetic instrument on the outcome that is not mediated by the exposure variable. These alternative pathways are often denoted “direct effects,” but in practice the alternative pathways likely involve many different mediators. Following Burgess et al. (2011), Burgess and Thompson (2013), and Kohler et al. (2011), Minică et al. (2018) used a polygenic score (PS) as the instrument, in an attempt to avoid the weak instrument problem associated with single SNPs. Minică et al. (2018) showed that the model has higher power to detect causality than traditional MR (Minică et al. 2018). Two strong assumptions of the MR-DoC model are unidirectional causation and the absence of within-person (i.e., unshared) environmental confounding. The latter implies that the correlation between the non-familial environmental influences on the exposure and the outcome is zero. In contrast, confounding originating in genetic and shared environmental correlations can be accommodated in the MR-DoC model.

MR-DoC2 extends the MR-DoC model by integrating two polygenic scores (PSs), one for each phenotype. Using two PSs in a twin sample allows us to model bidirectional causation in the presence of full confounding, originating in shared and unshared environmental and additive genetic effects. The ability to estimate bidirectional causality in the presence of full (shared and unshared environmental and additive genetic) confounding requires the assumption of no direct horizontal pleiotropy: we assume no direct effect of PS1 on phenotype 2 (Ph2), nor of PS2 on phenotype 1 (Ph1). However, we note that the model does not imply that PS1 (or PS2) is uncorrelated with Ph2 (Ph1), we return to this issue below. We first present the model, address the issue of parameter identification, and continue to address statistical power.

Methods

All analyses were performed using R version 4.1.1 (Core Team 2021). The models were specified using RAM matrix algebra in OpenMx v2.19.6 (Neale et al. 2016). We established local model identification numerically using the OpenMx function mxCheckIdentification (Hunter et al. 2021). The MR-DoC 2 model is shown in Fig. 2 for an individual twin (rather than a twin pair) to ease presentation. In Fig. 2, the two phenotypes (denoted Ph1 and Ph2) are modeled as follows:

P h 1_{ij} = a_{1} A_{1 i j} + c_{1} C_{1 i j} + e_{1} E_{1 i j} + g_{2} P h 2_{ij} + b_{1} P S 1_{ij},

Fig. 2 — MR-DoC2. Path diagram of the MR-DoC2 model for an individual. The model is locally identified when parameters b₂ and b₄ are fixed (zero). The model includes the effects of additive genetic (A), common environment (C) and specific environment (E) factors for both *Ph1* and *Ph2*, and their effects may correlate (parameters *ra, rc* and re)

P h 2_{ij} = a_{2} A_{2 i j} + c_{2} C_{2 i j} + e_{2} E_{2 i j} + g_{1} P h 1_{ij} + b_{3} P S 2_{ij},

where i denotes twin pair and j denotes individual twin. The parameters a₁, c₁ and e₁ represent the effects of the additive genetic (A), shared environmental (C), and unshared environmental variables (E) on the phenotype Ph1 (a₂, c₂, and e₂, defined analogously as effects of A₂, C₂, and E₂on Ph2). The parameters b₁ and b₃ express the instrumental variable effects on Ph1 and Ph2, respectively. The parameters of main interest are g₁ and g₂, i.e., the bidirectional causal effects. Note that the model includes correlations between the additive genetic and environmental variables (i.e., cor(A₁, A₂) = ra, etc.), which are denoted ra, rc, and re. An important feature of the present model is that it includes the parameter re, as it has been previously shown that the absence of re in a DoC model would introduce bias in the estimates (Rasmussen et al. 2019). Finally, note that the instruments may be correlated (correlation rf in Fig. 2), due to possible linkage disequilibrium. The vector of parameters is denoted θ = [ra, rc, re, rf, a₁, c₁, e₁, a₂, c₂, e₂, g₁, g₂, b₁, b₃, σ_x, σ_y], where σ_x and σ_y are the standard deviations of the PS1 and PS2, respectively. The model’s path coefficients and variances were estimated by maximum likelihood estimation; see details of the simulation next section (Tables 2 and 3, and S1). Note that the model, as shown in Fig. 2, includes the parameters b₂ and b₄, but these parameters were fixed to zero for identification. However, possible combinations of parameters that render the model identified are presented in Table 5. In particular, one cannot freely estimate b₂ and b₄, but either can be estimated if re is fixed to zero (Table 5).

Table 2.

Variance explained in statistical power (non-centrality parameter; NCP) by model parameters

	NCP ( $β$ ²)
	g₁ = 0	g₂ = 0	g₁ = g₂ = 0
b₁	0.365	0.000	0.181
g₁	0.517	0.000	0.289
b₃	0.000	0.365	0.181
g₂	0.000	0.517	0.289
ra	0.000	0.000	0.000
rc	0.000	0.000	0.000
re	0.002	0.002	0.000
rf	0.041	0.041	0.000
a₁	0.000	0.002	0.001
a₂	0.002	0.000	0.001
c₁	0.000	0.002	0.001
c₂	0.002	0.000	0.001
Total R²	0.929	0.929	0.945

Open in a new tab

Linear regression of NCP on the parameter values used for 4096 power analyses in the ACE model

Table 3.

Variance explained in statistical power (non-centrality parameter; NCP) by model parameters

	NCP ( $β$ ²)
	g₁ = 0	g₂ = 0	g₁ = g₂ = 0
b₁	0.460	0.000	0.226
g₁	0.402	0.000	0.222
b₃	0.000	0.460	0.226
g₂	0.000	0.402	0.222
ra	0.001	0.001	0.001
rc	0.000	0.000	0.000
re	0.009	0.009	0.001
rf	0.036	0.036	0.001
a₁	0.000	0.003	0.002
a₂	0.003	0.000	0.002
c₁	0.000	0.000	0.000
c₂	0.000	0.000	0.000
Total R²	0.911	0.911	0.902

Open in a new tab

Linear regression of NCP on the parameter values used for 8748 power analyses in the AE model

Table 5.

Model Identification

ID	Ph1	Ph2	a₁	c₁	e₁	a₂	c₂	e₂	ra	rc	re	rf	b₁	b₂	b₃	b₄	g₁	g₂
FULL
No	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr
No	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	0	fr	fr	fr	fr	fr	fr
Yes	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	0	fr	0	fr	fr
C
No	fr	fr	fr	0	fr	fr	0	fr	fr	0	fr	fr	fr	fr	fr	fr	fr	fr
Yes	fr	fr	fr	fr	fr	fr	0	fr	fr	0	fr	fr	fr	0	fr	0	fr	fr
E
Yes	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	0	fr	fr	fr	fr	0	fr	fr
Yes	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	0	fr	fr	0	fr	fr	fr	fr
No	fr	fr	fr	fr	fr	fr	fr	fr	fr	fr	0	fr	fr	fr	fr	fr	fr	fr
A
Yes	fr	fr	0	fr	fr	0	fr	fr	0	fr	fr	fr	fr	0	fr	0	fr	fr

Open in a new tab

Configurations of fixed and free parameters that may identify the model in Fig. 2. Each row shows a combination of fixed or free status for the parameters; the ID column indicates whether the model is locally identified. The text ‘fr’ indicates that the parameter is freely estimated, and ‘0’ denotes that the parameter is fixed to zero

Simulation procedure

Statistical power was explored by exact data simulation given various combinations of parameter values (van der Sluis et al. 2008). We used the R function mvrnorm() in the MASS library to simulate exact data (Venables et al. 2002). The power calculation was based on the likelihood ratio (LR) test. Given exact data simulation, parameter estimates of the full (identified) model equal the true values exactly. By subsequently fitting a model with one or more parameters of choice fixed to zero (nested under the true model), we obtain the exact non-centrality parameter (NCP) of the LR test given the sample sizes. The NCP is the difference in the expected value of the LR test statistic under the null and the alternative hypotheses (Verhulst 2017). Given the NCP, the degrees of freedom of the LT test (i.e., the difference in the number of parameters in the null and the alternative hypothesis), and the chosen Type I error rate alpha (e.g., 0.05), we can calculate the power to reject the constraints associated with the alternative hypothesis. The exact data simulation approach is equivalent to the analysis of exact population covariance matrices and mean vectors. While power can be established empirically using simulated data, this is computationally less efficient and offers no advantage above exact data simulation.

The simulation procedure involved: (1) choosing a set of parameter values for the model shown in Fig. 2; (2) exact data simulation, with arbitrary N = 1,000 MZ pairs and N = 1,000 DZ twin pairs; (3) fitting the true model using ML estimation in OpenMx; (4) fitting the false model by fixing one or more parameters to zero and refitting the model; and (5) calculating the NCP and the power to reject the false model restrictions. Type I error rate α was set to 0.05. In the power calculations, we focused on the causal parameters g₁, or g₂, or g₁ and g₂ together, i.e., 1 df tests (g₁ = 0 or g₂ = 0), or a 2 df test (g₁ = g₂ = 0). The 1 df tests are the tests of main interest, because they distinguish unidirectional from bidirectional causation. Note that the 2 df test is also of interest as the full background confounding (ra, rc, and re are all non-zero), implies that Ph1 and Ph2 may correlate, regardless of causal relations. We investigated the power in a factorial design, in which each parameter featured as a factor, with a given number of levels.

Additionally, we calculated the R² (proportion of explained variance) in the regression of the phenotypes (Ph1 and Ph2 in Fig. 2) on their respective PSs, and the correlation between Ph1 and Ph2 (Table 4). Given the NCPs and the parameters, we proceeded by regressing the NCPs on the parameter values to gauge the influence of the parameter values on the NCPs (i.e., the statistical power). In this manner, we can assess which parameters, in addition to the parameters g₁ and g₂, are the most relevant in determining the statistical power. Note that the regression analyses estimate the effect of the parameter values on the NCPs. So they do not directly address statistical power, which depends on the sample sizes, effect sizes, and α error rate. To provide some insight into the power given N = 1000 MZ and N = 1000 DZ, we present power and $R$ ² for a variety of parameter configurations in Tables 2, 3 and 4 and S1.

Table 4.

Exemplary b₁, b₃, g₁ and g₂ values and the power for estimating g₁

b₁	b₃	g₁	g₂	rf	Ph2_Ph1	PS1_Ph1	PS2_Ph2	ncpg₁	powg₁
0.075	0.075	0.06	0.06	0.00	0.214	0.066	0.066	15.97	0.979
0.075	0.025	0.06	0.02	0.00	0.140	0.068	0.023	15.97	0.979
0.075	0.075	0.06	0.02	0.00	0.138	0.068	0.066	15.97	0.979
0.075	0.050	0.06	0.04	0.00	0.180	0.067	0.045	15.97	0.979
0.075	0.075	0.06	0.04	0.00	0.180	0.067	0.066	15.97	0.979
0.025	0.050	0.06	0.02	0.00	0.334	0.022	0.040	4.71	0.583
0.025	0.075	0.06	0.06	0.00	0.417	0.020	0.059	4.71	0.583
0.075	0.025	0.02	0.02	0.25	0.126	0.070	0.026	4.70	0.582
0.025	0.050	0.02	0.02	0.50	0.465	0.025	0.045	1.13	0.186
0.025	0.050	0.02	0.04	0.50	0.508	0.026	0.045	1.13	0.186
0.025	0.075	0.02	0.02	0.50	0.463	0.026	0.065	1.13	0.186
0.025	0.025	0.02	0.06	0.50	0.539	0.024	0.024	1.13	0.186
0.025	0.075	0.02	0.04	0.50	0.506	0.027	0.065	1.13	0.186

Open in a new tab

Based from the same factorial AE design presented in Table 3 and parameters set presented in Table 1. Highest power with extreme positive b₁, b₃, and g₁ values. Lowest power with lowest g₁ and g₂ values. Ncpg₁, non-centrality parameter for rejecting g₁ = 0;powg₁, power for rejecting g₁ = 0; Ph2_Ph1, $R$ ² of the regression of Ph2 on Ph1; PS1_Ph1, R² of the regression of Ph1 on PS1; PS2_Ph2, $R$ ² of the regression of Ph2 on PS2

We executed three factorial designs, i.e., combining all chosen parameters’ values. The first factorial design included full ACE confounding. Each parameter in the parameter vector θ (except x, y, e₁, and e₂) was a factor (a total of 13 parameters), each factor had two levels (Table 1), e₁ was specified as $\sqrt{1 - a_{1}^{2} - c_{1}^{2}}$ and e₂ as $\sqrt{1 - a_{2}^{2} - c_{2}^{2}}$ . Therefore, the final number of cells in the first design was 2¹² = 4096 (Tables 1 and 2).

Table 1.

Parameter levels on the three factorial designs, with respective total number of cells for each design simulation

θ	Design 1 (ACE)	Design 2 (AE)	Design 3 (AE)
b₁	$\sqrt{0.025}$ , $\sqrt{0.05}$	$\sqrt{0.025}$ , $\sqrt{0.05}$ , $\sqrt{0.075}$	$- \sqrt{0.075}$ , $- \sqrt{0.03}$ , $\sqrt{0.03}$ , $\sqrt{0.075}$
b₃	$\sqrt{0.025}$ , $\sqrt{0.05}$	$\sqrt{0.025}$ , $\sqrt{0.05}$ , $\sqrt{0.075}$	$- \sqrt{0.075}$ , $- \sqrt{0.03}$ , $\sqrt{0.03}$ , $\sqrt{0.075}$
g₁	$\sqrt{0.020}$ , $\sqrt{0.05}$	$\sqrt{0.020}$ , $\sqrt{0.04}$ , $\sqrt{0.06}$	$- \sqrt{0.050}$ , $- \sqrt{0.020}$ , $\sqrt{0.050}$ , $\sqrt{0.020}$
g₂	$\sqrt{0.020}$ , $\sqrt{0.05}$	$\sqrt{0.020}$ , $\sqrt{0.04}$ , $\sqrt{0.06}$	$- \sqrt{0.050}$ , $- \sqrt{0.020}$ , $\sqrt{0.050}$ , $\sqrt{0.020}$
ra	0.25, 0.50	0.0, 0.25, 0.50	0.3
rc	0.25, 0.50	0	0
re	0.25, 0.50	0.0, 0.25, 0.50	0.3
rf	0.25, 0.50	0.0, 0.25, 0.50	0.3
a₁	$\sqrt{0.10}$ , $\sqrt{0.25}$	$\sqrt{0.10}$ , $\sqrt{0.25}$	$\sqrt{0.5}$
a₂	$\sqrt{0.10}$ , $\sqrt{0.25}$	$\sqrt{0.10}$ , $\sqrt{0.25}$	$\sqrt{0.3}$
c₁	$\sqrt{0.10}$ , $\sqrt{0.25}$	0	0
c₂	$\sqrt{0.10}$ , $\sqrt{0.25}$	0	0
Total cells	2¹² = 4096	3⁷*2² = 8748	4⁴ = 256

Open in a new tab

The model specification can be seen in Fig. 2. Also, e₁ was specified as $\sqrt{1 - a_{1}^{2} - c_{1}^{2}}$ and e₂ as $\sqrt{1 - a_{2}^{2} - c_{2}^{2}}$ . Table 4 exemplifies rows extracted from Design 3

The second design was based on an AE model, i.e., we set rc = c₁ = c₂=0. The model without C is of interest, because in adults C influences are often negligible or absent (Polderman et al. 2015). The parameters (i.e., factors) had two or three levels, again e₁ was specified as $\sqrt{1 - a_{1}^{2} - c_{1}^{2}}$ and e₂ as $\sqrt{1 - a_{2}^{2} - c_{2}^{2}}$ . (Tables 1 and 3). The number of cells in this design was 3⁷*2² = 8748. Table 4 contains example cells of this design, where the power to reject g₁ = 0 (df = 1) was moderate to high.

We executed a third design, again based on an AE model. In this design, we included negative and positive g₁ and g₂ values, which may be of interest given hypothetical negative causal effects (Table 5). In this design, the factors had one or three levels, e₁ was specified as $\sqrt{1 - a_{1}^{2} - c_{1}^{2}}$ and e₂ as $\sqrt{1 - a_{2}^{2} - c_{2}^{2}}$ , and the number of cells in the first design was 4⁴ = 256 (Tables 1 and 4). Although we focused on the df = 1 tests throughout, we also report the df = 2 tests in Table S1 for this design.

These factorial designs served to determine the contributions of the various parameters to the statistical power to reject g₁ = 0, g2 = 0, and g₁ = g₂ = 0. We selected parameter values such that the phenotypic twin correlations were reasonable (see Table 5) and the predictive strength of the instrument was plausible (Table 4 and S1). Specifically, we found the highest power with positive b₁, b₃, g₁, and g₂ values and the lowest with lowest g₁ and g₂ values. Tables 2 and 3 show results of linear regression analyses in which we regressed the NCP on the parameter values. The $R$ ² of each parameter reveals much of the variance in the statistical power is explained by that parameter. Thus, the contribution of each parameter to the statistical power changes can be quantified and compared.

Results

Power

Based on the simulation results, we identified the following trends concerning the power to reject g₁ = 0, g₂ = 0 and g₁ = g₂ = 0. First, the magnitude of the parameters b₁ and b₃, i.e., the predictive strength of the genetic instruments, are important to the power. This result can be seen in Tables 2 and 3 by considering the R² values of b₁ and b₃. Specifically, the PS regression coefficients, b₁ and b₃, explain around 46% of the variance in the NCP. The causal parameters g₁ and g₂ explain around 40% of the variance in the NCP in the AE models, Table 3. In practice, we found high power (> 0.8, alpha = 0.05, Nmz = Ndz = 1000) in a variety of situations with a range of positive and negative values for b₁, b₃, g₁, and g₂ (Table 4 and S1).

Based on design 2, we found that the correlation between the instruments (rf, Fig. 2) made little difference to the power, as is clear from the $R$ ² in Tables 2 and 3. Also, we found that the background correlations, i.e., the ra, rc and re parameters, are relatively unimportant to the power in the present designs (Tables 2 and 3).

We performed two factorial design analyses (with and without C variance), because we wanted to assess how the presence of C variance would affect the power. In Table 2, which includes C variance, b₁ and b₃ explain 36% of the variance of the NCP, which is less than the variance explained in the absence of C variance in the model (namely, 46% R² in the AE model, Table 2). So, while we see that the parameters related to C have little influence on the power, the absence of C is beneficial. Needless to say, whether C can be omitted in practice will be dictated by the data.

We found that differences in the variance components of Ph1 and Ph2 (i.e., differences in the parameters a₁ vs. a₂ and c₁ vs. c₂) had little effect on the power. This result is at odds with traditional DoC models, in which the statistical power depends heavily on these differences (Gillespie et al. 2003). Strikingly, in the present model, differences between the phenotypes in variance decomposition appear unimportant to the statistical power.

The present results were based on a linear regression of the NCPs on the parameters. We investigated possible non-linearity by including second order polynomials of the parameters. However, we found that the second order terms explained 7.1–8.9% of the variance in the NCPs (g₁ = 0, Tables 2 and 3). From Tables 2 and 3 the R² ranged from 90.2% (NCP regressed on all other parameters) in the AE model to 94.5% in the ACE model.

In an attempt to put the power analyses in perspective to published results, we present the power curves for four realistic scenarios (Fig. 3). The factorial design was performed with values for additive and shared genetic variances based on previous work (see below). The number of twin pairs required to achieve an 80% power of rejecting g₁ = 0 is plotted against an increasing R² for the instrument PS1 (path b₁). Environmental variances were dependent on shared and additive genetic variances so that e₁ was specified as $\sqrt{1 - a_{1}^{2} - c_{1}^{2}}$ and e₂ as $\sqrt{1 - a_{2}^{2} - c_{2}^{2}}$ . Additive and shared variances were considered as follows: (A) alcohol use (a² 49%, c² 10%) (Verhulst et al. 2015) and heart disease (a² 22%, c² 0%) (Wu et al. 2014); (B) BMI (a² 72%, c² 3%) (Rokholm et al. 2011) and major depression (a² 37%, c² 1%) (Scherrer et al. 2003); (C) cannabis use (a² 51%, c² 20%) (Verweij et al. 2010) and schizophrenia (a² 81%, c² 11%) (Sullivan et al. 2003); (D) dyslipidemia (LDL) (a² 60%, c² 28%) (Zhang et al. 2010) and again heart disease (a² 22%, c² 0%) (Wu et al. 2014) (Fig. 3). Figure 3 also includes vertical lines with previously measured estimations of four PSs: Smoking (Pasman et al. 2022), BMI (Furlong and Klimentidis 2020), LDL (Kuchenbaecker et al. 2019), and attention deficit hyperactivity disorder (ADHD) (Demontis et al. 2019). Figure 3 shows that the power to reject g1 = 0 is quite reasonable and well within values that appeared in recent publications.

Discussion

We presented a new twin model based on the MR-DoC twin model (Minica et al. 2018), which we refer to as the MR-DoC2 model. Compared to the MR-DoC model, MR-DoC2 model accommodates full confounding (originating in background A, C, and E effects) and bidirectional causality. The model can be used to investigate the bidirectional causal interrelationship. As usual in twin studies, the 2 × 2 phenotypic covariance matrix was decomposed into A, C, and E covariance matrices. However, in addition, phenotypic reciprocal causal paths were specified, as well as paths from the PSs to phenotypes Ph1 and Ph2.

The MR-DoC2 is suitable for the study of bidirectional causality given A, C, and E confounding, whilst retaining reasonable statistical power. While we have assumed no direct effects of PS1 on Ph2 and PS2 on Ph1, PS1 is correlated with Ph2 and PS2 is correlated with Ph1. Given non-zero rf, the correlation between PS1 and PS2, PS1 is correlated with Ph2 via the rf path (see Fig. 2). The model is based on the assumption that there are no direct effects of the PS1 on the outcome Ph2 (and PS2 on Ph1), but there are indirect associations with Ph2 from PS1 via rf when rf does not equal zero. Pleiotropy also occurs though the causal feedback loop if both g1 and g2 are non-zero. As shown in Table 5, there are situations in which the paths b₂ and b₄ can be estimated, but these involve other constraints. For instance, either b₂ or b₄ (but not both) can be estimated if the unshared environmental correlation, re, is fixed to zero.

As expected, statistical power is higher when a larger proportion of phenotypic variance is explained by the genetic instruments, PS1 and PS2. This is encouraging, because the proportion of trait variance accounted for by SNPs is likely to increase with greater sample sizes and with improvements in the ability to test rare variants’ associations with outcomes and their risk factors. We found the sign of the causal parameters (g₁, g₂) to be relevant to the power to reject g₁ = 0 or g₂ = 0. Negative values (g₁ < 0, g₂ < 0) or a combination of negative and positive values were associated with high power (Table S1). This result is consistent with one previously reported in multivariate twin studies (Evans and Duffy 2004), where power increases with negative genetic correlation.

In cross-sectional DoC modeling, the statistical power to discern direction of causation is greater when the phenotypes have different variance component proportions. That is to say, if the MZ and DZ correlations for exposure equal those for the outcome variable, the model will not be able to detect causal direction (Heath et al. 1993). However, in the present article we have shown that the addition of the PSs permit a larger range of genetic decomposition scenarios and these components (a₁, c₁, e₁, a₂, c₂, e₂) have little effect on the ability to detect causal directionality, which is a remarkable advantage over the original DoC model. More importantly, this model accommodates A, C, and E confounding, improving on previously noted limitations of DoC models (Rasmussen et al. 2019).

We presented a model in which the background covariance structure involved the same sources of variances, i.e., ACE or AE. It may happen that the phenotypes differ in this respect (e.g., ACE and ADE). This poses no problem with respect to model specification. This situation has the advantage of allowing b2 or b4 to be estimated, because the background confounding is necessarily limited to A and E (i.e., ra and re). We did not pursue this here as the combination of ACE and ADE phenotypes seems somewhat uncommon, in part because of the sample size.

Another scenario that is worth noting is when one considers a PS as an imperfect measure of the additive genetic liability, where rf asymptotically tends to ra. We did establish that a constraint ra = m*rf, with fixed m (e.g., m = 1), identified b2 or b4 (direct pleiotropic paths), but not both. If the relationship of ra and rf is of interest, a sensitivity analysis with varied values of the constant m can be set up to evaluate the effect on the causal estimates. In the case m = 1 it is implied that ra = rf.

We note that in Tables 2 and 3, the R² values did not sum to unity. However, the variances for b₁, b₃, g₁, and g₂ account for over 88% of the NCP variance, while the remainder are due to non-linear effects and interactions. We consider these to be too situation-specific and, therefore, of little interest in conducting a power analysis. All results are based on simulations of 1000 MZs and 1000 DZs, however, the code is available in a repository (https://github.com/lf-araujo/mr-doc2), so the reader can perform their own power calculations and include the number of observations that best suits their study.

In the standard DoC model, parameter estimates can be biased if the reliabilities of the two variables differ (Heath et al. 1993). Specifically, the bias in the causal parameters is towards the more reliable variable being the cause of the less reliable one. In additional simulations, we established that unmodeled measurement error (phenotypic reliabilities < 1) resulted in bias in the estimates of the parameters e₁, e₂, and re. While the power to reject g₁ = 0 and/or g₂ = 0 was lower in the presence of measurement error, the actual estimates of the causal parameters g₁ and g₂, and the estimates of the parameters b₁ and b₃ were unaffected by unmodeled measurement error.

As seems inevitable when trying to establish causation in non-experimental settings, some assumptions are necessary. It is important to emphasize that although the model incorporates (indirect) horizontal pleiotropy through rf*b3 or through rf*b1, the absence of the direct horizontal pleiotropy is a required assumption (i.e., the parameters b₂ and b₄ are zero). It represents a portion of the total horizontal pleiotropy (we assume that the total horizontal pleiotropy consists of rf*b3 + rf*b1 + b2 + b4). Violation of this assumption (b2 and b4 not equal 0) results in bias in the causal parameters g₁ and g₂ and in the parameters ra, rc, and re. Specifically, given b₂ > 0 and b₄ > 0, the parameters g₁ and g₂ are overestimated as a consequence of fixing b₂ = b₄ = 0. This overestimation increases the false positive rate, i.e., the rate with which we incorrectly reject g₁ = 0 and/or g₂ = 0. Given b₂ > 0 and b₄ > 0, the parameters ra, rc, and re are underestimated, as a consequence of the overestimation of g₁ and g₂. Furthermore, a mismatch between the phenotype in the GWAS that generated the PS and the phenotype used in MR-DOC2 could alter estimates. As the mismatch between the phenotypes in the discovery GWAS used to create the PRS and the MR-DoC2 phenotype increases, the b1 and b3 paths will become smaller, become weaker instruments, and potentially reduce power. The size of this effect was not pursued in this paper, and will depend on the nature of the mismatch.

The use of instrumental PS is common in Mendelian randomization studies (Burgess et al. 2020; Dudbridge 2021). It has been shown that the use of a PS as an instrument is mathematically equivalent to a weighted mean of the results from individual SNPs (Dudbridge 2021). However, its use also comes with challenges. It plays the role of a stronger instrument, but it is also a composite of variants that may themselves have indirect or direct effects on the outcome. Therefore, using a PS increases the risk of (horizontal) pleiotropy when compared to the use of a single variant. Note that the present model does account for several types of pleiotropy already in the model. First, in Fig. 2, the parameter rf represents the correlation between the two PSs, which may partially account for a correlation between PS1 (the instrument of Ph1) and Ph2. Second, the bidirectional causal paths (parameters between Ph1 and Ph2 also connect PS1 and Ph2 and PS2 and Ph1. A useful feature of the MR-DoC2 model is the inclusion of non-shared (E) confounding (parameter re). The constraint re = 0 is considered a drawback of standard DoC models (Rasmussen et al. 2019), which also applied to the MRDoc model, as presented by Minica et al. (2018).

Multivariate, GxE and DoC approaches have been considered less practical because they require larger data sets to obtain accurate estimates of parameters relating two traits (Gillespie et al. 2003). More recently, however, the availability of studies with very many participants have made these models practical to apply. We have shown that the MR-DoC2 model has moderate power to test bidirectional causation, and therefore is suitable for a range of clinical applications. The next steps in model investigation will include extensions for longitudinal and multivariate data, to provide corroboration of potential causal pathways identified by this modeling.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 18.6 kb)^{(18.6KB, docx)}

Author contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by LFSC-A, MS, YZ, PV, BV, CVD and MCN. The first draft of the manuscript was written by LFSC-A and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Funding

LFSCA is funded by NIH Grant No. 5T32MH-020030 and the Medical Research Council - UK, Grant No. MR/T03355X/1 during the study. MCN was funded by NIH Grant DA-049867.

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Code availability

Code is available in a repository for replication.

Declarations

Conflict of interest

Luis F. S. Castro-de-Araujo, Madhurbain Singh, Yi Zhou, Philip Vinh, Brad Verhulst, Conor V. Dolan and Michael C. Neale report no conflicts of interest.

Human and Animal Rights and Informed consent

No humans or animals were involved in this study.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Luis F. S. Castro-de-Araujo, Email: luis.araujo@vcuhealth.org

Madhurbain Singh, Email: singhm18@vcu.edu.

Yi Zhou, Email: zhouy33@vcu.edu.

Philip Vinh, Email: vinhpb@vcu.edu.

Brad Verhulst, Email: verhulst@tamu.edu.

Michael C. Neale, Email: michael.neale@vcuhealth.org

References

Burgess S, Thompson SG. Use of allele scores as instrumental variables for Mendelian randomization. Int J Epidemiol. 2013;42:1134–1144. doi: 10.1093/ije/dyt093. [DOI] [PMC free article] [PubMed] [Google Scholar]
Burgess S, Thompson SG, CRP CHD Genetics Collaboration Avoiding bias from weak instruments in Mendelian randomization studies. Int J Epidemiol. 2011;40:755–764. doi: 10.1093/ije/dyr036. [DOI] [PubMed] [Google Scholar]
Burgess S, Smith GD, Davies NM, Dudbridge F, Gill D, Glymour MM, Hartwig FP, Holmes MV, Minelli C, Relton CL, Theodoratou E (2020) Guidelines for performing Mendelian randomization investigations. 10.12688/wellcomeopenres.15555.2
Choi KW, Stein MB, Nishimi KM, Ge T, Coleman JRI, Chen C-Y, Ratanatharathorn A, Zheutlin AB, Dunn EC, Breen G, Koenen KC, Smoller JW. An exposure-wide and Mendelian randomization approach to identifying modifiable factors for the prevention of depression. Am J Psychiatry. 2020 doi: 10.1176/appi.ajp.2020.19111158. [DOI] [PMC free article] [PubMed] [Google Scholar]
Demontis D, Walters RK, Martin J, Mattheisen M, Als TD, Agerbo E, Baldursson G, Belliveau R, Bybjerg-Grauholm J, Bækvad-Hansen M, Cerrato F, Chambert K, Churchhouse C, Dumont A, Eriksson N, Gandal M, Goldstein JI, Grasby KL, Grove J, Gudmundsson OO, Hansen CS, Hauberg ME, Hollegaard MV, Howrigan DP, Huang H, Maller JB, Martin AR, Martin NG, Moran J, Pallesen J, Palmer DS, Pedersen CB, Pedersen MG, Poterba T, Poulsen JB, Ripke S, Robinson EB, Satterstrom FK, Stefansson H, Stevens C, Turley P, Walters GB, Won H, Wright MJ, Andreassen OA, Asherson P, Burton CL, Boomsma DI, Cormand B, Dalsgaard S, Franke B, Gelernter J, Geschwind D, Hakonarson H, Haavik J, Kranzler HR, Kuntsi J, Langley K, Lesch K-P, Middeldorp C, Reif A, Rohde LA, Roussos P, Schachar R, Sklar P, Sonuga-Barke EJS, Sullivan PF, Thapar A, Tung JY, Waldman ID, Medland SE, Stefansson K, Nordentoft M, Hougaard DM, Werge T, Mors O, Mortensen PB, Daly MJ, Faraone SV, Børglum AD, Neale BM. Discovery of the first genome-wide significant risk loci for attention-deficit/hyperactivity disorder. Nat Genet. 2019;51:63. doi: 10.1038/s41588-018-0269-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dudbridge F. Polygenic Mendelian randomization. Cold Spring Harb Perspect Med. 2021;11:a039586. doi: 10.1101/cshperspect.a039586. [DOI] [PMC free article] [PubMed] [Google Scholar]
Duffy DL, Martin NG. Inferring the direction of causation in cross-sectional twin data: theoretical and empirical considerations. Genet Epidemiol. 1994;11:483–502. doi: 10.1002/gepi.1370110606. [DOI] [PubMed] [Google Scholar]
Evans DM, Davey Smith G. Mendelian randomization: new applications in the coming age of hypothesis-free causality. Annu Rev Genomics Hum Genet. 2015;16:327–350. doi: 10.1146/annurev-genom-090314-050016. [DOI] [PubMed] [Google Scholar]
Evans DM, Duffy DL. A simulation study concerning the effect of varying the residual phenotypic correlation on the power of bivariate quantitative trait loci linkage analysis. Behav Genet. 2004;34:135–141. doi: 10.1023/B:BEGE.0000013727.15845.f8. [DOI] [PubMed] [Google Scholar]
Furlong MA, Klimentidis YC. Associations of air pollution with obesity and body fat percentage, and modification by polygenic risk score for BMI in the UK biobank. Environ Res. 2020;185:109364. doi: 10.1016/j.envres.2020.109364. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gillespie NA, Zhu G, Neale MC, Heath AC, Martin NG. Direction of causation modeling between cross-sectional measures of parenting and psychological distress in female twins. Behav Genet. 2003;33:14. doi: 10.1023/A:1025365325016. [DOI] [PubMed] [Google Scholar]
Heath AC, Kessler RC, Neale MC, Hewitt JK, Eaves LJ, Kendler KS. Testing hypotheses about direction of causation using cross-sectional family data. Behav Genet. 1993;23:29–50. doi: 10.1007/BF01067552. [DOI] [PubMed] [Google Scholar]
Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, Laurin C, Burgess S, Bowden J, Langdon R, Tan VY, Yarmolinsky J, Shihab HA, Timpson NJ, Evans DM, Relton C, Martin RM, Davey Smith G, Gaunt TR, Haycock PC. The MR-base platform supports systematic causal inference across the human phenome. eLife. 2018;7:e34408. doi: 10.7554/eLife.34408. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hunter MD, Garrison SM, Burt SA, Rodgers JL. The analytic identification of variance component models common to behavior genetics. Behav Genet. 2021;51:425–437. doi: 10.1007/s10519-021-10055-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jordan DM, Verbanck M, Do R. HOPS: a quantitative score reveals pervasive horizontal pleiotropy in human genetic variation is driven by extreme polygenicity of human traits and diseases. Genome Biol. 2019;20:222. doi: 10.1186/s13059-019-1844-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Katikireddi SV, Green MJ, Taylor AE, Davey Smith G, Munafò MR. Assessing causal relationships using genetic proxies for exposures: an introduction to Mendelian randomization. Addict Abingdon Engl. 2018;113:764–774. doi: 10.1111/add.14038. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kohler H-P, Behrman JR, Schnittker J. Social science methods for twins data: integrating causality, endowments, and heritability. Biodemogr Soc Biol. 2011;57:88–141. doi: 10.1080/19485565.2011.580619. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kuchenbaecker K, Telkar N, Reiker T, Walters RG, Lin K, Eriksson A, Gurdasani D, Gilly A, Southam L, Tsafantakis E, Karaleftheri M, Seeley J, Kamali A, Asiki G, Millwood IY, Holmes M, Du H, Guo Y, Kumari M, Dedoussis G, Li L, Chen Z, Sandhu MS, Zeggini E, Group USS. The transferability of lipid loci across African, Asian and European cohorts. Nat Commun. 2019 doi: 10.1038/s41467-019-12026-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maes HH, Neale MC, Kirkpatrick RM, Kendler KS. Using multimodel inference/model averaging to model causes of covariation between variables in twins. Behav Genet. 2021;51:82–96. doi: 10.1007/s10519-020-10026-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Minică CC, Dolan CV, Boomsma DI, de Geus E, Neale MC. Extending causality tests with genetic instruments: an integration of Mendelian randomization with the classical twin design. Behav Genet. 2018;48:337–349. doi: 10.1007/s10519-018-9904-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Neale MC, Cardon LR. Methodology for genetic studies of twins and families. Dordrecht: Kluwer; 1992. [Google Scholar]
Neale MC, Duffy DL, Martin NG. Direction of causation: reply to commentaries. Genet Epidemiol. 1994;11:463–472. doi: 10.1002/gepi.1370110603. [DOI] [Google Scholar]
Neale MC, Hunter MD, Pritikin JN, Zahery M, Brick TR, Kirkpatrick RM, Estabrook R, Bates TC, Maes HH, Boker SM. OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika. 2016;81:535–549. doi: 10.1007/s11336-014-9435-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ohlsson H, Kendler KS. Applying causal inference methods in psychiatric epidemiology: a review. JAMA Psychiatry. 2020;77:637–644. doi: 10.1001/jamapsychiatry.2019.3758. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pasman JA, Demange PA, Guloksuz S, Willemsen AHM, Abdellaoui A, ten Have M, Hottenga J-J, Boomsma DI, de Geus E, Bartels M, de Graaf R, Verweij KJH, Smit DJ, Nivard M, Vink JM. Genetic risk for smoking: disentangling interplay between genes and socioeconomic status. Behav Genet. 2022;52:92–107. doi: 10.1007/s10519-021-10094-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Polderman TJC, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visscher PM, Posthuma D. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet. 2015;47:702–709. doi: 10.1038/ng.3285. [DOI] [PubMed] [Google Scholar]
R Core Team (2021) R: a language and environment for statistical computing
Rasmussen SHR, Ludeke S, Hjelmborg JVB. A major limitation of the direction of causation model: non-shared environmental confounding. Twin Res Hum Genet Off J Int Soc Twin Stud. 2019;22:14–26. doi: 10.1017/thg.2018.67. [DOI] [PubMed] [Google Scholar]
Rokholm B, Silventoinen K, Tynelius P, Gamborg M, Sørensen TIA, Rasmussen F. Increasing genetic variance of body mass index during the Swedish obesity epidemic. PLoS ONE. 2011 doi: 10.1371/journal.pone.0027135. [DOI] [PMC free article] [PubMed] [Google Scholar]
Scherrer JF, Xian H, Bucholz KK, Eisen SA, Lyons MJ, Goldberg J, Tsuang M, True WR. A twin study of depression symptoms, hypertension, and heart disease in middle-aged men. Psychosom Med. 2003;65:548–557. doi: 10.1097/01.PSY.0000077507.29863.CB. [DOI] [PubMed] [Google Scholar]
Smith AB. The state of research on the effects of physical punishment. Soc Policy J N Z. 2006;27:114–127. [Google Scholar]
Sullivan PF, Kendler KS, Neale MC. Schizophrenia as a complex trait: evidence from a meta-analysis of twin studies. Arch Gen Psychiatry. 2003;60:1187–1192. doi: 10.1001/archpsyc.60.12.1187. [DOI] [PubMed] [Google Scholar]
Timpson NJ, Nordestgaard BG, Harbord RM, Zacho J, Frayling TM, Tybjærg-Hansen A, Smith GD. C-reactive protein levels and body mass index: elucidating direction of causation through reciprocal Mendelian randomization. Int J Obes. 2011;35:300–308. doi: 10.1038/ijo.2010.137. [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Sluis S, Dolan CV, Neale MC, Posthuma D. Power calculations using exact data simulation: a UsefulTool for genetic study designs. Behav Genet. 2008;38:202–211. doi: 10.1007/s10519-007-9184-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Venables WN, Ripley BD, Venables WN. Modern applied statistics with S, statistics and computing. 4. New York: Springer; 2002. [Google Scholar]
Verhulst B. A power calculator for the classical twin design. Behav Genet. 2017;47:255–261. doi: 10.1007/s10519-016-9828-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Verhulst B, Neale MC, Kendler KS. The heritability of alcohol use disorders: a meta-analysis of twin and adoption studies. Psychol Med. 2015;45:1061–1072. doi: 10.1017/S0033291714002165. [DOI] [PMC free article] [PubMed] [Google Scholar]
Verhulst B, Clark SL, Chen J, Maes HH, Chen X, Neale MC. Clarifying the genetic influences on nicotine dependence and quantity of use in cigarette smokers. Behav Genet. 2021;51:375–384. doi: 10.1007/s10519-021-10056-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Verweij KJH, Zietsch BP, Lynskey MT, Medland SE, Neale MC, Martin NG, Boomsma DI, Vink JM. Genetic and environmental influences on cannabis use initiation and problematic use: a meta-analysis of twin studies. Addict Abingdon Engl. 2010;105:417. doi: 10.1111/j.1360-0443.2009.02831.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Welsh P, Polisecki E, Robertson M, Jahn S, Buckley BM, de Craen AJM, Ford I, Jukema JW, Macfarlane PW, Packard CJ, Stott DJ, Westendorp RGJ, Shepherd J, Hingorani AD, Smith GD, Schaefer E, Sattar N. Unraveling the directional link between adiposity and inflammation: a bidirectional Mendelian randomization approach. J Clin Endocrinol Metab. 2010;95:93–99. doi: 10.1210/jc.2009-1064. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu S-H, Neale MC, Acton AJ, Jr, Considine RV, Krasnow RE, Reed T, Dai J. Genetic and environmental influences on the prospective correlation between systemic inflammation and coronary heart disease death in male twins. Arterioscler Thromb Vasc Biol. 2014;34:2168. doi: 10.1161/ATVBAHA.114.303556. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang S, Liu,Xin,Necheles J, Tsai H-J, Wang G, Wang B, Xing H, Li Z, Liu,Xue,Zang T, Xu X, Wang X. Genetic and environmental influences on serum lipid tracking: a population-based, longitudinal chinese twin study. Pediatr Res. 2010;68:316–322. doi: 10.1203/PDR.0b013e3181eeded6. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material 1 (DOCX 18.6 kb)^{(18.6KB, docx)}

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Code is available in a repository for replication.

[CR1] Burgess S, Thompson SG. Use of allele scores as instrumental variables for Mendelian randomization. Int J Epidemiol. 2013;42:1134–1144. doi: 10.1093/ije/dyt093. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] Burgess S, Thompson SG, CRP CHD Genetics Collaboration Avoiding bias from weak instruments in Mendelian randomization studies. Int J Epidemiol. 2011;40:755–764. doi: 10.1093/ije/dyr036. [DOI] [PubMed] [Google Scholar]

[CR3] Burgess S, Smith GD, Davies NM, Dudbridge F, Gill D, Glymour MM, Hartwig FP, Holmes MV, Minelli C, Relton CL, Theodoratou E (2020) Guidelines for performing Mendelian randomization investigations. 10.12688/wellcomeopenres.15555.2

[CR4] Choi KW, Stein MB, Nishimi KM, Ge T, Coleman JRI, Chen C-Y, Ratanatharathorn A, Zheutlin AB, Dunn EC, Breen G, Koenen KC, Smoller JW. An exposure-wide and Mendelian randomization approach to identifying modifiable factors for the prevention of depression. Am J Psychiatry. 2020 doi: 10.1176/appi.ajp.2020.19111158. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] Demontis D, Walters RK, Martin J, Mattheisen M, Als TD, Agerbo E, Baldursson G, Belliveau R, Bybjerg-Grauholm J, Bækvad-Hansen M, Cerrato F, Chambert K, Churchhouse C, Dumont A, Eriksson N, Gandal M, Goldstein JI, Grasby KL, Grove J, Gudmundsson OO, Hansen CS, Hauberg ME, Hollegaard MV, Howrigan DP, Huang H, Maller JB, Martin AR, Martin NG, Moran J, Pallesen J, Palmer DS, Pedersen CB, Pedersen MG, Poterba T, Poulsen JB, Ripke S, Robinson EB, Satterstrom FK, Stefansson H, Stevens C, Turley P, Walters GB, Won H, Wright MJ, Andreassen OA, Asherson P, Burton CL, Boomsma DI, Cormand B, Dalsgaard S, Franke B, Gelernter J, Geschwind D, Hakonarson H, Haavik J, Kranzler HR, Kuntsi J, Langley K, Lesch K-P, Middeldorp C, Reif A, Rohde LA, Roussos P, Schachar R, Sklar P, Sonuga-Barke EJS, Sullivan PF, Thapar A, Tung JY, Waldman ID, Medland SE, Stefansson K, Nordentoft M, Hougaard DM, Werge T, Mors O, Mortensen PB, Daly MJ, Faraone SV, Børglum AD, Neale BM. Discovery of the first genome-wide significant risk loci for attention-deficit/hyperactivity disorder. Nat Genet. 2019;51:63. doi: 10.1038/s41588-018-0269-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] Dudbridge F. Polygenic Mendelian randomization. Cold Spring Harb Perspect Med. 2021;11:a039586. doi: 10.1101/cshperspect.a039586. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] Duffy DL, Martin NG. Inferring the direction of causation in cross-sectional twin data: theoretical and empirical considerations. Genet Epidemiol. 1994;11:483–502. doi: 10.1002/gepi.1370110606. [DOI] [PubMed] [Google Scholar]

[CR8] Evans DM, Davey Smith G. Mendelian randomization: new applications in the coming age of hypothesis-free causality. Annu Rev Genomics Hum Genet. 2015;16:327–350. doi: 10.1146/annurev-genom-090314-050016. [DOI] [PubMed] [Google Scholar]

[CR9] Evans DM, Duffy DL. A simulation study concerning the effect of varying the residual phenotypic correlation on the power of bivariate quantitative trait loci linkage analysis. Behav Genet. 2004;34:135–141. doi: 10.1023/B:BEGE.0000013727.15845.f8. [DOI] [PubMed] [Google Scholar]

[CR10] Furlong MA, Klimentidis YC. Associations of air pollution with obesity and body fat percentage, and modification by polygenic risk score for BMI in the UK biobank. Environ Res. 2020;185:109364. doi: 10.1016/j.envres.2020.109364. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] Gillespie NA, Zhu G, Neale MC, Heath AC, Martin NG. Direction of causation modeling between cross-sectional measures of parenting and psychological distress in female twins. Behav Genet. 2003;33:14. doi: 10.1023/A:1025365325016. [DOI] [PubMed] [Google Scholar]

[CR12] Heath AC, Kessler RC, Neale MC, Hewitt JK, Eaves LJ, Kendler KS. Testing hypotheses about direction of causation using cross-sectional family data. Behav Genet. 1993;23:29–50. doi: 10.1007/BF01067552. [DOI] [PubMed] [Google Scholar]

[CR13] Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, Laurin C, Burgess S, Bowden J, Langdon R, Tan VY, Yarmolinsky J, Shihab HA, Timpson NJ, Evans DM, Relton C, Martin RM, Davey Smith G, Gaunt TR, Haycock PC. The MR-base platform supports systematic causal inference across the human phenome. eLife. 2018;7:e34408. doi: 10.7554/eLife.34408. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] Hunter MD, Garrison SM, Burt SA, Rodgers JL. The analytic identification of variance component models common to behavior genetics. Behav Genet. 2021;51:425–437. doi: 10.1007/s10519-021-10055-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] Jordan DM, Verbanck M, Do R. HOPS: a quantitative score reveals pervasive horizontal pleiotropy in human genetic variation is driven by extreme polygenicity of human traits and diseases. Genome Biol. 2019;20:222. doi: 10.1186/s13059-019-1844-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] Katikireddi SV, Green MJ, Taylor AE, Davey Smith G, Munafò MR. Assessing causal relationships using genetic proxies for exposures: an introduction to Mendelian randomization. Addict Abingdon Engl. 2018;113:764–774. doi: 10.1111/add.14038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] Kohler H-P, Behrman JR, Schnittker J. Social science methods for twins data: integrating causality, endowments, and heritability. Biodemogr Soc Biol. 2011;57:88–141. doi: 10.1080/19485565.2011.580619. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] Kuchenbaecker K, Telkar N, Reiker T, Walters RG, Lin K, Eriksson A, Gurdasani D, Gilly A, Southam L, Tsafantakis E, Karaleftheri M, Seeley J, Kamali A, Asiki G, Millwood IY, Holmes M, Du H, Guo Y, Kumari M, Dedoussis G, Li L, Chen Z, Sandhu MS, Zeggini E, Group USS. The transferability of lipid loci across African, Asian and European cohorts. Nat Commun. 2019 doi: 10.1038/s41467-019-12026-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] Maes HH, Neale MC, Kirkpatrick RM, Kendler KS. Using multimodel inference/model averaging to model causes of covariation between variables in twins. Behav Genet. 2021;51:82–96. doi: 10.1007/s10519-020-10026-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] Minică CC, Dolan CV, Boomsma DI, de Geus E, Neale MC. Extending causality tests with genetic instruments: an integration of Mendelian randomization with the classical twin design. Behav Genet. 2018;48:337–349. doi: 10.1007/s10519-018-9904-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] Neale MC, Cardon LR. Methodology for genetic studies of twins and families. Dordrecht: Kluwer; 1992. [Google Scholar]

[CR21] Neale MC, Duffy DL, Martin NG. Direction of causation: reply to commentaries. Genet Epidemiol. 1994;11:463–472. doi: 10.1002/gepi.1370110603. [DOI] [Google Scholar]

[CR22] Neale MC, Hunter MD, Pritikin JN, Zahery M, Brick TR, Kirkpatrick RM, Estabrook R, Bates TC, Maes HH, Boker SM. OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika. 2016;81:535–549. doi: 10.1007/s11336-014-9435-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] Ohlsson H, Kendler KS. Applying causal inference methods in psychiatric epidemiology: a review. JAMA Psychiatry. 2020;77:637–644. doi: 10.1001/jamapsychiatry.2019.3758. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] Pasman JA, Demange PA, Guloksuz S, Willemsen AHM, Abdellaoui A, ten Have M, Hottenga J-J, Boomsma DI, de Geus E, Bartels M, de Graaf R, Verweij KJH, Smit DJ, Nivard M, Vink JM. Genetic risk for smoking: disentangling interplay between genes and socioeconomic status. Behav Genet. 2022;52:92–107. doi: 10.1007/s10519-021-10094-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] Polderman TJC, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visscher PM, Posthuma D. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet. 2015;47:702–709. doi: 10.1038/ng.3285. [DOI] [PubMed] [Google Scholar]

[CR26] R Core Team (2021) R: a language and environment for statistical computing

[CR27] Rasmussen SHR, Ludeke S, Hjelmborg JVB. A major limitation of the direction of causation model: non-shared environmental confounding. Twin Res Hum Genet Off J Int Soc Twin Stud. 2019;22:14–26. doi: 10.1017/thg.2018.67. [DOI] [PubMed] [Google Scholar]

[CR28] Rokholm B, Silventoinen K, Tynelius P, Gamborg M, Sørensen TIA, Rasmussen F. Increasing genetic variance of body mass index during the Swedish obesity epidemic. PLoS ONE. 2011 doi: 10.1371/journal.pone.0027135. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] Scherrer JF, Xian H, Bucholz KK, Eisen SA, Lyons MJ, Goldberg J, Tsuang M, True WR. A twin study of depression symptoms, hypertension, and heart disease in middle-aged men. Psychosom Med. 2003;65:548–557. doi: 10.1097/01.PSY.0000077507.29863.CB. [DOI] [PubMed] [Google Scholar]

[CR30] Smith AB. The state of research on the effects of physical punishment. Soc Policy J N Z. 2006;27:114–127. [Google Scholar]

[CR31] Sullivan PF, Kendler KS, Neale MC. Schizophrenia as a complex trait: evidence from a meta-analysis of twin studies. Arch Gen Psychiatry. 2003;60:1187–1192. doi: 10.1001/archpsyc.60.12.1187. [DOI] [PubMed] [Google Scholar]

[CR32] Timpson NJ, Nordestgaard BG, Harbord RM, Zacho J, Frayling TM, Tybjærg-Hansen A, Smith GD. C-reactive protein levels and body mass index: elucidating direction of causation through reciprocal Mendelian randomization. Int J Obes. 2011;35:300–308. doi: 10.1038/ijo.2010.137. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] van der Sluis S, Dolan CV, Neale MC, Posthuma D. Power calculations using exact data simulation: a UsefulTool for genetic study designs. Behav Genet. 2008;38:202–211. doi: 10.1007/s10519-007-9184-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] Venables WN, Ripley BD, Venables WN. Modern applied statistics with S, statistics and computing. 4. New York: Springer; 2002. [Google Scholar]

[CR35] Verhulst B. A power calculator for the classical twin design. Behav Genet. 2017;47:255–261. doi: 10.1007/s10519-016-9828-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] Verhulst B, Neale MC, Kendler KS. The heritability of alcohol use disorders: a meta-analysis of twin and adoption studies. Psychol Med. 2015;45:1061–1072. doi: 10.1017/S0033291714002165. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] Verhulst B, Clark SL, Chen J, Maes HH, Chen X, Neale MC. Clarifying the genetic influences on nicotine dependence and quantity of use in cigarette smokers. Behav Genet. 2021;51:375–384. doi: 10.1007/s10519-021-10056-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] Verweij KJH, Zietsch BP, Lynskey MT, Medland SE, Neale MC, Martin NG, Boomsma DI, Vink JM. Genetic and environmental influences on cannabis use initiation and problematic use: a meta-analysis of twin studies. Addict Abingdon Engl. 2010;105:417. doi: 10.1111/j.1360-0443.2009.02831.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] Welsh P, Polisecki E, Robertson M, Jahn S, Buckley BM, de Craen AJM, Ford I, Jukema JW, Macfarlane PW, Packard CJ, Stott DJ, Westendorp RGJ, Shepherd J, Hingorani AD, Smith GD, Schaefer E, Sattar N. Unraveling the directional link between adiposity and inflammation: a bidirectional Mendelian randomization approach. J Clin Endocrinol Metab. 2010;95:93–99. doi: 10.1210/jc.2009-1064. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] Wu S-H, Neale MC, Acton AJ, Jr, Considine RV, Krasnow RE, Reed T, Dai J. Genetic and environmental influences on the prospective correlation between systemic inflammation and coronary heart disease death in male twins. Arterioscler Thromb Vasc Biol. 2014;34:2168. doi: 10.1161/ATVBAHA.114.303556. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] Zhang S, Liu,Xin,Necheles J, Tsai H-J, Wang G, Wang B, Xing H, Li Z, Liu,Xue,Zang T, Xu X, Wang X. Genetic and environmental influences on serum lipid tracking: a population-based, longitudinal chinese twin study. Pediatr Res. 2010;68:316–322. doi: 10.1203/PDR.0b013e3181eeded6. [DOI] [PubMed] [Google Scholar]

PERMALINK

MR-DoC2: Bidirectional Causal Modeling with Instrumental Variables and Data from Relatives

Luis F S Castro-de-Araujo

Madhurbain Singh

Yi Zhou

Philip Vinh

Brad Verhulst

Conor V Dolan

Michael C Neale

Abstract

Supplementary Information

Introduction

Fig. 1.

Methods

Fig. 2.

Table 2.

Table 3.

Table 5.

Simulation procedure

Table 4.

Table 1.

Results

Power

Fig. 3.

Discussion

Supplementary Information

Author contributions

Funding

Data availability

Code availability

Declarations

Conflict of interest

Human and Animal Rights and Informed consent

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases