Abstract
Much research effort is invested in attempting to determine causal influences on disease onset and progression to inform prevention and treatment efforts. However, this is often dependent on observational data that are prone to well-known limitations, particularly residual confounding and reverse causality. Several statistical methods have been developed to support stronger causal inference. However, a complementary approach is to use design-based methods for causal inference, which acknowledge sources of bias and attempt to mitigate these through the design of the study rather than solely through statistical adjustment. Genetically informed methods provide a novel and potentially powerful extension to this approach, accounting by design for unobserved genetic and environmental confounding. No single approach will be absent from bias. Instead, we should seek and combine evidence from multiple methodologies that each bring different (and ideally uncorrelated) sources of bias. If the results of these different methodologies align—or triangulate—then we can be more confident in our causal inference. To be truly effective, this should ideally be done prospectively, with the sources of evidence specified in advance, to protect against one final source of bias—our own cognitions, expectations, and fondly held beliefs.
[I]n a question like this truth is only to be had by laying together many varieties of error.
—Virginia Woolf, A Room of One's Own
The ongoing debate regarding the robustness of many published research findings highlights some of the challenges inherent in science: How can we know that we have the right answer? Certainly there is growing evidence that a number of factors may conspire to reduce confidence in any given result—sample sizes that are too small to provide precise estimates, overenthusiastic interrogation of data to achieve sufficiently small P-values to enable publication, failure to conduct replication studies or report null results, and so on (Munafò et al. 2017). However, even if all of these issues were to be addressed—if studies were sufficiently well powered, routinely subject to replication, analyzed according to prespecified analysis plans, and transparently reported—this would not be enough. We might achieve far more robust observations, but we still might not know that we are right in the inferences we draw from those observations. This is simply because results may be the artifact of various biases—in the study design or the data—that mean that the apparent explanation for the observation is not the correct one.
This issue is particularly important in the context of causal inference as opposed to prediction (see Box 1; Shmueli 2010). Well-described problems of reverse causality and residual confounding mean that, in observational epidemiology, we can never be certain that our observations represent true causal pathways. Whether we simply want to determine whether a causal pathway is operating, or to quantify a causal effect (i.e., estimate the magnitude of the effect correctly), conventional observational epidemiology has a long history of providing very precise but often wrong answers (Egger et al. 1998; Gage et al. 2016). Many associations are genuine and robust, but do not represent causal pathways. In other words, the apparent precision offered by the application of statistical tools may lead us astray and tempt us into ignoring the causal uncertainty that remains after the application of any single method. Ultimately, inference remains a matter of judgment, and this requires a broader perspective (Gigerenzer et al. 1989).
BOX 1.
Prediction and causal inference may appear superficially similar; both may rely on regression-based approaches to identify an association between X and Y. However, they differ in the fundamental question that each attempts to answer.
In the case of prediction, we are interested in obtaining the best predictions given the data. We want to predict one variable, and how we model this (i.e., what variables are included) is not of particular interest; all that matters is how well the model predicts the outcome. Polygenic scores have proved to be valuable in a prediction context (Lewis and Vassos 2017).
In the case of causal inference, we are interested in identifying factors that causally affect the outcome, and perhaps in precisely estimating the magnitude of that causal effect. In other words, we want to understand the relationship between two variables: the exposure and the outcome.
In the former, multiple variables are evaluated in terms of whether they add predictive ability over other variables, not whether they are causal. In the latter, the same variables may be problematic because they may produce spurious associations, or distort the magnitude of true causal effects between the exposure and outcome of interest (i.e., are confounding variables).
The relative value of data analyses and study design in causal inference has been (somewhat rhetorically) debated over many years (Robins 1999; Rosenbaum 1999; Rubin 2008; Pearl 2009). While there is a clear distinction at the extremes of the continuum—from performing multivariable analyses of a dataset that happens to be available to carrying out a meticulous randomized controlled trial—many inferential methods lie between these two extremes. An important aspect of inference relates to the power of incorporating exogenous perturbators into study design, such as the use of policy changes or the causal anchoring of transmitted genotypes (Davey Smith 2012). Many writers have suggested that considering a formal causal inference framework will disclose those situations in which a particular method is unable to produce reliable inference, because the data are inadequate or a defensible directed acyclic graph (DAG) cannot be drawn (Tennant et al. 2019).
However, it is rare to see this reported as a reason why any particular inference strategy has not been applied. An example relates to the highly replicable inverse association of circulating high-density lipoprotein cholesterol (HDL-c) and coronary heart disease (CHD), which was seen for every data analytical strategy that was applied (Davey Smith and Phillips 2020), placing it among the most robust epidemiological findings. However, it was argued that conventional analysis could not provide an accurate inference given the very high correlation between HDL-c and other potential causes of CHD (Phillips and Davey Smith 1991). Subsequent to this, many large randomized controlled trials testing a variety of HDL-c raising agents produced universally null findings in relation to CHD, as did Mendelian randomization (MR) studies (Davey Smith and Phillips 2020). Here, use of exogenous perturbators produced findings in strong agreement, but distinct from those produced by a range of data analytic strategies. We recognize that there is no clear dividing line between analysis and design strategies, and that this is a contested area that our intervention will not resolve. However, we maintain that it is a useful heuristic when presenting a broad overview of triangulation as it relates to strategies in causal inference.
There is a widespread notion that falsification represents the cornerstone of science. In fact, few studies are explicitly designed to falsify a theory or proposition. Scientists follow a process closer to what Peter Lipton described as “inference to the best explanation” (Lipton 2004; Munafò and Davey Smith 2018). This rests on abductive, rather than deductive, reasoning. It is linked to the notion of consilience proposed by Whewell (1840), which is essentially the idea that strong theories emerge from the integration of multiple lines of evidence. Triangulation as an approach to inference takes this as a starting point, but is more deliberate; it intentionally (and ideally prospectively) uses different methodologies, each one of which brings unique (and ideally unrelated) sources of bias, and applies the most rigorous standards of causal inference both within individual approaches and in the integration of evidence across these.
The application of triangulation within the context of etiological epidemiology has been presented in detail elsewhere (Lawlor et al. 2016), and its relationship with other approaches to causal inference discussed (Krieger and Davey Smith 2016). Here we briefly review the logic of triangulation, and provide specific examples of study designs that can be used in a triangulation framework, with a particular emphasis on genetically informed designs. We argue that adding genetics into a triangulation framework has particular value because of the plausibility that any biases associated with these approaches will be different from those of other study design approaches. We then discuss the potential for quantitative triangulation, and how to resolve a lack of convergence of results from different designs, and conclude with future directions, including a brief review of emerging genetically informed methods that may be able to further contribute to triangulation approaches.
TRIANGULATION IN THEORY
In etiological epidemiology, triangulation rests on “… integrating results from several different approaches” (Munafò and Davey Smith 2018). However, this integration can in reality be done either retrospectively or prospectively. The former, arguably, is what scientists have always done when selecting evidence to describe in the discussion of their own research, which supports the inferences they wish to make (although not necessarily considering the sources, nature, and direction of possible bias). This, of course, is itself prone to strong confirmation bias—the tendency (consciously or unconsciously) to seek out, or selectively emphasize, information that conforms with our preexisting beliefs (Nickerson 1998). Therefore, for triangulation to be valuable we should, as Bernoulli argued “… pay attention not only to those arguments that serve to prove a thing, but also to all those that can be adduced for the contrary, so that, when both groups have been properly weighed, it may be established which arguments preponderate” (Bernoulli 2006; Weisberg 2014).
Triangulation, therefore, should be done prospectively; it is a design consideration, rather than merely the selective use of evidence from different sources after the fact. It requires the use of at least two different methodologies with differing key sources of potential bias. Because biases will either lead to the overestimation or underestimation of causal effects, these sources of bias should differ in the direction of bias they introduce. The results from these are then compared. If they align and support the same conclusion, we can be more confident in that conclusion, whereas if they differ, we need to attempt to better understand the sources of bias and how they are influencing the observed results (Fig. 1). For each methodology, we need to acknowledge key sources of bias, and consider their potential impact so that we can be explicit about the expected direction of all sources of potential bias wherever possible (Munafò and Davey Smith 2018).
As discussed above, the different approaches that are used to support causal inference may be broadly divided into two kinds: those that use analytical (i.e., statistical) methods to deal with confounding and other sources of bias to arrive at a causal estimate, and those that use design-based approaches to do so (see Table 1).
Table 1.
Study design | Strengths versus statistical adjustment | Sources of bias/assumptions | Examples |
---|---|---|---|
Statistical adjustment for confounders | Not applicable | No unmeasured confounders; no measurement error in assessed confounders; correctly specified model (this applies to all methods to a greater or lesser degree) | Vitamin E supplement use as an exposure appearing to be strongly protective against coronary heart disease (CHD) (Rimm et al. 1993; Stampfer et al. 1993). |
Randomized controlled trials | Randomization leads to no systematic confounding | The intervention does not have effects except through changes in the exposure of interest | Vitamin E supplement use, despite apparently strong and replicable evidence from naive observational analyses, did not lower CHD risk in a large number of randomized controlled trials (RCTs) (Eidelman et al. 2004) |
Natural experiments | No systematic confounding | Differences on characteristics that may confound any observed association, or misclassification of exposure that relates to the naturally occurring exposure | Raising of the school leaving age in the United Kingdom is associated with reduced cigarette smoking and a number of related adverse health outcomes (Davies et al. 2018b) |
Instrumental variables (including Mendelian randomization) | No systematic confounding | Instrument not truly associated with the exposure (or weakly associated weak instrument bias), is associated with confounders, or the exclusion restriction criterion is violated | Using genetic variants associated with homocysteine levels, no association with CHD is found, consistent with results of RCTs (Clarke et al. 2012) |
Multiple control groups/cross-context comparison | Reveals context-specific confounding | Assumes that the sources of bias in the different groups are different, and would produce different associations, which may not be directly testable | Breastfeeding is differentially socially patterned in high versus low-/middle-income countries, but consistently associated with offspring intelligence (Brion et al. 2011) |
Positive and negative controls | Reveals existence of potential unmeasured confounding | Assumes (in the case of negative controls) that the exposure cannot plausibly be associated with the outcome (or vice versa), which is not directly testable | Smoking is robustly associated with suicide even after adjustment, which is plausibly causal, but is also equally strongly associated with homicide, which is not (Davey Smith et al. 1992) |
Discordant siblings | Robust to fixed maternal effects that could confound the association | Assumes that any misclassification of the exposure or the outcome is similar across siblings, and there is little or no individual-level confounding | When siblings discordant for maternal smoking in pregnancy are compared, this does not appear to influence tobacco dependence in adult offspring (Thapar et al. 2009) |
Children of twinsa | Robust to genetic confounding (for between-MZ maternal twin pair analysis), or confounding by other factors shared between MZ twins | Assumes no unmeasured confounding by factors that differ between twins | When MZ twins discordant for smoking are compared, this indicates a causal effect of smoking during pregnancy on offspring birth weight (Magnus et al. 1985) |
(MZ) monozygotic, (DZ) dizygotic.
aUsing this method, a comparison of between-MZ and between-DZ twin analyses also allows estimation of the extent of genetic confounding.
Multivariable regression analysis is the most commonly used analytical approach, and requires the assumption that all confounders are accurately measured and controlled for (which in practice is rarely—if ever—true), that participation in the study is not influenced by factors that may generate spurious associations (i.e., selection bias), and that misclassification of the exposure is not related to the outcome (and vice versa). In practice, it is difficult to test that these assumptions hold. More importantly, unmeasured or residual confounding is likely to be extremely common, if only because measurement of confounders is never without error. Extensions to this analytical approach includes matching methods (Stuart 2010), g methods, and target trial emulation (Hernán and Robins 2016).
Design-based approaches do not rely on this assumption, and instead attempt to address confounding principally through design, rather than statistical adjustment. Examples include the use of multiple control or comparison groups, positive and negative controls, and cross-contextual comparisons (which can help to understand the nature of any confounding), the use of instrumental variable, MR, regression discontinuity, and between-sibling designs (which attempt to reduce or eliminate confounding), and randomized controlled trials (i.e., experimental methods). We describe these below. These approaches still make assumptions of course, and are subject to potential bias, but in triangulation these are accepted and acknowledged, and offset by the use of multiple approaches (including analytical approaches).
Other sources of evidence may include evidence from model systems (e.g., studies in rodents), but these may be too far removed to allow direct comparison with studies in humans. Nevertheless, in some cases, so-called incommensurable sources of evidence such as this can be powerful, for example, the demonstration that low-density lipoprotein (LDL) cholesterol contributes to the development of atherosclerosis in model systems helped to confirm its role in human cardiovascular disease (Emini Veseli et al. 2017). Both analytical and study design methods can also apply sensitivity analyses that may identify bias, or demonstrate robustness (Box 2).
BOX 2.
Maternal smoking during pregnancy has been associated with offspring birth weight for many decades (Lowe 1959); but, as with smoking and other health outcomes, the causal nature of this association was disputed. Here we are not reviewing the (interesting) history of this debate; rather, we document how findings from various approaches to strengthening causal inference can be triangulated to produce an overall evidence base that is considerably more robust than that from any individual study alone (Davey Smith 2008; Richmond et al. 2014).
-
Observational studies
A large number of observational studies have demonstrated that maternal smoking during pregnancy is associated with birth weight of offspring.
-
Cross-contextual comparisons
Observational studies that have been carried out in different contexts where the confounding of maternal smoking with socioeconomic position and related factors differ, and those that are adequately powered consistently demonstrate the same direction and approximate magnitude of association.
-
Negative control studies
Negative control studies using paternal smoking as a factor that may be associated with confounders to the same degree as maternal smoking, but cannot have the same magnitude of a direct intrauterine effect, demonstrate considerably larger associations of maternal than paternal smoking with offspring birth weight (Davey Smith 2008). A second negative control, maternal smoking either before or after pregnancy but not during pregnancy, does not relate to offspring birth weight to the same extent as maternal smoking during pregnancy.
-
Within-sibship studies
Studying mothers who smoked during at least one pregnancy, and did not smoke during at least one other pregnancy, find on-average birth weight differences between the offspring born following maternal smoking and their siblings who were not exposed to antenatal smoke (Kuja-Halkola et al. 2014).
-
Children of twins
Offspring of female monozygotic (MZ) twin pairs in which one mother smokes and the other does not smoke finds lower birth weight for the offspring of the former (Magnus et al. 1985).
-
Mendelian randomization (MR)
A genetic variant that relates to heavier smoking carried by mothers is associated with lower birth weight of offspring. This association is limited to mothers who smoke, strongly suggesting that the effect of the variant is due to its influence on maternal smoking (Tyrrell et al. 2012). MR can be conceptualized within the instrumental variables (IVs) framework. An IV is a measure that relates to the exposure of interest and is only related to the outcome of interest through this association (i.e., the IV is not associated with confounding factors and the IV has no direct influence on the outcome).
-
Nongenetic IVs
Other nongenetic IVs for maternal smoking behavior can be used. Thus, a study used different levels of cigarette taxes across the United States, which influence smoking levels, and demonstrated a birth weight–lowering effect of smoking (Evans and Ringel 1999).
-
Randomized controlled trials (RCTs)
In RCTs, a group of mothers who are smoking before or during pregnancy are randomized to an intervention aimed at reducing smoking. Evidence that such an intervention leads to higher birth weight among offspring of the mothers randomized to the intervention has been seen (Hamilton 2001).
aData reproduced with permission from Krieger and Davey Smith (2016).
Critically, methods that rely on statistical adjustment to control confounding are likely to have similar (or at least related) sources of bias when applied to a single dataset, or to similar datasets (e.g., cohorts drawn from the same population, so that confounding structures are likely to be similar). In contrast, design-based approaches (including the use of analytical methods across different datasets drawn from different populations) are more likely to have different sources and, potentially, directions of bias. Across design-based approaches, the aim is to select approaches or settings in which biases can be expected to be orthogonal, meaning that knowledge of the true bias present in one approach is uninformative about the nature of the bias present in the alternative approach. Therefore, if both give highly similar results, we can be more confident that the result is correct, simply because the chance that those very different potential sources of bias would align to give similar results is presumably small. By extension, the more methods we can triangulate, the more confident we can be in our results.
STUDY DESIGNS USED IN TRIANGULATION
Various threats to validity may result in studies failing to reveal causal effects accurately (Matthay and Glymour 2020). Conventional observational epidemiological methods that use statistical adjustment can certainly be used in a triangulation framework, but should be complemented by one or more additional methods that rely on design-based approaches to address potential confounding.
Randomized Controlled Trials
The RCT is often considered the gold standard for causal inference, and represents the most common approach that uses study design as the basis for causal inference. Nevertheless, an RCT is also still prone to potential bias, because it rests on the assumption that the groups are similar except with respect to the intervention. This assumption may be violated, for example, due to failure to implement successful randomization (e.g., through lack of concealment of the random allocation), and differential loss to follow-up between groups. And, of course, an RCT is not always ethical or feasible. However, other design-based methods that can be applied to observational data exist.
Instrumental Variables
An instrumental variable is a variable that is robustly associated with an exposure of interest, but is not confounded with the outcome, and has no direct effect on the outcome (Angrist and Pischke 2009). The key assumptions are therefore that the instrument is robustly associated with the exposure (the relevance criterion), is not associated with confounders (the exchangeability criterion), and is not associated with the outcome other than via its association with the exposure (the exclusion restriction criterion). Potential sources of bias include the instrument not truly being associated with the exposure (see Fig. 2), the presence of confounding, or the exclusion restriction criterion being violated. If the association of the instrument with the exposure is weak, this may lead to so-called weak instrument bias (Davies et al. 2018a). Natural experiments (see below) are sometimes considered as an instrumental variable approach, but here we describe them separately.
Natural Experiments
Natural experiments compare populations who experience a “natural” exposure, leading to “quasi-random” exposure. One example is where exposure is determined by whether a particular continuous variable exceeds a specific threshold (leading to a regression discontinuity analysis) (Shadish et al. 2001; Morgan 2013). The key assumption is that the populations compared are similar (e.g., with respect to the underlying confounding structure) except for the naturally occurring exposure. Comparing potential confounding variables either side of the “exposure” within a regression discontinuity analysis can be used to explore this assumption (although it can never be tested with certainty). Potential sources of bias include differences by characteristics that may confound any observed association, or differential misclassification of exposure according to the level of the naturally occurring exposure. It is worth noting that, in many cases, these exogenous perturbators (e.g., natural disasters) may influence classification of the outcome, which will bias estimates. Steiner and colleagues present a range of DAGs for a number of quasi-experimental designs, including regression discontinuity and instrument variables (Steiner et al. 2017).
Different Confounding Structures
Consistency of observed associations across samples with different confounding structures can raise confidence that the association reflects a causal effect (Brion et al. 2011). One approach to this is to use multiple control groups within a case-control design, where bias generated by the control groups is anticipated to be in different directions (Yoon et al. 2011). This assumes that the sources of bias in the control groups are different, and would produce different associations. A related approach is the use of cross-context comparisons, where results across multiple populations in which confounding structures are different are compared. This assumes that the bias introduced by confounding will be different across contexts, so that congruent results are more likely to reflect causal effects (Brion et al. 2011).
Positive and Negative Controls
The use of positive and negative controls—common in preclinical experimental research—can be applied to both exposures and outcomes (Davey Smith 2008; Lipsitch et al. 2010). A positive control exposure is one that is known to be causally related to the outcome, and can be used to ensure the population sampled generates credible associations that would be expected. In contrast, a positive control outcome is one that is known to be causally related to the exposure. For example, smoking and lung cancer can be used, with smoking as a positive control exposure in a lung cancer outcome study, or lung cancer as a positive control outcome in a smoking exposure study.
A negative control exposure, on the other hand, is one that is not plausibly causally related to the outcome, while a negative control outcome is one that is not plausibly causally related to the exposure. For example, smoking is robustly associated with risk of suicide, which is plausibly causal, but is also equally strongly associated with homicide, which is not. Here, risk of homicide constitutes a negative control outcome, and the observed association of smoking and homicide therefore casts doubt on a causal interpretation of the observed association of smoking and suicide, particularly given that they are similar in magnitude (Davey Smith et al. 1992).
This approach has been used quite widely in the context of intrauterine exposures, using paternal behavior (e.g., smoking) as a negative control exposure for the exposure of interest (in this case, maternal smoking during pregnancy), on the assumption that confounding structures will be similar for both parents, but any biological effects of paternal smoking on the developing fetus will be negligible compared with any effects of maternal smoking (see below).
Discordant Siblings
A related approach is the use of siblings discordant for an exposure (Keyes et al. 2013). For example, two siblings born to a mother who smoked during one pregnancy but not the other provide information on the intrauterine effects of tobacco exposure, while controlling for observed and unobserved familial confounding (i.e., shared environmental confounders and 50% of genetic confounding). An extension of this approach is the use of MZ twins within a discordant-sibling design, for example, two MZ twins, one who smokes and one who does not. This controls for 100% of genetic confounding (Keyes et al. 2013; McAdams et al. 2020). These approaches assume that any misclassification of the exposure or the outcome is similar across siblings, and there is little or no individual-level confounding. Sibling studies cannot account for nonfamilial (i.e., nonshared) environmental confounding (e.g., MZ twins discordant for smoking may still differ in terms of other health behaviors such as alcohol consumption), so that statistical adjustment for these observed sources of potential confounding may still be important (Pingault et al. 2018). These approaches can be used in conjunction with genetic data (see below).
Incommensurable Evidence
Another class of approach to causal inference that can be applied within a triangulation framework is so-called incommensurable sources of evidence (Hey 2015). These are fundamentally different sources of evidence, including in vitro studies and studies of cell systems or animal models. These methodologies may allow the experimental manipulation of exposures where an RCT in a human sample would not be ethical or feasible, and can be particularly powerful when interrogating mechanistic pathways between an exposure and an outcome. However, these rely on the assumption that the results obtained (e.g., in a rodent model) will apply in human populations.
TRIANGULATION IN GENETICALLY INFORMED DESIGNS
Many of the study designs described above are reasonably widely used (some more than others). However, with the advent of genomic data in large cohort studies, a range of related methods that leverage the properties of genetic inheritance have been developed to aid causal inference (Pingault et al. 2018). Many are variants of the methods described above, but the incorporation of genomic information can address some of the biases that impact on the basic nongenetic methods, or provide alternative approaches that provide a richer portfolio of designs that lend themselves to triangulation. This is very much an area under active development, leading in some cases to different analytical approaches within a single overarching study design method that in turn allows triangulation within a method (e.g., various forms of pleiotropy-robust MR methods; see below), as well as across methods (e.g., other analysis- and design-based approaches).
Mendelian Randomization
MR is now a widely used genetically informed design-based method for causal inference, which is often implemented through an instrumental variable analysis (relying on the same three core assumptions) (see Fig. 2; Richmond and Davey Smith 2020), where genetic variants serve as the instrumental variables (or proxies) for the exposure of interest (Davey Smith and Ebrahim 2003; Davies et al. 2018a). Violation of the exclusion restriction criterion due to horizontal (or biological) pleiotropy—the direct effect of genetic variants on multiple phenotype—is the main likely source of bias, and for this reason a number of extensions to the foundational method have been developed that are robust to horizontal pleiotropy (Hemani et al. 2018; Hekselman and Yeger-Lotem 2020). Population stratification is another potential source of bias, which may require focusing on an ethnically homogeneous population, adjusting for genetic principal components that reflect different population subgroups, or using family-based designs (see below). MR has been described in detail elsewhere, including the now wide range of pleiotropy-robust methods that allow for triangulation within MR designs (Davies et al. 2018a).
Other Extensions
Other extensions to conventional design-based approaches to causal inference exist. Indeed, many of the methods described above can be incorporated into MR designs. This might include the use of negative control outcomes (Sanderson et al. 2020), or the use of MR across contexts (which may require the use of different genetic instruments, given that common variants associated with an exposure may differ in allele frequency across populations, and therefore not be suitable for use in some populations). Multivariable MR methods, which allow the unique contribution of multiple exposures to be examined, and mediating pathways explored, now exist (Sanderson 2020). Family-based designs (including the discordant-sibling approach described above) can be particularly powerful (McAdams et al. 2020; Thapar and Rice 2020). Critically, these methods can incorporate MR. The MR direction of causation (DOC) method, for example, combines the principles of MR with the DOC twin method from biometric genetics (Hwang et al. 2020). Extensions to the MR approach also include the demonstration of how particular exposures influence the biology of the specific tissues relevant to the disease being studied (Taylor et al. 2019; Hekselman and Yeger-Lotem 2020; Porcu et al. 2020), effectively a form of incommensurable evidence.
EXAMPLES OF TRIANGULATION IN PRACTICE
We are now seeing a number of studies published that explicitly use a triangulation framework for causal inference, many of which leverage these genetically informed designs, and their orthogonal biases, for comparison with other study designs. For example, using genetic variants associated with educational attainment as a proxy for years of education in an MR framework may be subject to bias due to horizontal pleiotropy. On the other hand, using the natural experiment created by raising the school leaving age to investigate the causal effect of years of education in a regression discontinuity framework may be subject to the vagaries of other societal changes that happened around the same time. Below we illustrate this with selected examples—of educational attainment, breastfeeding, cigarette smoking, and folate supplementation—that highlight the strength of the approach, and the complementarity of different nongenetic and genetically informed methodologies.
Educational Attainment
The example of years of education, discussed above, performs well when investigating whether educational attainment protects against smoking. MR studies suggest that increased educational attainment reduces likelihood of smoking initiation, and both reduces heaviness of smoking among smokers and increases the likelihood of smoking cessation (Gage et al. 2018). Multivariable MR methods indicate that this is in fact due to educational attainment, rather than general cognitive ability (Sanderson et al. 2019). Our confidence in the underlying result is strengthened considerably when we see that similar results are obtained when the natural experiment of the raising of the school leaving age from 15 to 16 in 1972 in the United Kingdom (Davies et al. 2018b).
Breastfeeding
Another example of apparently successful triangulation has been used to establish the effect of breastfeeding on offspring general cognitive ability. In high-income countries, breastfeeding is socially patterned such that more affluent mothers are more likely to breastfeed their offspring. Unsurprisingly, therefore, breastfeeding is associated with higher general cognitive ability. However, in low- and middle-income countries, the social patterning is the opposite, and yet the association between breastfeeding and general cognitive ability remains positive, despite these very different underlying confounding structures (Brion et al. 2011). Our confidence in this result is further strengthened by the positive results of a trial intended to promote breastfeeding on offspring cognitive ability (Kramer et al. 2008) and evidence from siblings discordant for breastfeeding (Evenhouse and Reilly 2005). No single result is itself compelling, but their combination begins to be.
Cigarette Smoking
Smoking, particularly exposure in utero, provides many examples of effective triangulation (see Box 2). Brand and colleagues combined MR (using a single genetic variant associated with heaviness of smoking) and a negative control analysis comparing maternal and paternal smoking during pregnancy (Evenhouse and Reilly 2005). MR analyses showed an effect of smoking in pregnancy on fetal growth, while in the negative control analysis there was no evidence of an association between paternal smoking and fetal growth. Other genetically informed designs support this conclusion, including comparison of sibling discordant for maternal smoking in pregnancy (Kuja-Halkola et al. 2014), and studies of offspring of MZ twin mothers, only one of whom smoked during pregnancy (Magnus et al. 1985). Another, particularly novel, approach to triangulation by Thapar and colleagues (2009) compared offspring conceived via in vitro fertilization, who were either genetically related (fertilized eggs implanted in the social mother) or genetically unrelated (fertilized eggs implanted in a surrogate mother) to the woman who underwent the pregnancy. If smoking in pregnancy is associated with offspring outcomes, the association should be observed in both groups, regardless of whether mother and offspring are related. Smoking in pregnancy was associated with offspring birth weight in both groups. However, attention deficit hyperactivity disorder (ADHD) symptoms were higher only in related pairs, and not in unrelated pairs, suggesting inherited effects rather than a direct causal effect of smoking on this outcome (and confirming the sensitivity of the method). Thapar and colleagues have triangulated results from many methods to show that findings converge to support the conclusion that maternal smoking during pregnancy does not impact offspring ADHD (Thapar and Rice 2020).
Folate Supplementation
Observational evidence suggests an apparent dose–response effect of plasma homocysteine and folate intake—which reduces homocysteine—on CHD, leading some to suggest that “any widespread increase in folate intake will have a favorable impact on CHD rates” (Rimm et al. 1998). However, an initial MR study indicated an adverse effect, but with strong evidence of publication bias (Lewis et al. 2005), while evidence from RCTs was null (Martí-Carvajal et al. 2009). This literature highlights the distorting effects of publication bias against null results. When Clarke and colleagues reviewed and meta-analyzed the unpublished literature on MTHFR genotype (strongly associated with homocysteine levels) and CHD—prospectively identifying large studies that had not yet conducted the necessary MR analyses, or obtained the necessary genotype data to do so—they found that the results aligned with those of RCTs (see Fig. 3), suggesting no causal effect (Clarke et al. 2012). This highlights the importance, where possible, of triangulating evidence from published and unpublished sources, as well as across methodologies (in this case MR studies and RCTs) to support robust inference.
Combining Multiple Approaches
Increasingly, studies are combining results from a range of methodologies, even if only supplementing an approach that uses statistical adjustment to control confounding with a design-based approach, or using multiple data sources that might be subject to different sources of bias (e.g., via selection into the study). For example, Crawford and colleagues present evidence using three methodologies—two prospective nested case-control studies, a meta-analysis of prospective studies, and MR of individual level and summary data—to show that higher morning plasma cortisol levels increase the risk of cardiovascular disease (Crawford et al. 2019). Different genetic methodologies can also be combined. Arumäe and colleagues used MR and twin data to determine the best-fitting causal model for effects of body mass index (BMI) on personality and vice versa (Arumäe et al. 2020). Observational and experimental evidence can also be integrated. Thompson and colleagues used MR (including evidence a range of sensitivity analyses) to explore the effect of maternal circulating 25(OH)D (vitamin D) and calcium on offspring birth weight, and combined this with evidence from RCTs, treating randomization to supplementation with vitamin D or to calcium as instrumental variables. The authors concluded that maternal circulating 25(OH)D does not influence offspring birth weight, although the evidence for maternal circulating calcium was less clear (Thompson et al. 2019).
Considering Sources of Bias
The example of breastfeeding shows how we are required to think carefully about potential sources of bias when applying a triangulation framework to our inferences. Cross-contextual comparisons can be considered to be an encoding of the causal invariance assumption; that is, while many things may change across situations, causes remain the same. However, each proposition can be wrong in certain circumstances; for example, smoking is most likely invariant in its effect on lung cancer, but the effects of education on BMI may vary (having a positive causal effect in low- and middle-income countries, but a negative causal effect in high-income countries). Similarly, the effect of education on BMI can change over time within the same country. In the past, higher educational attainment was associated with higher BMI in the United Kingdom, whereas now the opposite is true. For upstream causes like education, causation will be context dependent but is no less “causal” because of this.
TRIANGULATION AND MAGNITUDE OF CAUSAL EFFECT
Synthesizing evidence from multiple studies can be challenging, and particularly when each uses a different design. In principle, however, evidence synthesis methods could be applied in the context of triangulation of evidence. Most triangulation examples to date, including those described above, use synthesis approaches that are essentially qualitative. Such qualitative approaches attempt to evaluate informally both the strength and quality of evidence (including the nature and magnitude of potential bias) across different studies. They focus on determining whether a causal effect is present or absent, although typically without quantifying the strength of evidence for this. Crude quantitative approaches may be used to supplement these qualitative syntheses, such as simple vote counting (i.e., how many studies support a conclusion vs. do not support it). However, we caution against vote counting based on statistical significance, since statistical significance might not be reached, even when a causal effect is present, if the sample size is insufficiently large. Slightly more sophisticated is the statistical combination of P-values from the distinct lines of evidence to obtain an overall measure of strength of evidence (Becker 1994). Such an approach has been used particularly in the triangulation of statistically independent inferences arising from a single data source, which have been termed evidence factors (Karmakar et al. 2020).
A more challenging possibility is to use different study design methods to estimate the magnitude of a causal effect, which we might term quantitative triangulation. While meta-analytic methods are well understood for combining results of multiple studies deemed sufficiently similar and sufficiently free of bias, neither similarity of research question nor freedom from bias might routinely be expected in a triangulation context. Thus, more sophisticated evidence synthesis methods are needed.
It will usually be the case that different approaches target (possibly subtly) different causal effects. For example, MR studies of homocysteine levels and CHD represent long-term effects of circulating levels, whereas RCTs examine short-term effects of folate supplementation. If such different causal effects are to be combined meaningfully, causal models will be needed to connect the quantities being estimated by the different studies involved in the triangulation, potentially leading to the use of multiparameter evidence synthesis (Ades and Sutton 2005).
Because the assumption that different study designs are subject to different biases is central to the triangulation approach, quantitative triangulation will require that these biases are embraced in the synthesis. At their simplest, studies with biases believed to operate in different directions might allow us to put boundaries on the likely magnitude of the causal effect. More ambitiously, Bayesian approaches in which likely biases are represented using prior distributions pose one way forward (Turner et al. 2009), as do methods that estimate or adjust for bias (Greenland 2005). For example, our understanding of how biases impact on the magnitude of causal effect estimates in genetically informed designs such as MR is improving through a combination of theoretical developments, empirical evidence, and simulation studies (Taylor et al. 2014).
LACK OF CONVERGENCE
We have largely focused on examples that might be considered “successful” triangulation, in other words when results converge and support a strong conclusion. However, this will of course not always be the case.
A lack of convergence can, in fact, be powerful; it provides an opportunity to review the evidence provided by different approaches, synthesize this evidence, and (in particular) attempt to identify the reasons for nonconvergence. This last step may, itself, identify potential explanations for nonconvergence that can be tested in further research (again, planned prospectively). For example, Liu and colleagues found evidence for bidirectional causal effects between BMI and ADHD using MR and within-family polygenic score methods, but not using twin difference methods. One possible explanation for these differences is that the methods capture exposures over different timescales, a possibility that could be explicitly tested in future research (Liu et al. 2020).
Similarly, a successful quantitative triangulation would conclude that the studies “fit together” after allowing for differences in research questions and for likely biases. Conclusions about the magnitudes of specific causal effects can then be extracted or derived from the analysis. However, inconsistent findings across different sources of evidence would suggest that the triangulation framework is inadequate, meaning that either some biases have not been entertained or some differences in research questions have not been allowed for. A statistical synthesis may quantify this conflict, although is unlikely to pinpoint the exact source of the problem. However, consideration of the likely biases in the studies included may suggest which other genetically informed designs may be informative to resolve the conflict, drawing on the considerations in Table 1.
FUTURE DIRECTIONS
Other genetically informed methods that can be implemented in a triangulation framework will continue to emerge. One extension to MR is to use this approach to directly examine reverse causality, for example, to determine whether biomarkers associated with disease outcomes result from development of the disease, rather than being on the causal pathway to disease (Holmes and Davey Smith 2019). Thus, the approach can be useful in separating purely predictive biomarkers from causal biomarkers. A further potential extension is to use MR to detect and potentially allow better statistical adjustment for confounding. For example, MR can establish that underlying factors such as educational attainment and BMI influence vitamin D levels (Revez et al. 2020). When this is being investigated as an exposure, this will mean that if the outcome under consideration is also influenced by educational attainment and BMI, confounding will exist; in this case, MR could appropriately estimate the magnitude of this confounding and relate it to the observational associations. This can contribute to understanding differences that exist between conventional observational studies and RCTs in particular settings, for example. Confounding induced by reverse causation can be particularly pervasive and of substantial magnitude, such as in the investigation of BMI or alcohol consumption and health outcomes, when early stages of the disease process can influence the exposure. Investigation of such reverse causation could use formal MR or—with greater power, and in an exploratory manner—could use polygenic scores to uncover the presence of such potential confounding. These approaches only scratch the surface of the ways in which molecular genetic data can be used to uncover and estimate the effects of confounding, and further developments are anticipated.
It is also important to distinguish triangulation, particularly as a prospective effort, from what might superficially appear similar—the use of different sources of evidence to support a conclusion (e.g., a causal interpretation). The danger with the more common and conventional approach, of course, is that it is prone to cherry-picking. To avoid this particular source of bias, we would ideally consider in advance which methodologies might give reliable evidence in relation to a particular question, and specify these in a protocol that could be preregistered in advance of data collection (or at least data analysis). In the case of questions where evidence already exists and can be synthesized, this again should be collected in a systematic way according to a prespecified protocol.
Prospective triangulation remains relatively rare in etiological epidemiology, although Clarke and colleagues provide an example of this within the context of folate supplementation (Clarke et al. 2012). This may in part be because, as long as researchers are incentivized to produce as many papers as possible, the incentive will be to publish each potential component of a triangulation study as an individual paper. This leads to the risk that initial results may be contradicted in subsequent papers using different approaches, leading to a fragmented literature, and allowing (conscious or unconscious) citation bias, whereby researchers selectively cite studies that support a particular position or conclusion (de Vries et al. 2018). If we are to improve the quality of our inference through the use of triangulation, we may need to consider how to incentivize these weightier papers that combine multiple methods and rely on larger teams of authors. This may also require us to place less emphasis on authorship, and more on the contributions of individual members of the wider study team (for example, by adopting the CRediT contributorship taxonomy; casrai .org/credit). More generally, other open research practices, such as sharing data and code according to FAIR principles (Wilkinson et al. 2016), will support the inclusion of published research findings in quantitative triangulation efforts.
CONCLUSIONS
The limits of naive observational studies are well known, and there are many examples of incorrect causal inference based on such evidence (Davey Smith et al. 2020). While sophisticated statistical methodologies exist to support stronger causal inference in observational studies, they (like all methodologies) rely on assumptions, which, in some cases, are untestable. Rather than pursue ever more sophisticated analyses, an alternative (and arguably simpler) approach is effectively to accept that sources of bias—both known and unknown—will always be present. The more diverse methodologies that can be applied, which are each subject to different and (ideally) uncorrelated sources of bias, the more confident we can be in our causal inference if the results of these align.
Methods to synthesize and quantify evidence from different study designs are emerging. However, we must be careful to only push quantification to the limit of when it is useful and not beyond, when it can obscure. This will require careful consideration of how to incorporate, in particular, incommensurable evidence. Demonstrating mechanisms in support of causal inference is potentially powerful, but we need to be wary of reverting to post hoc appeals to biological plausibility. In some ways, this reflects the fundamental question of whether the causal question is “Why?” or “How?”; the fact is that both need considering and somehow synthesizing.
It is also worth noting that one major source of bias rests within individual scientists: our enthusiasm to discover novel findings, to publish these results in widely read journals, our unwillingness to admit defeat, and let go of our fondly held theories or beliefs. For this reason, triangulation will only be truly effective if it is a prospective effort, rather than the post hoc collection of evidence chosen (consciously or unconsciously) to support the findings of an initial study. The examples we provide above do not, unfortunately, meet this rather higher bar. Going forward, we should seek to prespecify (and ideally preregister on a third-party platform such as the Open Science Framework [osf.io]) our triangulation framework, noting the sources of evidence and methodologies that we will bring to bear to answer a causal question.
ACKNOWLEDGMENTS
M.R.M. and G.D.S. are programme leads within the MRC Integrative Epidemiology Unit at the University of Bristol (MC_UU_00011/1, MC_UU_00011/7). Both have published on the broad topic of research reproducibility, including triangulation, and may benefit in some small way from citations accumulated by their published work. We are grateful to Jeremy Labrecque, Jean-Baptiste Pingault, and Rebecca Richmond for their helpful comments.
Footnotes
Editors: George Davey Smith, Rebecca Richmond, and Jean-Baptiste Pingault
Additional Perspectives on Combining Human Genetics and Causal Inference to Understand Human Disease and Development available at www.perspectivesinmedicine.org
REFERENCES
*Reference is also in this collection.
- Ades AE, Sutton AJ. 2005. Multiparameter evidence synthesis in epidemiology and medical decision-making: current approaches. J R Stat Soc Ser A Stat Soc 169: 5–35. 10.1111/j.1467-985X.2005.00377.x [DOI] [Google Scholar]
- Angrist JD, Pischke JS. 2009. Mostly harmless econometrics: an empiricist's companion. Princeton University Press, Princeton, NJ. [Google Scholar]
- Arumäe K, Briley D, Colodro-Conde L, Mortensen E, Jang K, Ando J, Kandler C, Sørensen TIA, Dagher A, Mõttus R, et al. 2020. Two genetic analyses to elucidate causality in the associations between body mass index and psychological traits. NutriXiv 10.31232/osf.io/q8ehr [DOI] [PubMed] [Google Scholar]
- Becker BJ. 1994. Combining significance levels. In The handbook of research synthesis (ed. Cooper H, Hedges LV), pp. 215–230. Russell Sage, New York. [Google Scholar]
- Bernoulli J. 2006. The art of conjecturing together with letter to a friend on sets in court tennis. Johns Hopkins University Press, Baltimore. [Google Scholar]
- Brion MJA, Lawlor DA, Matijasevich A, Horta B, Anselmi L, Araújo CL, Menezes AMB, Victora CG, Davey Smith G. 2011. What are the causal effects of breastfeeding on IQ, obesity and blood pressure? Evidence from comparing high-income with middle-income cohorts. Int J Epidemiol 40: 670–680. 10.1093/ije/dyr020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clarke R, Bennett DA, Parish S, Verhoef P, Dötsch-Klerk M, Lathrop M, Xu P, Nordestgaaard BG, Holm H, Hopewell JC, et al. 2012. Homocysteine and coronary heart disease: meta-analysis of MTHFR case-control studies, avoiding publication bias. PLoS Med 9: e1001177. 10.1371/journal.pmed.1001177 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crawford AA, Soderberg S, Kirschbaum C, Murphy L, Eliasson M, Ebrahim S, Davey Smith G, Olsson T, Sattar N, Lawlor DA, et al. 2019. Morning plasma cortisol as a cardiovascular risk factor: findings from prospective cohort and Mendelian randomization studies. Eur J Endocrinol 181: 429–438. 10.1530/EJE-19-0161 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davey Smith G. 2008. Assessing intrauterine influences on offspring health outcomes: can epidemiological studies yield robust findings? Basic Clin Pharmacol Toxicol 102: 245–256. 10.1111/j.1742-7843.2007.00191.x [DOI] [PubMed] [Google Scholar]
- Davey Smith G. 2012. The Wright stuff: genes in the interrogation of correlation and causation. Eur J Pers 26: 391–413. [Google Scholar]
- Davey Smith G, Ebrahim S. 2003. Mendelian randomization: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 32: 1–22. 10.1093/ije/dyg070 [DOI] [PubMed] [Google Scholar]
- Davey Smith G, Phillips AN. 2020. Correlation without a cause: an epidemiological odyssey. Int J Epidemiol 49: 4–14. 10.1093/ije/dyaa016 [DOI] [PubMed] [Google Scholar]
- Davey Smith G, Phillips AN, Neaton JD. 1992. Smoking as “independent” risk factor for suicide: illustration of an artifact from observational epidemiology? Lancet 340: 709–712. 10.1016/0140-6736(92)92242-8 [DOI] [PubMed] [Google Scholar]
- Davey Smith G, Holmes MV, Davies NM, Ebrahim S. 2020. Mendel's laws, Mendelian randomization and causal inference in observational data: substantive and nomenclatural issues. Eur J Epidemiol 35: 99–111. 10.1007/s10654-020-00622-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davies NM, Holmes MV, Davey Smith G. 2018a. Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians. BMJ 362: k601. 10.1136/bmj.k601 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davies NM, Dickson M, Davey Smith G, van den Berg GJ, Windmeijer F. 2018b. The causal effects of education on health outcomes in the UK Biobank. Nat Hum Behav 2: 117–125. 10.1038/s41562-017-0279-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Vries YA, Roest AM, de Jonge P, Cuijpers P, Munafò MR, Bastiaansen JA. 2018. The cumulative effect of reporting and citation biases on the apparent efficacy of treatments: the case of depression. Psychol Med 48: 2453–2455. 10.1017/S0033291718001873 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Egger M, Schneider M, Davey Smith G. 1998. Spurious precision? Meta-analysis of observational studies. BMJ 316: 140–144. 10.1136/bmj.316.7125.140 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eidelman RS, Hollar D, Hebert PR, Lamas GA, Hennekens CH. 2004. Randomized trials of vitamin E in the treatment and prevention of cardiovascular disease. Arch Intern Med 164: 1552–1556. 10.1001/archinte.164.14.1552 [DOI] [PubMed] [Google Scholar]
- Emini Veseli B, Perrotta P, De Meyer GRA, Roth L, Van der Donckt C, Martinet W, De Meyer GRY. 2017. Animal models of atherosclerosis. Eur J Pharmacol 816: 3–13. 10.1016/j.ejphar.2017.05.010 [DOI] [PubMed] [Google Scholar]
- Evans WN, Ringel JS. 1999. Can higher cigarette taxes improve birth outcomes? J Public Econ 72: 135–154. 10.1016/S0047-2727(98)00090-5 [DOI] [Google Scholar]
- Evenhouse E, Reilly S. 2005. Improved estimates of the benefits of breastfeeding using sibling comparisons to reduce selection bias. Health Serv Res 40: 1781–1802. 10.1111/j.1475-6773.2005.00453.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gage SH, Munafò MR, Davey Smith G. 2016. Causal inference in developmental origins of health and disease (DOHaD) research. Annu Rev Psychol 67: 567–585. 10.1146/annurev-psych-122414-033352 [DOI] [PubMed] [Google Scholar]
- Gage SH, Bowden J, Davey Smith G, Munafò MR. 2018. Investigating causality in associations between education and smoking: a two-sample Mendelian randomization study. Int J Epidemiol 47: 1131–1140. 10.1093/ije/dyy131 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gigerenzer G, Swijtink Z, Porter T, Daston L, Beatty J, Krüger L (eds.). 1989. The empire of chance: how probability changed science and everyday life. Cambridge University Press, Cambridge. [Google Scholar]
- Greenland S. 2005. Multiple-bias modelling for analysis of observational data. J R Stat Soc Ser A Stat Soc 168: 267–306. 10.1111/j.1467-985X.2004.00349.x [DOI] [Google Scholar]
- Hamilton BH. 2001. Estimating treatment effects in randomized clinical trials with non-compliance: the impact of maternal smoking on birthweight. Health Econ 10: 399–410. 10.1002/hec.629 [DOI] [PubMed] [Google Scholar]
- Hekselman I, Yeger-Lotem E. 2020. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat Rev Genet 21: 137–150. 10.1038/s41576-019-0200-9 [DOI] [PubMed] [Google Scholar]
- Hemani G, Bowden J, Davey Smith G. 2018. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Hum Mol Genet 27: R195–R208. 10.1093/hmg/ddy163 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hernán MA, Robins JM. 2016. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol 183: 758–764. 10.1093/aje/kwv254 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hey SP. 2015. Robust and discordant evidence: methodological lessons from clinical research. Phil Sci 81: 55–75. [Google Scholar]
- Holmes MV, Davey Smith G. 2019. Can Mendelian randomization shift into reverse gear? Clin Chem 65: 363–366. 10.1373/clinchem.2018.296806 [DOI] [PubMed] [Google Scholar]
- *.Hwang LD, Davies NM, Warrington NM, Evans DM. 2020. Integrating family-based and Mendelian randomization designs. Cold Spring Harb Perspect Med 10.1101/cshperspect.a039503 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karmakar B, Small DS, Rosenbaum PR. 2020. Using evidence factors to clarify exposure biomarkers. Am J Epidemiol 189: 243–249. 10.1093/aje/kwz263 [DOI] [PubMed] [Google Scholar]
- Keyes KM, Davey Smith G, Susser E. 2013. On sibling designs. Epidemiology 24: 473–474. 10.1097/EDE.0b013e31828c7381 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kramer MS, Aboud F, Mironova E, Vanilovich I, Platt RW, Matush L, Igumnov S, Fombonne E, Bogdanovich N, Ducruet T, et al. 2008. Breastfeeding and child cognitive development: new evidence from a large randomized trial. Arch Gen Psychiatry 65: 578–584. 10.1001/archpsyc.65.5.578 [DOI] [PubMed] [Google Scholar]
- Krieger N, Davey Smith G. 2016. The tale wagged by the DAG: broadening the scope of causal inference and explanation for epidemiology. Int J Epidemiol 45: 1787–1808. [DOI] [PubMed] [Google Scholar]
- Kuja-Halkola R, D'Onofrio BM, Larsson H, Lichtenstein P. 2014. Maternal smoking during pregnancy and adverse outcomes in offspring: genetic and environmental sources of covariance. Behav Genet 44: 456–467. 10.1007/s10519-014-9668-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawlor DA, Tilling K, Davey Smith G. 2016. Triangulation in aetiological epidemiology. Int J Epidemiol 45: 1866–1886. 10.1093/ije/dyw314 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewis CM, Vassos E. 2017. Prospects for using risk scores in polygenic medicine. Genome Med 9: 96. 10.1186/s13073-017-0489-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewis SJ, Ebrahim S, Davey Smith G. 2005. Meta-analysis of MTHFR 677C→T polymorphism and coronary heart disease: does totality of evidence support causal role for homocysteine and preventive potential of folate? BMJ 331: 1053. 10.1136/bmj.38611.658947.55 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lipsitch M, Tchetgen ET, Cohen T. 2010. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology 21: 383–388. 10.1097/EDE.0b013e3181d61eeb [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lipton P. 2004. Inference to the best explanation, 2nd ed. Routledge, London. [Google Scholar]
- Liu C, Schoeler T, Davies NM, Peyre H, Lim KX, Barker E, Llewellyn C, Dudbridge F, Pingault JB. 2020. Are there causal relationships between ADHD and BMI? Evidence from multiple genetically informed designs. medRxiv 10.1101/2020/04.16.20067918 [DOI] [PubMed] [Google Scholar]
- Lowe CR. 1959. Effect of mothers’ smoking habits on birth weight of their children. Br Med J 2: 673–676. 10.1136/bmj.2.5153.673 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Magnus P, Berg K, Bjerkedal T, Nance WE. 1985. The heritability of smoking behaviour in pregnancy, and the birth weights of offspring of smoking-discordant twins. Scand J Soc Med 13: 29–34. 10.1177/140349488501300104 [DOI] [PubMed] [Google Scholar]
- Martí-Carvajal AJ, Solà I, Lathyris D, Salanti G. 2009. Homocysteine lowering interventions for preventing cardiovascular events. Cochrane Database Syst Rev 2009: CD006612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matthay EC, Glymour MM. 2020. A graphical catalog of threats to validity: linking social science with epidemiology. Epidemiology 31: 376–384. 10.1097/EDE.0000000000001161 [DOI] [PMC free article] [PubMed] [Google Scholar]
- *.McAdams TA, Rijsdijk FV, Zavos HM, Pingault JB. 2020. Twins and causal inference: leveraging nature's experiment. Cold Spring Harb Perspect Med 10.1101/cshperspect.a039552 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morgan MS. 2013. Nature's experiments and natural experiments in the social sciences. Phil Soc Sci 43: 341–357. 10.1177/0048393113489100 [DOI] [Google Scholar]
- Munafò MR, Davey Smith G. 2018. Robust research needs many lines of evidence. Nature 553: 399–401. 10.1038/d41586-018-01023-3 [DOI] [PubMed] [Google Scholar]
- Munafò MR, Nosek BA, Bishop DVN, Button KS, Chambers DC, Percie du Sert N, Simonsohn U, Wagenmakers EJ, Ware JJ, Ioannidis JPA. 2017. A manifesto for reproducible science. Nat Hum Behav 1: 0021. 10.1038/s41562-016-0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nickerson RS. 1998. Confirmation bias: a ubiquitous phenomenon in many guises. Rev Gen Psychol 2: 175–220. 10.1037/1089-2680.2.2.175 [DOI] [Google Scholar]
- Pearl J. 2009. Causality: models, reasoning and inference, 2nd ed. Cambridge University Press, New York. [Google Scholar]
- Phillips AN, Davey Smith G. 1991. How independent are “independent” effects? Relative risk estimation when correlated exposures are measured imprecisely. J Clin Epidemiol 44: 1223–1231. 10.1016/0895-4356(91)90155-3 [DOI] [PubMed] [Google Scholar]
- Pingault JB, O'Reilly PF, Schoeler T, Ploubidis GB, Rijsdijk F, Dudbridge F. 2018. Using genetic data to strengthen causal inference in observational research. Nat Rev Genet 19: 566–580. 10.1038/s41576-018-0020-3 [DOI] [PubMed] [Google Scholar]
- *.Porcu E, Sjaarda J, Lepik K, Carmeli C, Darrous L, Sulc J, Mounier N, Kutalik Z. 2020. Causal inference methods to integrate omics and complex traits. Cold Spring Harb Perspect Med 10.1101/cshperspect.a040493 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Revez JA, Lin T, Qiao Z, Xue A, Holtz Y, Zhu Z, Zeng J, Wang H, Sidorenko J, Kemper KE, et al. 2020. Genome-wide association study identifies 143 loci associated with 25 hydroxyvitamin D concentration. Nat Commun 11: 1647. 10.1038/s41467-020-15421-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- *.Richmond RC, Davey Smith G. 2020. Mendelian randomization: concepts and scope. Cold Spring Harb Perspect Med 10.1101/cshperspect.a040501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richmond RC, Al-Amin A, Davey Smith G, Relton CL. 2014. Approaches for drawing causal inferences from epidemiological birth cohorts: a review. Early Hum Dev 90: 769–780. 10.1016/j.earlhumdev.2014.08.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rimm EB, Stampfer MJ, Ascherio A, Giovannucci E, Colditz GA, Willett WC. 1993. Vitamin E consumption and the risk of coronary heart disease in men. N Engl J Med 328: 1450–1456. 10.1056/NEJM199305203282004 [DOI] [PubMed] [Google Scholar]
- Rimm EB, Willett WC, Hu FB, Sampson L, Colditz GA, Manson JE, Hennekens C, Stampfer MJ. 1998. Folate and vitamin B6 from diet and supplements in relation to risk of coronary heart disease among women. JAMA 279: 359–364. 10.1001/jama.279.5.359 [DOI] [PubMed] [Google Scholar]
- Robins JM. 1999. Comment on Rosenbaum 1999. Statistical Sci 14: 281–293. [Google Scholar]
- Rosenbaum PR. 1999. Choice as an alternative to control in observational studies. Statistical Sci 14: 259–304. [Google Scholar]
- Rubin DB. 2008. For objective causal inference, design trumps analysis. Ann Appl Stat 2: 808–840. 10.1214/08-AOAS187 [DOI] [Google Scholar]
- *.Sanderson E. 2020. Multivariable Mendelian randomization and mediation. Cold Spring Harb Perspect Med 10.1101/cshperspect.a038984 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanderson E, Davey Smith G, Bowden J, Munafò MR. 2019. Mendelian randomisation analysis of the effect of educational attainment and cognitive ability on smoking behaviour. Nat Commun 10: 2949. 10.1038/s41467-019-10679-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanderson E, Richardson TG, Hemani G, Davey Smith G. 2020. The use of negative control outcomes in Mendelian randomisation to detect potential population stratification or selection bias. bioRxiv 10.1101/2020.06.01.128264 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shadish WR, Cook TD, Campbell DT. 2001. Experimental and quasi-experimental designs for generalized causal inference, 2nd ed. Cengage, Farmington Hills, MI. [Google Scholar]
- Shmueli G. 2010. To explain or to predict? Stat Sci 25: 289–310. 10.1214/10-STS330 [DOI] [Google Scholar]
- Stampfer MJ, Hennekens CH, Manson JE, Colditz GA, Rosner B, Willett WC. 1993. Vitamin E consumption and the risk of coronary disease in women. N Engl J Med 328: 1444–1449. 10.1056/NEJM199305203282003 [DOI] [PubMed] [Google Scholar]
- Steiner PM, Kim Y, Hall CE, Su D. 2017. Graphical models for quasi-experimental designs. Sociol Methods Res 46: 155–188. 10.1177/0049124115582272 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stuart EA. 2010. Matching methods for causal inference: a review and a look forward. Stat Sci 25: 1–21. 10.1214/09-STS313 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taylor AE, Davies NM, Ware JJ, VanderWeele T, Davey Smith G, Munafò MR. 2014. Mendelian randomization in health research: using appropriate genetic variants and avoiding biased estimates. Econ Hum Biol 13: 99–106. 10.1016/j.ehb.2013.12.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taylor K, Davey Smith G, Relton CL, Gaunt TR, Richardson TG. 2019. Prioritizing putative influential genes in cardiovascular disease susceptibility by applying tissue-specific Mendelian randomization. Genome Med 11: 6. 10.1186/s13073-019-0613-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tennant PWG, Harrison WJ, Murray EJ, Arnold KF, Berrie L, Fox MP, Gadd SC, Keeble C, Ranker LR, Textor J, et al. 2019. Use of directed acyclic graphs (DAGs) in applied health research: review and recommendations. medRxiv 10.1101/2019.12.20.19015511 [DOI] [PMC free article] [PubMed] [Google Scholar]
- *.Thapar A, Rice F. 2020. Family-based designs that disentangle inherited factors from pre- and postnatal environmental exposures: in vitro fertilization, discordant sibling pairs, maternal versus paternal comparisons, and adoption designs. Cold Spring Harb Perspect Med 10.1102/cshperspect.a038877 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thapar A, Rice F, Hay D, Boivin J, Langley K, van den Bree M, Rutter M, Harold G. 2009. Prenatal smoking might not cause attention-deficit/hyperactivity disorder: evidence from a novel design. Biol Psychiatry 66: 722–727. 10.1016/j.biopsych.2009.05.032 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thompson WD, Tyrrell J, Borges MC, Beaumont RN, Knight BA, Wood AR, Ring SM, Hattersley AT, Freathy RM, Lawlor DA. 2019. Association of maternal circulating 25(OH)D and calcium with birth weight: a Mendelian randomisation analysis. PLoS Med 16: e1002828. 10.1371/journal.pmed.1002828 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Turner RM, Spiegelhalter DJ, Smith GCS, Thompson SG. 2009. Bias modelling in evidence synthesis. J R Stat Soc Ser A Stat Soc 172: 21–47. 10.1111/j.1467-985X.2008.00547.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tyrrell J, Huikari V, Christie JT, Cavadino A, Bakker R, Brion MJA, Geller F, Paternoster L, Myre R, Potter C, et al. 2012. Genetic variation in the 15q25 nicotinic acetylcholine receptor gene cluster (CHRNA5-CHRNA3-CHRNB4) interacts with maternal self-reported smoking status during pregnancy to influence birth weight. Hum Mol Genet 21: 5344–5358. 10.1093/hmg/dds372 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weisberg HI. 2014. Wilful ignorance: the mismeasure of uncertainty. Wiley, Hoboken, NJ. [Google Scholar]
- Whewell W. 1840. The philosophy of the inductive sciences, founded upon their history. John W. Parker, London. [Google Scholar]
- Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, Bonino da Silva Santos L, Bourne PE, et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3: 160018. 10.1038/sdata.2016.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yoon FB, Huskamp HA, Busch AB, Normand SLT. 2011. Using multiple control groups and matching to address unobserved biases in comparative effectiveness research: an observational study of the effectiveness of mental health parity. Stat Biosci 3: 63–78. 10.1007/s12561-011-9035-4 [DOI] [PMC free article] [PubMed] [Google Scholar]