Skip to main content
PLOS One logoLink to PLOS One
. 2018 Oct 11;13(10):e0204755. doi: 10.1371/journal.pone.0204755

Inferring causal pathways among three or more variables from steady-state correlations in a homeostatic system

Suraj Chawla 1, Anagha Pund 1, Vibishan B 1, Shubhankar Kulkarni 1, Manawa Diwekar-Joshi 1, Milind Watve 1,*
Editor: Massimiliano Ruscica2
PMCID: PMC6181337  PMID: 30307959

Abstract

Cross-sectional correlations between two variables have limited implications for causality. We examine here whether it is possible to make causal inferences from steady-state data in a homeostatic system with three or more inter-correlated variables. Every putative pathway between three variables makes a set of differential predictions that can be tested with steady state data. For example, among 3 variables, A, B and C, the coefficient of determination, rAC2 is predicted by the product of rAB2 and rBC2 for some pathways, but not for others. Residuals from a regression line are independent of residuals from another regression for some pathways, but positively or negatively correlated for certain other pathways. Different pathways therefore have different prediction signatures, which can be used to accept or reject plausible pathways using appropriate null hypotheses. The type 2 error reduces with sample size but the nature of this relationship is different for different predictions. We apply these principles to test the classical pathway leading to a hyperinsulinemic normoglycemic insulin-resistant, or pre-diabetic, state using four different sets of epidemiological data. Currently, a set of indices called HOMA-IR and HOMA-β are used to represent insulin resistance and glucose-stimulated insulin response by β cells respectively. Our analysis shows that if we assume the HOMA indices to be faithful indicators, the classical pathway must in turn be rejected. In effect, among the populations sampled, the classical pathway and faithfulness of the HOMA indices cannot be simultaneously true. The principles and example shows that it is possible to infer causal pathways from cross sectional correlational data on three or more correlated variables.

1. Introduction

1.1. Overview

In the field of biomedicine, the nature of causality, and the use of correlations as an evidence for causality are much debated [14]. There have been many attempts to develop sound methods to address questions of causal inference from correlational data which include Hill criteria [5], path analysis [69] the use of instrumental variables [10], Granger causality [11], Rubin causal model [12], or additive noise models [13]. Hill criteria are a set of common sense criteria useful in avoiding misguided inferences. Path analysis generally assumes a direction of causality, usually directed acyclic paths, and while it is useful in determining the contributions of different causal pathways to a process or a resultant variable, its application to pathways with loops and cycles is difficult. Methods like Granger causality depend upon the assumption that the cause always precedes effect and that the variables show some degree of chaos or turbulence, so that there are notable events like sudden peaks in the variables that can be tracked using longitudinal data. In evolved systems in which predictive adaptive responses are possible, the assumption that cause always precedes effect is questionable. Another class of methods like Propensity Score matching based on the Rubin causal model works well to estimate the effect of a causal factor but does not take into account unobserved factors. The Rubin Causal Model also incorporates the structural equations model as it includes non-parametric forms [14,15]. Models like additive noise can suggest the direction of the arrow of causality between two variables, but they require the assumption that either A causes B, or B causes A, without any confounding, looping or circularity [16,17].

More specifically, here we look at homeostatic systems which are extremely common in fields such as physiology. Homeostatic systems pose a unique problem for causal inferences. Causal inference can be based on time-series analysis with longitudinal data[18]. Longitudinal data are of little use however, if the time taken to reach equilibrium is smaller than the observational window, or if the system is already in a steady-state. Most homeostatic systems have negative feedback or some loop structures, because of which methods assuming acyclic causal paths or freedom from confounding are not applicable.

Although the use of correlations to infer causality is doubted, intervention experiments are generally taken as a convincing evidence of causality. However, causality in steady-state can be substantially different than causality in a perturbed-state and inferences from a perturbation experiment may not be applicable to steady-state causality. This necessitates a set of tools to infer causality in a steady state which is independent of perturbing interventions. We argue in this paper that it is possible to infer causal relationships among three or more variables from cross-sectional data in a homeostatic system in which the variables and their relationships are stable in time.

1.2. Motivation

Our motivation and the need for this tool came from some debated causal pathways in the pathophysiology of type 2 diabetes (T2D). According to the classical view, obesity-induced insulin resistance is primary, and rise in insulin levels is a compensatory response to insulin resistance, mediated by raised levels of glucose [19,20]. This is contested [21], with increasing evidence suggesting that hyperinsulinemia precedes insulin resistance [2226]. There is also evidence of neuronal signals affecting insulin production on the one hand, and controlling glucose production by the liver partly independent of insulin on the other. Therefore, the causal relationship between insulin resistance, hyperinsulinemia and hyperglycemia needs to be re-examined (reviewed by [21]).

Elucidation of the causal pathway for a pre-diabetic or diabetic state is critical at the clinical level because the current approaches to medication are designed assuming one particular pathway, and have largely failed to cure diabetes. If it is possible to determine causality reliably, it can potentially change diabetes medicine. It has long been recognized that levels of glucose and insulin are under homeostatic control, and that fasting is a steady state [2730]. With substantial data available on fasting levels of glucose and insulin from different populations, along with many other variables, a tool for inferring causality from a set of inter-correlated steady-state variables would help understand, and thereby better control, T2D.

Beyond the specific problem of causality in pre-diabetes, a set of methods that can help infer causality from steady-state data will find a large number of applications, not only in physiology and disease, but in many other areas of science. Although our investigations began with the pathophysiology of diabetes, the emerging principles are generalizable and valuable for inferential statistics in general.

1.3. Organization of the paper

Since the question addressed here and the approach suggested is complex we have organized the paper in the following sections. Section 2 describes the underlying principle and basic assumptions and spells out the nomenclature used in the rest of the paper. Section 3 details the development of the theoretical framework that is the central element of this work. We discuss first, possible alternative causal pathways between three inter-correlated variables, followed by the distinction between causal and regression equations that will underlie the subsequent analysis. In the interest of brevity, in the main text we restrict ourselves to reproducing only the final analytical prediction for each pathway, classified into acyclic and cyclic pathways, adding a few lines where pathway-specific predictions necessitate discussion. Complete proofs and derivations are instead relegated to the supplementary text. Section 4 discusses the testing of pathway predictions against suitable null hypotheses, and Section 5 describes findings regarding the sensitivity and robustness of the analytical predictions derived so far. Section 6 provides a general description of potential applications of the above theoretical framework. Section 7 encompasses a comprehensive demonstration of application of the framework to a real-life question in diabetes research. We first discuss in some detail, the current view on causal factors in diabetic and pre-diabetic states, before moving on to the dual feedback model that forms the main hypothesis to be tested. We analyze the system according to the methods detailed in previous sections, and present several novel observations along with their significance to diabetes research.

2. Methods

2.1. Overview

We show here that when three inter-correlated variables are considered together with two or more causal arrows connecting them to make a causal pathway, each of the possible pathways makes a set of differential predictions by which the pathways can be differentiated from each other. Our approach to develop a method of inferring causality from cross sectional regression correlation parameters comprises following steps:

  1. Development of a theoretical framework-
    1. We first list the perceived possible hypothetical causal pathways among three variables.
    2. For each pathway, we write a set of causal equations arising out of the hypothetical pathways. Steady-state solutions of these equations lead to a set of four general, and a few pathway-specific predictions. Each pathway therefore has a unique combination of such predictions or a prediction signature by which it can be differentiated from other pathways.
  2. Testing the pathways by simulations-
    1. We test, using simulated data generated from assumed causal pathways, the conditions under which the predictions can be used to accept or reject a pathway reliably.
    2. Based on these results, we suggest ways of handling multivariate data and infer causal networks among them.
  3. Demonstration of a specific example-
    1. We apply this logic to the specific case of pre-diabetes to examine the pathway classically thought to give rise to this condition.

2.2. Baseline assumptions and nomenclature

We consider three variables labelled A, B and C. Additional variables if needed to describe a pathway will be labelled X, Y and so on. All causal relationships represented by a single arrow are assumed to be linear, and all primary input variables are assumed to be normally distributed. In a given operation, the slopes of causal pathways are assumed to be constant; the errors in causal pathways are assumed to be distributed normally, with means of zero and a constant standard deviations, and no covariance with each other. We assume that the errors are caused by variation in individual responses, and that a given individual’s response is consistent in time sufficiently long to reach a steady state. Thus, the errors are randomized over the population, but for a given individual, they are constant in time. We assume no measurement error in the baseline models. Since all our predictions are related to correlation coefficients and regression slopes, we will ignore the intercepts for the sake of simplicity in deriving many of the predictions.

3. Development of a theoretical framework

3.1. The possible pathways

A variety of cyclic and acyclic pathways can exist in three variables. Fig 1 shows the simple primary pathways that are possible, and more can certainly be constructed by combinations of the primary pathways. It is also possible to consider permutations of the three variables; for example, the linear pathway among three variables can itself be written in six different ways. Here, we restrict to the primary pathways assuming a fixed sequence of the three variables denoted by A, B and C. The principles that we derive from this set of primary pathways can be extended to more complex pathways.

Fig 1. Possible primary causal pathways between three variables.

Fig 1

More complex pathways can be visualized by combinations of the primary ones.

3.2. Causal equations versus regression equations

Based on hypothesized pathways, we can write specific causal equations for each. The causal equations are derived from the hypothesized pathway, while the regression equations can be obtained from the given cross-sectional data using regression and correlation analysis. Our causal equations are similar to the structural equations of [15]. However, they differ in their interpretation and treatment. In structural equations, the left hand terms are effects and right hand terms are causes, and the two cannot be algebraically transferred without changing causal interpretations. In our approach, after finding equilibrium solutions, we can carry out algebraic operations freely in order to obtain testable predictions. The parameters of the regression equation are not necessarily identical to those of the causal equations (Table 1). For example, for a hypothesized pathway Y = mX + C, m is the causal slope, while the regression slope would be underestimated if there is post-effect variability in X [31], and such a bias in the slope is important in making and testing predictions. In the following section, we show that the parameters of causal equations hold pathway-specific relationships with the parameters of regression equations based on which, pathway-specific predictions about the regression correlation parameters can be made.

Table 1. List of abbreviations used.

Abbreviation Term Remarks
A, B, C Test variables We have access from data
Á, B´, Ć Mean of A, B and C respectively We calculate from data
X, Y Unknown variables Which affect test variables. We do not have access to these from data.
Mij Slopes of regression of i on j e.g. Mba is calculated as Mba=cov(B,A)var(A)
Kij Intercept of regression of i on j e.g. Kba is calculated as Kba=B´MbaA
Eij Residuals of regression of i on j e.g. Eba is calculated as
Eba = BMbaAKba
rij Correlation coefficient between i and j e.g. rij is calculated as
rij=ei.ejei2ej2
m1,m2,m3 Slopes in causal equations including the degradation rates We do not have access to these in data
e1,e2,e3 Error distribution in causal equations, assumed to be normal with mean zero and standard deviations sde1 etc. These are post-effect errors of the causal relationships which may get incorporated in pre-effect errors of a subsequent effect
ea,eb,ec Net variability in A, B and C e.g. ea is calculated as ea = AÁ
k1,k2 Intercepts in causal equations
d1,d2 Degradation /destruction rate constants in causal equations Especially necessary in cyclic pathways

Parameters of causal equations are denoted by small letters and those of regression equations by capital letters.

For ensuring steady-state, we assume that a given variable has a rate of formation/increase and a rate of degradation/decrease. If the rate of degradation is positively dependent, or the rate of formation negatively dependent on the standing level, then the variable invariably reaches a steady state determined by the set of input parameters. Such steady states are characteristic of homeostatic systems, and this principle is central to our methods.

For example, in a linear pathway A → B → C, dBdt=m1Ad1B and dCdt=m2Bd2C. At a steady-state the net change in any variable is zero. Therefore, the steady-state levels of B and C will be B=m1d1A=m1A and C=m2d2B=m2B respectively.

In simple cases, we need not explicitly include the rates of degradation in the equations but directly use parameters, m1 and m2. For pathways involving loops and feedback, the relationships between variables are more complex, and therefore necessitate the use of degradation constants in the causal equations for ensuring steady-states.

Causal and regression parameters can be analytically related, and these relationships allow the formation of regression-based predictions for each pathway based on the known relationships between the corresponding causal parameters. Here, we put the two together and describe four general predictions across all pathways and formulate a null hypothesis for each. In addition, there are certain pathway specific predictions that will be discussed along with the description of the corresponding pathway. The four general predictions are:

  1. Whether rAC2 can be estimated from the product of rAB2 and rBC2.

  2. Whether slope Mca can be estimated from the product of the slopes Mba and Mcb.

  3. Whether the residuals of the regression of B on A (Eba) are correlated with those of C on B (Ecb): The errors or residuals in a regression are assumed to be random independent errors. However, we will show below that if there are loops, convergent or confounding elements in a pathway, Eba and Ecb do not remain independent. Based on the nature of dependence between Eba and Ecb, presence of, and possible nature of the loops and convergence can be inferred.

  4. a. Whether correction for A improves or reduces the correlation of B with C, i.e. whether rEbaC2 is greater or lesser than rBC2.

    b. Whether the extent to which rEbaC2 is greater or lesser than rBC2 can be predicted by rAB2.

3.3. Pathway-wise analytical predictions

We will now state how each of the pathways makes specific predictions. For detailed formal proofs and derivations refer to S1 Text. Each putative pathway makes specific predictions about the above and accordingly the causal relationships can be accepted or rejected. For example, whether obesity leads to altered lipid profiles that ultimately lead hyperinsulinemia is one possible pathway. In an alternative view hyperinsulinemia leads to obesity.

Pathways P1 to P5 represent acyclic pathways and P6 and P7 represents pathways with loops. In pathways with loops, since there is a cyclic dependence between the variables we begin with differential equations with variable-specific constant rates of disintegration that assure steady states. This set of equations is then used to derive equilibrium solutions.

3.3.1. Linear pathway (P1)

The causal equations for a linear pathway are:

A=input=A´+ea
B=m1A+e1+k1
C=m2B+e2+k2

Where ea,e1,e2 are not correlated.

Regression parameters can be derived from the causal equations as follows. Since in regression of B on A, the slope = cov (A, B)/var A,

Mba=eaebea2=ea(m1ea+e1)ea2=m1ea2ea2=m1
Mcb=ecebeb2=(m2eb+e2)ebeb2=m2eb2eb2=m2
Mca=eceaea2=(m2m1ea+m2e1+e2)eaea2=m2m1ea2ea2=m2m1
Eba=ebMbaea=m1ea+e1m1ea=e1
Ecb=ecMcbeb=m2eb+e2m2eb=e2
Eca=ecMcaea=m2eb+e2m2m1ea=m2e1+e2

For linear equations, there is little difference between the causal equations and regression equations (Table 2). The regression equations therefore become

B=MbaA+Eba+Kba=m1A+e1+k1
C=McaA+Eca+Kca=m2B+e2+k2
C=McaA+Eca+Kca=m1m2A+(m2e1+e2)+(m2k1+k2)
Table 2. Relationship between the causal and regression equations for linear pathway.
Slopes Errors
Mba = m1 Eba = e1
Mcb = m2 Ecb = e2
Mca = m1m2 Eca = m2e1 + e2

Prediction R1: Based on the equations above and Table 2, it can be shown that rACrABrBC = 0 (See S1 Text ‘Linear pathway: Prediction R1: Proof 1’ for formal proof).

Prediction R2: From Table 2, it is obvious that the slope Mca can be predicted from the product McbMba; McaMcbMba = m1m2m2m1 = 0.

Prediction R3: From Table 2, as there is no covariance between e1 and e2, rEbaEcb2=re1e22=0

Prediction R4: For a linear pathway, it can be shown that

(a) rBC>rEbaC and further, (b) rBC2rEbaC2rBC2=rAB2

3.3.2. Radiating pathway (P2)

The causal equations for this model would be

A=m1B+e1+k1
B=input=B´+eb
C=m2B+e2+k2

Note that the relationship between causal parameters and regression parameters is substantially different in this pathway than the linear pathway (Table 3). For example, the causal slope is 1m1, but there is an underestimation of the slope during regression which is predicted exactly by rAB2.

Table 3. Relationship between the causal and regression equations for radiating pathway.
Slopes Errors
Mba=1m1rAB2 Eba=eb(1rAB2)1m1rAB2e1
Mcb = m2 Ecb = e2
Mca=m2m1rAB2 Eca=m2eb(1rAB2)m2m1rAB2e1+e2

However, this difference is not detectable from cross-sectional data alone. Therefore, the standard four testable predictions of this pathway remain similar to the linear pathway. We will describe later that differentiating between pathways P1 and P2 is possible using a different strategy.

Prediction R1: From Table 3, it can be shown that rAC2rAB2rBC2=0.

Prediction R2: From Table 3; we see slope Mca can be predicted from the product McbMba

Prediction R3: From Table 3, cov(e1,e2)=0rEbaEcb2=re1e22=0

Prediction R4: As formally shown in S1 Text (‘Radiating pathway: Prediction R4: Proof 2’), rEbaC<rBC and rBC2rEbaC2rBC2=rAB2

3.3.3. Convergent pathway (P3)

The causal equations for this model would be

A=input=A´+ea
B=m1A+m2C+e1+k1
C=input=C´+ec

where ea,e1 and ec are uncorrelated.

Regression parameters derived from the causal equations are given in Table 4.

Table 4. Relationship between the causal and regression equations for convergent pathway.
Slopes Errors
Mba = m1 Eba = e1 + m2ec
Mcb=1m2rBC2 Ecb=ec(1rBC2)1m2rBC2(m1ea+e1)
Mca = 0 Eca = ec

There are two pathway specific predictions for the convergent pathway, shared only by the different cause pathway. Firstly, we expect no correlation between A and C from this pathway, unless there are additional external pathways linking the two. The other unique feature of this pathway is that both A and C have independent causal influence on B. As a result, the effect of A adds to the error in the correlation between B and C and similarly, the effect of C contributes to the error in the correlation between A and B. As a result, rAB2+rBC2 cannot be greater than 1, as shown below:

rAB2+rBC2=m12ea2eb2+m22ec2eb2=m12ea2+m22ec2eb2

eb2=m12ea2+m22ec2+e12, so

rAB2+rBC2<1

This prediction is so robust that if rAB2+rBC2>1, the convergent pathway can be rejected right away. Since we assume A and C to be independent input variables we assume no correlation between them. However, if they are correlated due to some cause other than this pathway, only then rAB2+rBC2 can be greater than 1.

Prediction R1: Unlike pathways P1 and P2, for the convergent pathway, it can be seen that rAC2rAB2.rBC2<0.

Prediction R2: Since the expected slope Mca is zero, and both Mba and Mcb are non-zero, their product is not a predictor of Mca.

|Mca||McbMba|<0as|Mca|=0

Prediction R3: The correlation rEbaEcb is predicted to have the same sign as Mcb.

Prediction R4: It can be shown that (a) rBC<rEbaC and further, (b) rEbaC2rBC2rEbaC2=rAB2.

Because this expression differs from R4 (b) of the earlier pathways, we can use a more generalized form for R4 (b) as |rBC2rEbaC2|max(rBC2,rEbaC2)=rAB2

3.3.4. Common cause pathway (P4)

he causal equations for this model would be X=input=X´+ex

A=m1X+e1+k1
B=m2X+e2+k2
C=m3X+e3+k3

where exe1e2e3 are not correlated.

It needs to be noted that e1e2e3 are important in defining this pathway. If e2 is negligible the pathway approximates to the radiating pathway with B being the mediator between A and C. Similarly, at small e1, A becomes the mediator and at small e3, C becomes the mediator in a radiating pathway. For the way we have defined our predictions, e2 is the most important error term in this pathway (Table 5).

Table 5. Relationship between the causal and regression equations for common cause pathway.
Slopes Errors
Mba=m2m1rAX2 Eba=m2ex(1rAX2)+e2rAX2m2m1e1
Mcb=m3m2rBX2 Ecb=m3ex(1rBX2)+e3rBX2m3m2e2
Mca=m3m1rAX2 Eca=m3ex(1rAX2)+e3rAX2m3m1e1

One very special feature of this pathway is that qualitatively it is highly symmetric with respect to all the three variables A, B and C. This means that any permutation of them does not change the qualitative nature of any prediction. This can be used as a pathway specific prediction and a distinct signature for this pathway.

Prediction R1: It can be shown that rAC2>rAB2.rBC2 for this pathway.

Prediction R2: |Mca| − McbMba > 0

Prediction R3: The sign of the correlation rEbaEcb is decided by the signs of m1 and m2. When both have the same signs rEbaEcb are negative and when they have opposing signs rEbaEcb is positive. In other words, the correlation multiplied by the sign of Mca is always negative.

Prediction R4: For this pathway (a) rEbaC<rBC and (b) rBC2rEbaC2rBC2rAB2

3.3.5. Single different cause pathway (P5a)

The causal equations for this model would be

X=input=X´+ex
A=input=A´+ea
B=m1A+m2X+e1+k1
C=m3X+e2+k2

where exeae1e2 are not correlated. Regression parameters derived are in Table 6.

Table 6. Relationship between the causal and regression equations for single different cause pathway.
Slopes Errors
Mba = m1 Eba = m2ex + e1
Mcb=m3m2rBX2 Ecb=m3ex+e2m3rBX2(m1eam2+ex+e1m2)
Mca = 0 Eca = m3ex + e2m1ea

Two specific predictions of this pathway shared only by the convergent pathway (P3) are that rAC2=0 and that rAB2+rBC2<1.

Prediction R1: rAC2rAB2.rBC2<0

Prediction R2: |Mca| < McbMba

Prediction R3: The correlation rEbaEcb is predicted to have the same sign as Mcb.

Prediction R4: (a) rBC>rEbaC and further (b |rBC2rEbaC2|max(rBC2,rEbaC2)=rAB2 is true.

3.3.6. Double different causes pathway (P5b)

The causal equations for this model would be

X=input=X´+ex
Y=input=Y´+ey
A=m1X+e1+k1
B=m2X+m3Y+e2+k2
C=m3Y+e3+k3

where exeye1e2e3 are not correlated. Regression parameters derived are in Table 7.

Table 7. Relationship between the causal and regression equations for double different cause pathway.
Slopes Errors
Mba=m2m1rAX2 Eba=m2ex(1rAX2)+m3ey+e2m2m1rAX2e1
Mcb=m4m3rBY2 Ecb=m4ey(1rBY2)+e3m4m3rBY2(m2ex+e2)
Mca = 0 Eca = m4ey + e3

Two specific predictions of this pathway shared only by the convergent pathway (P3) are that rAC2=0 and that rAB2+rBC2<1.

Prediction R1: rAC2rAB2rBC2<0

Prediction R2: |Mca| < McbMba

Prediction R3: The correlation rEbaEcb is predicted to have the same sign as Mcb.

Prediction R4: (a) rBC>rEbaC and further (b) |rBC2rEbaC2|max(rBC2,rEbaC2)=rAB2 is true.

It can be seen that all predictions of pathways P5a and P5b are identical and henceforth we will treat both of them in a single group as pathway P5.

3.3.7. Positive or negative feedback pathway (P6)

The causal equations for this model would be

A=input
dBdt=m1Ad1B+e1+mfC+k1
dCdt=m2Bd2C+e2+k2

where eae1e2 are not correlated, and d1 and d2 are positive. For a consistent definition of feedback, we assume m1 and m2 to always be positive, and that the sign of mf decides whether it is a positive or negative feedback loop. Feedback loops depend crucially on the relative strength of the forward versus backward causation. If the feedback term, i.e. the effect of C on B is weak, it approximates to a linear pathway, and if the forward term i.e. effect of B on C is weak, it approximates to a convergent pathway. Therefore, the predictions of linear or convergent pathways can be expected if the forward or feedback links respectively are weak. Additionally, however, the negative feedback pathway is associated with a problem of definition. If the feedback effect of C on B is stronger than the effect of B on C, the signs of the slope in the causal and regression equations could be opposite, implying that while m2 is positive, Mcb could become negative. This happens when

Mcb=m2d2+e1e2eb2<0i.e.|mfe22eb21(d1d2m2mf)|>m2d2

This results in a paradoxical transformation of a causally negative feedback into an apparent positive feedback since the sign of the slope and that of the feedback effect is the same. Further, when the negative feedback is much stronger than the forward effect, the predictions of convergent pathway are more applicable than the predictions of negative feedback pathway. At equilibrium where both dBdt and dCdt=0, the equilibrium concentrations of B and C are given by

B=m1d2(d1d2m2mf)A+d2e1+mfe2d1d2m2mf+mfk2+k1d2d1d2m2mf
C=m2d2B+e2d2+k2d2

For simplification we take m1=m1d2(d1d2m2mf),e1=d2e1+mfe2d1d2m2mf and k1=mfk2+k1d2d1d2m2mf

B=m1A+e1+k1

Similarly, m2=m2d2,e2=e2d2 and k2=k2d2

C=m2B+e2+k2

It should be noted that e1=d2e1+mfe2d1d2m2mf and e2=e2d2 share e2, and would therefore co-vary. The sign of this covariance is decided by the sign of mf, i.e. whether the feedback is positive or negative.

Regression parameters can be derived from the above as in Table 8.

Table 8. Relationship between the causal and regression equations for positive or negative feedback pathway.
Slopes Errors
Mba=m1=m1d2(d1d2m2mf) Eba=e1=d2e1+mfe2d1d2m2mf
Mcb=m2+e1e2eb2=m2d2+e1e2eb2 Ecb=e2e1e2eb2eb=e2d2cov(d2e1+mfe2d1d2m2mf)(e2d2)eb2eb
Mca=m2m1=m2d2m1d2(d1d2m2mf) Eca=m2e1+e2=m2d2d2e1+mfe2d1d2m2mf+e2d2

Prediction R1: When the feedback is negative rAC2>rAB2rBC2

The reverse applies for positive feedbacks where rAC2<rAB2rBC2

Prediction R2: In negative feedback Mca > McbMba and in positive feedback |Mca| < McbMba.

Prediction R3: The sign of this correlation will be decided by the sign of mf which is negative for negative feedback and positive for positive feedback.

Prediction R4: (a) It can be shown that for negative feedbacks rEbaC<(rBC). For positive feedback the prediction is more conditional. The inequality rEbaC<(rBC) will be true above a threshold rBC. For smaller rBC it is difficult to make a definite prediction. (b) for negative feedback rBC2rEbaC2rBC2>rAB2 and for positive feedback rBC2rEbaC2rBC2<rAB2 is true above a threshold rBC.

3.3.8. Positive or negative feed-forward pathway (P7)

The causal equations for this model would be

A=input=A´+ea
A=m1X+e1+k1
dBdt=m1Ad1B+e1+k1
dCdt=m2B+m3Ad2C+e2+k2

where eae1e2 are not correlated. At equilibrium

B=m1d1A+e1d1+k1d1=m1A+e1+k1 taking m1=m1d1,e1=e1d1 and, k1=k1d1

C=m1m2+d1mfd1d2A+(m2d1d2e1+1d2e2)
C=m1m2+mfd1d2m1Bmfm1d2e1+e2d2+k2d2

We will take m2=m1m2+mfd1d2m1e2=mfm1d2e1+e2d2

Note that since e1 decides both e1 and e2, the covariance between e1 and e2 will be mfm1d2e1, which will be positive when mf is negative i.e. for negative feed-forward, and negative when mf positive i.e. positive feed-forward is.

For simplifying the definition of feed-forward, we assume m1 and m2 to be positive and the sign of mf decides whether the feed-forward is positive or negative; a negative feed-forward pathway is once again associated with a problem of definition. If the feed-forward effect of A on C is stronger than that through B, and if their signs are opposite, the signs of slope in the causal and regression equations could be opposites. That is, if mfd1 > m1m2 then Mcb can be negative although the causal relationship between B and C is positive. This results in a paradoxical transformation of a causally negative feed-forward pathway into an effectively positive feed-forward pathway as the product MbaMcb and Mca both have the same sign.

Note that while all the expressions are the same as in feedback pathways (Table 9), the differences lie in the meanings of e1e2,m2 etc.

Table 9. Relationship between the causal and regression equations for positive or negative feed-forward pathway.
Slopes Errors
Mba=m1=m1d1 Eba=e1=e1d1
Mcb=m2+e1e2eb2=m1m2+mfd1d2m1+e1e2eb2 Ecb=e2e1e2eb2eb=mfm1d2e1+e2d2e1e2eb2eb
Mca=m2m1=m1m2+mfd1d2m1.m1d1 Eca=m2e1+e2=m1m2+mfd1d2m1.e1d1+mfm1d2e1+e2d2

Prediction R1: For positive feed-forward rAC2>rAB2rBC2, and for negative feed-forward rAC2<rAB2rBC2, but under conditions in which the result mimics positive feedback, rAC2>rAB2rBC2. This is the condition when a causally negative feed-forward transforms into an apparent positive feed-forward.

Prediction R2: When mf is positive, McbMba < Mca. In the case of negative feed-forward McbMba > Mca but under conditions in which a negative feed-forward transforms into a positive feed-forward, McbMba < m1m2 = Mca.

Prediction R3: For positive feed-forward, we expect a negative correlation, and for negative feed-forward, a positive correlation. Therefore, for positive feed-forward, the correlation rEbaEcb will have the opposite sign of that of Mcb. For a negative feed-forward pathway, under conditions when it transforms into an effective positive feed-forward correlation, rEbaEcb will have the opposite sign of that of Mcb.

Prediction R4: (a) In the case of positive feed-forward rEbaC<rBC and in the case of negative feed-forward prediction is conditional. Under the conditions when a causally negative feed-forward becomes apparently positive feed-forward, the prediction of positive feed-forward is true. When a negative feed-forward is effective rEbaC<rBC will be true above a threshold rAB.

(b) For positive feed-forward rBC2rEbaC2rBC2>rAB2. For negative feed-forward a universal prediction cannot be made. When the effect is that of a positive feedback the prediction of positive feedback is true, otherwise rBC2rEbaC2rBC2<rAB2.

4. Testing against the null hypotheses

Testing in data needs to be different for equality and inequality predictions. The prediction can serve as the null model wherever equality is predicted but needs to be treated as an alternative hypothesis wherever inequality is predicted. For pathways that predict equality, a two-tailed probability is used, and for pathways predicting one-way inequality, a one tailed test is used. Numerical simulations were run using the causal pathway equations for each of the pathways P1 to P7 to generate data, assuming the errors to be distributed normally around a mean zero. In the results of simulations reported below, the convention consistently followed through Figs 2 and 3 is that if H0 is true, it is indicated by green, while H1 being true is indicated by red and H2 being true by yellow.

Fig 2. Analytical signatures for each pathway.

Fig 2

Summarizing the analytical signature for each pathway in a color code where green represents acceptance of the null hypotheses H0, and red and yellow represent the acceptance of H1 and H2 respectively. Asterisks indicate conditional prediction e.g.*rBC2 above a threshold, **rAB2 above a threshold.

Fig 3. Simulation results of all pathways against all predictions.

Fig 3

For all acyclic pathways and predictions from R1 to R4a, the result of every simulation run is plotted as a point with rAB2 and rBC2 on X and Y axes respectively. Green represents null hypothesis true, red for H1 and yellow for H2.

R1: H0: rAC2=rAB2rBC2, H1: rAC2<rAB2.rBC2 and H2: rAC2>rAB2rBC2. Since every correlation coefficient is associated with a confidence interval, to test the null hypothesis, we check that the confidence interval of rAC2rAB2rBC2 includes zero.

R2: H0: Mca = McbMba (green), H1: Mca < McbMba (red) and H2: Mca > McbMba (yellow).

R3: Since the signs of causal slopes m1 and m2 are allowed to be positive or negative in the models, the correlation coefficients are multiplied by the sign of the slope Mcb.

H0: rEbaEcb2.sign(Mcb)=0 (green), H1: rEbaEcb2.sign(Mcb)<0 (red) and H2: rEbaEcb2.sign(Mcb)>0 (yellow).

R4a: H0: rBC2=rEbaC2 (green), H1: rBC2<rEbaC2 (red) and H2: rBC2>rEbaC2 (yellow).

R4b: H0: |rBC2rEbaC2|max(rBC2,rEbaC2)=rAB2, H1: |rBC2rEbaC2|max(rBC2,rEbaC2)<rAB2, H2: |rBC2rEbaC2|max(rBC2,rEbaC2)>rAB2. Since the prediction is about whether |rBC2rEbaC2|max(rBC2,rEbaC2) is predicted by rAB2, in the simulations results reported below we show a scatter plot between the two where good predictions lie along the diagonal and failure of prediction strays away from it.

The results match very well with the predictions in Fig 2. For converging and different cause pathways, a pathway specific prediction is that the sum of the two correlation coefficients never exceeds unity. This is also evident in the simulation results.

For feedback and feed-forward pathways predictions from R1 to R4a, the X axis is rBC and Y is m2sde1mfsde2 which reflects the relative strength of the feedback or feed-forward term as compared to the forward relation between B and C. The feed effect is strong when the ratio is close to zero and weak moving away from it. For negative feedback and feed-forward the Y axis goes from -1 to 0 and for positive feedback and feed-forward from 0 to 1. With negative feedback and feed-forward, there is an apparent conversion to positive feedback and feed-forward respectively under a set of conditions. When this happens rBC becomes negative and the predictions of positive feedback and feed-forward respectively apply. It can also be seen that when the ratio is close to zero, predictions of converging pathway hold true.

Predictions from R4b for all pathways are shown as scatter plots with |rBC2rEbaC2|max(rBC2,rEbaC2) and rAB2 When they are predicted equal, most points lie along the diagonal. Wherever inequality is predicted, they are on one side of the diagonal.

Rejection due to overfitting inequality: For all inequality predictions, overfitting is possible. For example, if we expect that rAC2>rAB2rBC2, it is also possible that rAC2 is too large than what can be predicted by the pathway under consideration. It is possible to test this either analytically using equations derived for the corresponding predictions (see S1 Text), or using simulations only if the parameters of the causal equations are known. If parameter estimates for the pathways are known from independent empirical sources, it should be possible to test over-fitting inequality. We will illustrate this with real life data later in the section ‘Testing specific pathways and questions: The case of pre-diabetes’.

5. Sensitivity and robustness of predictions

We used simulations to test the sensitivity and reliability of the analytical predictions. The simulations were run using the causal pathway equations for each of the pathways P1 to P7 to generate data, assuming the errors to be distributed normally around a mean zero. Up to 10000 simulations are run, with each run using randomly drawn parameters and error standard deviations from a given range (see S1 Text ‘Simulations used for testing the sensitivity and robustness of the predictions’ for details). The error standard deviation ranges were selected such that the coefficients of determination were well spread between zero and one. The generated data were then used to test the predictions of the corresponding pathway. Simulations used in this section are not based on real life data and are mainly employed to test the reliability of the predictions over a range of regression correlation coefficients.

5.1. Prediction sensitivity depends on sample size

Figs 2 and 3 show that simulations generally follow the analytical predictions quite well, but with certain limitations. Many of the predictions, particularly when H1 and H2 are expected to be true, work well above threshold values of r2. When either rAB2 or rBC2 or both are small, the null hypothesis fails to get rejected. This threshold of sensitivity can be reduced by increasing sample size (n) (Fig 4). For example, in the case of prediction 1 for the convergent pathway, simulations showed that type 2 error reduced in a threshold relationship with the product of rAB2 and rBC2 and the point of inflection reduced exponentially with sample size (Fig 4B). Although qualitatively it can be said without doubt that statistical power would increase with sample size, the effect of sample size on statistical power appears to be different for different pathway-prediction combinations (Fig 4). In the case of cyclic pathways, many predictions are conditional as described above, and that is clearly reflected in the simulations. The agreement between predictions and simulation results is weaker for a few specific pathway-prediction combinations, in the sense that they work in a narrow range of conditions. This was seen in case of P5 (different cause) prediction from R4b, and P7 (negative feed-forward) prediction from R2 (Fig 4). This implies that we need to be conservative in rejecting pathways in such pathway-prediction combinations, particularly when the correlation coefficients are small. Further, wherever the predictions are themselves the null models, its rejection will naturally become conservative at low correlations. However, for inequality predictions, where the null hypothesis is equality, failure of rejecting the null hypothesis should not be taken as rejection of the prediction when correlations are weak. When we take such a conservative approach, rejection of a prediction can be confidently taken to mean rejection of a pathway whereas failure to reject has limited implications particularly when the correlations are weaker.

Fig 4.

Fig 4

Effect of n on the reliability of prediction. A: Simulation results at n=100 and n=500 in three pathway-prediction combinations: Colour codes similar to Fig 3. Note that when correlation coefficients are small the null hypothesis fails to get rejected leading to type 2 error. The parameter area over which the simulation results match the prediction increases with n. B. For prediction 1 of convergent pathway type 2 error shows a threshold relationship with the product of rAB2 or rBC2 and the point of inflection of the threshold relationship decreases exponentially with sample size.

5.2. Robustness of predictions

It can be seen from Table 10 that each pathway makes a set of predictions by which some pathways can be differentiated from others. However, some have an identical set of predictions among the general predictions described so far. Table 10 shows that there are 6 different signatures among 9 primary pathways. Some of the predictions are conditional, and therefore, it may not always be possible to differentiate between pathways. For example, some predictions do not work for very small r2 values. Feedback is not distinguishable from linear pathways unless the feedback arrow is sufficiently strong. Such limitations are common to all statistical tools, and they need to be used and interpreted in light of the appropriate context and conditions.

Table 10. Summary of predictions of all pathways considered.

Prediction/
Rule →
Pathway ↓
R1 R2 R3 R4 a R4 b Pathway specific prediction
P1 linear r2ACr2AB.r2BC = 0 |Mca|= |Mba.Mcb| rEba,Ecb= 0 rEbc,C/rBC < 1 |rEbaC2rBC2|max(rEbaC2,rBC2)=rAB2
P2 radiating r2ACr2AB.r2BC = 0 |Mca|= |Mba.Mcb| rEba,Ecb= 0 rEbc,C /rBC < 1 |rEbaC2rBC2|max(rEbaC2,rBC2)=rAB2
P3 convergent r2ACr2AB.r2BC < 0 |Mca|< |Mba.Mcb| rEba,Ecb> 0 rEbc,C /rBC > 1 |rEbaC2rBC2|max(rEbaC2,rBC2)=rAB2 rAC=0,
r2AB+r2BC<1
P4 common cause r2ACr2AB.r2BC > 0 |Mca|> |Mba.Mcb| rEba,Ecb< 0 rEbc,C /rBC < 1 |rEbaC2rBC2|max(rEbaC2,rBC2)>=rAB2 Symmetry around A,B,C
P5 different cause r2ACr2AB.r2BC < 0 |Mca|< |Mba.Mcb| rEba,Ecb> 0 rEbc,C /rBC > 1 |rEbaC2rBC2|max(rEbaC2,rBC2)=rAB2 rAC=0
r2AB+r2BC<1
P6 feedback
Negative
r2ACr2AB.r2BC > 0 |Mca|> |Mba.Mcb| rEba,Ecb< 0 rEbc,C /rBC < 1 |rEbaC2rBC2|max(rEbaC2,rBC2)>=rAB2
P6 feedback positive r2ACr2AB.r2BC < 0 |Mca|< |Mba.Mcb| rEba,Ecb> 0 rEbc,C /rBC < 1 |rEbaC2rBC2|max(rEbaC2,rBC2)>=<rAB2
P7 feed-forward negative r2ACr2AB.r2BC < 0 |Mca|< |Mba.Mcb| rEba,Ecb> 0 rEbc,C /rBC < 1 |rEbaC2rBC2|max(rEbaC2,rBC2)<=rAB2
P7 feed-forward positive r2ACr2AB.r2BC > 0 |Mca|> |Mba.Mcb| rEba,Ecb< 0 rEbc,C /rBC < 1 |rEbaC2rBC2|max(rEbaC2,rBC2)>=rAB2

Note that there are 6 distinct signatures among 9 pathways. Pathways with identical predictions are shaded with the same color. There is some redundancy between the predictions. For example, the results of R1, R2 and R3 are tightly correlated. We feel that the redundancy serves to validate and reinforce the predictions. Also, when there are complex pathways arising through combinations of primary pathways, different predictions show differential departures from the primary pathway predictions. Therefore, all predictions are useful in spite of some redundancy.

5.3. Sensitivity analysis

The predictions derived and tabulated above are based on the typical assumptions of mainstream statistics that the input variables are distributed normally and that all causal links are linear. However, it is important to ask how critical these assumptions are for the predictions to work. In experimental biology, the input variable is often designed to have uniform intervals and is not normally distributed. A moderate deviation from linearity is also common in physiological and other biological systems. If the predictions are too sensitive to these assumptions, they may prove to be of limited use in real-life. We used Monte-Carlo simulations to assess whether the predictions work under moderate deviations from these assumptions. When the input variables were selected randomly from a uniform rather than a Gaussian distribution, all predictions worked with nearly the same differentiating ability (data not shown). Similarly, when non-parametric Spearman ranked correlations were used instead of Pearson’s correlations, the correlation related predictions (from R1 and R4) worked similarly (data not shown). This demonstrates that the tools are not too sensitive to the assumptions of normality of input variable, linearity of relationships, and parametric or non-parametric nature of correlations.

6. Applications of the method

6.1. Accepting or rejecting pathways using real-life data

Two approaches are possible by which the predictions of a pathway can be tested using real-life data.

(i) Based on confidence intervals of correlations and regression slopes: The null hypotheses for every prediction can be tested using calculation of confidence intervals of regression correlation parameters. Simulations have shown that except when the underlying correlation coefficients are too low, this approach can be reliably used to test the predictions. The sensitivity of predictions depends upon the sample size as well as the position in the parameter space (Fig 3). It is likely therefore that at smaller sample sizes, or at lower rAB2 or rBC2, pathways that predict H1 or H2 may fail to get support even if true. On the other hand, at lower rAB2 or rBC2 if a pathway predicts H0 to be true and the null hypothesis gets rejected, the rejection can be highly reliable. (ii) Monte-Carlo simulation approach: An alternative approach, which will be more conservative in rejecting pathways, is the Monte-Carlo approach. Assuming a specific pathway to be true, it is possible to back calculate the causal equation parameters from the regression correlation parameters obtained in the data (Tables 2 to 9). For pathways such as negative or positive feedback, it is not possible to estimate all causal parameters from regression parameters. In such cases, if empirical estimates of one or a few causal parameters can be obtained, the remaining causal parameters can be worked back. Using the estimated parameters of causal equations, Monte-Carlo simulations can be run to obtain the probabilities of getting the observed results. This approach can be particularly useful when the correlations obtained in the data are weak, and a conservative inference is preferred.

6.2. Distinguishing between pathways with identical signatures

From the predictions summarized in Table 10, it can be seen that some pathways share prediction signatures. For example, the linear pathway cannot be distinguished from radiating or convergent, is indistinguishable from different cause. There are three possible ways of resolving between pathways with similar signatures:

  1. Swapping variables: In the generalized predictions, common cause pathway and negative feedback pathway have the same predictions. However, the predictions of the common cause pathway are symmetric around A, B and C, and flipping the positions of the three variables does not alter the predictions, which is not the case with negative feedback pathway.

  2. Involving a fourth variable whose causal relationship with at least one of the triad is already known, or

  3. Involving more variables to cross validate pathways.

We will discuss (ii) and (iii) in a different context below.

6.3. Inferring causality between two variables

It is extremely difficult to infer causal relationship between two correlated variables. Although some solutions have been suggested, their applicability is limited [15, 16]. However, it is possible to infer the causal relationship between two variables if we have data on a third variable that is correlated with one or both of them with known causality. For example, in men, testosterone levels and muscle strength are correlated, but the direction of the causal arrow might be uncertain since testosterone can increase muscle mass, while [32] exercise can also induce a testosterone response [33,34]. The causal relationship can be revealed in cross-sectional data if we use chronological age as a third variable. Neither testosterone nor muscle mass decides the chronological age, but age may affect one or both the variables. If age shows significant correlation with one or both the variables, the predictions from different possible pathways can be tested using the set of predictions as described. By testing these predictions, it should be possible to determine the causal relationship between muscle mass and testosterone.

6.4. Inferring causal pathways with three variables

To infer causal pathways within three intercorrelated variables, three alternative approaches are possible. The first approach is to test and resolve between preconceived hypothetical pathways. It is likely that prior knowledge or some insights into mechanisms allow us to start with a few plausible alternative pathways. It is possible to perceive more complex pathways by combinations of the primary pathways that we considered in this paper. For example, a pathway may contain both feedback and feed-forward elements. Such complex or combinational pathways can be used to make a set of predictions by the analytical approach described above and testing these predictions can resolve between pathways. If we do not have such preconceived pathways, it would be necessary to consider all possible combinations of pathways between the three variables and make differential predictions from each of them. In such cases, we must also consider permutations of the variables. At the end, it may not be possible to ascertain a single unique causal pathway since the prediction signatures of some of them may be identical. Nevertheless, it would still be possible to reject some pathways based on their prediction signature. In addition, if available, we can involve a fourth variable correlated to one or more of the three, if there is some pre-existing knowledge about its causal relationship.

6.5. Inferring causal networks with more than three variables

In complex systems, often there are large causal networks. In such networks, combinations of 3 membered motifs can be identified. Out of the possible pathways among three variables some can be rejected using analysis of the three variables. Bringing in a fourth one can provide additional insights which can be used for cross checking or validating our first set of inferences. In complex causal networks, there can be many such cross check and validation possibilities. For large networks algorithms requiring massive computational power may be needed that may pin down one or a few network structures from the large number of possible ones using combinations of three-member motifs and cross validation facility among the motifs.

7. Testing specific pathways and questions: The case of pre-diabetes

Apart from some common pathways described above, it is possible that real life problems have some added complications due to which, the standard solution of testing a fixed set of predictions may not be sufficient. However, one can apply similar foundational principles to handle such pathway-specific questions. We will illustrate this using a classical hypothesis that attempts to explain a human physiological state designated as an insulin resistant, hyperinsulinemic, normoglycemic, pre-diabetic state. In this state, the plasma levels of fasting insulin (FI) are raised although fasting glucose (FG) remains normal. The classical interpretation of this state (Fig 5A) is that a rise in insulin resistance, presumably as a result of obesity, is primary. Insulin resistance interferes with insulin-induced glucose uptake by muscle and other insulin-dependent tissues. The reduced uptake raises plasma glucose levels. The raised plasma glucose induces extra insulin secretion so that plasma insulin levels go up. The extra insulin compensates for insulin resistance and normalizes glucose level. As a result, the fasting steady state of an insulin-resistant individual is characterized by raised FI and normal FG. At steady state, insulin resistance is measured by the index, HOMA-IR (defined as Insulin(IU).Glucose(mgdL)405), and the β cell response to glucose, by the index, HOMA β (defined as 360.Insulin(IU)Glucose(mgdL)63). Both the indices are based on the assumption of a steady state. Indices based on ratios are tricky in that they can result into many spurious correlations. These correlations are not useful to make any functional inference about the system. However, we show below that since different causal pathways can give rise to different correlations, such correlations can be used to infer causal pathways.

Fig 5.

Fig 5

Possible pathways between insulin resistance, FG and FI. a) A simplified single feedback pathway that approximates the negative feedback pathway P6. b) A null model assuming FG and FI to be independent and HOMA-IR a derived construct. c) An improvised null model with an external causal factor influencing FG and FI. d) The classically perceived pathway with dual feedback from glucose and insulin.

A logical flaw in this interpretation is that, after the glucose levels return to normal, there is no reason why FI should remain high. Insulin has a short half-life of about 6 minutes [30,35] and therefore a steady state level can be achieved quite fast; 12 hour fasting should be sufficient to achieve such a steady state. Therefore, a steady state in which FI is raised but FG remains normal is not well explained by the classical theory. In spite of this flaw, the main stream thinking in this field has held on to this interpretation for over four decades, and the indices HOMA-IR and HOMA- β continue to be commonly used in epidemiological research. Challenges to this causal interpretation come from the arguments and evidence that rise in FI precedes insulin resistance [2226,36]. Therefore, there is a need to reexamine the classical causal pathway. We will test this pathway based on our interpretations of the interrelationships of the regression-correlation parameters.

The pathway in question is more complex than the basic set of pathways P1 to P7. For regulation of glucose production by the liver and glucose uptake by tissues, there is a dual negative feedback. One feedback is exerted by glucose itself, which enhances tissue uptake and suppresses liver glucose production. The other feedback operates through insulin, which facilitates glucose uptake by insulin-dependent tissues and suppresses liver glucose production. If we ignore the direct glucose feedback and assume that feedback regulation operates only through insulin, then there is a single negative feedback. Thus the pathway can be simplified to the negative feedback pathway P6. If we incorporate dual feedback, as the equations show below, the relationship between insulin resistance and FG is not strictly linear. We could therefore use the standard set of predictions of a negative feedback model, assuming a single feedback. Alternatively, we can use the dual feedback model, and apply simulations to make and test predictions, since empirical estimates for most of the parameters are available from experiments (see S2 Text).

However, the main problem in testing these pathways is that we have no direct measure of insulin resistance. HOMA-IR and HOMA-β are believed to measure insulin resistance and β cell response respectively, but they are derived from the other two variables, which makes the problem tricky and circular. We approach the problem using more than one set of assumptions. (i) First, we test the dual feedback pathway (Fig 5D) assuming HOMA-IR and HOMA-β to faithfully represent insulin resistance and β cell response respectively. (ii) Then, we examine the constraints laid down by deriving these two parameters from the other two variables. (iii) In comparison, we use a null model (Fig 5B) in which the classical pathway is not true, there is no relationship between FG and FI, and HOMA-IR and HOMA-β are artificial constructs derived from the two measured variables and may not reflect any real phenomenon. We also test the typical convergent model, in which FG and FI determine HOMA-IR. (iv)Using some oversimplification, ignoring non-linearity of the model and assuming that HOMA-IR and HOMA-β are faithful indicators, we test the classical predictions of the negative feedback pathway (Fig 5A) as described earlier. We use epidemiological data on FG and FI measurements in four populations to test the classical causal pathway using our approach.

As a general note, we would like to add that the HOMA indices being ratio measures could be confounding factor. It is known that ratio indices can behave unpredictably in different parameter ranges, and can therefore generate spurious correlations. In our view, this strengthens the case for using the underlying causal relationships between the variables to interpret these correlations in the appropriate context.

7.1. Data sources

We used four data sets of sample studies by two research groups. All the four sets contain individuals with and without overt type 2 diabetes. Since we are addressing the prediabetic state here we have taken the non-diabetic subset of n individuals from the four samples. (i) Coronary Risk of Insulin Sensitivity in Indian Subjects (CRISIS) Study, Pune, India [37] (n=558). (ii) Pune Maternal Nutrition Study (PMNS), Pune, India [38] (n=299). (iii and iv) Newcastle Heart Project (NHP), England, [39] which has data on populations of two different ethnic origins namely European white (n=595) and south Asian (n=413). (iv) Pune Maternal Nutrition Study (PMNS), Pune, India [38] (n=299). We received the data from the two principal investigators of the studies. Data sources (i) and (ii) were received from Prof. Chittaranjan Yajnik and data sources (iii) and (iv) were received from Prof. Raj Bhopal. All data have been anonymized, and all the predictions have been tested independently in all the four data sets.

7.2. The dual feedback model (Fig 5D)

We assume that the standing plasma glucose level is a result of baseline rate of glucose production by the liver; suppression of this production as well as muscle glucose pickup which is proportional to the standing glucose level (direct glucose feedback); the insulin mediated suppression as well as uptake (insulin mediated feedback) and individual variability. The standing insulin levels are a result of glucose stimulated insulin secretion on the one hand and insulin degradation on the other. Thus, the causal equations can be written as

dGdt=LK1.GISENS.K2.I+e1
dIdt=K3.Gd.I+e2

Where G and I are plasma levels of glucose and insulin respectively, and FG and FI are the fasting steady state levels of the same. K1 denotes the rate constant for negative feedback of glucose on liver glucose production and tissue glucose uptake; K2 denotes the rate constant for insulin-mediated feedback which is proportional to ISENS, the insulin sensitivity of tissues; K3 is the rate constant for glucose-induced insulin release; and d, the rate of insulin degradation.

At steady state, we get

FG=d.(L+e1)ISENS.K2.e2K1.d+ISENS.K2.K3
FI=K3.FG+e2d

It can be seen that the steady state glucose level is a function of insulin resistance (IR) which is a reciprocal of insulin sensitivity. Using the reciprocal, we can write

FG=IR.d.(L+e1)K2.e2IR.K1.d+K2.K3

Thus the relationship between FG and IR is non-linear and follows a saturation curve.

7.3. Testing the model

Testing the pathways by the four different approaches described above:

7.3.1. Approach 1

Assuming classical pathway and faithful indices: The following predictions of the classical pathway depicted in Fig 5D and modeled above are testable.

  1. HOMA-IR, FG and FI should be positively correlated to each other. This prediction is true in all the four data sets except that the correlations between FG and FI are weak in all the four data sets. In terms of the variance explained (range 2.6 to 4.9%) FG and FI are poorly related (Table 11). The glucose homeostasis model expects a positive correlation between FG and FI. It is important to realize this since in the classical thinking, a prediabetic state is characterized by increased insulin but normal glucose levels. If the compensatory insulin response is mediated through glucose, it is impossible to have a raised FI without a proportionate rise in FG. In the pathway predictions, a positive correlation between FG and FI is expected independent of the feedback loop. However the classical thinking tries to explain a hyperinsulinemic normoglycemic state achieved through this pathway. The poor correlation between FG and FI, and a large coefficient of variation in FI compared to FG indicates that a normoglycemic hyperinsulinemic state may indeed be achieved, but whether the classical pathway offers a sound explanation for this state is the question. In an insulin resistant state, the level of FI can increase by about 10-fold the normal. However, the difference between the lower and upper limit of glucose in a pre-diabetic state is less than 1.5-fold. To achieve a tenfold increase in the effect resulting from a 1.5 fold increase in the causal variable, the slope needs to be of the order of 7 to 8. However in the data, the regression slope ranges between 0.05 and 0.2 (Table 12). Therefore the variance in FI is unlikely to be caused by variance in glucose following insulin resistance. Therefore, we need to conclude that most of the variation in FI appears to be random error independent of insulin resistance.

  2. By the steady state equations, the slope of the regression of FI on FG should be K3/d. Empirical estimates for both K3 and d are available (see S2 Text) and therefore this prediction can be tested. The empirical estimates are K3 = 0.08 microIU.mg/min and d = 0.15/min respectively, and thereby the expected slope is 0.533. In all the four data sets, the slopes are significantly smaller than the ones predicted from the empirical estimates (0.05 to 0.2). Thus, apart from a mismatch between the slope required to cause the observed variation in FI and actual slopes, the slopes expected from the empirical estimates of parameters and those obtained in regression also do not match. The latter mismatch by itself may not be sufficient to reject the pathway since a large measurement error in the X variable, i.e. FG can lead to underestimation of regression slope, but this explanation implies that a substantial part of variation in glucose is independent of insulin resistance, and is akin to random error with respect to the hypothetical causal pathway.

  3. HOMA-β in our assumption represents K3. However K3 is a constant in our model, and although it may have some variability in the population, it is uncorrelated with the three variables of concern. Therefore, HOMA-β should show no significant correlation with FG, FI and HOMA-IR. However, in all the four data sets HOMA-β is significantly positively correlated with FI, but negatively correlated with FG and positively correlated with HOMA-IR.

  4. In a negative feedback pathway rAC2>rAB2.rBC2. Qualitatively this inequality is true for HOMA-IR, FG and FI in the data. However, simulations show that there is overfitting of the inequality. rAC2 in all the four sets of data are substantially higher than the distribution obtained in the simulations (Fig 6). The correlation between FI and HOMA-IR is far greater than that predicted by the simulations, leading to an overfitting rejection.

Table 11. Fasting glucose and fasting insulin in the four datasets.
Data source FG mean (s.d.) FI mean (s.d.)
CRISIS (n=558) 94 (18.70) 7.58 (1.82)
KEMPMNS (n=299) 86.35 (11.21) 11.84 (10.82)
NHP-SA (n=413) 98.86 (7.37) 11.11 (9.26)
NHP-EU (n=595) 99.90 (6.40) 8.41 (5.29)
Table 12. Correlations between FG, FI, HOMA-IR and HOMA beta in the four data sets; r is the Pearson’s correlation coefficient, M is the slope of the corresponding regression, and ρ is the Spearman’s rank correlation.
FG:FI FG:HOMA IR FG:HOMA beta FI:HOMA IR FI:HOMA beta HOMA IR:HOMA beta
r(M) ρ r ρ r ρ r ρ R ρ r ρ
CRISIS (n=558) 0.217
(0.055)
0.284 0.525 0.465 -0.209 -0.403 0.931 0.975 0.232 0.698 0.312 0.543
KEM PMNS(n=299) 0.198
(0.191)
0.241 0.317 0.352 -0.219 -0.259 0.985 0.990 0.532 0.811 0.633 0.746
NHP-SA (n=413) 0.163
(0.206)
0.282 0.236 0.464 -0.198 -0.145 0.995 0.990 0.763 0.875 0.833 0.810
NHP-EU (n=595) 0.223
(0.185)
0.261 0.309 0.464 -0.064 -0.055 0.994 0.993 0.891 0.933 0.904 0.886
Fig 6. Frequency distribution of correlation coefficients in simulations of the classical pathway leading to prediabetic state: Bars represent the distribution of Pearson’s correlations obtained in 10000 runs of simulations.

Fig 6

The arrows indicate Pearson’s correlations in the four sets of empirical data.

Thus if we assume the two HOMA indices to faithfully represent insulin resistance and beta cell response respectively, then classical pathway needs to be rejected owing to mismatches with many of its predictions.

The distribution generated by simulations matches well with the real life correlations for true IR-FG (grey bars and arrows), FG-FI (red bars and arrows), and the product of the two (purple bars and arrows). The correlation between true IR and FI is greater than the product as predicted by the pathway (green bars, we do not have empirical estimates of these correlations) but the correlation between HOMA-IR and FI (blue bars and arrows) is substantially greater than the predicted leading to an overfitting rejection. This indicates that either HOMA-IR as currently calculated is substantially different from true insulin resistance or the pathway get rejected based on this prediction.

7.3.2. Approach 2

Effects of deriving HOMA-IR and HOMA-β from FG and FI: Since HOMA-IR and HOMA-β are not independently measured but derived from FG and FI measurements, some correlations will follow from the derivations themselves. The overfitting anomaly observed above can be explained as an artifact coming out of the calculation of HOMA-IR. However, some other anomalies remain unexplained. Here we are assuming that the classical pathway is true and therefore, FI is a linear function of FG. If FI is represented as m.FG + e, HOMA-IR will be correlated to FG2. Similarly, HOMA-β should be represented as m.FG/(FG – 63)+e. Under normal physiological range, FG > 63 and therefore HOMA-β is a decreasing function of FG. As a result both FI and HOMA-IR should be negatively correlated to HOMA-β. Simulations of the pathway results in a negative correlation between HOMA-IR and HOMA-β as long as the errors are small to moderate. These expectations do not match the empirical data, in which FI and HOMA-IR have significant positive correlations with HOMA-β. Thus, accepting the classical pathway with some allowance for artifacts coming out of the derived variables is not sufficient to explain the empirical correlations.

7.3.3. Approach 3

Testing the predictions of the null model: If FG and FI are independent of each other and have some variance around a mean, HOMA-IR is expected to be positively correlated with both since it is a product of the two. FI should be positively correlated with HOMA-β, but FG should be negatively correlated with HOMA-β. In the HOMA-IR- HOMA-β relationship, FI is in the numerator of both. FG is in the numerator of HOMA-IR but in the denominator of HOMA-β. Nevertheless, since the coefficient of variation of FI is substantially greater than that of FG, FI is expected to dominate the relationship and result in a positive correlation between HOMA-IR and HOMA- β. All these predictions are observed in the data (Table 12). The mismatch of the null model with the data is that it assumes FG and FI to be independent and uncorrelated. In all the four sets of data, there is a significant but weak correlation between the two. The r2 ranges from 0.026 to 0.049, and thus not more than 5% of variance is explained by the relationship.

If we consider FG and FI to be independent and HOMA-IR and HOMA-β derived from them, they constitute a convergent pathway that can be tested by the pathway predictions. It can be seen that predictions from R1, R2 and R3 of the convergent pathway are accepted. However, prediction from R4 and the pathway-specific prediction are rejected (Tables 12 and 13). These rejections can be explained by the positive correlation between FG and FI. We have seen in the analysis of pathway P3 that if A and C are positively correlated then rAB2+rBC2 can be greater than 1. The rejection of the null model suggests that there is a relationship between FG and FI, but does not indicate whether it comes from the classical pathway or through any other source as in Fig 5C.

Table 13. Testing the four predictions for the null model, which is a convergent pathway (Fig 5B),; A=Glucose, B=HOMA IR, C=Insulin.
Prediction R1
r2AC <r2AB.r2BC
CI of r2AC CI of r2AB.r2BC Accepted/Rejected
Lower Upper Lower Upper
CRISIS 0.013 0.081 0.180 0.300 Accepted
KEM PMNS -0.004 0.082 0.035 0.162 Not rejected
NHP-SA -0.004 0.057 0.013 0.098 Not rejected
NHP-EU 0.016 0.084 0.050 0.139 Not rejected
Prediction R2
Mca<Mba.Mcb
CI of Mca CI of Mba.Mcb Accepted/Rejected
Lower Upper Lower Upper
CRISIS -0.053 0.163 0.104 0.146 Accepted
KEM PMNS -0.185 0.568 0.195 0.413 Accepted
NHP-SA -0.199 0.611 0.1761 0.418 Accepted
NHP-EU -0.178 0.548 0.190 0.320 Accepted
Prediction R3
rEbaEcb>0
CI of rEbaEcb Accepted/Rejected
Lower Upper
CRISIS 0.390 0.521 Accepted
PMNS 0.118 0.333 Accepted
NHP-SA 0.081 0.268 Accepted
NHP-EU 0.178 0.328 Accepted
Prediction 4a
rEbaC>rBC

CI rEbaC CI rBC Accepted/Rejected
Lower Upper Lower Upper
CRISIS 0.953 0.965 0.918 0.941 Accepted
KEM PMNS 0.966 0.978 0.982 0.988 Rejected
NHP-SA 0.980961 0.987 0.994 0.995 Rejected
NHP-EU 0.968 0.977 0.993 0.995 Rejected

7.3.4. Approach 4

If we ignore the non-linearity of the model and assume HOMA-IR and HOMA-β to faithfully represent insulin resistance and beta cell response, we may use the 4 predictions of the standard negative feedback pathway. It is seen that predictions from R1, R2 and R3 are accepted but the outcome of prediction from R4 is complex (Table 14). After correcting for the effect of HOMA-IR, the FG-FI correlation should be weakened and that difference would be predicted by the correlation between HOMA-IR and FG. However instead of weakening, the FG-FI correlation becomes negative. Because of the strong positive correlation between HOMA-IR and FI, correcting for HOMA-IR subtracts from every value of FG, a quantity proportionate to FI, leading to a negative correlation between the corrected FG and FI. Additionally, simulations of the pathway show that if true insulin resistance is assumed to be correlated to FG by the same order as HOMA-IR, the correlation of true insulin resistance with FI is far less than that between HOMA-IR and FI (a result similar to Fig 6 and therefore not separately shown). Thus, there is an overfitting rejection of prediction from R1 as well. Rejection of this pathway based on two predictions is due to the unrealistically strong correlation between HOMA-IR and FI, which comes from the calculation of HOMA-IR itself.

Table 14. Testing the four predictions for a simplified form of the classical negative feedback pathway (Fig 5A); (A= HOMA IR, B=glucose, C=insulin).
Prediction R1
r2AC >r2AB.r2BC
CI r2AC CI of r2AB.r2BC Accepted/Rejected
Lower Upper Lower Upper
CRISIS 0.845 0.886 0.003 0.300 Accepted
KEM PMNS 0.9652 0.978 -0.001 0.161 Accepted
NHP-SA 0.988 0.99 -4.8695E-05 0.006 Accepted
NHP-EU 0.986 0.990 0.001 0.012 Accepted
Prediction R2
Mca>Mba.Mcb
CI Mca CI of Mba.Mcb Accepted/Rejected
Lower Upper Lower Upper
CRISIS -3.242 9.967 0.222 0.641 Accepted
KEM PMNS 0.083 0.299 0.080 0.587 Rejected
NHP-SA 0.085 0.326 0.038 0.334 Accepted
NHP-EU -3.766 11.581 0.132 0.457 Accepted
Prediction R3
rEbaEcb<0
Lower rEbaEcb Upper rEbaEcb Accepted/Rejected
CRISIS -0.574 -0.452 Accepted
PMNS -0.414 -0.210 Accepted
NHP-SA -0.325 -0.142 Accepted
NHP-EU -0.379 -0.234 Accepted
Prediction R4a
rEbaC<rBC
CI rEbaC CI rBC Accepted/Rejected
Lower Upper Lower Upper
CRISIS -0.392 -0.243 0.136 0.294 Rejected
PMNS -0.231 -0.008 0.087 0.305 Rejected
NHP-SA -0.168 0.023 0.068 0.256 Rejected
NHP-EU -0.167 -0.008 0.145 0.298 Rejected

We need to examine now to what extent HOMA-IR faithfully represents the true insulin resistance because if it does, the classical pathway certainly gets rejected. This can be examined in the simulations since the true insulin resistance is an input variable and HOMA-IR can be calculated as an outcome of the simulations. We see that HOMA-IR is correlated well with true insulin resistance when both e1 and e2 are close to zero (Fig 7). As the errors increase, the correlation becomes weaker. In the data, we do not have access to e1 and e2 but since the FG-FI correlation also becomes weaker with e2, we can look at how HOMA-IR represents true insulin resistance at different levels of FG-FI correlation. It can be seen that as FG-FI correlation becomes weak, HOMA-IR correlation with the true insulin resistance also becomes weak (Fig 7), but this relationship is affected by e1. When e1 is close to zero, i.e. almost all the variation in FG is explained by variation in true insulin resistance, even at low FG-FI correlation, HOMA-IR represents true insulin resistance fairly well, their correlation ranging between 0.58 and 0.7. On the other hand if we assume e1 to be large i.e. most of the variation in FG is due to random error or effects independent of insulin action, HOMA-IR is poorly correlated with true insulin resistance, the correlation coefficient declining to 0.2. Thus if we assume that the variance in FG is mainly caused by insulin resistance, then we have to reject the classical pathway leading to hyperinsulinemia. Alternatively, it is likely that the classical pathway is true but HOMA-IR does not represent true insulin resistance and that most of the variation in FG is not caused by insulin resistance. The substantially lower than expected slope of the FG-FI regression suggests large random errors in FG making the second interpretation more likely. In any case the classical pathway and the faithfulness of HOMA indices cannot be simultaneously true, and we have to reject at least one of them.

Fig 7.

Fig 7

The reliability of HOMA-IR as an index of true insulin resistance. The pathway simulations were carried out at a standard deviation of e1=1 (blue dots) and 10 (red dots). The FG-FI correlation weakens with increase in e2 which also affects the correlation between true IR and HOMA-IR. It can be seen that HOMA-IR is a reliable indicator of insulin resistance when e1 is small, but at large e1 it is a poor indicator as suggested by a weak correlation with true insulin resistance.

Results of the four alternative approaches to analyze the classical pathway and the null model converge on the inference that the null model is rejected on the basis of a weak but significant correlation between FG and FI. But the weak correlation in FG and FI is not adequately explained by the classical pathway owing to multiple mismatches and rejection of many of its predictions. The pathway rejection may be partially saved by saying that HOMA-IR and HOMA- β are not good indicators of insulin resistance and beta cell response and that we do not have access to true insulin resistance to test the predictions. However the FG-FI regression slope also has a large mismatch with expectations derived from the variance in FI as well as from empirical estimates of K3 and d. Therefore, it seems more likely that FG and FI are related by causes other than the classical pathway, and HOMA-IR and HOMA- β are derived artificial constructs that do not represent any real life phenomena.

There are a number of real life interpretations of the pathway in Fig 5C. Autonomic inputs from the nervous system are known to affect both insulin secretion and liver glucose production, which might be represented by the common cause arrows of Fig 5C. Alternatively, a small error in data collection can also result in the observed FG-FI correlation. The fasting sampling is done by instructing the subjects to have no food or drink after the last evening meal. However, if even a small proportion of subjects happen to consume bed tea an hour or two before sampling, their glucose as well as insulin levels could be slightly elevated simultaneously. This can result in a weak positive correlation between FG and FI in the data. Since the fasting state is based on the honesty of the subjects and there is no independent monitoring, this source of error cannot be ignored. Thus, there are more than one possible reasons for external factors causing a weak correlation between FG and FI, and the correlation is not sufficient to support the classical pathway in the presence of multiple other mismatches.

It should be noted that the correlational patterns in the four data sets used are remarkably similar although they come from populations differing in location, ethnicity and culture. It would be important to see whether the same correlational patterns are observed in other populations as well, but we can be confident in rejecting the classical pathway at least in the populations sampled.

7.4. What can type 2 diabetes research gain from our analysis

Putting the results together, it can safely be concluded that HOMA-IR and HOMA-β appear to be artificial constructs and reflect very marginally, if at all, the true insulin resistance and β cell response in a steady state. Before our approach for testing a causal pathway was available, there was no way to test whether HOMA-IR and HOMA-β truly represent the intended states of the system. Because of this limitation, insulin resistance was a circular argument. The inability of insulin to regulate glucose was assumed to be because of insulin resistance, but insulin resistance was measured as the inability of insulin to regulate glucose. This circularity had made the hypothesis of insulin resistance and compensatory hyperinsulinemia non-falsifiable. Our approach to pathway predictions breaks the circularity, and makes it possible to test whether the insulin resistance and glucose-mediated compensatory hyperinsulinemia hypothesis is supported by epidemiological data. At least in the populations tested, many serious anomalies in the classical pathway leading to a hyperinsulinemic, normoglycemic, insulin resistance prediabetic state are exposed. Conservatively we can argue that since HOMA-IR and HOMA-β do not represent insulin resistance and β cell response faithfully, and we do not have alternative measures for them, it may not be possible to clearly reject the classical pathway, but the data clearly show that even if true, the classical pathway has a very limited role in deciding FG, FI and their inter-relationship. Both the steady state levels have a large component of error or effects independent of the pathway under consideration. Although our analysis is restricted to the prediabetic state at present, establishing causality in the prediabetic state has implications for the over diabetic state. According to classical thinking, a failure of compensatory insulin response leads to diabetic hyperglycemia. Since our analysis questions the compensatory insulin response itself, the pathway leading to hyperglycemia is also in question. Even a highly conservative inference would demand rethinking of the causal process leading to diabetes.

It should be noted that although the HOMA indices may not reflect what they are intended to reflect, they do not cease to be useful. They may still reflect on the population variability in the glucose and insulin responses and many attempts to analyze this variability and classify diabetes have used the HOMA indices insightfully [40,41]. Therefore although the interpretations might change, the HOMA indices may still serve a useful purpose.

Doubts about the classical pathway are raised independently by experiments using insulin receptor knockouts or insulin suppression. Muscle-specific insulin receptor knockouts show altered glucose tolerance curves but normal fasting insulin [42]. Insulin suppression experiments do not result in elevated fasting glucose [4345]. Inactivation of insulin degrading enzyme raises steady state insulin levels but does not decrease glucose levels [46,47]. These experiments have already challenged the classical pathway. Thus, there are multiple reasons to doubt the classical pathway. On the other hand, a number of factors other than the mutual effects of FG and FI are known to affect insulin response as well as glucose homeostasis [4854], but these factors have not been integrated into the mainstream glucose homeostasis models. We do not intend here to test all possible alternative pathways deciding FG and FI. But our study lays down a set of methods by which this can be done, once the pathway hypotheses are clearly spelt out and the causal variables are measured. An important contribution of our methods is that physiological causal pathways can be evaluated based on epidemiological data, which is potentially a very important tool in understanding complex disorders. Experimental biology reveals what can happen in a system, but what does happen at the population level is better revealed by epidemiological data. Therefore, discerning causal signatures of pathophysiological pathways in epidemiological data is likely to be an important breakthrough.

8. Conclusions

Making causal inferences from cross sectional correlational data is a long-standing problem. A correlation between two variables does not give reliable information about causal relations. However, we demonstrate here, in the context of steady state homeostatic systems, using mathematical proofs as well as simulations from causal pathways that, in a set of three or more correlated variables, it is possible to test causal hypotheses based on the interrelationships of regression-correlation parameters. This is potentially a highly valuable tool in making causal inferences from cross sectional data in several fields.

Using this set of principles, we tested the classical causal assumption behind the hyperinsulinemic, normoglycemic, insulin resistant or pre-diabetic state. The analysis showed that this causal pathway and the measures of insulin resistance and insulin response were not supported by epidemiological data. Thus, the objections raised recently to the classical causal pathway are validated and alternative causal pathways that already have substantial experimental evidence need to be integrated in the mainstream clinical thinking.

Supporting information

S1 Text. Deriving predictions for regression parameters based on causal equations and calculation of causal parameters for simulations.

(DOCX)

S2 Text. Empirical estimates for parameters used in simulations.

(DOCX)

Acknowledgments

We thank Chittaranjan Yajnik for making the CRISIS and PMNS data sets available to us. We would also like to thank Louise Hayes and Raj Bhopal for making the data from NHP studies available. We thank Subhash Lele and Anil Gore for useful comments on an earlier draft manuscript.

Data Availability

All relevant data are within the paper, supporting information and previously published studies (ref 37-39).

Funding Statement

This work was supported by Rajiv Gandhi Science and Technology Commission (RGSTC), Maharashtra State.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Text. Deriving predictions for regression parameters based on causal equations and calculation of causal parameters for simulations.

(DOCX)

S2 Text. Empirical estimates for parameters used in simulations.

(DOCX)

Data Availability Statement

All relevant data are within the paper, supporting information and previously published studies (ref 37-39).


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES