A model of gene-gene and gene-environment interactions and its implications for targeting environmental interventions by genotype

Helen M Wallace

doi:10.1186/1742-4682-3-35

. 2006 Oct 9;3:35. doi: 10.1186/1742-4682-3-35

A model of gene-gene and gene-environment interactions and its implications for targeting environmental interventions by genotype

Helen M Wallace ^1,^✉

PMCID: PMC1629012 PMID: 17029623

Abstract

Background

The potential public health benefits of targeting environmental interventions by genotype depend on the environmental and genetic contributions to the variance of common diseases, and the magnitude of any gene-environment interaction. In the absence of prior knowledge of all risk factors, twin, family and environmental data may help to define the potential limits of these benefits in a given population. However, a general methodology to analyze twin data is required because of the potential importance of gene-gene interactions (epistasis), gene-environment interactions, and conditions that break the 'equal environments' assumption for monozygotic and dizygotic twins.

Method

A new model for gene-gene and gene-environment interactions is developed that abandons the assumptions of the classical twin study, including Fisher's (1918) assumption that genes act as risk factors for common traits in a manner necessarily dominated by an additive polygenic term. Provided there are no confounders, the model can be used to implement a top-down approach to quantifying the potential utility of genetic prediction and prevention, using twin, family and environmental data. The results describe a solution space for each disease or trait, which may or may not include the classical twin study result. Each point in the solution space corresponds to a different model of genotypic risk and gene-environment interaction.

Conclusion

The results show that the potential for reducing the incidence of common diseases using environmental interventions targeted by genotype may be limited, except in special cases. The model also confirms that the importance of an individual's genotype in determining their risk of complex diseases tends to be exaggerated by the classical twin studies method, owing to the 'equal environments' assumption and the assumption of no gene-environment interaction. In addition, if phenotypes are genetically robust, because of epistasis, a largely environmental explanation for shared sibling risk is plausible, even if the classical heritability is high. The results therefore highlight the possibility – previously rejected on the basis of twin study results – that inherited genetic variants are important in determining risk only for the relatively rare familial forms of diseases such as breast cancer. If so, genetic models of familial aggregation may be incorrect and the hunt for additional susceptibility genes could be largely fruitless.

Background

Some geneticists have predicted a genetic revolution in healthcare: involving a future in which individuals take a battery of genetic tests, at birth or later in life, to determine their individual 'genetic susceptibility' to disease [1,2]. In theory, once the risk of particular combinations of genotype and environmental exposure is known, medical interventions (including lifestyle advice, screening or medication) could then be targeted at high-risk groups or individuals, with the aim of preventing disease [3].

However, there are also many critics of this strategy, who argue that it is likely to be of limited benefit to health [4-8]. One area of debate concerns the proportion of cases of a given common disease that might be avoided by targeting environmental or lifestyle interventions to those at high genotypic risk. Known genetic risk factors have to date shown limited utility in this respect [9]. However, some argue that combinations of multiple genetic risk factors may prove more useful in the future [10].

There are two possible approaches to considering this issue. The 'bottom-up' approach seeks to identify individual genetic and environmental risk factors and their interactions and quantify the risks. However, this approach is limited by the difficulties in establishing the statistical validity of genetic association studies and of quantifying gene-gene and gene-environment interactions: see, for example, [11-14].

A 'top-down' approach instead considers risks at the population level using twin and family studies and data on the importance of environmental factors in determining a trait. However, analysis of twin data is usually limited by the assumptions made in the classical twin study [15], including that: (i) there are no gene-gene interactions (epistasis); (ii) there are no gene-environment interactions; (iii) the effects of environmental factors shared by twins are independent of zygosity (the 'equal environments' assumption). These assumptions have all been individually explored and shown to be important in influencing the conclusions drawn from twin and family data [16-18]. In addition, the magnitude of any gene-environment interaction is critically important in determining the utility of targeting environmental interventions by genotype [19]. Although a general methodology to analyze twin data without making these assumptions has been developed, the algebra becomes intractable once multiple loci are involved [17]. This is problematic because, for common diseases, the impacts of multiple genetic variants, and potentially the whole genetic sequence, on disease susceptibility (here called 'genotypic risk') may be important.

The four-category model of population risks developed by Khoury and others [19] is a useful starting point for a top-down analysis of genetic prediction and prevention. It allows the merits of a targeted intervention strategy (which seeks to reduce the exposure of the high-risk genotype group only) to be explored, and can readily be extended to include more than four risk categories [10]. However, this model's use to date has been limited to bottom-up consideration of single genetic variants or to studying hypothetical examples of multiple variants. The four-category model is limited by the assumption of no confounders, which means it is applicable to only a subset of possible models of gene-gene and gene-environment interaction. However, situations where the 'no confounders' assumption is valid are arguably most likely to be of relevance to public health.

The aim of this paper is to combine the four-category model with population level data from twin, family and environmental studies, without adopting the classical twin model assumptions. This model of gene-gene and gene-environment interactions is then used to implement a 'top-down' approach to quantifying the utility of genetic 'prediction and prevention'.

Method

The four-category model

Consider a population divided into genotypic or environmental risk categories for a given trait (Figure 1a and 1b). The fraction of the population in the 'high environmental risk group' (designated by subscript e) is ε, and this subpopulation is at risk r_e. The remainder of the population is at risk r_oe. The fraction of the population in the 'high genotypic risk' group (designated by the subscript g) is γ, and this subpopulation is at risk r_g, with the remainder of the population at risk r_og. The total risk r_tfor this trait in this population is then given by:

r_t= γr_g+ (1-γ)r_og (1)

or by:

r_t= εr_e+ (1-ε)r_oe (2)

The same population can alternatively be divided into four categories, making a four-category model (Figure 1c)) with risks R_oo, R_oe, R_goand R_ge. Table 1 shows the risk categories in this model.

Table 1.

The four category model: risks and cases for a population of size N.

Category	Risk of being in category	Number of people in category	Number of cases in category
ge (high-risk genotype/high-risk exposure)	R_ge	γεN	γε R_geN
go (high-risk genotype/low-risk exposure)	R_go	γ (1-ε)N	γ (1-ε)R_goN
oe (low-risk genotype/high-risk exposure)	R_oe	ε (1-γ)N	ε (1-γ)R_oeN
oo (low-risk genotype/low-risk exposure)	R_oo	(1-ε) (1-γ)N	(1-ε) (1-γ)R_ooN

Total		N	r_tN

Open in a new tab

The risks are related to the previous definitions by:

r_g= εR_ge+ (1-ε) R_go (3)

r_og= εR_oe+ (1-ε) R_oo (4)

r_e= γR_ge+ (1-γ) R_oe (5)

r_oe= γR_og+ (1-γ) R_oo (6)

The category risks R remain constant in different populations (i.e. as ε and γ vary), provided there are no confounders. This assumption restricts the model to special cases of gene-gene and gene-environment interaction. Note that for a single genetic variant, r_gcorresponds to the penetrance of the variant, and that in general (provided R_ge≠ R_go) this varies with the proportion of the population in the high exposure group, ε, as has been observed [20,21].

The total risk for the given trait is given by:

r_t= γεR_ge+ γ(1-ε)R_go+ ε(1-γ)R_oe+ (1-ε)(1-γ)R_oo (7)

The subpopulation of cases has different characteristics from the general population: for example, it contains a higher proportion of people from the 'ge' subgroup. The relative risk for a person drawn randomly from a subpopulation with the same genotypic and environmental characteristics as the cases, RR^cases, is given by the sum of the relative risks for each category shown in Table 1:

$R R^{c a s e s} = \frac{γ ε R_{g e}^{2} + γ (1 - ε) R_{g o}^{2} + ε (1 - γ) R_{o e}^{2} + (1 - ε) (1 - γ) R_{o o}^{2}}{r_{t}^{2}} (8)$

Similarly, the relative risk for a person drawn randomly from a subpopulation with the same genotypic characteristics as the cases (but with the environmental characteristics of the general population) is:

$R R_{g e n}^{c a s e s} = \frac{γ r_{g}^{2} + (1 - γ) r_{o g}^{2}}{r_{t}^{2}} (9)$

The relative risk for a person drawn randomly from a subpopulation with the same environmental characteristics as the cases (but with the genotypic characteristics of the general population) is:

$R R_{e n v}^{c a s e s} = \frac{ε r_{e}^{2} + (1 - ε) r_{o e}^{2}}{r_{t}^{2}} (10)$

Population attributable fractions

Provided there are no confounders, the population attributable fraction (PAF^E_e) due to the presence of the high exposure (E) in the high exposure population subgroup (e) may be defined as:

$P A F_{e}^{E} = \frac{ε (r_{e} - r_{o e})}{r_{t}} = ε {γ (R_{g e} - R_{g o}) + (1 - γ) (R_{o e} - R_{o o})} / r_{t} (11)$

If the trait is a disease, PAF^E_eis the proportion of cases that could be avoided if an environmental intervention (such as a lifestyle change or reduction in exposure) succeeds in moving everyone in the 'high environmental risk group' to the 'low environmental risk' category, as shown in Figure 1b.

The targeted population attributable fraction (PAF^E_ge) may be defined as the proportion of cases that could be avoided by targeting the same environmental intervention at the 'high genotypic + high environmental risk' subgroup only (the 'ge' subgroup), as shown in Figure 1c. Again assuming no confounders, it is given by:

$P A F_{g e}^{E} = ε γ (R_{g e} - R_{g o}) / r_{t} (12)$

Note that PAF^E_gediffers from PAF_geas defined by Khoury & Wagener [19]. The latter implicitly assumes that both environmental and genetic risk factors are reduced and thus is inappropriate for assessing the merits of a targeted environmental intervention. PAF^E_geas defined here is instead equivalent to the targeted attributable fraction (AF_T) defined by Khoury et al. [10]. To avoid confusion, the notation adopted here specifies both the nature of the intervention (environmental, denoted by superscript E) and the target subpopulation (the 'ge' subgroup, at both high genotypic and high environmental risk). Thus, the proportion of cases that would be avoided were it possible to move the 'high genotypic risk' subgroup to 'low genotypic risk' (as shown in Figure 1a) is written as PAF^G_g, given by:

$P A F_{g}^{G} = \frac{γ (r_{g} - r_{o g})}{r_{t}} = γ {ε (R_{g e} - R_{o e}) + (1 - ε) (R_{g o} - R_{o o})} / r_{t} (13)$

Although in practice it is not possible to change the genotype of the population, the parameter PAF^G_gis nevertheless useful in the calculations that follow.

Measures of utility

Khoury et al. [10] define the Population Impact (PI) as:

$P I = P A F_{g e}^{E} / P A F_{e}^{E} (14)$

PI is one possible measure of the usefulness of targeting the environmental intervention (E) at the 'ge' subgroup. It measures the proportion of cases avoided by targeting the 'high genotypic + high environmental risk' subgroup (the 'ge' subgroup), compared to the proportion avoided by applying the environmental intervention to the whole 'high environmental risk' group. PI has the property:

0 ≤ PI ≤ 1 (15)

and has its maximum value when PAF^E_ge= PAF^E_e. However, as a measure of the utility of genotyping, PI has the disadvantage that it takes no account of the proportion of the population γ in the high genotypic risk group. This means PI = 1 when γ = 1 simply because the whole population is then in the high genotypic risk group, although using genotyping to target environmental interventions is more likely to be useful if PI = 1 and γ is also small.

Therefore, consider an alternative utility parameter U_ge, defined by:

$U_{g e} = \frac{P A F_{g e}^{E}}{P A F_{e}^{E}} - γ = \frac{γ (1 - γ) [(R_{g e} - R_{g o}) - (R_{o e} - R_{o o})]}{[γ (R_{g e} - R_{g o}) + (1 - γ) (R_{o e} - R_{o o})]} (16)$

which has the property

-γ ≤ U_ge≤ (1-γ) (17)

U_getends to 1 only if PI = 1 and γ is also small. It is a measure of the utility of using genotyping to target the environmental intervention at the 'ge' subgroup, compared to randomly selecting the same proportion γ of the population to receive the intervention. U_geis positive if those at high genotypic risk have more to gain than those at low genotypic risk from the intervention ((R_ge-R_go) ≥ (R_oe-R_oo)) and negative if they have less to gain from the intervention. This reflects the fact that targeting those who have least to gain through an intervention is worse than using random selection in terms of its impact on population health.

Note that even if genotyping is better than random selection, other types of test that are more useful may be available [22]; a population-based approach still has the potential to reduce more cases of disease [9,19,23]; and such targeting also has broader psychological and social implications. Therefore a positive U_gedoes not necessarily imply that genotyping is the best means of selecting a subpopulation to target, or that a targeted approach is necessarily effective or socially acceptable. Note also that the measure U_geapplies only to interventions that are considered applicable to the whole population (such as smoking cessation) and neglects other relevant issues such as cost-effectiveness and the burden of disease [24]. In addition, it is necessary to consider the magnitude of the Population Attributable Fraction, PAF^E_ebefore proposing this approach. This is because both PI and U_gemay tend to unity even if only a small proportion of cases can be avoided by means of environmental interventions.

Limits on parameters

Consider only populations where r_g≥ r_ogand r_e≥ r_oefor all values of ε and γ. Then the risks in the four box model must be ordered such that:

1 ≥ R_ge≥ R_oe≥ R_oo≥ 0 (18)

and

R_ge≥ R_go≥ R_oo (19)

Using the known relationships (Equations (11), (13) and (16)) between PAF^E_e, PAF^G_g, U_geand the risks R_oo, R_go, R_oeand R_ge, leads to the limits on the utility parameter U_geshown in Table 2. These conditions also ensure that PAF^E_e, PAF^G_gand PAF^E_geare all positive. The two remaining inequalities (R_ge≤ 1 and R_oo≥ 0) are considered later, where they are used to derive limits on the proportion of the population in the 'high genotypic risk' group, γ. This step is not possible at this stage because PAF^E_e, PAF^G_gand PAF^E_geare themselves dependent on γ.

Table 2.

Constraints on model parameters

Condition	Limits on U_ge	Limits on γ	Limits on p^DZ_g	Limits on f_ge
R_oe≥ R_oo	U_ge≤ (1 - γ)	γ ≤ γ_{max ge}where $γ_{\max g e} = \frac{1}{1 + \frac{V_{g e}}{V_{e}}}$

R_go≥ R_oo	$U_{g e} \leq (1 - γ) \frac{P A F_{g}^{G}}{P A F_{e}^{E}}$		$p_{g}^{D Z} \leq p_{g \max}^{D Z}$	$f_{g e} \leq \frac{1}{P A F_{e}^{E}}$

R_ge≥ R_go	U_ge≥ -γ	γ ≥ γ_negwhere $γ_{n e g} = \frac{1}{1 + \frac{V_{e}}{V_{g e}}}$

R_ge≥ R_oe	$U_{g e} \geq - (1 - γ) \frac{ε P A F_{g}^{G}}{(1 - ε) P A F_{e}^{E}}$		$p_{g}^{D Z} \leq p_{g n e g}^{D Z}$	$f_{g e} \geq - \frac{ε}{(1 - ε) P A F_{e}^{E}}$

R_ge≤ 1		γ ≥ γ_{min ge}where $γ_{\min g e} = \frac{1}{1 + \frac{F_{1}^{2}}{(V_{g} / r_{t}^{2})}}$

R_oo≥ 0		γ ≤ γ_owhere $γ_{o} = \frac{1}{1 + F_{2}^{2} (V_{g} / r_{t}^{2})}$

Open in a new tab

The twin and familial risks model

Data from studies of monozygotic and dizygotic twins are commonly used to estimate the genetic and environmental variances V_gand V_eof a trait. Here, the aim is to use twin and other data to estimate the possible magnitudes of the population attributable fractions and measures of utility defined above. To do this it is necessary to estimate V_g, V_eand the variance due to gene-environment interaction, V_ge. The standard methodology for twin data analysis is inappropriate because it assumes V_ge= 0.

First note that we are interested in the extent to which relatives share risk categories (which may be either environmental or genotypic, or both), rather than a particular genetic variant. The probability that a relative of a proband is also a case depends on the extent to which their environmental and genotypic risks are correlated with those of the proband. Rather than adopting a specific form for the genetic model, define p^rel_gas the correlation in genotypic risk category (g) between relatives of type denoted by the superscript 'rel'. The parameter p^rel_gis the probability that the genotypic risk category (high or low) is identical by descent.

For monozygotic (MZ) twins, assumed to share their entire genome, p^MZ_g= 1. For dizygotic (DZ) twins and other siblings, who share half their genome, p^DZ_g= p^sib_g= 1/2 for a single allele model (dominant Mendelian disorder) or an additive polygenic model. For a two allele model (recessive Mendelian disorder) or the dominance term of a polygenic model (in which multiple pairs of alleles interact), p^DZ_g= p^sib_g= 1/4. Here, allowing for the possibility of multiple gene-gene interactions (epistasis), require only that:

$1 / 2 \geq p_{g}^{D Z} \geq 0 (20)$

The meaning of p^DZ_gand its relationship to the polygenic risk model first adopted by Ronald Fisher in 1918 is discussed further below.

Similarly, define p^rel_eas the correlation in environmental risk category (e) between relatives of type "rel", requiring only that:

$1 \geq p_{e}^{r e l} \geq 0 (21)$

Assume that p^rel_gand p^rel_eare independent (so that there is no genotype-environment correlation) and that risks within a category are randomly distributed. The relative risk for a relative of type "rel" may then be written:

$λ_{r e l} = (1 - p_{g}^{r e l}) (1 - p_{e}^{r e l}) + p_{g}^{r e l} (1 - p_{e}^{r e l}) R R_{g e n}^{c a s e s} + (1 - p_{g}^{r e l}) p_{e}^{r e l} R R_{e n v}^{c a s e s} + p_{g}^{r e l} p_{e}^{r e l} R R^{c a s e s} (22)$

Substituting for the relative risks RR^cases_gen, RR^cases_envand RR^casesusing Equations (8), (9) and (10) leads (after some algebra) to:

$λ_{r e l} - 1 = p_{g}^{r e l} \frac{V_{g}}{r_{t}^{2}} + p_{e}^{r e l} \frac{V_{e}}{r_{t}^{2}} + p_{g}^{r e l} p_{e}^{r e l} \frac{V_{g e}}{r_{t}^{2}} (23)$

where

$\frac{V_{e}}{r_{t}^{2}} = \frac{(1 - ε)}{ε} {[P A F_{e}^{E}]}^{2} (24)$

$\frac{V_{g}}{r_{t}^{2}} = \frac{(1 - γ)}{γ} {[P A F_{g}^{G}]}^{2} (25)$

$\frac{V_{g e}}{r_{t}^{2}} = \frac{(1 - ε)}{ε γ (1 - γ)} {[U_{g e} P A F_{e}^{E}]}^{2} (26)$

Note that if the G-E interaction component of the variance, V_ge, is zero, the utility of targeting the environmental intervention by genotype, U_ge, is also zero (Equation (26)), because those at high genotypic risk have no more to gain from the intervention than those at low genotypic risk (R_ge-R_go= R_oe-R_oo).

Equation (23) can also be derived more formally using matrix methods (Appendix A).

The gene-environment interaction factor and remaining inequalities

Without loss of generality, define the gene-environment interaction factor f_gesuch that:

$\frac{V_{g e}}{r_{t}^{2}} = f_{g e}^{2} \frac{V_{g}}{r_{t}^{2}} . \frac{V_{e}}{r_{t}^{2}} (27)$

and choose its sign so that (combining Equations (24), (25) and (26)):

$U_{g e} = f_{g e} \sqrt{γ (1 - γ) \frac{V_{g}}{r_{t}^{2}}} (28)$

U_geis zero if f_ge= 0 (i.e. for an additive G-E model, with no G-E interaction), but for a given γ and V_g, U_geincreases with increasing gene-environment interaction factor, f_ge. For a fixed f_geand genetic variance component V_g, U_geis maximum when γ = 1/2, i.e. when half the population is in the high genotypic risk group, provided solutions with γ = 1/2 exist (see also below: cases where γ_maxge < 1/2).

Using the definitions of V_e, V_gand V_ge(Equations (24), (25) and (26)) and the remaining inequalities, R_ge≤ 1 and R_oo≥ 0, two limits can be derived on the proportion of the population in the 'high genotypic risk' group, γ (see Table 2).

Scoping studies

The general system of equations represented by Equation (23) may be simplified where data exist from monozygotic twins, dizygotic twins and other siblings, such that λ_DZ> λ_sib. This implies that environmental risks are more strongly correlated in dizygotic twins than in other siblings, p^e_DZ> p^e_sib. Remembering that p^MZ_g= 1 and p^sib_g= p^DZ_g, three independent equations for the relative risk in monozygotic, dizygotic twins and siblings may then be written:

$λ_{M Z} - 1 = \frac{V_{g}}{r_{t}^{2}} + p_{e}^{M Z} \frac{V_{e}}{r_{t}^{2}} + p_{e}^{M Z} \frac{V_{g e}}{r_{t}^{2}} (29)$

$λ_{D Z} - 1 = p_{g}^{D Z} \frac{V_{g}}{r_{t}^{2}} + p_{e}^{D Z} \frac{V_{e}}{r_{t}^{2}} + p_{g}^{D Z} p_{e}^{D Z} \frac{V_{g e}}{r_{t}^{2}} (30)$

$λ_{s i b} - 1 = p_{g}^{D Z} \frac{V_{g}}{r_{t}^{2}} + p_{e}^{s i b} \frac{V_{e}}{r_{t}^{2}} + p_{g}^{D Z} p_{e}^{s i b} \frac{V_{g e}}{r_{t}^{2}} (31)$

To solve, assume the recurrence risks λ are known (see Appendix B and [25]) and define:

$R_{M D} = \frac{λ_{M Z} - 1}{λ_{D Z} - 1} (32)$

$R_{S D} = \frac{λ_{s i b} - 1}{λ_{D Z} - 1} (33)$

with

R_MD≥ 1 (34)

and

0 ≤ R_SD≤ 1. (35)

Note that if R_SD= 1, Equations (30) and (31) are identical, p^e_DZ= p^e_sib, and more relatives are needed to obtain solutions, except in the special case where there is no environmental variance (see below: no environmental variance).

In addition, define the variable parameters (assumed unknown):

$c_{M D} = \frac{p_{e}^{M Z}}{p_{e}^{D Z}} (36)$

$c_{S D} = \frac{p_{e}^{s i b}}{p_{e}^{D Z}} (37)$

with

c_MD≥ 1 (38)

and

0 ≤ c_SD≤ 1. (39)

For λ_DZ> 1 and R_SD< 1 the simultaneous Equations (29), (30) and (31) can then be solved to give:

$\frac{V_{g}}{r_{t}^{2}} = \frac{(λ_{D Z} - 1)}{p_{g}^{D Z}} . \frac{(R_{S D} - c_{S D})}{(1 - c_{S D})} (40)$

$\frac{V_{e}}{r_{t}^{2}} = \frac{(λ_{D Z} - 1)}{p_{e}^{D Z} c_{M D} (1 - p_{g}^{D Z})} [\frac{(c_{M D} - 1) (1 - R_{S D})}{(1 - c_{S D})} + (1 - p_{g}^{D Z} R_{M D})] (41)$

$\frac{V_{g e}}{r_{t}^{2}} = \frac{(λ_{D Z} - 1)}{p_{e}^{D Z} p_{g}^{D Z} c_{M D} (1 - p_{g}^{D Z})} [\frac{(1 - c_{M D} p_{g}^{D Z}) (1 - R_{S D})}{(1 - c_{S D})} + (1 - p_{g}^{D Z} R_{M D})] (42)$

provided $p_{g}^{D Z}$ ≠ 0, $p_{e}^{D Z}$ ≠ 0 and c_SD≠ 1 (see also below).

For situations in which a targeted intervention is under consideration, the population attributable fraction PAF^E_eand exposure ε are likely to be known, allowing V_eto be treated as an input variable. However, p^DZ_eis usually unknown, since environmental correlations are often difficult to measure. Therefore, it is useful to eliminate p^DZ_efrom Equations (41) and (42), leading to:

$\frac{V_{g e}}{V_{e}} = \frac{{\frac{p_{g}^{D Z}}{p_{g \min}^{D Z}} - 1} \frac{(R_{S D} - c_{S D})}{(1 - c_{S D})}}{p_{g}^{D Z} R_{M D} (p_{g t o p}^{D Z} - p_{g \min}^{D Z})} (43)$

where

$p_{g t o p}^{D Z} = \frac{1}{R_{M D}} {1 + \frac{(c_{M D} - 1) (1 - R_{S D})}{(1 - c_{S D})}} (44)$

and

$p_{g \min}^{D Z} = \frac{(R_{S D} - c_{S D})}{{R_{M D} (1 - c_{S D}) - c_{M D} (1 - R_{S D})}} (45) .$

Equations (27), (40) and (43) allow the gene-environment interaction factor f_geto be written as:

$f_{g e}^{2} = \frac{{\frac{p_{g}^{D Z}}{p_{g \min}^{D Z}} - 1}}{(λ_{D Z} - 1) R_{M D} (p_{g t o p}^{D Z} - p_{g}^{D Z})} (46) .$

The parameter p^DZ_g, which defines the form of the genetic model, is then given by:

$\frac{p_{g}^{D Z}}{p_{g \min}^{D Z}} = \frac{1 + f_{g e}^{2} (λ_{D Z} - 1) R_{M D} p_{g t o p}^{D Z}}{1 + f_{g e}^{2} (λ_{D Z} - 1) R_{M D} p_{g \min}^{D Z}} (47) .$

For known R_MD, R_SDand λ_DZa solution space can now be mapped, which includes all possible variances consistent with the data and with the inequalities derived above.

Requiring the variances to be positive leads to the additional conditions on p^DZ_gand c_SDshown in Table 3.

Table 3.

Further constraints on model parameters

Condition	Limits on p^DZ_g	Limits on c_SD
V_e≥ 0	$p_{g}^{D Z} \leq p_{g t o p}^{D Z}$

V_ge≥ 0	$p_{g}^{D Z} \geq p_{g \min}^{D Z}$

V_g≥ 0		C_SD≤ R_SD

γ_max≥ γ_min		If λ_MD> y_e+ 1 require: c_SD≥ c_SDmwhere $c_{S D m} = 1 - \frac{(λ_{D Z} - 1) (1 - R_{S D}) [c_{M D} + f_{g e}^{2} (λ_{D Z} - 1) R_{M D} + y_{e} f_{g e}^{2} (c_{M D} - 1)]}{[1 + f_{g e}^{2} (λ_{D Z} - 1)] [(λ_{D Z} - 1) R_{M D} - y_{e}]}$

Open in a new tab

The limits on U_geshown in Table 2 set limits on the range of gene-environment interaction models such that:

$- \frac{ε}{(1 - ε) P A F_{e}^{E}} \leq f_{g e} \leq \frac{1}{P A F_{e}^{E}} (48)$

Noting that f_ge= 0 corresponds to p^DZ_g= p^DZ_gmin(Equation (64)), this implies that, for U_ge≥ 0, the solution space may be defined by:

$p_{g \min}^{D Z} \leq p_{g}^{D Z} \leq p_{g \max}^{D Z} (49)$

where p^DZ_gmaxis given by Equation (47) with f_ge= 1/PAF^E_e.

For U_ge≤ 0, the solution space may be defined by:

$p_{g \min}^{D Z} \leq p_{g}^{D Z} \leq p_{g n e g}^{D Z} (50)$

where p^DZ_gnegis given by Equation (47) with f_ge= -ε/(1-ε)PAF^E_e.

The remaining limits on U_gelead to the additional conditions on the range of γ values (the proportion of the population in the high risk group) shown in Table 2. These conditions on γ may be written:

γ_min≤ γ ≤ γ_max (51)

where (noting that γ_maxge= γ_owhen f_ge= 1):

$γ_{\max} = {\begin{matrix} γ_{\max g e} for f_{g e} \geq 1 \\ γ_{0} for f_{g e} \leq 1 \end{matrix} (52)$

and (noting that γ_minge= γ_negwhen f_ge= -r_t/(1-r_t)):

$γ_{\min} = {\begin{matrix} γ_{\min g e} for f_{g e} \geq - r_{t} / (1 - r_{t}) \\ γ_{n e g} for f_{g e} \leq - r_{t} / (1 - r_{t}) \end{matrix} (53)$

Two transition lines can therefore be defined such that p^DZ_g= p^DZ_gtwhen f_ge= 1 and p^DZ_g= p^DZ_gnegtwhen f_ge= -r_t/(1-r_t). The values of p^DZ_gtand p^DZ_gnegtmay be calculated using Equation (47).

The full range of gene-environment interaction models specified by f_ge(within the limits given by Equation (48)) and the corresponding range of γ values are summarized in Table 4. Note that the risk distribution associated with f_ge= 1 corresponds to a multiplicative model of gene-environment interaction. If f_ge≥ 1 solutions with population impact PI = 1 may exist (i.e. with PAF^E_ge= PAF^E_e), provided the proportion of the population in the high risk genotypic group takes the maximum value consistent with the data (γ = γ_maxge). For lower values of f_ge, solutions with PI = 1 cannot exist.

Table 4.

Limits on the gene-environment interaction factor (f_ge) and the proportion of the population in the high-genotypic risk group (γ).

Gene-environment interaction model	Interaction factor f_ge	Risk distribution		Utility U_ge	Fraction of population at high genotypic risk

					Maximum γ_max	Minimum γ_min
Genetic effect in high-exposure group only	1/PAF^E_e	R₀₀	R_ge	Positive	γ_maxge(where PAF^E_ge= PAF^E_e; PI = 1; and U_ge= 1-γ).	γ_minge(where R_ge= 1).
		R₀₀	R_0e
Multiplicative	1	R_g0	R_g0R_0e/R₀₀		γ_maxge= γ₀(where PAF^E_ge= PAF^E_e; R₀₀= 0; and PAF^G_g= 1).
		R₀₀	R_0e
Additive	0	R_g0	R_g0+R_0e-R₀₀	Zero	γ₀(where R₀₀= 0).
		R₀₀	R_0e
Reverse multiplicative	-r_t/(1-r_t)	R_g0	(1-R_g0) (1-R_0e)/(1-R₀₀)	Negative		γ_neg= γ_minge(where PAF^E_ge= 0 and R_ge= 1)
		R₀₀	R_0e
Genetic effect in low-exposure group only	-ε/(1-ε)PAF^E_e	R_g0	R_0e			γ_neg(where PAF^E_ge= 0 and PI = 0).
		R₀₀	R_0e

Open in a new tab

One additional condition is necessary for solutions to exist, namely:

γ_max≥ γ_min (54)

This condition is always met if

λ_MD≤ y_e+ 1 (55)

where

$y_{e} = {\begin{matrix} F_{1} / f_{g e} for f_{g e} \geq 1 \\ F_{1} / F_{2} for 1 \geq f_{g e} \geq - r_{t} / (1 - r_{t}) \\ - F_{2} / f_{g e} for f_{g e} \leq - r_{t} / (1 - r_{t}) \end{matrix} (56)$

and F₁and F₂are given by:

$F_{1} = \frac{[(\frac{1 - r_{t}}{r_{t}}) - (\frac{1 - ε}{ε}) P A F_{e}^{E}]}{[1 + f_{g e} (\frac{1 - ε}{ε}) P A F_{e}^{E}]} = \frac{(1 - r_{e})}{[r_{t} + f_{g e} (r_{e} - r_{t})]} (57)$

$F_{2} = \frac{(1 - P A F_{e}^{E})}{(1 - f_{g e} P A F_{e}^{E})} (58) .$

However, if λ_MDis greater than this, the requirement γ_max≥ γ_minfurther restricts the values of c_SDthat lie within the solution space (Table 3).

If V_eand ε are known, a solution space can be now be mapped for p^DZ_gand f_gewith known input data from twin and sibling studies (λ_MZ, λ_DZand λ_sib), for a given c_MDand all values of c_SDwithin the assumed range. The boundaries of the solution space are determined by the limits on f_gegiven by Equation (48), the condition γ_max≥ γ_min(Equation (54)), and the requirement that p^DZ_gis less than or equal to 1/2 (Equation (20)) – no other condition on the genetic model is specified a priori. For each genetic risk model and gene-environment interaction model in the solution space, defined by p^DZ_gand f_gerespectively, the variances V_gand V_gecan then be calculated, as can γ_maxand γ_min. For a chosen γ value in the allowed range, U_gecan then be calculated from Equation (28).

The model code is available as [Additional file 1: heritability12.xls].

Note that the condition on p^DZ_g≤ 1/2 may also be rewritten using Equation (47), so that:

$p_{g}^{D Z} \leq 1 / 2 \Rightarrow \frac{(p_{g \min}^{D Z} - \frac{1}{2})}{p_{g \min}^{D Z}} \leq R_{M D} f_{e}^{2} (λ_{D Z} - 1) (1 / 2 - p_{g t o p}^{D Z}) (59)$

which is always met if

$p_{g t o p}^{D Z} \leq 1 / 2 (60) .$

Before mapping the solution space, first consider some special cases and a comparison of the model with the classical twin studies approach.

Special cases

1. No genetic variance

If V_g= 0, Equation (27) implies that V_ge= 0 also. Equations (29), (30) and (31) then give:

R_SD= c_SD (61)

and

R_MD= c_MD (62)

Under the usual assumption that c_MD= 1 (the 'equal environments' assumption), this is the well-known result that genetic variance can be zero only when the concordance in monozygotic and dizygotic twins is the same (leading to R_MD= 1). However, if the equal environments assumption is not met (c_MD> 1), values of R_MDgreater than 1 do not necessarily imply that a genetic component to the variance exists (see, for example, [18]).

2. No environmental variance

If V_e= 0, Equation (27) implies that V_ge= 0 also. Equations (29), (30) and (31) then give:

R_SD= 1 (63)

and

$R_{M D} = 1 / p_{g}^{D Z} (64)$

For a purely genetic model with no environmental variance, Equation (64) implies that if R_MD> 2, p^DZ_g< 1/2. This is consistent with Risch's finding [16] that neither an additive genetic model nor a single dominant gene model (both with p^DZ_g= 1/2) can fit the data for conditions such as schizophrenia (which has an R_MDvalue significantly greater than 2).

3. Classical twin study assumptions

Assuming no gene-environment interaction (V_ge= 0); an additive genetic risk model (p^DZ_g= 1/2); and the 'equal environments' assumption (c_MD= 1) in Equations (29), (30) and (31) gives:

$\frac{V_{g}}{r_{t}^{2}} = 2 (λ_{M Z} - λ_{D Z}) (65)$

This is the classical twin study result, assuming the dominance term of the genetic variance is negligible. Note that, if R_MD= 2, the classical solution implies that the environmental variance terms in Equations (29) to (31) are zero and shared sibling risk is due to entirely to shared genes.

4. No correlation in genotypic risk in siblings (p^DZ_g= 0)

Equation (20) allows p^DZ_gto tend to zero. Substituting p^DZ_g= 0 in Equations (29), (30) and (31) and using the definition of the gene-environment interaction factor (Equation (28)) gives:

R_SD= c_SD (66)

and

$\frac{V_{g}}{r_{t}^{2}} = \frac{(λ_{D Z} - 1) (R_{M D} - c_{M D})}{[1 + f_{g e}^{2} c_{M D} (λ_{D Z} - 1)]} (67)$

Note that, from Equations (30) and (31), p^DZ_g= 0 corresponds to a purely environmental explanation for shared sibling risks (although there may remain a genetic component to shared risks in monozygotic twins, from Equation (29)). The solution p^DZ_g= 0 may not exist in reality; however, the solution at this limit is of interest because low values of p^DZ_gare plausible.

Also, note that if f_ge= 0 (no gene-environment interaction) and c_MD= 1 (the 'equal environments' assumption), the genetic variance V_ggiven by Equation (67) is half the classical twin study result (Equation (65)).

5. Cases where γ_max= γ_min

If the line γ_max= γ_minexists within the solution space, some special cases may arise with risk distributions of particular interest (including, for example, a solution with R_ge= 1 and all other risks zero). These special cases and the conditions that they meet are shown in Table 5.

Table 5.

Special cases with γ_max= γ_minfor U_ge≥ 0

Special cases with γ_max= γ_min				Special cases with γ_max= γ_minand specific G-E interaction models				Special cases with γ_max= γ_minand all risks all 0 or 1
Risk distribution		Conditions	Population impact and Utility	Risk distribution		Conditions	Population impact and Utility	Risk distribution		Conditions	Population impact and Utility

								1	1	r_t= 1 PAF_e= 0	Undefined (PAF_ge= 0)
				R₀₀	1	γ_minge= γ_maxge(R_ge= 1 and PAF_ge= PAF_e) f_ge= 1/PAF_e	PI = 1 U_ge= 1-γ	1	1
R_g0	1	γ_minge= γ_maxge(R_ge= 1 and PAF_ge= PAF_e) f_ge≥ 1	PI = 1 U_ge= 1-γ	R₀₀	R₀₀			0	1	r_t= γε PAF_e= 1	PI = 1 U_ge= 1-γ
R₀₀	R₀₀			R_g0	1	γ_minge= γ₀= γ_maxge(R_ge= 1; R₀₀= 0; PAF_ge= PAF_e) f_ge= 1	PI = 1 U_ge= 1-γ	0	0
R_g0	1	γ_minge= γ₀(R_ge= 1; R₀₀= 0) 0 ≤ f_ge≤ 1	0 = PI = 1 U_ge= PI-γ	0	0			1	1	r_t= γ PAF_e= 0	Undefined (PAF_ge= 0)
0	R_0e			1-R_0e	1	γ_minge= γ₀(R_ge= 1; R₀₀= 0) f_ge= 0	PI = γ U_ge= 0	0	0
				0	R_0e			0	1	r_t= ε PAF_e= 1	PI = γ U_ge= 0
								0	1

Open in a new tab

6. Cases where γ_maxge< 1/2

Equation (27) shows that for a fixed gene-environment interaction factor f_geand genetic variance component V_g, the utility U_geis maximum when γ = 1/2, i.e. when half the population is in the high genotypic risk group, provided this solution exists. However, if γ_max< 1/2, utility is maximum when γ = γ_max. As a smaller proportion of the population is then targeted, these solutions are of particular interest. Because solutions with population impact PI = 1 may exist when 1 ≤ f_ge≤ 1/PAF^E_eif γ = γ_maxge(Table 4), it is of interest to identify the area of the solution space with γ_maxge< 1/2. Maximum utility is then obtained when γ = γ_maxge(where PI = 1 and U_ge= 1-γ_maxge). For the condition

$γ_{\max g e} < 1 / 2 \Rightarrow p_{g}^{D Z} > p_{g x}^{D Z} (68)$

where p^DZ_gxis given by:

$R_{M D} (1 - c_{S D}) {(p_{g x}^{D Z})}^{2} + [(1 - c_{S D}) (R_{M D} - 1) - (2 c_{M D} - 1) (1 - R_{S D})] p_{g x}^{D Z} - (R_{S D} - c_{S D}) = 0 (69)$

solving for p^DZ_gxallows the region of the solution space where γ_maxge< 1/2 to be defined.

7. Cases where the 'equal environments' assumption holds (c_MD= 1)

In the special case where the 'equal environments' assumption holds (c_MD= 1, and hence p^DZ_gtop= 1/R_MD), Equation (63) simplifies to give R_MD≥ 2. Equation (62) also simplifies to give:

$p_{g}^{D Z} \leq 1 / 2 \Rightarrow c_{S D} \geq c_{1} (70)$

where

$c_{1} = 1 - \frac{(1 - R_{S D}) [1 + f_{g e}^{2} (λ_{D Z} - 1) (2 - R_{M D})]}{(2 - R_{M D}) [1 + f_{g e}^{2} (λ_{D Z} - 1)]} (71)$

Meeting the condition p^DZ_g≤ 1/2 at c_SD= 0 then requires:

$R_{M D} \geq 2 - \frac{(1 - R_{S D})}{[1 + f_{g e}^{2} (λ_{D Z} - 1) R_{S D}]} (72) .$

It follows that if c_MD= 1, solutions with p^DZ_g= 1/2 (an additive genetic model) and positive utility exist only when the following condition holds for R_MD:

$R_{M D} \leq 2 - \frac{(1 - R_{S D})}{[1 + (λ_{D Z} - 1) R_{S D} / {(P A F_{e}^{E})}^{2}]} (73) .$

Further, all three classical twin study assumptions (c_MD= 1, p^DZ_g= 1/2 and f_ge= 0) can be met only for values of R_MDthat are low enough to satisfy:

1 + R_SD≥ R_MD> 1 (74).

If R_MDlies within this range, the classical twin study gives one possible solution; however, other solutions also exist. All alternative solutions favour a less 'genetic' and more 'environmental' explanation for shared sibling risks (i.e. they have higher values of c_SD). If R_MDis greater than 1+R_SD, all three assumptions of the classical twin study cannot be met simultaneously.

Comparison with the classical twins approach

Table 6 summarizes the differences between the classical twin studies approach and the method adopted here.

Table 6.

Comparison with classical twin study

	Classical twin study	Twins + siblings model
Genetic model	Additive and dominance terms only: V^DZ_g= 1/2V_A+1/4V_D	Variable: V^DZ_g= p^DZ_gV_gwith 0 < = p^DZ_g< = 1/2

Shared twin environments	Equal environments assumption: c_MD= 1	Variable: 1 < = c_MD< = R_MDc_MD= R_MDimplies V_g= 0

Shared sibling environments	Siblings not included.	Variable: 0 < = c_SD< = R_SDFamilial aggregation may be due to genes (c_SD= 0) or environment (c_SD= R_SD).

Gene-environment interactions	None	Variable: V_ge= f²_ge· V_g· V_e/r²_t-ε/(1-ε)PAF_e< = f_ge< = 1/PAF_e

Gene-environment correlations	None	None

Method	Total phenoptypic variance given by: V_P= V_g+V_eV_Pis input and a single solution for V_eand V_gcalculated. Heritabilities are given by: H²= V_g/V_Ph²= V_A/V_P	V_eand ε are input and V_gand V_gecalculated, for a chosen c_MDand all possible values of f_geand p^DZ_g. Method is not valid if R_SD= 1.

Open in a new tab

A central feature of the model is that it abandons Fisher's assumption [26] that genes act as risk factors for common traits in a manner necessarily dominated by an additive polygenic term. In his historic 1918 paper, Fisher synthesized Mendelian inheritance with Darwin's theory of evolution by showing that the genetic variance of a continuous trait could be decomposed into additive and non-additive components [26,27]. Following Fisher, the classical twin study analysis depends on writing the genetic component of a trait as a convergent series of terms, consisting of an additive term (the sum of contributions of individual alleles at each locus) plus a smaller dominance term (the sum of contributions from pairs of alleles at each locus) and – usually neglected – epistatic terms (involving potentially multiple interactions between alleles at multiple loci) [15]. Often the additive term is assumed to dominate the series (equivalent to assuming p^DZ_g= 1/2).

Fisher saw his polygenic model as "abandon [ing] the strictly Mendelian mode of inheritance, and treat [ing] Galton's 'particulate inheritance' in almost its full generality" [26]. However, it can be argued that Fisher's model is flawed in so far as it fails to distinguish between the function of alleles and the properties of traits [4,28]. In particular, epistasis (although referred to here as 'gene-gene interaction') is not strictly an interaction between genes, but can be shown to depend on the structure and interdependence of metabolic pathways [28].

The alternative model adopted here is based on correlations in risk categories for a trait (which may be either environmental or genetic, or both), rather than single or multiple genetic variants. Adopting Porteous' critique [28], there is no a priori biological reason why the parameter p^DZ_g(the probability that the genotypic risk category of a dizygotic twin pair is identical by descent) cannot take any value between 1/2 (its value if the additive model holds) and zero. Low p^DZ_gcan then be understood to mean either a situation in which Fisher's polygenic model [26] is dominated by negative (synergistic) epistatic terms (for example, p^DZ_g= 1/2ⁿimplies that interactions between n deleterious alleles are necessary to produce a phenotypic effect), or, more meaningfully, a situation in which human phenotypes are biologically robust to individual genetic variants [29]. Thus, in the extreme case where numerous genetic variants combine to influence a trait through the interdependence of metabolic pathways, the trait may be highly correlated in monozygotic twins (who share all the genetic variants) but not correlated at all (p^DZ_g= 0) in dizygotic twins or siblings (who share only half the relevant variants by descent). Although p^DZ_g= 0 may not be realistic, low values of p^DZ_gare plausible, and may even be typical of complex diseases.

The classical twin study assumptions (see above) allow a single solution to be calculated from the under-determined system of simultaneous Equations (29), (30) and (31). However, in the absence of prior knowledge about the form of the genetic model, the presence or absence of gene-environment interactions, and the validity of the 'equal environments' assumption, the approach adopted here is more rigorous.

Results

General model solutions

First consider the behaviour of the model when the 'equal environments' assumption holds and hence c_MD= 1 (as described above).

Figures 2, 3 and 4 show the possible solution spaces for an arbitrary set of plausible input parameters satisfying the requirement R_MD> 1+R_SDnecessary for the classical twin study solution to exist. In Figure 2 the gene-environment interaction factor f_geand hence utility, U_ge, are both positive and in Figure 3 they are negative. The horizontal axis shows c_SD/R_SD, which is zero if shared sibling risk is due to shared genetic factors only and 1 if shared sibling risk is due to shared environmental factors only. The vertical axis shows p^DZ_g, which is 1/2 if the additive genetic model holds, but may reduce to zero if epistasis dominates and the phenotype is robust to genetic variation. The three curved solid lines represent three models of gene-environment (G-E) interaction: an additive G-E model (i.e. no gene-environment interaction, f_ge= 0); a multiplicative G-E model (f_ge= 1); and maximum G-E interaction (f_ge= 1/PAF^E_e). The possible solution spaces are shaded grey. Each point in each shaded solution space corresponds to a given genetic model (defined by p^DZ_g) and a given G-E interaction model (defined by f_ge). Figure 4 plots the entire solution space (including both negative and positive utility) by transforming the horizontal axis to represent the G-E interaction parameter, f_ge. Although the classical twin model can fit the data, an infinite number of other solutions corresponding to different genetic and gene-environment interaction models also exist. In this example, the line γ_max= γ_minlies outside the solution space and no solutions exist with γ_maxge< 1/2.

**Example model solution space with R_MD< 1+R_SDand U_ge≥ 0**. Input parameters: λ_MZ= 3.4, λ_DZ= 3, λ_sib= 2, ε = 0.2, PAF^E_e= 0.5, c_MD= 1, r_t= 0.1. Hence R_MD= 1.2, R_SD= 0.5.

**Example model solution space with R_MD< 1+R_SDand U_ge≤ 0**. Input parameters as for Figure 2.

**Example full model solution space with R_MD< 1+R_SD**. Input parameters as for Figure 2, with the solution space transformed so that f_geis on the horizontal axis.

For lower values of R_MD, the curves defining the solution space are shifted downwards [see Additional files 2 to 9], so that the line f_ge= 0 (corresponding to no gene-environment interaction) lies entirely below the line p^DZ_g= 1/2 (corresponding to an additive genetic model). The classical twin study solution does not exist, but many other combinations of genetic and gene-environment interaction models may fit the data.

When c_MD> 1, lines of constant f_geno longer decrease monotonically to zero, and are also shifted upwards, so that solutions with strong G-E interactions are no longer possible [see Additional files 10 to 12].

Example applications using twin, sibling and environmental data

Input values

Consider example applications of the model for male lung cancer, female breast cancer and schizophrenia. The model input variables used are shown in Table 7.

Table 7.

Input variables

Condition	λ_MZ	λ_DZ	λ_sib	ε	PAF^E_e	r_t
Breast cancer	4.09	2.51	2.01	0.62	0.15	0.036

Lung cancer	6.27	6.14	3.16	0.15	0.86	0.017

Schizophrenia	52.1	14.2	8.6	0.62	0.15	0.01

				0.15	0.86

Open in a new tab

The recurrence risks, λ, and total risks, r_t, for breast and lung cancer are those calculated by Risch [30], based on Scandinavian twin data reported by Lichtenstein et al. [31] (involving more than 44,000 twin pairs) and Swedish familial data reported by Doug and Hemminki [32] (involving more than 2 million families). The proportion of the population exposed, ε, and population attributable fraction, PAF^E_e, for breast cancer are taken from those reported by Rockhill et al. [33] for a US population. Although strictly speaking these values may not be appropriate for a Scandinavian population, and include a component due to family history that may be (at least partly) genetic, they give a low V_e, consistent with the known environmental risk factors for breast cancer, and results are not sensitive to these input values (because V_eis so small). For lung cancer, it is assumed that 15% of the Scandinavian population smokes and that 86% of lung cancer cases could be avoided if they did not (giving a risk of lung cancer in smokers of 10%).

The recurrence risks λ, and total risk, r_t, for schizophrenia are those used by Risch [16], based on European data summarized by McGue et al. [34]. More recent twin studies for schizophrenia have given variable results and this example should be treated as illustrative only. Further, environmental exposures and population attributable fractions are unknown for schizophrenia. Two exploratory sets of results are therefore reported, using data consistent with a low environmental variance (based on the values used for breast cancer), and high environmental variance (based on the values used for smoking and lung cancer).

Detailed results for the three diseases are shown in [Additional file 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]. The key findings are outlined below.

Breast cancer results

For breast cancer, the PAF^E_eassociated with known environmental factors is low. The value of the model is therefore less in calculating the utility of targeted environmental interventions than in exploring the solution space for a complex disease with R_MDclose to 2.

Although strictly speaking the classical twin study solution (with an additive genetic model, p^DZ_g= 1/2, and an additive G-E model, f_ge= 0) does not exist as a solution, it might lie within the margin of error of the data. However, an infinite number of other models also could also fit the data. The classical twin model result always overestimates the genetic component of the variance, which reduces as the gene-environment interaction factor f_geincreases, and also as p^DZ_gdecreases (i.e. as epistatic terms begin to dominate the genetic model). These alternative models imply that shared environmental factors may partially explain familial aggregation of breast cancer. This contrasts with the classical twin method result (see earlier), which for R_MD= 2 leads to the inevitable conclusion that shared sibling risk must be due solely to shared genes [35].

In theory, a model with p^DZ_g= 0 (where shared sibling risk is due entirely to shared environmental factors) could fit the data. However, for breast cancer the existence of known mutations that significantly increase risk (particularly mutations in the BRCA1 and BRCA2 genes, which are relatively common) rules out this solution. Although it is not possible to subtract out the effect of these mutations from the model, it is possible to show that they could be sufficient to explain the twin data if a G-E interaction also exists. For example, one possible solution consistent with the data could involve one or more dominant genes (p^DZ_g= 1/2), a strong G-E interaction (f_ge= 1/PAF^E_e), but a largely environmental explanation for shared sibling risk (say c_SD/R_SD= 0.9). This solution implies that the genetic component of the variance is less than a fifth of the classical twin study result, which could be low enough to be explained by mutations in the BRCA1 and BRCA2 genes alone [35]. If this model were correct it would have important implications for women with such mutations, but would not contribute significantly to reducing the incidence of breast cancer in the population as a whole, because the affected proportion of the population γ would be rather small. Other solutions, involving different genetic models with lower p^DZ_g, and/or less gene-environment interaction, are also possible.

The line γ_max= γ_mindoes not occur within the solution space for breast cancer; however, in some circumstances the lines γ_maxand γ_minmay be rather close together. This suggests that, although as expected there is always a trade-off between selecting a small proportion (γ_min) of the population with a high Positive Predictive Value (PPV), or a larger proportion of the population (γ_max) with a higher Population Impact (PI) [19], some possible solutions could exist for breast cancer where the PPV and PI are both relatively high. Further, γ_maxis often less than 1/2, so that, in these regions of the possible solution space, maximum utility might be obtained by targeting less than 50% of the population. However, known environmental factors for breast cancer are often not amenable to intervention and other possible solutions, with low, zero or negative utility, also exist.

Lung cancer results

For lung cancer, all the possible solutions imply that shared sibling risk is largely due to shared environmental factors (smoking) because solutions occur only when c_SD/R_SDis close to 1. Unlike for breast cancer, the line γ_max= γ_minlies outside the solution space, even for negative f_ge, as does the area of solutions with γ_maxge< 1/2. However, the classical twin study solution, with f_ge= 0 and p^DZ_g= 1/2, clearly lies within the solution space.

Although the classical twin model again provides an upper limit to the genetic component of the variance, even the classical result indicates that the risk of lung cancer is dominated by smoking in this population and the variance has at most a small genetic component.

Unlike the breast cancer example, γ_maxand γ_minare always far apart, suggesting a strong trade off between high Postive Predictive Value (R_ge) for a genotypic test and a high Population Impact (PI) for a targeted intervention. This means that a genotypic test that predicts which smokers will get lung cancer cannot exist. To predict all cases of lung cancer in smokers (i.e. to obtain PI = 1), 95% or more of the population would have to be in the high genotypic risk group, and the predictive value of such a test would be very low.

Because the genetic component of the variance is so small, it follows that the utility of genetic 'prediction and prevention' (measured by U_ge) is also small (from Equation (28)). Utility is maximum when γ = 1/2, but even then values are low. The maximum utility of genotyping occurs when about 60% of cases could be prevented by targeting the 50% of smokers at high genotypic risk. However, other possible solutions have zero or negative utility.

Schizophrenia results

For schizophrenia, the classical twin study solution (with f_ge= 0 and p^DZ_g= 1/2 and c_MD= 1) cannot not fit the data. If the 'equal environments' assumption holds, neither a single dominant gene (p^DZ_g= 1/2), nor additive polygenic model (also with p^DZ_g= 1/2), nor single recessive gene (p^DZ_g= 1/4) can explain the twin and family data, consistent with Risch's 1990 findings [16]. This may suggest that the genetic model for schizophrenia is likely to be dominated by epistatic terms. However, if gene-environment interactions are important, it is also possible that a recessive gene, combined with at least multiplicative G-E interaction (p^DZ_g= 1/4 and f_ge= 1 or higher), could explain the data.

The possible solution spaces include purely genetic explanations for shared sibling risk (at c_SD/R_SD= 0), or purely environmental ones (at c_SD/R_SD= 1, applicable if p^DZ_g= 0).

Assuming a small environmental component to the variance, there is no region of the solution space for which γ_maxge< 1/2, suggesting that the utility of targeted environmental interventions under these assumptions is likely to be low. However, if the environmental component of the variance is assumed to be much larger, the available solution space changes dramatically, because the line γ_max= γ_minnow constrains the solution space to a much smaller area, which excludes solutions with no G-E interaction (f_ge= 0). Special solutions may exist along the line γ_max= γ_min, as shown in Table 5. Because the environmental factors contributing to schizophrenia are unknown, it is impossible to draw any conclusions about the potential benefits of targeting environmental interventions at those at high genotypic risk.

Because prenatal development is thought to be important in schizophrenia, it is plausible that monozygotic twins are more likely to share environmental risk factors than dizygotic twins are. Breaking the 'equal environments' assumption changes the shape of the solution space significantly, and, assuming a small environmental component to the variance, only limited G-E interactions are now possible (the multiplicative G-E model, f_ge= 1, lies largely outside the solution space). The utility of targeting environmental interventions by genotype is then likely to be low. However, in these circumstances it is possible that an additive genetic model (p^DZ_g= 1/2) with some G-E interaction, or a recessive gene (p^DZ_g= 1/4) with no G-E interaction, could explain the data.

Discussion

If Fisher's polygenic model [26] is abandoned, along with the usual twin study assumption that there are no gene-environment interactions, the four-category model developed by Khoury and others can be combined with twin, family and environmental data to implement a 'top down' approach to assessing the utility of targeting environmental/lifestyle interventions by genotype. Scoping studies, valid when R_SD≠ 1, provide a first step to modelling the health of populations [23].

Abandoning Fisher's assumption that the polygenic model is necessarily dominated by an additive term can be justified by the growing evidence that phenotypic effects can result from the synergistic action of alleles in many genes [36]. For example, Bardet-Biedl Syndrome, historically assumed to be a recessive trait, has been shown to involve three interacting mutations at two loci in some patients (implying that p^DZ_g= 1/8) and, more recently, an additional locus has been identified that can also interact to change disease severity and symptoms [37]. Both positive and negative gene-environment interactions have also been observed in human diseases, although there are difficulties in confirming their statistical validity [38,39].

The model also allows the impact of the much criticised 'equal environments' assumption to be explored.

A number of conclusions can be drawn about the merits of the classical twin study and the utility of genetic 'prediction and prevention'.

Firstly, the model confirms that the classical twin study solution is not always valid and gives at best an upper limit to the genetic component of the variance of a trait. The importance of the 'equal environments' assumption and of gene-environment interactions have previously been recognised [17,18]; however, less attention has been paid to the potential role of gene-gene interactions (epistasis). For larger values of R_MD(greater than 1+R_SD), observed for conditions such as schizophrenia, the model generalizes Risch's findings [16] to show that the three assumptions of the classical twin model cannot all be satisfied simultaneously. For intermediate R_MDvalues, observed for conditions such as breast cancer (for which R_MDis approximately 2), the model illustrates that the conclusion drawn from classical twin studies, that familial aggregation is due entirely to shared genetic factors, may be erroneous. This raises the possibility – previously rejected on the basis of twin study results [35] – that genetic variants are important in determining risk only for the relatively rare familial forms of cancer. If so, genetic models of familial aggregation (for example [40]) may be incorrect and the hunt for additional susceptibility genes could be largely fruitless. Existing published findings might then reflect prevailing bias, rather than true associations [14].

Secondly, the model confirms that the potential for reducing the incidence of common diseases using environmental/lifestyle interventions targeted by genotype may be limited [7] by:

(i) the low importance of genetic differences in determining the risk of some conditions (for example, lung cancer);

(ii) the complexity of gene-gene and gene-environment interactions and/or lack of knowledge of environmental factors (for example, schizophrenia).

Targeting environmental/lifestyle interventions at those at 'high genotypic risk' can be of high utility only in specific circumstances. The utility of targeting environmental interactions by genotype (compared to randomly selecting the same number of people from the population) is zero if there is no gene-environment interaction. Utility can also be negative in the presence of a negative interaction (i.e. if the people at high genotypic risk have less to gain by the intervention than people at low genotypic risk). The finding that utility increases with gene-environment interaction is consistent with Khoury and Wagener [19] but the relationship is considerably clarified by the adoption here of different measures of the population attributable fraction associated with a targeted intervention (PAF^E_ge) and of utility (U_ge). Further, by formally introducing constraints on the model (for example, that risks are positive and do not exceed 100%), it is possible to demonstrate that both the gene-environment interaction factor and utility have maximum values, which cannot be exceeded for a given data set.

The lung cancer example is apparently trivial but also of critical importance. The R_MDvalue for lung cancer is close to 1, and neither the Scandinavian data used here [31], nor earlier US studies [41], have identified a significant heritable component. It follows from Equation (27) that if the genetic component of the variance, V_g, is zero, V_ge(the G-E component of the variance) is also zero and using genotyping to target an intervention such as smoking cessation is therefore of zero utility (no better than randomly selecting the same number of individuals). This approximate conclusion is confirmed by the results presented for lung cancer, which show extremely low utility. The detailed calculations may at first sight seem unnecessary, particularly because smoking causes multiple diseases and targeting smoking cessation on the basis of lung cancer risk alone is therefore ill-advised. However, the idea that a genetic test will one day predict which smokers get lung cancer has been widely promoted in the literature and has driven much research aimed at identifying the supposed 'genes for lung cancer' [42]. The results presented here strongly suggest that there will never be a genetic or genotypic test that predicts which smokers will get lung cancer, because the genetic component of the variance is not high enough.

Finally, the model illustrates the argument of Terwilliger and Weiss [11] that the potential for population biobanks to quantify risks for complex disease is limited by a 'multiple testing' problem caused by the large number of genetic and gene-environment interaction models that could fit existing data. Each point in each solution space described above represents a different combination of a genetic risk model (defined by p^DZ_g) and a G-E interaction model (defined by f_ge). Further, any given value of p^DZ_gmay be obtained by an infinite number of different combinations of different alleles acting through multiple biological pathways. Because the number of hypotheses that could be tested is essentially infinite, sample sizes necessary to quantify the risks (R_oo, R_go, R_oeand R_ge) could "plausibly be larger than the number of people that have ever lived" [11].

The model has several limitations. Measurements of shared sibling risk (λ_sib) are needed from the same population as twin data, and the scoping studies are only valid for λ_DZ> λ_sib, implying that environmental risks are more strongly correlated in dizygotic twins than other siblings. Some data exist to support this assumption for smoking [43] but for other exposures its validity is usually unknown. However, the model does not reduce to the classical twin study solution if this condition is not met: instead, data from more relatives are needed. In principle the model could, and should, be expanded to include data from more relatives, other data (such as migration study data), more risk categories and error terms. However, the number of unknown parameters will then increase, unless more data are available to quantify exposures (which change from generation to generation) and to estimate the extent to which environments are correlated between different types of relative.

Treating exposure and environmental variance (or population attributable fraction) as input data is also problematic when the effects of environmental factors on risk are often unknown. Further, the simple nature of the model (with one environmental axis) cannot adequately represent the complexity of environmental (including socio-economic) causes of disease. However, if targeting environmental interventions by genotype is to be considered, this implies that at least something is known (or expected to be learned) about environmental factors, such as particular exposures, that are amenable to intervention.

The assumption of no gene-environment correlation will often hold (for example it is rather implausible that the same genes strongly influence both lung cancer risk and nicotine addiction), but is not necessarily always true. Adult lactose intolerance is an example of a condition with a strong gene-environment interaction where targeted intervention to avoid drinking milk may be of high utility. However, the model is invalid for lactose intolerance unless exposures are applied equally to the population studied because, in general, people who are lactose intolerant may be less likely to drink milk (a gene-environment correlation) owing to the unpleasant symptoms.

A more fundamental problem is caused by the assumptions that: (i) the risks R_oo, R_go, R_oeand R_geare inherent properties of a given trait within a given population (with a given γ and ε) and that there are therefore no confounders; and (ii) risks are randomly distributed within these categories.

These assumptions, although often made, are implausible in many situations. The assumption of no confounders means that the model can only represent a subset of the potential models of gene-gene and gene-environment interaction described by more complex models (for example [17]). It is unlikely to be met if multiple genetic factors interact with multiple environmental ones [44]. Although this may well render the results presented here invalid, such complexity is likely to reduce the utility of targeting by genotype, rather than enhance it. Hence, situations where the 'no confounders' assumption at least approximately holds are those most likely to be of relevance to public health.

The second assumption neglects the fact that for most exposures there is a gradient in risk, with higher exposure meaning higher risk, and that the same may also be true of genetic factors. This means that increasing the number of categories in the model will increase V_e(see [45]) and perhaps V_g. Further, these subcategories may be differently correlated between relatives (for example, the twin of a heavy smoker may be more likely to be a heavy smoker than a light one). If so, a relative of a proband may not be representative of their allocated risk category in the four-category model and Equation (22) then becomes invalid.

More broadly, these assumptions make the model, like the classical twin model, essentially deterministic: it assumes that all the factors contributing to correlations in risk between relatives are perfectly known and are either environmental or genetic. Retention of these assumptions here may be problematic and could limit the applicability of the results. Nevertheless, all the other questionable assumptions of the classical twin model have been simultaneously removed.

Conclusion

The model shows that the potential for reducing the incidence of common diseases using environmental interventions targeted by genotype may be limited, except in special cases. The model also confirms that the importance of an individual's genotype in determining their risk of complex diseases tends to be exaggerated by the classical twin studies method, owing to the 'equal environments' assumption and the assumption of no gene-environment interaction. In addition, if phenotypes are genetically robust, because of epistasis, a largely environmental explanation for shared sibling risk is plausible, even if the classical heritability is high. The model therefore highlights the possibility – previously rejected on the basis of twin study results – that inherited genetic variants are important in determining risk only for the relatively rare familial forms of diseases such as breast cancer. If so, genetic models of familial aggregation may be incorrect and the hunt for additional susceptibility genes could be largely fruitless.

Competing interests

The author(s) declare that they have no competing interests.

Appendix A: formal derivation of equation (31)

Equation (23) may be derived more formally by extending the matrix method of Li and Sacks [46].

Define the probability that an affected proband is in genotypic risk category z and environmental risk category w as P_zwand assume that risks are randomly distributed within categories. Using the definitions of the four category model given in Table 1, a vector P may be defined:

$P = (\begin{matrix} P_{o o} \\ P_{o e} \\ P_{g o} \\ P_{g e} \end{matrix}) = (\begin{matrix} (1 - ε) (1 - γ) R_{o o} / r_{t} \\ ε (1 - γ) R_{o e} / r_{t} \\ γ (1 - ε) R_{g o} / r_{t} \\ γ ε R_{g e} / r_{t} \end{matrix}) (A 1)$

A risk vector R may also be defined:

$R = (\begin{matrix} R_{o o} \\ R_{o e} \\ R_{g o} \\ R_{g e} \end{matrix}) (A 2)$

Now define G_xyas the conditional probability P(relative is in genotypic risk category y|proband is in genotypic risk category x). Similarly, define E_xyas the conditional probability P(relative is in environmental risk category y|proband is in environmental risk category x). Using the definitions of p^rel_gand p^rel_egiven in Section 2.5, matrices G and E may be written such that:

$G^{r e l} = (\begin{matrix} G_{o o} & G_{o g} \\ G_{g o} & G_{g g} \end{matrix}) = (\begin{matrix} p_{g}^{r e l} + (1 - γ) (1 - p_{g}^{r e l}) & γ (1 - p_{g}^{r e l}) \\ (1 - γ) (1 - p_{g}^{r e l}) & p_{g}^{r e l} + γ (1 - γ) p_{g}^{r e l} \end{matrix}) (A 3)$

$E^{r e l} = (\begin{matrix} E_{o o} & E_{o e} \\ E_{e o} & E_{e e} \end{matrix}) = (\begin{matrix} p_{e}^{r e l} + (1 - ε) (1 - p_{e}^{r e l}) & ε (1 - p_{e}^{r e l}) \\ (1 - ε) (1 - p_{e}^{r e l}) & p_{e}^{r e l} + ε (1 - ε) p_{e}^{r e l} \end{matrix}) (A 4)$

Finally, define X_ab-cdas the conditional probability P(relative is in risk category cd|proband is in risk category ab), where the risk categories are as defined in Table 1 (for example risk categorgy 'ge' implies high-genotypic and high-environmental risk). Provided p^rel_gand p^rel_eare independent (there are no gene-environment correlations), the gene-environment interaction matrix M^rel_gemay be written as:

$M_{g e}^{r e l} = (\begin{matrix} X_{o o - o o} & X_{o o - o e} & X_{o o - g o} & X_{o o - g e} \\ X_{o e - o o} & X_{o e - o e} & X_{o e - g o} & X_{o e - g e} \\ X_{g o - o o} & X_{g o - o e} & X_{g o - g o} & X_{g o - g e} \\ X_{g e - o o} & X_{g e - o e} & X_{g e - g o} & X_{g e - g e} \end{matrix}) = (\begin{matrix} G_{o o} E_{o o} & G_{o o} E_{o e} & G_{o g} E_{o o} & G_{o g} E_{o e} \\ G_{o o} E_{e o} & G_{o o} E_{e e} & G_{o g} E_{e o} & G_{o g} E_{e e} \\ G_{g o} E_{o o} & G_{g o} E_{o e} & G_{g g} E_{o o} & G_{g g} E_{o e} \\ G_{g o} E_{e o} & G_{g o} E_{e e} & G_{g g} E_{e o} & G_{g g} E_{e e} \end{matrix}) (A 5)$

Then the risk in a relative of the proband is given by:

$λ_{r e l} r_{t} = P . (M_{g e}^{r e l} R) (A 6)$

After some algebra, this yields equation (23).

Appendix B: calculating recurrence risks for twins

The sibling recurrence risk λ_sibis often available directly from familial studies. For twins the recurrence risks, if not reported, may be calculated from the case-wise concordance (Cc):

λ_MZ= Cc_MZ/r_t (B1)

λ_DZ= Cc_DZ/r_t (B2)

where, if there is complete ascertainment of all affected twins in a population,

Cc = 2C/(2C + D) (B3)

and C is the number of concordant and D the number of discordant pairs [25].

Supplementary Material

Additional File 1

Gene-gene and gene-environment interaction model. Contains the Visual Basic macro (Twincal), input and output datasheets and charts used to calculate the solutions described in the text. The program is run by entering parameters in the 'Inputs' sheet and clicking on the 'Run' button. Note that for the final chart ('fe') the number of categories on the horizontal axis changes depending on the environmental input parameters ε and PAF^E_e. If these parameters are changed it is therefore necessary to delete the lower part of the output sheet prior to running the model and, after the run, to redraw the chart using the source data option from the chart. All other charts are drawn automatically. The line γ_max= γ_minis calculated exactly for the chart 'fe' but is approximated in the charts 'pgdz' and 'pgdzneg' using Newton's method and an initial guess for f_ge(f0) and step (fet). For some input parameters it may be necessary to change these values by editing the Visual Basic code (Twincal) to obtain a valid solution.

Click here for file^{(814.5KB, xls)}

Additional File 2

Supplementary Figure 1: Example model solution space with R_MD= 1.7 and U_ge≥ 0. Model solution space with U_ge≥ 0 for the same input parameters as Figure 2, apart from λ_MZ= 4.4.

Click here for file^{(73.2KB, bmp)}

Additional File 3

Supplementary Figure 2: Example model solution space with R_MD= 1.8 and U_ge≥ 0. Model solution space with U_ge≥ 0 for the same input parameters as Figure 2, apart from λ_MZ= 4.6.

Click here for file^{(73.2KB, bmp)}

Additional File 4

Supplementary Figure 3: Example model solution space with R_MD= 1.95 and U_ge≥ 0. Model solution space with U_ge≥ 0 for the same input parameters as Figure 2, apart from λ_MZ= 4.9.

Click here for file^{(73.2KB, bmp)}

Additional File 5

Supplementary Figure 4: Example model solution space with R_MD= 2.1 and U_ge≥ 0. Model solution space with U_ge≥ 0 for the same input parameters as Figure 2, apart from λ_MZ= 5.2.

Click here for file^{(73.2KB, bmp)}

Additional File 6

Supplementary Figure 5: Example full solution space with R_MD= 1.7. Full model solution space for the same input parameters as Figure 5, transformed so that f_geis on the horizontal axis.

Click here for file^{(71.2KB, bmp)}

Additional File 7

Supplementary Figure 6: Example full solution space with R_MD= 1.8. Full model solution space for the same input parameters as Figure 6, transformed so that f_geis on the horizontal axis.

Click here for file^{(71.2KB, bmp)}

Additional File 8

Supplementary Figure 7: Example full solution space with R_MD= 1.95. Full model solution space for the same input parameters as Figure 7, transformed so that f_geis on the horizontal axis.

Click here for file^{(71.2KB, bmp)}

Additional File 9

Supplementary Figure 8: Example full solution space with R_MD= 2.1. Full model solution space for the same input parameters as Figure 8, transformed so that f_geis on the horizontal axis.

Click here for file^{(71.2KB, bmp)}

Additional File 10

Supplementary Figure 9: Example model solution with c_MD> 1 and U_ge≥ 0. Input parameters: λ_MZ= 5.2, λ_DZ= 3, λ_sib= 2, ε = 0.2, PAF^E_e= 0.5, c_MD= 2, r_t= 0.1.

Click here for file^{(1.7MB, bmp)}

Additional File 11

Supplementary Figure 10: Example model solution with c_MD> 1 and U_ge≥ 0. Input parameters as for Figure 13.

Click here for file^{(88.1KB, bmp)}

Additional File 12

Supplementary Figure 11: Example full solution space with c_MD> 1. Full model solution space for the same parameters as Figure 13, transformed so that f_geis on the horizontal axis.

Click here for file^{(71.2KB, bmp)}

Additional File 13

Supplementary Figure 12: Breast cancer solution space with U_ge≥ 0. Input parameters are as shown in Table 5, with c_MD= 1. The solution space is shown (shaded) for positive f_ge, assuming the 'equal environments' assumption holds (c_MD= 1). The darker shaded area shows the part of the solution space for which γ_maxge< 1/2. Utility U_geis at its maximum when γ = 1/2 except within this darker shaded area.

Click here for file^{(75.1KB, bmp)}

Additional File 14

Supplementary Figure 13: Breast cancer variances with f_ge= 0. Input parameters as for Figure 16. Additive model of G-E interaction (f_ge= 0). Variance components are genetic (V_g) or environmental (V_e).

Click here for file^{(66.4KB, bmp)}

Additional File 15

Supplementary Figure 14: Breast cancer variances with f_ge= 1. Input parameters as for Figure 16. Multiplicative G-E interaction model (f_ge= 1). Variance components are genetic (V_g), environmental (V_e) or due to gene-environment interaction (V_ge).

Click here for file^{(66.4KB, bmp)}

Additional File 16

Supplementary Figure 15: Breast cancer variances with f_ge= 1/PAF^E_e. Input parameters as for Figure 16. Maximum G-E interaction model (f_ge= 1/PAF^E_e). Variance components are genetic (V_g), environmental (V_e) or due to gene-environment interaction (V_ge).

Click here for file^{(66.4KB, bmp)}

Additional File 17

Supplementary Figure 16: Breast cancer γ values with f_ge= 0. Input parameters as for Figure 16. The proportion of the population in the 'high genotypic risk' group, γ, may take any value in the shaded area. γ_minoccurs when R_ge= 1, i.e. when the Positive Predictive Value (PPV) of being in the 'ge' subgroup is 100%. γ_maxoccurs when R_oo= 1 for an additive G-E model and solutions with a Population Impact of 100% (PI = 1) cannot exist.

Click here for file^{(67.3KB, bmp)}

Additional File 18

Supplementary Figure 17: Breast cancer γ values with f_ge= 1. Input parameters as for Figure 16. The proportion of the population in the 'high genotypic risk' group, γ, may take any value in the shaded area. A solution with a Population Impact of 100% (PI = 1) may exist if γ = γ_max.

Click here for file^{(62.1KB, bmp)}

Additional File 19

Supplementary Figure 18: Breast cancer γ values with f_ge= 1/PAF^E_e. Input parameters as for Figure 16. The proportion of the population in the 'high genotypic risk' group, γ, may take any value in the shaded area. A solution with a Population Impact of 100% (PI = 1) may exist if γ = γ_max.

Click here for file^{(63.5KB, bmp)}

Additional File 20

Supplementary Figure 19: Breast cancer solution space with U_ge≤ 0. Input parameters are as for Figure 16. The solution space is shown for negative f_ge(where the utility of targeting environmental interventions at the high genotypic risk group is negative, U_ge≤ 0). Solutions exist only in the shaded area where γ_max≥ γ_min.

Click here for file^{(59.4KB, bmp)}

Additional File 21

Supplementary Figure 20: Breast cancer: full solution space. Input parameters are as for Figure 16. The same solution space as Figures 16 and 23 is shown (shaded), transformed so that the G-E interaction factor is plotted on the horizontal axis. Again, each point in the shaded solution space represents a genetic model defined by p^DZ_gand a G-E interaction model defined by f_ge. The area of solutions with γ_maxge< 1/2 is highlighted with darker shading. The classical twin study solution lies on the vertical axis (f_ge= 0) at the point p^DZ_g= 1/2, and is slightly outside the solution space.

Click here for file^{(72.7KB, bmp)}

Additional File 22

Supplementary Figure 21: Lung cancer solution space with U_ge≥ 0. Input parameters are as shown in Table 5, with c_MD= 1.

Click here for file^{(58.6KB, bmp)}

Additional File 23

Supplementary Figure 22: Lung cancer variances with f_ge= 0. Input parameters as for Figure 25. Note that the horizontal axis has been expanded to show high values of c_SD/R_SDonly.

Click here for file^{(66.4KB, bmp)}

Additional File 24

Supplementary Figure 23: Lung cancer variances with f_ge= 1. Input parameters as for Figure 25. Note that the horizontal axis has been expanded to show high values of c_SD/R_SDonly.

Click here for file^{(62.7KB, bmp)}

Additional File 25

Supplementary Figure 24: Lung cancer variances with f_ge= 1/PAF^E_e. Input parameters as for Figure 25. Note that the horizontal axis has been expanded to show high values of c_SD/R_SDonly.

Click here for file^{(62.7KB, bmp)}

Additional File 26

Supplementary Figure 25: Lung cancer γ values for f_ge= 1. Input parameters as for Figure 25. The proportion of the population in the 'high genotypic risk' group, γ, may take any value in the shaded area.

Click here for file^{(58.6KB, bmp)}

Additional File 27

Supplementary Figure 26: Lung cancer γ values for f_ge= 1/PAF^E_e. Input parameters as for Figure 25. The proportion of the population in the 'high genotypic risk' group, γ, may take any value in the shaded area.

Click here for file^{(62.7KB, bmp)}

Additional File 28

Supplementary Figure 27: Lung cancer U_gevalues for f_ge= 1. Input parameters as for Figure 25. The utility parameter, U_ge, may take any value in the shaded area, but is maximum when γ = 1/2.

Click here for file^{(62.7KB, bmp)}

Additional File 29

Supplementary Figure 28: Lung cancer U_gevalues for f_ge= 1/PAF^E_e. Input parameters as for Figure 25. The utility parameter, U_ge, may take any value in the shaded area, but is maximum when γ = 1/2.

Click here for file^{(58.6KB, bmp)}

Additional File 30

Supplementary Figure 29: Lung cancer: full solution space. Input parameters as for Figure 25.

Click here for file^{(1.7MB, bmp)}

Additional File 31

Supplementary Figure 30: Schizophrenia U_ge≥ 0, small environmental variance and c_MD≥ 1. Input parameters are as shown in Table 5, with ε = 0.62, PAF^E_e= 0.15 and c_MD= 1.

Click here for file^{(58.6KB, bmp)}

Additional File 32

Supplementary Figure 31: Schizophrenia U_ge≥ 0, small environmental variance and c_MD> 1. Input parameters are as shown in Table 5, with ε = 0.62, PAF^E_e= 0.15 and c_MD= 3.8.

Click here for file^{(58.6KB, bmp)}

Additional File 33

Supplementary Figure 32: Schizophrenia U_ge≥ 0, large environmental variance and c_MD= 1. Input parameters are as shown in Table 5, with ε = 0.15, PAF^E_e= 0.86 and c_MD= 1.

Click here for file^{(58.6KB, bmp)}

Acknowledgments

Acknowledgements

The author is grateful to the Joseph Rowntree Charitable Trust for funding the completion of this work.

References

Collins FS. Shattuck Lecture – medical and societal consequences of the Human Genome Project. New Engl J Med. 1999;341:28–37. doi: 10.1056/NEJM199907013410106. [DOI] [PubMed] [Google Scholar]
Bell J. The new genetics in clinical practice. BMJ. 1998;316:618–620. doi: 10.1136/bmj.316.7131.618. [DOI] [PMC free article] [PubMed] [Google Scholar]
Collins FS, McKusick VA. Implications of the Human Genome Project for medical science. J Am Med Assoc. 2001;285:540–544. doi: 10.1001/jama.285.5.540. [DOI] [PubMed] [Google Scholar]
Strohman RC. The coming Kuhnian revolution in biology. Nat Biotechnol. 1997;15:194–200. doi: 10.1038/nbt0397-194. [DOI] [PubMed] [Google Scholar]
Holtzman NA, Marteau TM. Will genetics revolutionize medicine? New Engl J Med. 2000;343:141–144. doi: 10.1056/NEJM200007133430213. [DOI] [PubMed] [Google Scholar]
Vineis P, Schulte P, McMichael AJ. Misconceptions about the use of genetic tests in populations. Lancet. 2001;357:709–712. doi: 10.1016/S0140-6736(00)04136-2. [DOI] [PubMed] [Google Scholar]
Baird P. The Human Genome Project, genetics and health. Community Genet. 2001;4:77–80. doi: 10.1159/000051161. [DOI] [PubMed] [Google Scholar]
Cooper RS, Psaty BM. Genetics and medicine: distraction, incremental progress, or the dawn of a new age? Ann Intern Med. 2003;138:576–580. doi: 10.7326/0003-4819-138-7-200304010-00014. [DOI] [PubMed] [Google Scholar]
Vineis P, Ahsan H, Parker M. Genetic screening and occupational and environmental exposures. Occup Environ Med. 2004;62:657–662. doi: 10.1136/oem.2004.019190. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khoury MJ, Yang Q, Gwinn M, Little J, Flanders WD. An epidemiologic assessment of genetic profiling for measuring susceptibility to common diseases and targeting interventions. Genet Med. 2004;6:38–47. doi: 10.1097/01.gim.0000105751.71430.79. [DOI] [PubMed] [Google Scholar]
Terwilliger JD, Weiss KM. Confounding, ascertainment bias, and the blind quest for a genetic 'fountain of youth'. Ann Med. 2003;35:532–544. doi: 10.1080/07853890310015181. [DOI] [PubMed] [Google Scholar]
Ioannidis JPA, Ntzani EE, Trikalinos TA, Contopoulos-Ionnidis DG. Replication validity of genetic association studies. Nat Genet. 2001;29:306–309. doi: 10.1038/ng749. [DOI] [PubMed] [Google Scholar]
Cordell HJ, Clayton DG. Genetic association studies. Lancet. 2005;366:1121–1131. doi: 10.1016/S0140-6736(05)67424-7. [DOI] [PubMed] [Google Scholar]
Ioannidis J. Why most published research findings are false. PloS Med. 2005;2:e124. doi: 10.1371/journal.pmed.0020124. DOI: 10.137/journal.pmed.0020124. [DOI] [PMC free article] [PubMed] [Google Scholar]
Layzer D. Heritability analyses of IQ scores: science or numerology? Science. 1974;183:1259–1266. doi: 10.1126/science.183.4131.1259. [DOI] [PubMed] [Google Scholar]
Risch N. Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet. 1990;46:222–228. [PMC free article] [PubMed] [Google Scholar]
Guo S-W. Gene-environment interaction and the mapping of complex traits: some statistical models and their interpretation. Hum Hered. 2000;50:286–303. doi: 10.1159/000022931. [DOI] [PubMed] [Google Scholar]
Hopper JL. Why 'common environmental effects' are so uncommon in the literature. In: Spector TD, Sneider H, MacGregor AJ, editor. Advances in twin and sib-pair analysis. London: Greenwich Medical Media Ltd; 2000. [Google Scholar]
Khoury MJ, Wagener DK. Epidemiological evaluation of the use of genetics to improve the predictive value of disease risk factors. Am J Hum Genet. 1995;56:835–844. [PMC free article] [PubMed] [Google Scholar]
Lewis SJ, Brunner EJ. Methodological problems in genetic association studies of longevity – the apolipoprotein E gene as an example. Int J Epidemiol. 2004;33:962–970. doi: 10.1093/ije/dyh214. [DOI] [PubMed] [Google Scholar]
Tryggvadottir L, Sigvaldason H, Olafsdottir GH, Jonasson JG, Jonsson T, Tulinius H, Eyfjord JE. Population-based study of changing breast cancer risk in Icelandic BRCA2 mutation carriers, 1920–2000. J Natl Cancer Inst. 2006;98:116–122. doi: 10.1093/jnci/djj012. [DOI] [PubMed] [Google Scholar]
Humphries S, Ridker PM, Talmud PJ. Genetic testing for cardiovascular disease susceptibility: a useful clinical management tool or possible misinformation? Arterioscler Thromb Vasc Biol. 2004;24:628–636. doi: 10.1161/01.ATV.0000116216.56511.39. [DOI] [PubMed] [Google Scholar]
Rose G. Sick individuals and sick populations. Int J Epidemiol. 1985;14:32–38. doi: 10.1093/ije/14.1.32. [DOI] [PubMed] [Google Scholar]
Khoury MJ, Jones K, Grosse SD. Quantifying the health benefits of genetic tests: The importance of a population perspective. Genet Med. 2006;8:191–195. doi: 10.1097/01.gim.0000206278.37405.25. [DOI] [PubMed] [Google Scholar]
MacGregor AJ. Practical approaches to account for bias and confounding in twin data. In: Spector TD, Sneider H, MacGregor AJ, editor. Advances in twin and sib-pair analysis. London: Greenwich Medical Media Ltd; 2000. [Google Scholar]
Fisher RA. The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinb. 1918;52:399–433. [Google Scholar]
Hopper JL. Variance components for statistical genetics: applications in medical research to characteristics related to human diseases and health. Stat Methods Med Res. 1993;2:199–223. doi: 10.1177/096228029300200302. [DOI] [PubMed] [Google Scholar]
Porteous JW. A rational treatment of Mendelian genetics. Theor Biol Med Model. 2004;1:6. doi: 10.1186/1742-4682-1-6. DOI: 10.1186/1742-4682-1-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Azevedo RBR, Lohaus R, Srinivasan S, Dang KK, Burch CL. Sexual reproduction selects for robustness and negative epistasis in artificial gene networks. Nature. 2006;440:87–90. doi: 10.1038/nature04488. [DOI] [PubMed] [Google Scholar]
Risch N. The genetic epidemiology of cancer: interpreting family and twin studies and their implications for molecular genetic approaches. Cancer Epidemiol Biomarkers Prev. 2001;10:733–741. [PubMed] [Google Scholar]
Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E, Skytthe A, Hemminki K. Environmental and heritable factors in the causation of cancer. New Engl J Med. 2000;343:78–85. doi: 10.1056/NEJM200007133430201. [DOI] [PubMed] [Google Scholar]
Dong C, Hemminki K. Modification of cancer risks in offspring by sibling and parental cancers from 2,112,616 nuclear families. Int J Cancer. 2001;92:144–150. doi: 10.1002/1097-0215(200102)9999:9999<::AID-IJC1147>3.0.CO;2-C. [DOI] [PubMed] [Google Scholar]
Rockhill B, Weinberg CR, Newman B. Population attributable fraction estimation for established breast cancer risk factors: considering the issues of high prevalence and unmodifiability. Am J Epidemiol. 1998;147:826–833. doi: 10.1093/oxfordjournals.aje.a009535. [DOI] [PubMed] [Google Scholar]
McGue M, Gottesman II, Rao DC. The transmission of schizophrenia under a multifactorial threshold model. Am J Hum Genet. 1983;35:1161–1178. [PMC free article] [PubMed] [Google Scholar]
Easton DF. How many more breast cancer predisposition genes are there? Breast Cancer Res. 1999;1:14–17. doi: 10.1186/bcr6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Badano JL, Katsanis N. Beyond Mendel: an evolving view of human genetic disease transmission. Nat Rev Genet. 2002;3:779–789. doi: 10.1038/nrg910. [DOI] [PubMed] [Google Scholar]
Badano JL, Leitch CC, Ansley SJ, May-Simera H, Lawson S, Lewis RA, Beales PL, Dietz HC, Fisher S, Katsanis N. Dissection of epistasis in oligogenic Bardet-Biedl syndrome. Nature. 2006;439:326–330. doi: 10.1038/nature04370. [DOI] [PubMed] [Google Scholar]
Taioli E, Zocchetti C, Garte S. Models of interaction between metabolic genes and environmental exposure in cancer susceptibility. Environ Health Perspect. 1998;106:67–70. doi: 10.1289/ehp.9810667. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet. 2005;6:287–298. doi: 10.1038/nrg1578. [DOI] [PubMed] [Google Scholar]
Antoniou AC, Pharoah PDP, McMullan G, Day NE, Stratton MR, Peto J, Ponder BJ, Easton DF. A comprehensive model for familial breast cancer incorporating BRCA1, BRCA2 and other genes. Br J Cancer. 2002;86:76–83. doi: 10.1038/sj.bjc.6600008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Braun MM, Caporaso NE, Page WF, Hoover RN. A cohort study of twins and cancer. Cancer Epidemiol Biomarkers Prev. 1995;4:469–473. [PubMed] [Google Scholar]
Hall W, Madden P, Lynskey M. The genetics of tobacco use: methods, findings and policy implications. Tob Control. 2002;11:119–124. doi: 10.1136/tc.11.2.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vink JM, Willemsen G, Boomsma DI. The association of current smoking behavior with the smoking behavior of parents, siblings, friends and spouses. Addiction. 2003;98:923–931. doi: 10.1046/j.1360-0443.2003.00405.x. [DOI] [PubMed] [Google Scholar]
Taioli E, Garte S. Covariates and confounding in epidemiologic studies using metabolic gene polymorphisms. Int J Cancer. 2002;100:97–100. doi: 10.1002/ijc.10448. [DOI] [PubMed] [Google Scholar]
Guo S. The behaviors of some heritability estimators in the complete absence of genetic factors. Hum Hered. 1999;49:215–228. doi: 10.1159/000022878. [DOI] [PubMed] [Google Scholar]
Li CC, Sacks L. The derivation of joint distribution and correlation between relatives by the use of stochastic matrices. Biometrics. 1954;10:347–360. doi: 10.2307/3001590. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Additional File 1

Click here for file^{(814.5KB, xls)}

Additional File 2

Supplementary Figure 1: Example model solution space with R_MD= 1.7 and U_ge≥ 0. Model solution space with U_ge≥ 0 for the same input parameters as Figure 2, apart from λ_MZ= 4.4.

Click here for file^{(73.2KB, bmp)}

Additional File 3

Supplementary Figure 2: Example model solution space with R_MD= 1.8 and U_ge≥ 0. Model solution space with U_ge≥ 0 for the same input parameters as Figure 2, apart from λ_MZ= 4.6.

Click here for file^{(73.2KB, bmp)}

Additional File 4

Supplementary Figure 3: Example model solution space with R_MD= 1.95 and U_ge≥ 0. Model solution space with U_ge≥ 0 for the same input parameters as Figure 2, apart from λ_MZ= 4.9.

Click here for file^{(73.2KB, bmp)}

Additional File 5

Supplementary Figure 4: Example model solution space with R_MD= 2.1 and U_ge≥ 0. Model solution space with U_ge≥ 0 for the same input parameters as Figure 2, apart from λ_MZ= 5.2.

Click here for file^{(73.2KB, bmp)}

Additional File 6

Supplementary Figure 5: Example full solution space with R_MD= 1.7. Full model solution space for the same input parameters as Figure 5, transformed so that f_geis on the horizontal axis.

Click here for file^{(71.2KB, bmp)}

Additional File 7

Supplementary Figure 6: Example full solution space with R_MD= 1.8. Full model solution space for the same input parameters as Figure 6, transformed so that f_geis on the horizontal axis.

Click here for file^{(71.2KB, bmp)}

Additional File 8

Supplementary Figure 7: Example full solution space with R_MD= 1.95. Full model solution space for the same input parameters as Figure 7, transformed so that f_geis on the horizontal axis.

Click here for file^{(71.2KB, bmp)}

Additional File 9

Supplementary Figure 8: Example full solution space with R_MD= 2.1. Full model solution space for the same input parameters as Figure 8, transformed so that f_geis on the horizontal axis.

Click here for file^{(71.2KB, bmp)}

Additional File 10

Supplementary Figure 9: Example model solution with c_MD> 1 and U_ge≥ 0. Input parameters: λ_MZ= 5.2, λ_DZ= 3, λ_sib= 2, ε = 0.2, PAF^E_e= 0.5, c_MD= 2, r_t= 0.1.

Click here for file^{(1.7MB, bmp)}

Additional File 11

Supplementary Figure 10: Example model solution with c_MD> 1 and U_ge≥ 0. Input parameters as for Figure 13.

Click here for file^{(88.1KB, bmp)}

Additional File 12

Supplementary Figure 11: Example full solution space with c_MD> 1. Full model solution space for the same parameters as Figure 13, transformed so that f_geis on the horizontal axis.

Click here for file^{(71.2KB, bmp)}

Additional File 13

Click here for file^{(75.1KB, bmp)}

Additional File 14

Click here for file^{(66.4KB, bmp)}

Additional File 15

Click here for file^{(66.4KB, bmp)}

Additional File 16

Click here for file^{(66.4KB, bmp)}

Additional File 17

Click here for file^{(67.3KB, bmp)}

Additional File 18

Click here for file^{(62.1KB, bmp)}

Additional File 19

Click here for file^{(63.5KB, bmp)}

Additional File 20

Click here for file^{(59.4KB, bmp)}

Additional File 21

Click here for file^{(72.7KB, bmp)}

Additional File 22

Supplementary Figure 21: Lung cancer solution space with U_ge≥ 0. Input parameters are as shown in Table 5, with c_MD= 1.

Click here for file^{(58.6KB, bmp)}

Additional File 23

Supplementary Figure 22: Lung cancer variances with f_ge= 0. Input parameters as for Figure 25. Note that the horizontal axis has been expanded to show high values of c_SD/R_SDonly.

Click here for file^{(66.4KB, bmp)}

Additional File 24

Supplementary Figure 23: Lung cancer variances with f_ge= 1. Input parameters as for Figure 25. Note that the horizontal axis has been expanded to show high values of c_SD/R_SDonly.

Click here for file^{(62.7KB, bmp)}

Additional File 25

Supplementary Figure 24: Lung cancer variances with f_ge= 1/PAF^E_e. Input parameters as for Figure 25. Note that the horizontal axis has been expanded to show high values of c_SD/R_SDonly.

Click here for file^{(62.7KB, bmp)}

Additional File 26

Click here for file^{(58.6KB, bmp)}

Additional File 27

Click here for file^{(62.7KB, bmp)}

Additional File 28

Supplementary Figure 27: Lung cancer U_gevalues for f_ge= 1. Input parameters as for Figure 25. The utility parameter, U_ge, may take any value in the shaded area, but is maximum when γ = 1/2.

Click here for file^{(62.7KB, bmp)}

Additional File 29

Click here for file^{(58.6KB, bmp)}

Additional File 30

Supplementary Figure 29: Lung cancer: full solution space. Input parameters as for Figure 25.

Click here for file^{(1.7MB, bmp)}

Additional File 31

Supplementary Figure 30: Schizophrenia U_ge≥ 0, small environmental variance and c_MD≥ 1. Input parameters are as shown in Table 5, with ε = 0.62, PAF^E_e= 0.15 and c_MD= 1.

Click here for file^{(58.6KB, bmp)}

Additional File 32

Supplementary Figure 31: Schizophrenia U_ge≥ 0, small environmental variance and c_MD> 1. Input parameters are as shown in Table 5, with ε = 0.62, PAF^E_e= 0.15 and c_MD= 3.8.

Click here for file^{(58.6KB, bmp)}

Additional File 33

Supplementary Figure 32: Schizophrenia U_ge≥ 0, large environmental variance and c_MD= 1. Input parameters are as shown in Table 5, with ε = 0.15, PAF^E_e= 0.86 and c_MD= 1.

Click here for file^{(58.6KB, bmp)}

[B1] Collins FS. Shattuck Lecture – medical and societal consequences of the Human Genome Project. New Engl J Med. 1999;341:28–37. doi: 10.1056/NEJM199907013410106. [DOI] [PubMed] [Google Scholar]

[B2] Bell J. The new genetics in clinical practice. BMJ. 1998;316:618–620. doi: 10.1136/bmj.316.7131.618. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] Collins FS, McKusick VA. Implications of the Human Genome Project for medical science. J Am Med Assoc. 2001;285:540–544. doi: 10.1001/jama.285.5.540. [DOI] [PubMed] [Google Scholar]

[B4] Strohman RC. The coming Kuhnian revolution in biology. Nat Biotechnol. 1997;15:194–200. doi: 10.1038/nbt0397-194. [DOI] [PubMed] [Google Scholar]

[B5] Holtzman NA, Marteau TM. Will genetics revolutionize medicine? New Engl J Med. 2000;343:141–144. doi: 10.1056/NEJM200007133430213. [DOI] [PubMed] [Google Scholar]

[B6] Vineis P, Schulte P, McMichael AJ. Misconceptions about the use of genetic tests in populations. Lancet. 2001;357:709–712. doi: 10.1016/S0140-6736(00)04136-2. [DOI] [PubMed] [Google Scholar]

[B7] Baird P. The Human Genome Project, genetics and health. Community Genet. 2001;4:77–80. doi: 10.1159/000051161. [DOI] [PubMed] [Google Scholar]

[B8] Cooper RS, Psaty BM. Genetics and medicine: distraction, incremental progress, or the dawn of a new age? Ann Intern Med. 2003;138:576–580. doi: 10.7326/0003-4819-138-7-200304010-00014. [DOI] [PubMed] [Google Scholar]

[B9] Vineis P, Ahsan H, Parker M. Genetic screening and occupational and environmental exposures. Occup Environ Med. 2004;62:657–662. doi: 10.1136/oem.2004.019190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] Khoury MJ, Yang Q, Gwinn M, Little J, Flanders WD. An epidemiologic assessment of genetic profiling for measuring susceptibility to common diseases and targeting interventions. Genet Med. 2004;6:38–47. doi: 10.1097/01.gim.0000105751.71430.79. [DOI] [PubMed] [Google Scholar]

[B11] Terwilliger JD, Weiss KM. Confounding, ascertainment bias, and the blind quest for a genetic 'fountain of youth'. Ann Med. 2003;35:532–544. doi: 10.1080/07853890310015181. [DOI] [PubMed] [Google Scholar]

[B12] Ioannidis JPA, Ntzani EE, Trikalinos TA, Contopoulos-Ionnidis DG. Replication validity of genetic association studies. Nat Genet. 2001;29:306–309. doi: 10.1038/ng749. [DOI] [PubMed] [Google Scholar]

[B13] Cordell HJ, Clayton DG. Genetic association studies. Lancet. 2005;366:1121–1131. doi: 10.1016/S0140-6736(05)67424-7. [DOI] [PubMed] [Google Scholar]

[B14] Ioannidis J. Why most published research findings are false. PloS Med. 2005;2:e124. doi: 10.1371/journal.pmed.0020124. DOI: 10.137/journal.pmed.0020124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] Layzer D. Heritability analyses of IQ scores: science or numerology? Science. 1974;183:1259–1266. doi: 10.1126/science.183.4131.1259. [DOI] [PubMed] [Google Scholar]

[B16] Risch N. Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet. 1990;46:222–228. [PMC free article] [PubMed] [Google Scholar]

[B17] Guo S-W. Gene-environment interaction and the mapping of complex traits: some statistical models and their interpretation. Hum Hered. 2000;50:286–303. doi: 10.1159/000022931. [DOI] [PubMed] [Google Scholar]

[B18] Hopper JL. Why 'common environmental effects' are so uncommon in the literature. In: Spector TD, Sneider H, MacGregor AJ, editor. Advances in twin and sib-pair analysis. London: Greenwich Medical Media Ltd; 2000. [Google Scholar]

[B19] Khoury MJ, Wagener DK. Epidemiological evaluation of the use of genetics to improve the predictive value of disease risk factors. Am J Hum Genet. 1995;56:835–844. [PMC free article] [PubMed] [Google Scholar]

[B20] Lewis SJ, Brunner EJ. Methodological problems in genetic association studies of longevity – the apolipoprotein E gene as an example. Int J Epidemiol. 2004;33:962–970. doi: 10.1093/ije/dyh214. [DOI] [PubMed] [Google Scholar]

[B21] Tryggvadottir L, Sigvaldason H, Olafsdottir GH, Jonasson JG, Jonsson T, Tulinius H, Eyfjord JE. Population-based study of changing breast cancer risk in Icelandic BRCA2 mutation carriers, 1920–2000. J Natl Cancer Inst. 2006;98:116–122. doi: 10.1093/jnci/djj012. [DOI] [PubMed] [Google Scholar]

[B22] Humphries S, Ridker PM, Talmud PJ. Genetic testing for cardiovascular disease susceptibility: a useful clinical management tool or possible misinformation? Arterioscler Thromb Vasc Biol. 2004;24:628–636. doi: 10.1161/01.ATV.0000116216.56511.39. [DOI] [PubMed] [Google Scholar]

[B23] Rose G. Sick individuals and sick populations. Int J Epidemiol. 1985;14:32–38. doi: 10.1093/ije/14.1.32. [DOI] [PubMed] [Google Scholar]

[B24] Khoury MJ, Jones K, Grosse SD. Quantifying the health benefits of genetic tests: The importance of a population perspective. Genet Med. 2006;8:191–195. doi: 10.1097/01.gim.0000206278.37405.25. [DOI] [PubMed] [Google Scholar]

[B25] MacGregor AJ. Practical approaches to account for bias and confounding in twin data. In: Spector TD, Sneider H, MacGregor AJ, editor. Advances in twin and sib-pair analysis. London: Greenwich Medical Media Ltd; 2000. [Google Scholar]

[B26] Fisher RA. The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinb. 1918;52:399–433. [Google Scholar]

[B27] Hopper JL. Variance components for statistical genetics: applications in medical research to characteristics related to human diseases and health. Stat Methods Med Res. 1993;2:199–223. doi: 10.1177/096228029300200302. [DOI] [PubMed] [Google Scholar]

[B28] Porteous JW. A rational treatment of Mendelian genetics. Theor Biol Med Model. 2004;1:6. doi: 10.1186/1742-4682-1-6. DOI: 10.1186/1742-4682-1-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] Azevedo RBR, Lohaus R, Srinivasan S, Dang KK, Burch CL. Sexual reproduction selects for robustness and negative epistasis in artificial gene networks. Nature. 2006;440:87–90. doi: 10.1038/nature04488. [DOI] [PubMed] [Google Scholar]

[B30] Risch N. The genetic epidemiology of cancer: interpreting family and twin studies and their implications for molecular genetic approaches. Cancer Epidemiol Biomarkers Prev. 2001;10:733–741. [PubMed] [Google Scholar]

[B31] Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E, Skytthe A, Hemminki K. Environmental and heritable factors in the causation of cancer. New Engl J Med. 2000;343:78–85. doi: 10.1056/NEJM200007133430201. [DOI] [PubMed] [Google Scholar]

[B32] Dong C, Hemminki K. Modification of cancer risks in offspring by sibling and parental cancers from 2,112,616 nuclear families. Int J Cancer. 2001;92:144–150. doi: 10.1002/1097-0215(200102)9999:9999<::AID-IJC1147>3.0.CO;2-C. [DOI] [PubMed] [Google Scholar]

[B33] Rockhill B, Weinberg CR, Newman B. Population attributable fraction estimation for established breast cancer risk factors: considering the issues of high prevalence and unmodifiability. Am J Epidemiol. 1998;147:826–833. doi: 10.1093/oxfordjournals.aje.a009535. [DOI] [PubMed] [Google Scholar]

[B34] McGue M, Gottesman II, Rao DC. The transmission of schizophrenia under a multifactorial threshold model. Am J Hum Genet. 1983;35:1161–1178. [PMC free article] [PubMed] [Google Scholar]

[B35] Easton DF. How many more breast cancer predisposition genes are there? Breast Cancer Res. 1999;1:14–17. doi: 10.1186/bcr6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B36] Badano JL, Katsanis N. Beyond Mendel: an evolving view of human genetic disease transmission. Nat Rev Genet. 2002;3:779–789. doi: 10.1038/nrg910. [DOI] [PubMed] [Google Scholar]

[B37] Badano JL, Leitch CC, Ansley SJ, May-Simera H, Lawson S, Lewis RA, Beales PL, Dietz HC, Fisher S, Katsanis N. Dissection of epistasis in oligogenic Bardet-Biedl syndrome. Nature. 2006;439:326–330. doi: 10.1038/nature04370. [DOI] [PubMed] [Google Scholar]

[B38] Taioli E, Zocchetti C, Garte S. Models of interaction between metabolic genes and environmental exposure in cancer susceptibility. Environ Health Perspect. 1998;106:67–70. doi: 10.1289/ehp.9810667. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B39] Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet. 2005;6:287–298. doi: 10.1038/nrg1578. [DOI] [PubMed] [Google Scholar]

[B40] Antoniou AC, Pharoah PDP, McMullan G, Day NE, Stratton MR, Peto J, Ponder BJ, Easton DF. A comprehensive model for familial breast cancer incorporating BRCA1, BRCA2 and other genes. Br J Cancer. 2002;86:76–83. doi: 10.1038/sj.bjc.6600008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B41] Braun MM, Caporaso NE, Page WF, Hoover RN. A cohort study of twins and cancer. Cancer Epidemiol Biomarkers Prev. 1995;4:469–473. [PubMed] [Google Scholar]

[B42] Hall W, Madden P, Lynskey M. The genetics of tobacco use: methods, findings and policy implications. Tob Control. 2002;11:119–124. doi: 10.1136/tc.11.2.119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B43] Vink JM, Willemsen G, Boomsma DI. The association of current smoking behavior with the smoking behavior of parents, siblings, friends and spouses. Addiction. 2003;98:923–931. doi: 10.1046/j.1360-0443.2003.00405.x. [DOI] [PubMed] [Google Scholar]

[B44] Taioli E, Garte S. Covariates and confounding in epidemiologic studies using metabolic gene polymorphisms. Int J Cancer. 2002;100:97–100. doi: 10.1002/ijc.10448. [DOI] [PubMed] [Google Scholar]

[B45] Guo S. The behaviors of some heritability estimators in the complete absence of genetic factors. Hum Hered. 1999;49:215–228. doi: 10.1159/000022878. [DOI] [PubMed] [Google Scholar]

[B46] Li CC, Sacks L. The derivation of joint distribution and correlation between relatives by the use of stochastic matrices. Biometrics. 1954;10:347–360. doi: 10.2307/3001590. [DOI] [Google Scholar]

PERMALINK

A model of gene-gene and gene-environment interactions and its implications for targeting environmental interventions by genotype

Helen M Wallace

Abstract

Background

Method

Conclusion

Background

Method

The four-category model

Figure 1.

Table 1.

Population attributable fractions

Measures of utility

Limits on parameters

Table 2.

The twin and familial risks model

The gene-environment interaction factor and remaining inequalities

Scoping studies

Table 3.

Table 4.

Special cases

1. No genetic variance

2. No environmental variance

3. Classical twin study assumptions

4. No correlation in genotypic risk in siblings (pDZg = 0)

5. Cases where γmax = γmin

Table 5.

6. Cases where γmaxge < 1/2

7. Cases where the 'equal environments' assumption holds (cMD = 1)

Comparison with the classical twins approach

Table 6.

Results

General model solutions

Figure 2.

Figure 3.

Figure 4.

Example applications using twin, sibling and environmental data

Input values

Table 7.

Breast cancer results

Lung cancer results

Schizophrenia results

Discussion

Conclusion

Competing interests

Appendix A: formal derivation of equation (31)

Appendix B: calculating recurrence risks for twins

Supplementary Material

Acknowledgments

Acknowledgements

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

4. No correlation in genotypic risk in siblings (p^DZ_g= 0)

5. Cases where γ_max= γ_min

6. Cases where γ_maxge< 1/2

7. Cases where the 'equal environments' assumption holds (c_MD= 1)