Inequality in genetic cancer risk suggests bad genes rather than bad luck

Mats Julius Stensrud; Morten Valberg

doi:10.1038/s41467-017-01284-y

. 2017 Oct 27;8:1165. doi: 10.1038/s41467-017-01284-y

Inequality in genetic cancer risk suggests bad genes rather than bad luck

Mats Julius Stensrud ^1,^2,^✉, Morten Valberg ^1,³

PMCID: PMC5660094 PMID: 29079851

Abstract

Heritability is often estimated by decomposing the variance of a trait into genetic and other factors. Interpreting such variance decompositions, however, is not straightforward. In particular, there is an ongoing debate on the importance of genetic factors in cancer development, even though heritability estimates exist. Here we show that heritability estimates contain information on the distribution of absolute risk due to genetic differences. The approach relies on the assumptions underlying the conventional heritability of liability model. We also suggest a model unrelated to heritability estimates. By applying these strategies, we describe the distribution of absolute genetic risk for 15 common cancers. We highlight the considerable inequality in genetic risk of cancer using different metrics, e.g., the Gini Index and quantile ratios which are frequently used in economics. For all these cancers, the estimated inequality in genetic risk is larger than the inequality in income in the USA.

Cancer heritability estimates can be obtained via decomposing trait variance into genetic and other factors. Here, the authors obtain the distribution of absolute genetic risk for 15 common cancers, and they use a number of metrics to show that the genetic risk varies considerably across individuals.

Introduction

There are several approaches to quantify the contribution of heritable factors to disease^1,2. A straightforward strategy is using familial recurrence risks, e.g., the recurrence risk in monozygotic co-twins, given a co-twin is affected (λ _M), or the recurrence risk in a pair of siblings (λ _S)³. If the relative risk in relatives of affected individuals is different from 1, family related factors influence the risk. Indeed, it has been argued that the majority of such factors are most likely genetic^3–5. The familial risk estimates may have immediate interest for relatives of affected individuals. These estimates are simple predictors of the individual disease risk, and they may be particularly useful when few other risk factors are known. However, these familial risk estimates per se do not yield accurate information about the magnitude and inequality of genetic and environmental risk⁶. Nor do they indicate the relative importance of heritable, common environmental and other factors. The familial risk estimates are purely observational, and do not have a causal interpretation.

Heritability, on the other hand, allows for comparison between heritable and other factors: The heritability denotes the fraction of the variation of the trait that is due to genetic differences². These estimates are characteristics of the population under study, and cannot be immediately generalised to other populations. To interpret the heritability, we must make assumptions about the underlying causal structure, i.e. we must define a causal model^1,7.

Heritability is often used to evaluate the importance of genetic effects, but the interpretation is not always easy. Intuitively, a large heritability may correspond to a large variability in absolute genetic risk. Nevertheless, it is not straightforward to see how the absolute genetic risk distribution depends on heritability. Indeed, for cancer development the contribution of genetic, environmental factors and chance is debated^{8–19, 34, 36}, despite the access to heritability data²⁰.

To better understand the importance of heritable factors, we obtain the distribution of absolute risks due to genetic differences. After estimating the absolute genetic risk distribution, we study the fundamental inequality in cancer risk across individuals, using e.g. Nordic twin data for 15 common cancers²⁰. Our analysis suggests that genetic differences lead to substantial inequality in the risk of several cancers.

Results

Deriving the distribution of absolute genetic risk

Human diseases are often considered to be dichotomous traits; you are either affected or unaffected. For such traits, the heritability of liability is frequently used to study inheritance². The concept implies that every individual has a liability to disease, which is the sum of e.g. several genetic and environmental components. Usually the liability is assumed to be normally distributed in the population, and a threshold on the liability scale determines whether an individual acquires the disease. Hence, the standard liability model is usually interpreted as a threshold model^7,21. This model allows for the decomposition of the variance into genetic and environmental components. It is appealing, because the variance on the liability scale does not depend on the disease prevalence. Furthermore, the normally distributed liability may have some justification in the central limit theorem; if we believe that the liability of a trait is due to several additive genetic and environmental factors, the liability may approximately follow a normal distribution.

In the 1970s a mathematically equivalent interpretation of the threshold model was described, which is based on the genetic liability ι _G, i.e., the liability solely due to genotype²². In the Methods section, we have derived the risk of disease given ι _G, which we denote Y. Indeed, we express the distribution of Y to study how the genetic risk varies on an individual level. Wray et al.²³ use some similar concepts to see that the probit model fits with real, observed family data²⁴. Here, we will use summary estimates of the heritability h ² from twin studies to derive the distribution of Y for 15 common cancers. When the absolute risk distribution is derived, we can obtain various measures of the genetic inequality in risk.

Exploring inequality in risk for 15 cancers

Mucci et al.²⁰ recently reported heritability estimates for 15 common cancers based on the heritability of liability model, using data from Nordic twin registries. We will apply the sampling algorithm described in the Methods section to derive the distribution of absolute risk for these 15 cancers. To illustrate this, Fig. 1 shows the estimated genetic risk distribution for the 4 most common cancers. We interpret the genetic risk as the individual life-time risk of disease, given that the individual’s genetic make-up was known, but the environmental exposure unknown. The interpretation relies on the assumptions underlying the heritability of liability model, e.g. that genetic factors and the environmental factors are independent on the liability scale.

Fig. 1 — Genetic risk distribution for four common cancers. The distribution of risk due to genetic differences is displayed for four common cancers, using heritability and prevalence data from Nordic twin registries²⁰

By obtaining the risk distributions, we are able to explore the genetic contribution to disease risk. To do this, we will suggest some useful summary measures.

Gini index

First, we use the Lorenz curve, and its summary measure the Gini index. Although rarely used in medicine and epidemiology, this metric adequately describes the variation in disease risk^25,26. Importantly, it allows for comparison across measurement scales; the Gini index does not depend on the cumulative risk of a disease in a population (or the total size of an economy), neither on the size of the population itself. It only relies on the relative mean absolute difference between individuals²⁶. Crudely, the Gini index is a number between 0 and 1, describing the inequality in disease risk across individuals. More precisely, the Lorenz curve is represented by a function L(S), in which S is a cumulative proportion of the population, and L(S) is the fraction of the total risk that is carried by S. E.g. if the risk is equal among subjects in the population, the fraction of risk carried by any 50% of the population would be L(0.5) = 0.5, which means that the Lorenz curve is a straight line. The Gini index is a ratio describing the deviation from this straight line, which can be interpreted as a coefficient of deviation in risk, either on the absolute or the relative scale²⁶ (A formal mathematical derivation is found in the Methods section).

In our context, a Gini index of 0 means that everybody has the same genetic risk to a particular cancer, whereas a Gini index of 1 implies maximum inequality in risk across individuals. The Gini index is widely used in economics and demography, e.g., to study inequality in income and wealth. In Fig. 2, we show the Gini index for 15 common cancers. The Gini index is derived by using the heritability h ² and life-time risk estimates form a recent Nordic twin study²⁰. The red dashed line denotes the Gini index of income in the USA, using data from the World Bank²⁷. Interestingly, the plot reveals a major inequality in cancer risk for the common cancers. For all specific cancers, the inequality in genetic risk seems to be larger than the inequality in income in the USA. We also studied the genetic risk of cancer overall, using the heritability of acquiring any type of cancer. This heritability estimate is lower than the individual cancers²⁰, which is expected because a factor increasing the risk of a particular cancer does not necessarily increase the risk of other cancers. Still, the Gini index of acquiring any type of cancer was almost as large as the Gini index for income in the USA.

Fig. 2 — Gini indices for 15 common cancers. The Gini indices with 95% confidence intervals are derived by using data from Nordic twin registries²⁰. The red dashed line marks the Gini index of income in the USA

We have displayed the relation between the Gini index and the heritability (Fig. 3a), and the relation between the Gini index and the observed relative risk in monozygotic co-twins of affected individuals (λ _M) (Fig. 3b). The areas of the circles are proportional to the life-time risk of the cancers. The three different measures of genetic contribution are related, but not co-linear, indicating that they capture non-overlapping information about the risk of disease. In particular, for cancer sites with similar heritability, the Gini index is relatively larger for the rarer sites.

Quantile ratios

Alternatively, we may study the inequality in risk by using a quantile ratio. The population is partitioned into subset according to quantiles of genetic risk, and we may estimate the ratio of affected individuals in the highest risk partition compared to the lowest risk partition. This metric is also frequently used to compare incomes in economics, e.g., the 20:20 ratio (RR_20:20) which assess the 20% richest compared to the 20% poorest of a population. Table 1 shows the RR_20:20 of genetic risk, which highlight a substantial difference in risk across subgroups; those in the highest 20 percentile carry substantially more of the disease burden than those in the lowest 20 percentile. In comparison, RR_20:20 for income is ~5 in the UK and ~9 in the USA²⁸.

Table 1.

Summary measures of the genetic risk of 15 common cancers

	Obtained from Mucci et al.²⁰
Site	h ² (%)	env ² (%)	AR (%)	λ_M	GC _h ₂	GC _beta	RR_20:20	RR_interv
Overall cancer	33.0	0.0	32.4	1.4	0.37	0.37	9	0.64
Head and neck	9.0	26.0	0.8	7.5	0.45	0.84	12	0.55
Stomach	22.0	6.0	1.1	6.2	0.64	0.81	66	0.34
Colon	15.0	16.0	2.9	3.8	0.48	0.71	16	0.51
Rectum and anus	14.0	10.0	1.9	3.5	0.49	0.68	17	0.50
Lung	18.0	24.0	3.2	5.5	0.52	0.80	22	0.48
Melanoma	58.0	0.0	1.2	16.3	0.90	0.93	69,946	0.05
Non-Melanoma	43.0	0.0	1.9	7.6	0.79	0.85	997	0.17
Breast	31.0	16.0	9.4	3.0	0.55	0.66	36	0.45
Corpus uteri	27.0	0.0	2.0	3.5	0.66	0.69	87	0.33
Ovary	39.0	0.0	1.6	5.4	0.78	0.79	610	0.19
Prostate	57.0	0.0	10.5	3.6	0.71	0.73	689	0.26
Testis	37.0	24.0	0.5	27.6	0.83	0.96	1339	0.13
Kidney	38.0	0.0	0.8	8.4	0.81	0.85	1049	0.15
Bladder, other urinary organs	30.0	0.0	2.2	4.5	0.68	0.75	120	0.30
Leukemia, other	57.0	0.0	0.6	25.3	0.92	0.95	181350	0.03

Open in a new tab

Summary measures of genetic cancer risk are displayed. Here, h ² denotes heritability of liability estimates, and env ² denotes the contribution of common environmental factors to the variance of liability. AR denotes the absolute risk of cancer, and λ_M denotes the recurrence risk in monozygotic co-twins. The four leftmost columns are obtained from Mucci et al.²⁰ GC _h2 denotes the Gini index derived from the heritability method. GC_beta denotes the Gini index derived from the beta distribution. RR_20:20 denotes the ratio of mean risks from the upper vs the lower 20 percentile. RR_interv describes the relative risk after an intervention in which those in the upper 20 percentile are manipulated to achieve the average risk in the lower 20 percentile

A hypothetical intervention

Related to quantile ratios, we may estimate the effect of hypothetical interventions on particular risk groups. Suppose, for example, that we were able to reduce the genetic risk of each individual in the upper 20 percentile to the average risk in the lowest 20 percentile. This question could be relevant for public health professionals, because it suggests the potential benefit of identifying and subsequently intervening on high-risk populations.

We could calculate the relative risk of such interventions, assuming that the environment is left unaltered. Indeed, this relative risk is immediately obtained from the cumulative risk distribution. Let y ₂₀ denote the 20 percentile of genetic risk and let y ₈₀ denote the 80 percentile. Then

{RR}_{interv .} = \frac{\int_{0}^{y_{80}} y f_{Y} (y) d y + \int_{0}^{y_{20}} y f_{Y} (y) d y}{E (Y)} .

Relative risk estimates after such hypothetical interventions are found in Table 1. Indeed, these risk estimates also suggest a major contribution of genes to disease development; if we, e.g., were able to reduce the risk of prostate cancer in the upper 20 percentile to the average risk in the lower 20 percentile, we would reduce the number of cancers by a proportion of 1 − 0.26 = 0.74.

Using different sources of heritability data

Heritability data may not only be derived from twin studies. Genome-wide association studies (GWAS) allows for the calculation of heritability estimates without relying on family structures^29,30. These estimates account for the variability due to genetic variants tagged by single-nucleotide polymorphisms (SNPs), usually with a population frequency above 1–5%. Such array heritability estimates are therefore considered to be lower bounds of the overall heritability, but may yield important information about the inequality in risk due to genetic variants associated with common SNPs. Lu et al.²⁹ estimated array heritability for a range of cancers, highlighting that array estimates captures approximately half the heritability from older twin studies. We may immediately apply our approaches to explore the inequality in cancer risk due to genetic variants tagged by SNPs. This could yield insight into, e.g., the benefit of targeting genetic variants tagged by SNPs in future interventions. In Fig. 4, we display the Gini indices derived from the array heritability estimates in Lu et al.²⁹, again highlighting the substantial inequailty in genetic risk.

Fig. 4 — Gini indices derived from array heritability estimates. Gini indices with 95% confidence intervals are calculated from array heritability estimates derived from Lu et al.²⁹ The black boxes are based on array heritability removing loci with known association with the cancers. The red dashed line marks the Gini index of income in the USA

Alternative to the threshold model

Although frequently used, the assumptions of the heritability of liability model are not necessarily satisfied¹. Considering the liability to be normally distributed is convenient and may agree with the central limit theorem, but testing this assumption is usually infeasible in practice^7,24, and it may not be robust if the genetic risk is determined by few, rare genes¹. When using twin data, we usually assume no gene-environment interaction on the liability scale^1,31, and we consider monozygotic- and dizygotic twins to share the same amount of environmental factors. Another issue is the confidence intervals of heritability and common environmental components, which are often wide even when hundreds of thousands are included in the study²⁰.

Until now we have based our results on the heritability of liability assumptions. We may, however, suggest a different approach that does not rely on the concept of heritability. We achieve this by assuming that the risk due to both heritable factors and common environment follows a parametric distribution. First, we let this distribution be the beta distribution, which allows for a wide range of shapes of the risk distribution and is bounded by 0 and 1. Importantly, in this model the risk distribution is uniquely defined by the observed recurrence risk (e.g., λ _m) and the disease prevalence⁶. First, we use the beta model to investigate the risk distribution due to the total effect of genes and shared environment. That is, this measure will capture the maximum inequality in risk due to genes and shared environment. Hence, we would generally assume that inequality measures from this approach, e.g., the Gini index, are larger in magnitude than the heritability based estimates. Intuitively, the differences should be relatively large if the shared environmental component is substantial, and relatively small if the common environmental component is minor. In Table 1, the Gini index from the beta models (GC_beta) are shown together with the Gini index from the heritability model $(G C_{h^{2}})$ . The Gini indices from the beta model are generally larger than the estimates from the heritability model. As expected, the discrepancy is larger for the cancers with larger shared environmental components, which may be obtained by twin data as the fraction of the variance on the liability scale due to shared environment²⁰ (env² in Table 1). A plot similar to Fig. 2 including the beta Gini estimates is found in Fig. 5. For the cancers that were studied in both Mucci et al.²⁰ and Lu et al.²⁹, we have also compared twin heritability, array heritability and the estimates derived in this section (Fig. 6).

Fig. 5 — Gini indices from the beta distribution. Gini indices with 95% confidence intervals are displayed for the twin estimates in Fig. 2 (blue) together with estimates from the alternative beta distribution (red). The red dashed line marks the Gini index of income in the USA

Fig. 6 — Comparing Gini indices from different risk distributions. The Gini indices with 95% confidence intervals displayed in Figs. 2, 4 and 5 are shown together. Only the cancers that were reported in both Mucci et al.²⁰ and Lu et al.²⁹ are included. The red dashed line marks the Gini index of income in the USA

We may also use similar derivations for other distributions than the beta distribution. In particular, a distribution equal to f _Y(y) in Eq. (4) of the Methods section could be derived directly by using estimates of λ _M and the life-time disease risk. Then, we replace h ² by $h_{env}^{2}$ in Eq. (4), and we let $h_{env}^{2}$ be a parameter that determines the shape of f _Y(y). Indeed, we may interpret $h_{env}^{2}$ as the fraction of variance on the liability scale due to genes and common environment.

Discussion

The contribution of heritable factors to major diseases is debated^14,16. The antagonising views may arise due to ambiguous use of terminology and misinterpretation of model assumptions^3,18,19,32. To gain deeper insight into the importance of genetic factors in cancer development, we have studied the absolute genetic risk distribution, under explicitly defined models. Thereby we can use measures of inequality that may be easier to understand than heritability itself, e.g. the Gini index and the 20:20 ratio. These measures may be particularly desirable, because comparisons across scales can be made. Indeed, these measures are widely used in economics and demography, and they have also been successfully applied in biology previously³³.

Our results suggest that 15 common cancers show a major inequality in the genetic susceptibility to disease. As a curious comparison, we show that the inequality in cancer risk is larger than the income inequality in the USA. We must emphasise, however, that our main results are based on the basic assumptions of the heritability estimates. In particular, we cannot immediately extrapolate the results outside the study populations.

Nevertheless, the major inequalities in risk suggest that many cancer cases are preventable in principle¹⁸. Even though preventative strategies are lacking today, our analysis therefore suggests that undiscovered targets for interventions may exist, at least in theory. The information on risk inequality may be useful for public health professionals and other decision makers, when prioritising future prevention strategies and research projects. In particular, being able to identify high-risk individuals, and target these individuals for genetic or environmental interventions could be cost-effective strategies.

Fundamentally, our results put the debated role of chance in cancer development into perspective^8,34,35: Irrespective of the definition of chance and the role of randomness in cancer development, we show that the genetic risk varies considerably across individuals. This points to major genetic variability in the individual risk of acquiring cancer. These findings do not contradict the results by either Tomasetti et al.^34,36 or Wu et al.¹⁴ Rather, Tomasetti et al.³⁴ suggest that the cancer incidence at a site is strongly correlated with the number of baseline stem cell divisions at this site. Thereby they study heterogeneity between sites. We rather study heterogeneity within a cancer site, and suggest that environmental and genetic factors lead to major differences between individuals. Despite the seemingly random nature of stem cell mutations, there may be currently unknown processes, which vary across individuals, that influence the risk of particular cancers. Some individuals may be loaded with considerably higher risk than others, due to genetic or common environmental factors. We may denote these individuals as “unlucky”. However, it is not necessarily sensible to assume that they are unlucky due to fundamentally random events¹⁹.

Methods

Deriving the distribution of absolute risk

We will show how the absolute genetic risk distribution is derived from the liability threshold model. To do this, we use the conventional assumptions of the liability model. Let the liability L ~ N(μ = 0, σ ² = 1) be the sum of several components, and let Φ(z) denote the cumulative standard normal distribution. An individual is affected by disease X with life-time risk Pr(X = 1) = 1 − q if

L \geq Φ^{- 1} (q) .

To obtain estimates of h ², it is usually assumed that L has a genetic component

L_{G} ~ N (μ = 0, σ^{2} = h^{2}),

which is independent of the other components. We aim to find

\Pr (X = 1 ∣ι_{G}) = y .

We define L _E = L − L _G, which is the component of L not determined by genotype. Usually, L _G and L _E are assumed to be independent, and therefore L _E ~ N(0, 1 − h ²). Let L _G =ι _G. Then,

L ∣ι_{G} : N (ι_{G}, 1 - h^{2}) .

We are now able to express the probability of disease, given the genetic liability

g (ι_{G}) = P (X = 1 ∣ι_{G}) = P (L > Φ^{- 1} (q) ∣ι_{G}) = P (L∣ ι_{G} > Φ^{- 1} (q)) = Φ (\frac{ι_{G} - Φ^{- 1} (q)}{\sqrt{1 - h^{2}}}) .

This relation has been graphically illustrated by Smith²¹ and a mathematical expression was suggested by Mendell and Elston²². Due to the probit relation between t _G and the absolute risk in Eq. (1), the liability threshold model has also been denoted a probit model²³.

We are interested in how y varies among individuals in the population. Hence, we view Y = g(L _G) as a random variable and let g ⁻¹(Y) = L _G. Then

\begin{matrix} g (L_{G}) = Φ (\frac{L_{G} - Φ^{- 1} (q)}{\sqrt{1 - h^{2}}}) \\ g^{- 1} (Y) = Φ^{- 1} (Y) \sqrt{1 - h^{2}} + Φ^{- 1} (q) \end{matrix}

Simulating the distribution of Y

Equation (1) allows us to simulate the distribution of Y for a particular disease. To do this, we simply draw a standard Gaussian variable for each subject, which represents the genetic liability, and then transform this variable into an absolute risk. The procedure can be described more formally by the following algorithm:

1. Obtain h ² and the population life-time prevalence 1 − q of the disease, e.g. from published data.

2. For each i in (1, …, n), draw the individual liability t _G,i from a normal distribution

L_{G, i} ~ N (μ = 0, σ^{2} = h^{2})

3. For each i, calculate the genetic risk y _i from Eq. (1)

y_{i} = Φ (\frac{ι_{G, i} - Φ^{- 1} (q)}{\sqrt{1 - h^{2}}})

Derivation of the distribution of Y

We may also express the distribution of Y algebraically. The probability density of Y is expressed as

f_{Y} (y) = f_{L_{G}} (g^{- 1} (y)) \times \frac{d g^{- 1} (y)}{d y} = f_{L_{G}} (g^{- 1} (y)) \frac{1}{g' (g^{- 1} (y))},

where $f_{L_{G}}$ denotes the distribution function of $L_{G} ~ N (0, h^{2})$ . Furthermore

g' (g^{- 1} (y)) = \frac{1}{\sqrt{2 π} \sqrt{1 - h^{2}}} \times e^{- \frac{{(\frac{g^{- 1} (y) - Φ^{- 1} (q)}{\sqrt{1 - h^{2}}})}^{2}}{2}} = \frac{1}{\sqrt{2 π} \sqrt{1 - h^{2}}} \times e^{- \frac{{(Φ^{- 1} (y))}^{2}}{2}} .

Finally we plug into Eq. (3) to find

f_{Y} (y) = \frac{\sqrt{1 - h^{2}}}{\sqrt{h^{2}}} e^{- \frac{g^{- 1} {(y)}^{2}}{2 h^{2}}} e^{\frac{{(Φ^{- 1} (y))}^{2}}{2}} = \frac{\sqrt{1 - h^{2}}}{\sqrt{h^{2}}} e^{- \frac{{(Φ^{- 1} (y) \sqrt{1 - h^{2}} + Φ^{- 1} (q))}^{2}}{2 h^{2}} + \frac{{(Φ^{- 1} (y))}^{2}}{2}} .

By the definition of Y, we have that

E (Y) = E_{L_{G}} (P (X = 1 ∣ι_{G})) = P (X = 1) = 1 - q .

The variance of Y can be found numerically by solving

VAR (Y) = E (Y^{2}) - E {(Y)}^{2} = \int_{0}^{1} y^{2} f_{Y} (y) d y - {(1 - q)}^{2} .

These derivations allow us to study how the absolute risk due to genetic differences is distributed in the population.

Theoretic derivation of the Gini index

We will present a formal definition of the Gini index as a function of the Lorenz curve. Let f _Y and F _Y be the probability density function (pdf) and cumulative density function (cdf) of Y, respectively. The Lorenz curve of the distribution of Y is defined as

L (x) = \frac{1}{E (Y)} \int_{0}^{x} t f_{Y} (t) d t, 0 \leq x \leq 1 .

The Gini index of the distribution of Y is then defined as

G_{Y} = 2 \int_{0}^{1} (F_{Y} - L (F_{Y})) d F_{Y} = 2 \int_{0}^{1} (F_{Y} (x) - L (x)) f_{Y} (x) d x .

The last equality (the integral limits) follows since f _Y has support [0,1]. In general, the Gini index of the distribution of Y may easily be found using numerical integration.

For a Beta(α, β) distributed variable, the Gini index is explicitly given as

G_{Beta} = \frac{2 B (2 α, 2 β)}{α B {(α, β)}^{2}},

where B is the beta function³⁷.

Risk due to heritable factors and shared family environment

We assume that the risk of a particular cancer varies continuously across individuals in the population. More precisely, let X _i be a binary variable taking value 1 if a subject is affected and 0 if a subject is unaffected. The probability of developing cancer in individual i, p _i = P(X _i = 1), is drawn from a distribution f(p _i) with support [0,1] and mean μ = E(p _i). Let f(p _i) follow a parametric beta distribution, which allows for a range of shapes. To completely specify f(p _i), we must define E(p _i) and VAR(p _i). We find E(p _i) using published data on the life-time incidence of the disease I _life. To derive an estimate of VAR(p _i), we make use of studies on monozygotic (MZ) twins. Following the terminology of Risch[3], let λ _r denote the risk ratio of a relative of an affected individual. We assume that p _i is equal in a pair of MZ twins. We interpret p _i as the risk of disease due to heritable factors and shared family environment. Then we find λ _M, the risk ratio for disease given a co- MZ twin is affected

λ_{M} = \frac{P (X_{2} = 1 ∣X_{1} = 1)}{P (X_{i} = 1)} = \frac{P (X_{2} = 1, X_{1} = 1)}{P (X_{i} = 1) P (X_{1} = 1)} = \frac{P (X_{2} = 1, X_{1} = 1)}{P {(X_{i} = 1)}^{2}}, since P (X_{1} = 1) = P (X_{i} = 1) = \frac{E (p_{i}^{2})}{E {(p_{i})}^{2}}, since P (X_{1} = 1) = P (X_{2} = 1) = 1 + \frac{V A R (p_{i})}{E {(p_{i})}^{2}} .

Using estimates of λ _M from MZ twin studies, we can find

VAR (p_{i}) = (λ_{M} - 1) E {(p_{i})}^{2} \approx ({\hat{λ}}_{M} - 1) I_{life}^{2} .

Hence, under these assumptions we can completely specify the distribution of risk in the population, f(p _i), if estimates of the cumulative incidence (I _life) and the twin recurrence risk (λ _M) are available. We may interpret this as follows: Each subject obtains a risk (probability of developing disease) due to genetic factors and common environment. Then, this probability, combined with unmeasured individual factors and chance, determines whether the subject gets the disease.

Indeed, we can use exactly the same approach to specify the probit liability distribution in the main text. Then, we use Expression (4) as a parameterisation of the probit liability distribution, with parameters E(y) = 1 − q and $h_{env}^{2}$ . Here, we have replaced h ² by $h_{env}^{2}$ in Eq. (5), because it no longer denotes heritability. Rather, $h_{env}^{2}$ is the fraction of the trait variance on the liability scale due to both heritable factors and common environment. Mathematically, $h_{env}^{2}$ is a shape parameter of the the probit liability distribution. Then, we combine Expressions (3) and (5) to

(λ_{M} - 1) E {(p_{i})}^{2} - E (Y^{2}) + E {(Y)}^{2} = 0 (λ_{M} - 1) {(1 - q)}^{2} - \int_{0}^{1} y^{2} f_{Y} (y) d y + {(1 - q)}^{2} = 0 (λ_{M} - 1) {(1 - q)}^{2} - \int_{0}^{1} y^{2} \frac{\sqrt{1 - h_{env}^{2}}}{\sqrt{h_{env}^{2}}} e^{- \frac{{(Φ^{- 1} (y) \sqrt{1 - h_{env}^{2}} + Φ^{- 1} (q))}^{2}}{2 h_{env}^{2}} + \frac{{(Φ^{- 1} (y))}^{2}}{2}} d y + {(1 - q)}^{2} = 0 .

Indeed, Expression (8) can be solved numerically to find $h_{e n v}^{2}$ .

Numeric results

To derive our numeric estimates, we have used the results from Tables 2 and 3 in Mucci et al.²⁰ and Table 2 in Lu et al.²⁹ Confidence intervals were obtain by inserting the confidence bounds reported in Mucci et al.²⁰ and Lu et al.²⁹ into our expressions for genetic risk. For the beta distribution, we used the confidence intervals in Table 2 in Mucci et al.²⁰ for recurrence risks in monozygotic twins. All our numeric results were obtained by two independent approaches, numeric integration of analytic expressions and simulations. Both approaches yielded the same results.

Code availability

The computer code for all the calculations was written in R version 3.3.2 using RStudio version 1.0.136. This computer code is available in Supplementary Data 1.

Data availability

We have solely used data that are readily available in previously published articles^20,29.

Electronic supplementary material

Peer Review File^{(199.5KB, pdf)}

41467_2017_1284_MOESM2_ESM.pdf^{(167.9KB, pdf)}

Description of Additional Supplementary Files

Supplementary Data 1^{(12.5KB, txt)}

Acknowledgements

This work was partially supported by the Norwegian Cancer Society, grant number 4493570, and the Nordic Cancer Union, grant number 186031. We thank Odd O. Aalen for his valuable comments to the manuscript.

Author contributions

M.J.S. conceived the study. M.J.S. and M.V. performed and interpreted the data analysis. M.J.S. and M.V. drafted and critically revised the article. M.J.S. and M.V. approved the final version of the manuscript.

Competing interests

The authors declare no competing financial interests.

Footnotes

Electronic supplementary material

Supplementary Information accompanies this paper at doi:10.1038/s41467-017-01284-y.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Risch N. The genetic epidemiology of cancer. Cancer Epidemiol. Biomarkers. Prev. 2001;10:733–741. [PubMed] [Google Scholar]
2.Visscher PM, Hill WG, Wray NR. Heritability in the genomics era—concepts and misconceptions. Nat. Rev. Genet. 2008;9:255–266. doi: 10.1038/nrg2322. [DOI] [PubMed] [Google Scholar]
3.Risch N. Linkage strategies for genetically complex traits. I. Multilocus models. Am. J. Hum. Genet. 1990;46:222–228. [PMC free article] [PubMed] [Google Scholar]
4.Khoury MJ, Beaty TH, Kung-Yee L. Can familial aggregation of disease be explained by familial aggregation of environmental risk factors? Am. J. Epidemiol. 1988;127:674–683. doi: 10.1093/oxfordjournals.aje.a114842. [DOI] [PubMed] [Google Scholar]
5.Aalen OO. Modelling the influence of risk factors on familial aggregation of disease. Biometrics. 1991;47:933–945. doi: 10.2307/2532650. [DOI] [PubMed] [Google Scholar]
6.Valberg, M., Stensrud, M. J. & Aalen, O. O. The surprising implications of familial association in disease risk. arXiv preprint arXiv:1707.00014 (2017). [DOI] [PMC free article] [PubMed]
7.Tenesa A, Haley CS. The heritability of human disease: estimation, uses and abuses. Nat. Rev. Genet. 2013;14:139–149. doi: 10.1038/nrg3377. [DOI] [PubMed] [Google Scholar]
8.Tomasetti C, Vogelstein B. Cancer risk: role of environment—response. Science. 2015;347:729–731. doi: 10.1126/science.aaa6592. [DOI] [PubMed] [Google Scholar]
9.Tomasetti, C. & Vogelstein, B. Musings on the theory that variation in cancer risk among tissues can be explained by the number of divisions of normal stem cells. arXiv preprint arXiv:1501.05035 (2015). [DOI] [PMC free article] [PubMed]
10.Thomas F, Roche B, Ujvari B. Intrinsic versus extrinsic cancer risks: the debate continues. Trends Cancer. 2016;2:68–69. doi: 10.1016/j.trecan.2016.01.004. [DOI] [PubMed] [Google Scholar]
11.Couzin-Frankel J. The bad luck of cancer. Science. 2015;347:12–12. doi: 10.1126/science.347.6217.12. [DOI] [PubMed] [Google Scholar]
12.Weinberg C, Zaykin D. Is bad luck the main cause of cancer? J. Natl Cancer I. 2015;107:djv125. doi: 10.1093/jnci/djv125. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Luzzatto L, Pandolfi PP. Causality and chance in the development of cancer. N. Engl. J. Med. 2015;373:84–88. doi: 10.1056/NEJMsb1502456. [DOI] [PubMed] [Google Scholar]
14.Wu S, Powers S, Zhu W, Hannun YA. Substantial contribution of extrinsic risk factors to cancer development. Nature. 2016;529:43–47. doi: 10.1038/nature16166. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Noble R, Kaltz O, Hochberg ME. Peto’s paradox and human cancers. Phil. Trans. R. Soc. B. 2015;370:20150104. doi: 10.1098/rstb.2015.0104. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Noble RJ, Kaltz O, Nunney L, Hochberg ME. Overestimating the role of environment in cancers. Cancer Prev. Res. 2016;9:773–776. doi: 10.1158/1940-6207.CAPR-16-0126. [DOI] [PubMed] [Google Scholar]
17.Tarone RE. RE: is bad luck the main cause of cancer? J. Natl Cancer I. 2015;107:djv227. doi: 10.1093/jnci/djv227. [DOI] [PubMed] [Google Scholar]
18.Smith GD, Relton CL, Brennan P. Chance, choice and cause in cancer aetiology: individual and population perspectives. Int. J. Epidemiol. 2016;45:605–613. doi: 10.1093/ije/dyw224. [DOI] [PubMed] [Google Scholar]
19.Stensrud MJ, Strohmaier S, Valberg M, Aalen OO. Can chance cause cancer? A causal consideration. Eur. J. Cancer. 2017;75:83–85. doi: 10.1016/j.ejca.2016.12.022. [DOI] [PubMed] [Google Scholar]
20.Mucci LA, et al. Familial risk and heritability of cancer among twins in nordic countries. JAMA. 2016;315:68–76. doi: 10.1001/jama.2015.17703. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Smith C. Recurrence risks for multifactorial inheritance. Am. J. Hum. Genet. 1971;23:578. [PMC free article] [PubMed] [Google Scholar]
22.Mendell NR, Elston R. Multifactorial qualitative traits: genetic analysis and prediction of recurrence risks. Biometrics. 1974;30:41–57. doi: 10.2307/2529616. [DOI] [PubMed] [Google Scholar]
23.Wray NR, Goddard ME. Multi-locus models of genetic risk of disease. Genome Med. 2010;2:10. doi: 10.1186/gm131. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Visscher PM, Wray NR. Concepts and misconceptions about the polygenic additive model applied to disease. Hum. Hered. 2016;80:165–170. doi: 10.1159/000446931. [DOI] [PubMed] [Google Scholar]
25.Mauguen A, Begg CB. Using the Lorenz curve to characterize risk predictiveness and etiologic heterogeneity. Epidemiology. 2016;27:531–537. doi: 10.1097/EDE.0000000000000499. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Lee W-C. Characterizing exposure–disease association in human populations using the Lorenz curve and Gini index. Stat. Med. 1997;16:729–739. doi: 10.1002/(SICI)1097-0258(19970415)16:7<729::AID-SIM491>3.0.CO;2-A. [DOI] [PubMed] [Google Scholar]
27.World Bank. World Bank Gini Inidices (2017). URL: http://data.worldbank.org/indicator/SI.POV.GINI.
28.World Bank. Qunitle of income from http://data.worldbank.org (2017).
29.Lu Y, et al. Most common’sporadic’ cancers have a significant germline genetic component. Hum. Mol. Genet. 2014;23:6112–6118. doi: 10.1093/hmg/ddu312. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Sampson JN, et al. Analysis of heritability and shared heritability based on genome-wide association studies for 13 cancer types. J. Natl Cancer I. 2015;107:djv279. doi: 10.1093/jnci/djv279. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Benchek PH, Morris NJ. How meaningful are heritability estimates of liability? Hum. Genet. 2013;132:1351–1360. doi: 10.1007/s00439-013-1334-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Wray NR, Yang J, Goddard ME, Visscher PM. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 2010;6:e1000864. doi: 10.1371/journal.pgen.1000864. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Wittebolle L, et al. Initial community evenness favours functionality under selective stress. Nature. 2009;458:623–626. doi: 10.1038/nature07840. [DOI] [PubMed] [Google Scholar]
34.Tomasetti C, Li L, Vogelstein B. Stem cell divisions, somatic mutations, cancer etiology, and cancer prevention. Science. 2017;355:1330–1334. doi: 10.1126/science.aaf9011. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Nowak MA, Waclaw B. Genes, environment, and “bad luck”. Science. 2017;355:1266–1267. doi: 10.1126/science.aam9746. [DOI] [PubMed] [Google Scholar]
36.Tomasetti C, Vogelstein B. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science. 2015;347:78–81. doi: 10.1126/science.1260825. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Pham-Gia T, Turkkan N. Determination of the Beta distribution form its Lorenz curve. Math. Comput. Model. 1992;16:73–84. doi: 10.1016/0895-7177(92)90008-9. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Peer Review File^{(199.5KB, pdf)}

41467_2017_1284_MOESM2_ESM.pdf^{(167.9KB, pdf)}

Description of Additional Supplementary Files

Supplementary Data 1^{(12.5KB, txt)}

Data Availability Statement

We have solely used data that are readily available in previously published articles^20,29.

[CR1] 1.Risch N. The genetic epidemiology of cancer. Cancer Epidemiol. Biomarkers. Prev. 2001;10:733–741. [PubMed] [Google Scholar]

[CR2] 2.Visscher PM, Hill WG, Wray NR. Heritability in the genomics era—concepts and misconceptions. Nat. Rev. Genet. 2008;9:255–266. doi: 10.1038/nrg2322. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Risch N. Linkage strategies for genetically complex traits. I. Multilocus models. Am. J. Hum. Genet. 1990;46:222–228. [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Khoury MJ, Beaty TH, Kung-Yee L. Can familial aggregation of disease be explained by familial aggregation of environmental risk factors? Am. J. Epidemiol. 1988;127:674–683. doi: 10.1093/oxfordjournals.aje.a114842. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Aalen OO. Modelling the influence of risk factors on familial aggregation of disease. Biometrics. 1991;47:933–945. doi: 10.2307/2532650. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Valberg, M., Stensrud, M. J. & Aalen, O. O. The surprising implications of familial association in disease risk. arXiv preprint arXiv:1707.00014 (2017). [DOI] [PMC free article] [PubMed]

[CR7] 7.Tenesa A, Haley CS. The heritability of human disease: estimation, uses and abuses. Nat. Rev. Genet. 2013;14:139–149. doi: 10.1038/nrg3377. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Tomasetti C, Vogelstein B. Cancer risk: role of environment—response. Science. 2015;347:729–731. doi: 10.1126/science.aaa6592. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Tomasetti, C. & Vogelstein, B. Musings on the theory that variation in cancer risk among tissues can be explained by the number of divisions of normal stem cells. arXiv preprint arXiv:1501.05035 (2015). [DOI] [PMC free article] [PubMed]

[CR10] 10.Thomas F, Roche B, Ujvari B. Intrinsic versus extrinsic cancer risks: the debate continues. Trends Cancer. 2016;2:68–69. doi: 10.1016/j.trecan.2016.01.004. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Couzin-Frankel J. The bad luck of cancer. Science. 2015;347:12–12. doi: 10.1126/science.347.6217.12. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Weinberg C, Zaykin D. Is bad luck the main cause of cancer? J. Natl Cancer I. 2015;107:djv125. doi: 10.1093/jnci/djv125. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Luzzatto L, Pandolfi PP. Causality and chance in the development of cancer. N. Engl. J. Med. 2015;373:84–88. doi: 10.1056/NEJMsb1502456. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Wu S, Powers S, Zhu W, Hannun YA. Substantial contribution of extrinsic risk factors to cancer development. Nature. 2016;529:43–47. doi: 10.1038/nature16166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Noble R, Kaltz O, Hochberg ME. Peto’s paradox and human cancers. Phil. Trans. R. Soc. B. 2015;370:20150104. doi: 10.1098/rstb.2015.0104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Noble RJ, Kaltz O, Nunney L, Hochberg ME. Overestimating the role of environment in cancers. Cancer Prev. Res. 2016;9:773–776. doi: 10.1158/1940-6207.CAPR-16-0126. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Tarone RE. RE: is bad luck the main cause of cancer? J. Natl Cancer I. 2015;107:djv227. doi: 10.1093/jnci/djv227. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Smith GD, Relton CL, Brennan P. Chance, choice and cause in cancer aetiology: individual and population perspectives. Int. J. Epidemiol. 2016;45:605–613. doi: 10.1093/ije/dyw224. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Stensrud MJ, Strohmaier S, Valberg M, Aalen OO. Can chance cause cancer? A causal consideration. Eur. J. Cancer. 2017;75:83–85. doi: 10.1016/j.ejca.2016.12.022. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Mucci LA, et al. Familial risk and heritability of cancer among twins in nordic countries. JAMA. 2016;315:68–76. doi: 10.1001/jama.2015.17703. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Smith C. Recurrence risks for multifactorial inheritance. Am. J. Hum. Genet. 1971;23:578. [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Mendell NR, Elston R. Multifactorial qualitative traits: genetic analysis and prediction of recurrence risks. Biometrics. 1974;30:41–57. doi: 10.2307/2529616. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Wray NR, Goddard ME. Multi-locus models of genetic risk of disease. Genome Med. 2010;2:10. doi: 10.1186/gm131. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Visscher PM, Wray NR. Concepts and misconceptions about the polygenic additive model applied to disease. Hum. Hered. 2016;80:165–170. doi: 10.1159/000446931. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Mauguen A, Begg CB. Using the Lorenz curve to characterize risk predictiveness and etiologic heterogeneity. Epidemiology. 2016;27:531–537. doi: 10.1097/EDE.0000000000000499. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Lee W-C. Characterizing exposure–disease association in human populations using the Lorenz curve and Gini index. Stat. Med. 1997;16:729–739. doi: 10.1002/(SICI)1097-0258(19970415)16:7<729::AID-SIM491>3.0.CO;2-A. [DOI] [PubMed] [Google Scholar]

[CR27] 27.World Bank. World Bank Gini Inidices (2017). URL: http://data.worldbank.org/indicator/SI.POV.GINI.

[CR28] 28.World Bank. Qunitle of income from http://data.worldbank.org (2017).

[CR29] 29.Lu Y, et al. Most common’sporadic’ cancers have a significant germline genetic component. Hum. Mol. Genet. 2014;23:6112–6118. doi: 10.1093/hmg/ddu312. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Sampson JN, et al. Analysis of heritability and shared heritability based on genome-wide association studies for 13 cancer types. J. Natl Cancer I. 2015;107:djv279. doi: 10.1093/jnci/djv279. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Benchek PH, Morris NJ. How meaningful are heritability estimates of liability? Hum. Genet. 2013;132:1351–1360. doi: 10.1007/s00439-013-1334-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Wray NR, Yang J, Goddard ME, Visscher PM. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 2010;6:e1000864. doi: 10.1371/journal.pgen.1000864. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Wittebolle L, et al. Initial community evenness favours functionality under selective stress. Nature. 2009;458:623–626. doi: 10.1038/nature07840. [DOI] [PubMed] [Google Scholar]

[CR34] 34.Tomasetti C, Li L, Vogelstein B. Stem cell divisions, somatic mutations, cancer etiology, and cancer prevention. Science. 2017;355:1330–1334. doi: 10.1126/science.aaf9011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Nowak MA, Waclaw B. Genes, environment, and “bad luck”. Science. 2017;355:1266–1267. doi: 10.1126/science.aam9746. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Tomasetti C, Vogelstein B. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science. 2015;347:78–81. doi: 10.1126/science.1260825. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Pham-Gia T, Turkkan N. Determination of the Beta distribution form its Lorenz curve. Math. Comput. Model. 1992;16:73–84. doi: 10.1016/0895-7177(92)90008-9. [DOI] [Google Scholar]

PERMALINK

Inequality in genetic cancer risk suggests bad genes rather than bad luck

Mats Julius Stensrud

Morten Valberg

Abstract

Introduction

Results

Deriving the distribution of absolute genetic risk

Exploring inequality in risk for 15 cancers

Fig. 1.

Gini index

Fig. 2.

Fig. 3.

Quantile ratios

Table 1.

A hypothetical intervention

Using different sources of heritability data

Fig. 4.

Alternative to the threshold model

Fig. 5.

Fig. 6.

Discussion

Methods

Deriving the distribution of absolute risk

Simulating the distribution of Y

Derivation of the distribution of Y

Theoretic derivation of the Gini index

Risk due to heritable factors and shared family environment

Numeric results

Code availability

Data availability

Electronic supplementary material

Acknowledgements

Author contributions

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases