Abstract
We critically examined existing approaches for the estimation of the excess familial risk of cancer which can be attributed to identified common genetic risk variants and propose an alternative, more straightforward approach for calculating this proportion using well-established epidemiological methodology. We applied the underlying equations of the traditional approaches and the new epidemiological approach for colorectal cancer (CRC) in a large population-based case-control study in Germany with 4447 cases and 3480 controls, who were recruited from 2003–2016 and for whom interview, medical and genomic data were available. Having a family history of CRC (FH) was associated with a 1.77-fold risk increase in our study population (95% CI 1.52–2.07). Traditional approaches yielded estimates of the FH-associated risk explained by 97 common genetics variants from 9.6% to 23.1%, depending on various assumptions. Our alternative approach resulted in smaller and more consistent estimates of this proportion, ranging from 5.4% to 14.3%. Commonly employed methods may lead to strongly divergent and possibly exaggerated estimates of excess familial risk of cancer explained by associated known common genetic variants. Our results suggest that familial risk and risk associated with known common genetic variants might reflect two complementary major sources of risk.
Keywords: genetic epidemiology, familial risk, common genetic variants, risk proportion, excess risk
Introduction
In the era of genome wide association studies (GWAS), many single nucleotide polymorphisms (SNPs) have been found to be associated with a higher risk for various types of cancers1–3. With increasing sample size of the GWAS consortia, the power to detect common variants with small effects has rapidly increased, and discovery of several SNPs at once is now the rule rather than the exception (e.g.4, 5).
Many of the SNPs discovery studies included estimates of how much of the familial risk of cancer can be attributed to previously known and the newly discovered genetic variant(s). They mostly employed an equation first proposed by Cox et al. in 2007, or slight modifications thereof6–9 which includes two major components:
the relative risk attributable to a given SNP, commonly denoted as , where p is the population frequency of the minor allele, q = 1-p, and r1 and r2 are the relative risks for heterozygotes and rare homozygotes relative to common homozygotes, and
the overall familial relative risk estimated from epidemiological studies, commonly denoted as λo. The share of the familial risk attributable to the SNP is then obtained as log λ * / λo. If applied to multiple independent SNPs in a multiplicative model, the numerator consists of the sum of the log(λ*) across SNPs.
Although this approach seems straightforward, there are a number of issues that deserve critical discussion.
First, the relative risk estimates typically come from the large GWAS that detected the SNPs, and will - if winner’s curse is not appropriately addressed - typically be lower in independent samples, which may lead to substantial overestimation of the proportion of FH risk explained by the SNPs.
Second, if the contribution of multiple SNPs is not completely independent but these SNPs are in some linkage disequilibrium, the numerator of the equation may be substantially overestimated. On the other hand, if too restrictive “inclusion criteria” are employed that would consider only totally independent SNPs, the complementary information which correlated SNPs may still convey would be lost, leading to potential underestimation of the numerator.
Third, the estimator for relative risk associated with FH in the denominator is commonly taken from pooled estimates from epidemiological studies and may differ from the relative risk in the populations used to derive the genetic risks.
Fourth, the implicit partitioning of the excess risk of family history into some proportion that is explained by known genetic variants and some proportion that is explained by yet to be identified genetic factors neglects the fact that a substantial proportion of familial aggregation may be due to other reasons, such as familial aggregation of environmental or lifestyle risk factors.
Fifth, carriers of common, low penetrance risk alleles are not restricted to persons with a FH. In fact, given the limited proportion of persons with a FH, the common, low penetrance risk alleles occur more often in persons without FH. These risk alleles are hence not restricted to familial risk, but in fact convey risk independent of FH.
In this article, we propose an alternative, straightforward “epidemiological approach”, using well-established epidemiological methodology that is unaffected by these concerns. We will use colorectal cancer (CRC), the third most common cancer globally10, whose heritability was estimated to be 35%11, as an example to demonstrate our approach. Next to others, family history (FH) has been identified as a major CRC risk factor12. In the past decade, more and more SNPs associated with CRC risk have been discovered by GWAS (e.g.13–16). Risk increases associated with single SNPs are mostly very small, with odds ratios (OR) for the risk alleles ranging between 1 and 1.1, but polygenetic scores based on multiple SNPs were shown to be highly predictive of CRC risk17–20.
Methods
Alternative epidemiological approach
In the proposed approach both genomic data and familial risk are derived from the same dataset. The underlying concept is graphically depicted in Supplementary Figure 1. The concept reflects the causal role of both genetic and environmental factors for CRC risk. The association of FH with CRC risk reflects clustering of both types of factors within families. Based on this model, we propose to estimate the proportion of the CRC risk that is associated with having a FH (which is most commonly defined as FH in a first-degree relative) that can be explained by common genetic variants by
Where RRa is the relative risk (RR) for FH which is adjusted for common genetic variants, RRb is the RR for FH which is not adjusted for common genetic variants and both RRs are adjusted for environmental factors affecting CRC risk. Should the genetic variants not explain any of the FH-associated risk, RRa would equal RRb, in which case Prop(SNPs) would be 0. Should the genetic variants completely (100%) explain the excess CRC risk for a FH beyond the excess risk explained by environmental factors, RRa would equal 1. In practice, one would expect RRa to be between 1.0 and RRb and Prop(SNPs) to be between 0 and 1 (100%). Analogous calculations could be made using the log(RRs) rather than the RRs of both unadjusted and adjusted FH risk estimates (results are presented in the supplement). In case-control studies, RRs are commonly approximated by odds ratios (ORs).
Study population
Data for the current analyses were taken from the DACHS study (Darmkrebs: Chancen der Verhütung durch Screening), which has been described in detail elsewhere21, 22. Briefly, DACHS is an ongoing population-based case-control study in southwest Germany. Patients with a first diagnosis of CRC aged at least 30 years are eligible for participation. Recruitment is conducted by all 22 hospitals in the study area which offer first line treatment to patients with CRC. Controls are randomly selected from population registries using frequency matching with respect to sex, age and county of residence. This analysis was based on 4447 cases and 3480 controls who were recruited from 2003 to 2016 and for whom genomic data were available. The ethics committees of the Medical Faculty at the University of Heidelberg and the Medical Chambers of Baden-Württemberg and Rhineland-Palatinate approved the study. Written informed consent was obtained from all participants.
Data collection
Standardized in-person interviews were conducted with both cases (typically during their hospital stay) and controls (at their homes) by trained interviewers. Detailed information about the participants’ family history and a variety of other risk and preventive factors was collected, and blood or buccal samples were taken. All CRC cases were histologically confirmed.
Genotyping
DNA was extracted from blood samples (in 99.1% of participants) or from buccal cells (in 0.9% of participants) using conventional methods. Details about genotyping and imputation are provided in Supplementary Table 1.
Identification and selection of SNPs for the genetic risk score
A literature review was conducted to find SNPs that were found to be associated with a higher risk for persons of European descent as reported in detail elsewhere20. Of 105 identified SNPs (Supplementary Table 2), 97 could be reliably measured or imputed. For six SNPs, the risk allele in our sample was not the same as reported in the respective discovery study (rs1957636, rs72647484, rs7259371, rs2696839, rs11884596, rs2516420). The correlation between the SNPs measured with the correlation coefficient from the ‘genetics’ package in R is depicted in Supplementary Figure 2.
Statistical analyses
A genetic risk score (GRS) was derived with various approaches. It was both calculated unweighted, as mere sum of risk alleles, or as weighted sum of risk alleles, with weights equal to the log of the per-risk-allele-OR as reported in the discovery study. We furthermore applied various linkage disequilibrium (LD) thresholds (no LD, D’≥0.95, D’≥0.5, D’≥0.1, or max. one SNP per locus) for inclusion of SNPs in the GRS (only including the most significant SNP of all SNPs in LD in logistic regression models within our sample), which resulted in different numbers of included SNPs (97, 90, 80, 71 and 59, respectively). Additionally, both continuous GRS and categorized GRS were analyzed, with GRS categories defined by percentiles of the GRS distribution in controls: 0–10, 10–20, 20–40, 40–60, 60–80, 80–90, 90–100.
Proportions of FH risk explained by common genetic variants were estimated according to four traditional approaches, using the formulas given by the respective paper, for four different ORs for FH, for five different inclusion criteria for common genetic variants under observation and both with relative risk estimates and risk allele frequencies for the genetic variants from the discovery studies and with relative risk estimates and risk allele frequencies from the DACHS study.
Next, we employed the newly suggested epidemiological approach and estimated the proportion of FH risk explained by common genetic variants by , where both ORs were estimated from multiple logistic regression models as outlined above. Different methods of handling the genetic information (GRS vs. separate variables) and, again, different LD thresholds were used for calculating the proportion of explained familial risk. Confidence intervals were computed using bootstrapping methods (n=1000).
Data availability
Data will be made available upon reasonable request.
Results
Table 1 shows some main characteristics of the study population used for this analysis. Sex and age were equally distributed among cases and controls, reflecting matching. FH in a first-degree relative was more common in cases than controls (13.7% vs. 10.0%, p-value <.0001), and a higher proportion of cases had a higher GRS compared to controls (p-value <.0001). Figure 1 depicts the distribution of the GRS among cases and controls with relation to FH. A major difference in the distribution of the GRS was seen between cases and controls, but not between participants with and without FH, neither in cases nor in controls.
Table 1.
Characteristic | Group | Cases | Controls | p-valuea |
---|---|---|---|---|
n = 4447 | n = 3480 | |||
Sex | Female | 1749 (39.3) | 1344 (38.6) | 0.5206 |
Male | 2698 (60.7) | 2136 (61.4) | ||
Age | <50 | 220 (5.0) | 146 (4.2) | 0.6458b |
50–59 | 681 (15.3) | 521 (15.0) | ||
60–69 | 1333 (30.0) | 1080 (31.0) | ||
70–79 | 1519 (34.2) | 1165 (33.5) | ||
≥80 | 694 (15.6) | 568 (16.3) | ||
Personal history of colonoscopy | 1127 (25.4) | 2027 (58.3) | <.0001 | |
Family history | No FH | 3487 (78.4) | 2896 (83.2) | <.0001 |
SDR onlyc | 352 (7.9) | 236 (6.8) | ||
FDRd | 608 (13.7) | 348 (10.0) | ||
Genetic risk scoree | Very low | 274 (6.2) | 348 (10.0) | <.0001 |
Low | 335 (7.5) | 348 (10.0) | ||
Low-medium | 739 (16.6) | 696 (20.0) | ||
Medium | 880 (19.8) | 696 (20.0) | ||
Medium-high | 961 (21.6) | 696 (20.0) | ||
High | 611 (13.7) | 348 (10.0) | ||
Very high | 647 (14.6) | 348 (10.0) | ||
Cancer stage | I | 1035 (23.3) | - | - |
II | 1343 (30.2) | - | ||
III | 1423 (32.0) | - | ||
IV | 647 (14.5) | - |
Note: values are expressed as n (%)
Abbreviations: FDR, first-degree relative; FH, family history; SDR, second-degree relative
Test for differences between cases and controls
P-value for t-test with continuous variable
FH in any SDR only
FH in at least one FDR
Classification of GRS: Very low, ≤10th percentile; Low, 10th-20th percentile; Low-medium, 20th-40th percentile; Medium, 40th-60th percentile; Medium-high, 60th-80th percentile; High, 80th-90th percentile; Very high, >90th percentile; GRS generated with weighted risk alleles (weights equaling the beta-coefficient as found in the respective discovery study) of SNPs not in LD (cut-off 0.95)
Traditional estimates of the proportion of FH-associated risk that can be statistically explained by common genetic variants are shown in Table 2. The estimated proportions ranged widely, depending on the combination of the employed criteria (LD, assumed OR for FH, estimation approach). Estimates of the proportion explained by the genetic variants were generally much lower when they were based on the relative risk estimates and risk allele frequencies from the DACHS study rather than the discovery studies, but substantial variation was observed even within both groups of estimates, ranging from 9.6 to 23.1% and from 14.2 to 42.5%, respectively.
Table 2.
Estimated proportion of explained family history risk [%] based on risk estimates for SNPs from | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Assumed risk increase by family history (odds ratio) | Approach | Discovery study | DACHS studya | ||||||||
All SNPsb (NSNPs=97) |
Excluding SNPs in LD | Max. 1 SNP per locus | All SNPsb (NSNPs=97) |
Excluding SNPs in LD | Max. 1 SNP per locus | ||||||
D’≥0.95 (NSNPs=90) |
D’≥0.5 (NSNPs=80) |
D’≥0.1 (NSNPs=71) |
(NSNPs=59) | D’≥0.95 (NSNPs=90) |
D’≥0.5 (NSNPs=80) |
D’≥0.1 (NSNPs=71) |
(NSNPs=59) | ||||
2.4 | Cox 2007 [6] | 25.1 | 21.2 | 18.4 | 15.7 | 14.2 | 15.0 | 13.9 | 11.9 | 11.2 | 10.5 |
Pharoah 2008 [7] | 29.9 | 25.5 | 22.0 | 18.8 | 17.0 | 15.8 | 14.5 | 12.4 | 11.7 | 10.9 | |
Zheng 2013 [8] | 27.8 | 24.0 | 20.7 | 17.6 | 16.0 | 15.0 | 13.9 | 11.9 | 11.2 | 10.5 | |
Jenkins 2016 [9] | 26.4 | 22.5 | 19.5 | 16.7 | 15.1 | 14.0 | 12.8 | 11.0 | 10.3 | 9.6 | |
2.2 | Cox 2007 [6] | 27.8 | 23.5 | 20.3 | 17.5 | 15.7 | 16.7 | 15.4 | 13.2 | 12.5 | 11.7 |
Pharoah 2008 [7] | 31.2 | 26.6 | 23.0 | 19.7 | 17.8 | 16.5 | 15.1 | 13.0 | 12.2 | 11.4 | |
Zheng 2013 [8] | 30.9 | 26.6 | 23.0 | 19.6 | 17.8 | 16.7 | 15.4 | 13.2 | 12.4 | 11.7 | |
Jenkins 2016 [9] | 29.3 | 25.0 | 21.6 | 18.5 | 16.7 | 15.5 | 14.2 | 12.2 | 11.5 | 10.7 | |
2.0 | Cox 2007 [6] | 31.6 | 26.7 | 23.2 | 19.9 | 17.9 | 19.0 | 17.5 | 15.1 | 14.2 | 13.3 |
Pharoah 2008 [7] | 32.7 | 27.9 | 24.1 | 20.6 | 18.7 | 17.3 | 15.9 | 13.6 | 12.8 | 12.0 | |
Zheng 2013 [8] | 35.1 | 30.3 | 26.2 | 22.3 | 20.3 | 19.0 | 17.5 | 15.1 | 14.2 | 13.3 | |
Jenkins 2016 [9] | 33.3 | 28.5 | 24.6 | 21.0 | 19.0 | 17.7 | 16.2 | 13.9 | 13.0 | 12.2 | |
1.77c | Cox 2007 [6] | 38.4 | 32.5 | 28.1 | 24.1 | 21.7 | 23.1 | 21.3 | 18.3 | 17.2 | 16.2 |
Pharoah 2008 [7] | 34.8 | 29.7 | 25.7 | 21.9 | 19.8 | 18.4 | 16.9 | 14.5 | 13.6 | 12.7 | |
Zheng 2013 [8] | 42.7 | 36.8 | 31.7 | 27.0 | 24.6 | 23.1 | 21.2 | 18.3 | 17.2 | 16.2 | |
Jenkins 2016 [9] | 40.5 | 34.6 | 29.9 | 25.5 | 23.1 | 21.5 | 19.7 | 16.9 | 15.8 | 14.8 |
Estimates based on odds ratios and allele frequencies as observed in the DACHS study
In the DACHS study, 97 out of 105 SNPs could be analyzed (see Supplementary Table 2)
FH estimate obtained in the DACHS study
In the DACHS study, the OR for FH derived from a model adjusted for the matching variables sex and age and a number of environmental factors was 1.77 (95% confidence interval (CI) 1.52 to 2.07), and this OR estimate was only slightly altered by additional adjustment for the GRS, regardless of how restrictive inclusion of SNPs was with respect to LD (Table 3). The proportion of FH associated risk explained by common genetic variants could be quantified as between 6.4% (95% CI: 1.5 to 12.3%) and 8.9% (95% CI: 4.0 to 14.8%) by the epidemiological approach. Very similar results were obtained in sensitivity analyses using the log(ORs) instead of the ORs (Supplementary Table 3).
Table 3.
FH of CRC1 | No FH | FH in a FDR | % of explained familial risk2 |
---|---|---|---|
Cases, n (%) | 3487 (85.2) | 608 (14.8) | - |
Controls, n (%) | 2896 (89.3) | 348 (10.7) | - |
Model 1, OR (95% CI)a | Ref. | 1.77 (1.52 to 2.07) | - |
Model 2, OR (95% CI)b | Ref. | 1.70 (1.46 to 1.99) | 8.94 (4.02–14.81) |
Model 3, OR (95% CI)c | Ref. | 1.70 (1.46 to 1.99) | 8.94 (3.16–15.45) |
Model 4, OR (95% CI)d | Ref. | 1.71 (1.46 to 2.00) | 7.71 (2.18–13.91) |
Model 5, OR (95% CI)e | Ref. | 1.71 (1.47 to 2.00) | 7.38 (2.52–13.33) |
Model 6, OR (95% CI)f | Ref. | 1.72 (1.47 to 2.01) | 6.42 (1.50–12.27) |
Abbreviations: CI, confidence interval; CRC, colorectal cancer; FDR, first-degree relative; FH, family history; OR, odds ratio; Ref., reference
Participants with a FH of CRC in a second-degree relative were excluded for this analysis (n=588)
95% confidence intervals for explained familial risk was calculated using bootstrapping methods (n=1000)
Regression model 1 adjusted for sex, age, education, smoking, hormone replacement therapy among women, BMI and previous colonoscopy
Regression models 2–6 adjusted for same variables as model 1 plus additionally adjusted for genetic risk score:
continuous, weighted, no LD threshold
continuous, weighted, LD threshold D’≥0.95
continuous, weighted, LD threshold D’≥0.5
continuous, weighted, LD threshold D’≥0.1
continuous, weighted, max. 1 SNP per locus
In the application of our alternative approach, we considered a number of design options, such as (1) adjustment for all common genetic variants as single variables or (2) adjustment for common genetic variants through one variable, i.e. a GRS. We calculated the FH-associated risk proportion explained by common genetic variants with both approaches, and furthermore with a large variety of GRS model options and they all consistently yielded estimates in a low range between 5 and 14% (Table 4, Supplementary Table 3).
Table 4.
Linkage disequilibrium | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
- (NSNPs=97) |
D’≥0.95 (NSNPs=90) |
D’≥0.5 (NSNPs=80) |
D’≥0.1 (NSNPs=71) |
max. 1 SNP per locus (NSNPs=59) |
||||||||
included as separate variables | 13.8 | 2.2–23.2 | 14.3 | 3.3–23.2 | 13.8 | 3.3–22.2 | 13.2 | 3.3–21.3 | 10.2 | 1.3–18.3 | ||
included as a GRS… | categorizeda | unweighted | 10.6 | 4.8–16.9 | 10.9 | 4.8–17.0 | 8.9 | 3.4–15.2 | 7.3 | 1.9–13.8 | 6.3 | 0.8–12.5 |
Weightedb | 8.6 | 3.9–14.5 | 8.9 | 3.6–14.9 | 8.1 | 2.8–14.7 | 7.2 | 1.3–12.2 | 5.4 | 0.8–11.4 | ||
continuous | unweighted | 11.6 | 6.1–18.7 | 11.0 | 5.0–18.5 | 9.3 | 3.6–16.2 | 8.3 | 2.8–15.1 | 7.2 | 1.6–14.1 | |
weightedb | 8.9 | 4.0–14.8 | 8.9 | 3.2–15.5 | 7.7 | 2.2–13.9 | 7.4 | 2.5–13.3 | 6.4 | 1.5–12.3 |
Please note: 95% confidence intervals of explained familial risk was calculated using bootstrapping methods (n=1000)
Abbreviations: GRS, genetic risk score; OR, odds ratio
Categories defined by percentiles of distribution in controls: 0–10, 10–20, 20–40, 40–60, 60–80, 80–90, 90–100
Weights equal to log(OR) from discovery study
Discussion
Our analyses support suggestions from theoretical considerations that traditional approaches commonly employed in GWAS might overestimate the proportions of familial risk that can be explained by common genetic variants, due to the underlying assumptions. Furthermore, estimates obtained by these approaches strongly varied even with relatively minor variations in the inclusion criteria of SNPs. By contrast, our proposed epidemiological approach yielded much lower, but consistent and robust estimates of the familial proportion explained by known common genetic variants.
Traditional approaches of calculating the explained proportion of familial risk by common genetic variants have been widely employed in the GWAS literature (e.g.3, 5, 15, 23–26). Manuscripts reporting on the newly identified susceptibility loci commonly estimate the incremental and total proportion of familial risk explained by the respective SNPs. For example, so far published estimates for the familial risk for CRC allegedly explained by common genetic variants ranged from ~6% for ten SNPs3, to ~8% for 20 SNPs under observation15, to ~12% for 76 SNPs5, to ~22%6 for 45 SNPs in a simulation study. Most studies took some LD measure into account5, 6, while others simply added up contributions of SNPs without LD-pruning (e.g.15). However, as shown by our analyses, such estimates may be quite sensitive to the underlying assumptions. In our example, the estimates obtained by traditional approaches had a wide range. In particular, the following patterns were observed:
First, all estimates were substantially higher when the relative risk estimates and the risk allele frequencies of the discovery study rather than those of the DACHS study were employed, which might to some degree reflect the well-known phenomenon of “winner’s curse”27. The obtained relative risk estimates and the risk allele frequencies are subject to variation within populations, which might contribute to differing explained risk proportions.
Second, all of the traditional estimates were strongly dependent on the inclusion criteria of SNPs, being approximately 50% higher for the least restrictive approach (including all 97 SNPs) compared to the most restrictive approach (59 SNPs). Using a less restrictive cutoff might lead to overestimation of the explained familial risk by ignoring the redundancy of the risk information conveyed by SNPs in high linkage disequilibrium.
Third, the traditional estimates strongly depend on the assumed risk increase by family history, a direct consequence of the underlying equations, and may be biased if the risk associated with family history in the analyzed study differs from the assumed familial risk.
Although our suggested alternative approach is less susceptible to violations of these assumptions, possible disadvantages of the “epidemiological approach” also have to be kept in mind. Foremost, it requires availability of epidemiological data, especially information on FH of CRC and possible confounding risk factors. For our approach the risk estimate of having a FH of CRC needs to be obtained from the same dataset from which the genetic data is analyzed. Many large genomic datasets might lack this kind of information, but could nevertheless conduct our proposed approach in subsets for which FH information is available. Furthermore, some of the considerations that are mentioned above might also apply for our approach, such as the issue of winner’s curse if the cohort under investigation was included in deriving the GRS without appropriate correction for overoptimism or the appropriate threshold of the linkage disequilibrium. However, estimates of the familial risk explained by known genetic variants were found to be much less affected by choice of the threshold for the linkage disequilibrium with the proposed epidemiological approach than with the traditional approaches (range for epidemiological approach 5.4–14.3% vs. range for traditional approaches 9.6–23.1%).
It has to be kept in mind that our analysis, like most previous analyses that estimated explained familial risk, was restricted to common variants. In principle, however, analogous approaches could also be applied in analyses additionally encompassing rare, highly penetrant germline mutations.
A specific strength of our study was the availability and use of comprehensive genetic and environmental data from the DACHS study, one of the largest case-control studies on CRC, with an unselected population-based multi-center recruitment.
In summary, we suggest an alternative, straightforward approach based on standard epidemiological methodology to estimate the proportion of FH-associated risk of cancer that is attributable to common genetic risk variants. Application of this approach in a large population-based case-control study on CRC supports suggestions that this proportion may be substantially smaller than previously assumed and highly dependent on SNP pruning methods, the assumed risk for having a FH of CRC and the number of identified SNPs. On the other hand, relevant prevalences of those variants in both people with and without FH imply that, for the example of CRC, polygenetic risk scores based on meanwhile identified common genetic variants explain quite a substantial proportion of overall risk far beyond FH in the total population. Rather than reflecting a major subcomponent of familial risk, common genetic variants appear to reflect substantial complementary risk. Vice versa, rather than exclusively or even primarily reflecting genetic factors, familial aggregation of risks may also reflect to a large extent other components, such as shared environmental or lifestyle factors28. While the alternative epidemiological approach was illustrated using CRC as an example, it should be equally applicable to other cancer outcomes, and, in fact, any other disease outcome.
Supplementary Material
Novelty and Impact:
Traditional approaches are prone to overestimate the proportion of familial risk of cancer that can be explained by meanwhile identified common genetic variants for various reasons.
We offer an alternative approach that is not affected by such overestimation and illustrate our point using data from a large case-control study on colorectal cancer.
The suggested easy-to-implement alternative approach may help to prevent major overestimation of the proportion of familial risk explained by common genetic variants.
Funding:
This work was supported by grants from the German Research Council (BR 1704/6-1, BR 1704/6-3, BR 1704/6-4, BR 1704/6-6, CH 117/1-1), the German Federal Ministry of Education and Research (01KH0404, 01ER0814, 01ER0815, 01ER1505A, 01ER1505B and 01GL1712), and the US National Institutes of Health (U01-CA185094).
Abbreviations:
- CRC
Colorectal cancer
- CI
confidence interval
- FH
family history
- GRS
genetic risk score
- GWAS
genome-wide association study
- OR
odds ratio
- SNP
single nucleotide polymorphism
Footnotes
Disclosure: The authors have nothing to disclose.
References
- 1.Eeles RA, Olama AA, Benlloch S, Saunders EJ, Leongamornlert DA, Tymrakiewicz M, Ghoussaini M, Luccarini C, Dennis J, Jugurnauth-Little S, Dadaev T, Neal DE, et al. Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array. Nat Genet 2013;45: 385–91, 91e1–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Goh CL, Schumacher FR, Easton D, Muir K, Henderson B, Kote-Jarai Z, Eeles RA. Genetic variants associated with predisposition to prostate cancer and potential clinical implications. J Intern Med 2012;271: 353–65. [DOI] [PubMed] [Google Scholar]
- 3.Houlston RS, Webb E, Broderick P, Pittman AM, Di Bernardo MC, Lubbe S, Chandler I, Vijayakrishnan J, Sullivan K, Penegar S, Carvajal-Carmona L, Howarth K, et al. Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat Genet 2008;40: 1426–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Michailidou K, Lindström S, Dennis J, Beesley J, Hui S, Kar S, Lemaçon A, Soucy P, Glubb D, Rostamianfar A, Bolla MK, Wang Q, et al. Association analysis identifies 65 new breast cancer risk loci. Nature 2017;551: 92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Schmit SL, Edlund CK, Schumacher FR, Gong J, Harrison TA, Huyghe JR, Qu C, Melas M, Van Den Berg DJ, Wang H, Tring S, Plummer SJ, et al. Novel Common Genetic Susceptibility Loci for Colorectal Cancer. J Natl Cancer Inst 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Jenkins MA, Makalic E, Dowty JG, Schmidt DF, Dite GS, MacInnis RJ, Ait Ouakrim D, Clendenning M, Flander LB, Stanesby OK, Hopper JL, Win AK, et al. Quantifying the utility of single nucleotide polymorphisms to guide colorectal cancer screening. Future Oncol 2016;12: 503–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Pharoah PD, Antoniou AC, Easton DF, Ponder BA. Polygenes, risk prediction, and targeted prevention of breast cancer. N Engl J Med 2008;358: 2796–803. [DOI] [PubMed] [Google Scholar]
- 8.Zheng W, Zhang B, Cai Q, Sung H, Michailidou K, Shi J, Choi JY, Long J, Dennis J, Humphreys MK, Wang Q, Lu W, et al. Common genetic determinants of breast-cancer risk in East Asian women: a collaborative study of 23 637 breast cancer cases and 25 579 controls. Hum Mol Genet 2013;22: 2539–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cox A, Dunning AM, Garcia-Closas M, Balasubramanian S, Reed MWR, Pooley KA, Scollen S, Baynes C, Ponder BAJ, Chanock S, Lissowska J, Brinton L, et al. A common coding variant in CASP8 is associated with breast cancer risk. Nat Genet 2007;39: 352–8. [DOI] [PubMed] [Google Scholar]
- 10.Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2018. [DOI] [PubMed] [Google Scholar]
- 11.Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E, Skytthe A, Hemminki K. Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med 2000;343: 78–85. [DOI] [PubMed] [Google Scholar]
- 12.Lowery JT, Ahnen DJ, Schroy PC 3rd, Hampel H, Baxter N, Boland CR, Burt RW, Butterly L, Doerr M, Doroshenk M, Feero WG, Henrikson N, et al. Understanding the contribution of family history to colorectal cancer risk and its clinical implications: A state-of-the-science review. Cancer 2016;122: 2633–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Dunlop MG, Dobbins SE, Farrington SM, Jones AM, Palles C, Whiffin N, Tenesa A, Spain S, Broderick P, Ooi L-Y, Domingo E, Smillie C, et al. Common variation near CDKN1A, POLD3 and SHROOM2 influences colorectal cancer risk. Nat Genet 2012;44: 770–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Tomlinson IPM, Webb E, Carvajal-Carmona L, Broderick P, Howarth K, Pittman AM, Spain S, Lubbe S, Walther A, Sullivan K, Jaeger E, Fielding S, et al. A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3. Nat Genet 2008;40: 623–30. [DOI] [PubMed] [Google Scholar]
- 15.Al-Tassan NA, Whiffin N, Hosking FJ, Palles C, Farrington SM, Dobbins SE, Harris R, Gorman M, Tenesa A, Meyer BF, Wakil SM, Kinnersley B, et al. A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer. Sci Rep 2015;5: 10442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Peters U, Jiao S, Schumacher FR, Hutter CM, Aragaki AK, Baron JA, Berndt SI, Bezieau S, Brenner H, Butterbach K, Caan BJ, Campbell PT, et al. Identification of Genetic Susceptibility Loci for Colorectal Tumors in a Genome-Wide Meta-analysis. Gastroenterology 2013;144: 799–807.e24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hsu L, Jeon J, Brenner H, Gruber SB, Schoen RE, Berndt SI, Chan AT, Chang-Claude J, Du M, Gong J, Harrison TA, Hayes RB, et al. A model to determine colorectal cancer risk using common genetic susceptibility loci. Gastroenterology 2015;148: 1330–9.e14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Frampton MJ, Law P, Litchfield K, Morris EJ, Kerr D, Turnbull C, Tomlinson IP, Houlston RS. Implications of polygenic risk for personalised colorectal cancer screening. Ann Oncol 2015;27: 429–34. [DOI] [PubMed] [Google Scholar]
- 19.Weigl K, Thomsen H, Balavarca Y, Hellwege JN, Shrubsole MJ, Brenner H. Genetic Risk Score Is Associated With Prevalence of Advanced Neoplasms in a Colorectal Cancer Screening Population. Gastroenterology 2018;155: 88–98.e10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Weigl K, Chang-Claude J, Knebel P, Hsu L, Hoffmeister M, Brenner H. Strongly enhanced colorectal cancer risk stratification by combining family history and genetic risk score. Clin Epidemiol 2018;10: 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Brenner H, Chang–Claude J, Jansen L, Knebel P, Stock C, Hoffmeister M. Reduced Risk of Colorectal Cancer Up to 10 Years After Screening, Surveillance, or Diagnostic Colonoscopy. Gastroenterology 2014;146: 709–17. [DOI] [PubMed] [Google Scholar]
- 22.Brenner H, Chang-Claude J, Seiler CM, Rickert A, Hoffmeister M. Protection from colorectal cancer after colonoscopy: a population-based, case-control study. Ann Intern Med 2011;154: 22–30. [DOI] [PubMed] [Google Scholar]
- 23.Broderick P, Carvajal-Carmona L, Pittman AM, Webb E, Howarth K, Rowan A, Lubbe S, Spain S, Sullivan K, Fielding S, Jaeger E, Vijayakrishnan J, et al. A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat Genet 2007;39: 1315–7. [DOI] [PubMed] [Google Scholar]
- 24.Jaeger E, Webb E, Howarth K, Carvajal-Carmona L, Rowan A, Broderick P, Walther A, Spain S, Pittman A, Kemp Z, Sullivan K, Heinimann K, et al. Common genetic variants at the CRAC1 (HMPS) locus on chromosome 15q13.3 influence colorectal cancer risk. Nat Genet 2008;40: 26–8. [DOI] [PubMed] [Google Scholar]
- 25.Tomlinson I, Webb E, Carvajal-Carmona L, Broderick P, Kemp Z, Spain S, Penegar S, Chandler I, Gorman M, Wood W, Barclay E, Lubbe S, et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet 2007;39: 984–8. [DOI] [PubMed] [Google Scholar]
- 26.Tomlinson IP, Carvajal-Carmona LG, Dobbins SE, Tenesa A, Jones AM, Howarth K, Palles C, Broderick P, Jaeger EE, Farrington S, Lewis A, Prendergast JG, et al. Multiple common susceptibility variants near BMP pathway loci GREM1, BMP4, and BMP2 explain part of the missing heritability of colorectal cancer. PLoS Genet 2011;7: e1002105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhong H, Prentice RL. Correcting “winner’s curse” in odds ratios from genomewide association findings for major complex human diseases. Genet Epidemiol 2010;34: 78–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Tian Y, Kharazmi E, Sundquist K, Sundquist J, Brenner H, Fallah M. Familial colorectal cancer risk in half siblings and siblings: nationwide cohort study. BMJ 2019;364: l803. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data will be made available upon reasonable request.