Abstract
Background
Before SNP-based risk can be incorporated in colorectal cancer (CRC) screening, the ability of these SNPs to estimate colorectal cancer (CRC) risk for persons with and without a family history of CRC, and the screening implications need to be determined.
Methods
We estimated the association with CRC of a 45 SNP-based risk using 1181 cases and 999 controls, and its correlation with CRC risk predicted from detailed family history. We estimated the predicted change in the distribution across predefined risk categories, and implications for recommended screening commencement age, from adding SNP-based risk to family history.
Results
The inter-quintile risk ratio for colorectal cancer risk of the SNP-based risk was 3.28 (95% CI 2.54 – 4.22). SNP-based and family history-based risks were not correlated (r = 0.02). For persons with no first-degree relatives with CRC, screening could commence 4 years earlier for women (5 years for men) in the highest quintile of SNP-based risk. For persons with two first-degree relatives with CRC, screening could commence 16 years earlier for men and women in the highest quintile, and 7 years earlier for the lowest quintile.
Conclusions
This 45 SNP panel in conjunction with family history, can identify people who could benefit from earlier screening. Risk reclassification by 45 SNPs could inform targeted screening for CRC prevention, particularly in clinical genetics settings when mutations in high-risk genes cannot be identified. Yet to be determined is cost-effectiveness, resources requirements, community, patient and clinician acceptance, and feasibility with potentially ethical, legal and insurance implications.
Keywords: Colorectal cancer, risk prediction, single nucleotide polymorphism, prediction model, family history
INTRODUCTION
Categorisation of people by their colorectal cancer (CRC) risk can guide risk-based prevention, including screening. Family history of the disease is a well-established risk factor for CRC; accordingly screening guidelines recommend that screening be greater for those with a family history compared to those without a family history of the disease.(1) This increased screening could be by modality (e.g. colonoscopy vs faecal occult blood test [FOBT]), age at which screening commences (younger vs older) or frequency (e.g. two yearly vs every 10 years), or a combination of these in equipoise, based on the cost-effective use of limited resources and safety. For persons with a strong family history, efforts to identify the heritable basis of the disease can involve germline mutation screening and testing such as the DNA mismatch repair genes responsible for Lynch syndrome. In this clinical setting, current practice is finding that for a substantial proportion of such families no cancer predisposing mutation can be identified, leaving no other option but to offer screening based on the average risk of CRC based on the cancer family history alone. Family history is a blunt measure of increased risk, as even within groups of persons with the same family history there is substantial variation in the risk of CRC.(2) This suggests the existence of other genetic risk factors which, if identified, could be used to further stratify risk allowing for a more appropriate screening, compared with that recommended based on family history alone.
Recently, genome-wide association studies (GWASs) have found multiple single nucleotide polymorphisms (SNPs) associated with the risk of CRC. Although each SNP is associated with only a small increment in risk, combining these SNPs has the potential to improve risk estimation. The CRC risk based on the first 10 SNPs discovered was sufficient to identify the 0.4% of the population whose risk exceeded the threshold (5% over a 10-year period) for which colonoscopy screening, rather than the less invasive FOBT, would be recommended.(3) Since that report, more independent SNPs associated with CRC risk have been identified, and risk gradients associated with 95 independent SNPs have been published.(4)
In 2015, we published a theoretical evaluation of the 45 independent SNPs identified from a systematic literature review that had been internally or externally validated to be associated with CRC risk for persons of European ancestry.(5) We predicted that the 20% of the population with the highest number of risk alleles would be at 1.8-times the risk of persons with average number of risk alleles. Consequently, this group would be predicted to attain average population age-specific risk approximately 9 years earlier than would someone with the average number of risk alleles. We also predicted that the 45 CRC-risk associated SNPs identified in the literature could explain about 22% of the familial component of CRC risk.(5)
To evaluate this theoretical SNP-based risk, and to determine its clinical utility, we have conducted a validation study using a population-based sample of CRC cases and controls, and assessed its ability to improve risk classification and change recommended CRC screening of people compared with classification based on family history alone.
MATERIALS AND METHODS
Study sample
We used CRC cases and controls from the Colon Cancer Family Registry, which has been described in detail previously.(6) Cases were persons with invasive cancer of the colon or rectum identified from population-based cancer registries in the Puget Sound region of the state of Washington in the USA, Ontario in Canada, and Victoria in Australia. Controls were persons who had not had a diagnosis of CRC randomly selected from the general population by using Medicare and Driver’s License files (Washington, USA), telephone subscriber lists (Ontario, Canada), or electoral rolls (Victoria, Australia).
For the estimation of a SNP-based risk, we used 1,181 cases and 999 controls who underwent genome-wide testing from a GWAS using the Illumina Human1M v1 or Illumina Human1M-Duo v3.0 platform. Given that the original purpose of the GWAS was to identify new CRC susceptibility genes, cases were preferentially selected to be aged younger than 50 years at diagnosis with a 10% sampling from all other ages at diagnoses, and controls were preferentially selected to not have a family history of CRC. Cases were tested for germline mutations in the DNA mismatch repair genes and MUTYH, and all mutation carriers were excluded. Informed consent was obtained from all study participants, and the study protocol was approved by the institutional research ethics review board at each study site.
Estimation of SNP-based risk
For each case and control, we estimated an individual SNP-based risk based on the 45 SNPs that we previously selected(5,7) as being associated with CRC risk from a search of the literature. Using the approach of Mealiffe et al.,(8) we estimated for each of the 45 SNPs the odds ratio (OR) per risk allele and risk allele frequency (p) assuming independent and additive risks on the log OR scale. For each SNP, we calculated the population average risk as μ = (1 – p)2 + 2p(1 – p)OR + p2OR2. Weighted risk values (so that the population average risk was equal to 1) were calculated as 1/μ, OR/μ and OR2/μ for the three genotypes defined by number of risk alleles (0, 1, or 2). The overall SNP-based risk for each individual was then calculated by multiplying the weighted risk values for each of the 45 SNPs (Supplementary Table 1). We assumed that the age-specific incidence of colorectal cancer for each quintile of SNP-based risk was a multiple of the average population age-specific incidence (USA population), where the multiple was approximately equal to the hazard ratio for each quintile of SNP-based risk.
Association of SNP-based risk with CRC
The SNP-based risk was log transformed for all analyses. We estimated the association between SNP-based risk and CRC by applying multiple logistic regression to the case and control data. We adjusted for age group, sex and recruitment site.
We assessed the risk gradient, and hence the discrimination in risk between cases and controls, by estimating the change in odds per adjusted standard deviation (OPERA). OPERA interprets risk estimates by adjusting the standard deviation for the other factors taken into account by design and analysis.(9) We also estimated risk discrimination by the inter-quintile risk ratio (average CRC risk for those in the top 20% of the population for the SNP score divided by average risk for those in the bottom 20% of the population for the SNP score). The inter-quintile risk-ratio was derived by exponentiating the OPERA estimate by 2.8. Given the deliberate deficit of controls with a family history, the estimated SNP-based association was adjusted down by 4% based on the theoretical gradient of polygenic risk in the population (see details in Appendix).(10) For prediction in terms of age- and sex-specific population incidences, we used data from the Surveillance, Epidemiology, and End Results Program Cancer Statistics.(11)
We assessed the extent to which the SNPs were dependent on familial risk by estimating the Pearson’s correlation (r) between the family history-based and SNP-based risks using the cases in the GWAS dataset. For the estimation of family history-based risk, we calculated the lifetime absolute risk (probability) of CRC predicted by a mixed major gene–polygenic model.(12) The model estimates each person’s risk of CRC using detailed family history data. It considers, for each relative, the age at diagnosis of CRC as well as their relationship to the proband, age at last living or age at death, and their high-risk gene mutation status, if known.
We also assessed the ability to reclassify the recommended age at commencement of screening by including the SNP-based risk. The 5-year risk of CRC for the average person (without a previous diagnosis of CRC) in the USA is approximately 0.3% at age 50 years,(13) which is the age that guidelines recommend screening to commence in many countries including the USA.(14) Using the age-specific incidences for each quintile of SNP-based risk, we estimated the ages at which the average woman and man in the highest and lowest quintile of SNP-based risk met this 0.3% risk threshold.
Stata version 14.2(15) was used for all statistical analyses unless otherwise specified. All statistical tests were two-sided, and P-values less than 0.05 were considered nominally statistically significant. Detailed statistical methods for the calculation of risk distributions are provided in the Appendix.
RESULTS
Cases and controls were balanced for sex, cases were distributed almost evenly across participating regions, controls were predominantly from Canada, and cases had on average 1.8 more risk alleles than the controls (Table 1). The OR per adjusted standard deviation for SNP-based risk was 1.53 (95% CI 1.39 – 1.67, p <0.001). The corresponding inter-quintile risk ratio was 3.28 (2.54 – 4.22). The correlation between the SNP-based risk and the family history-based risk was r = 0.02.
Table 1.
Characteristics of cases and controls used for estimation of the SNP-based risk of colorectal cancer
Cases (n=1,181) | Controls (n=999) | ||||
---|---|---|---|---|---|
N | (%) | N | (%) | ||
Sex | |||||
Women | 567 | (48.0) | 521 | (52.1) | |
Men | 614 | (52.0) | 478 | (47.9) | |
Country | |||||
Canada | 417 | (35.3) | 501 | (50.2) | |
Australia | 320 | (27.1) | 189 | (18.9) | |
USA | 444 | (37.6) | 309 | (30.9) | |
Age (years) | |||||
<40 | 111 | (9.4) | 48 | (4.8) | |
40–49 | 450 | (38.1) | 113 | (11.3) | |
50–59 | 263 | (22.3) | 304 | (30.4) | |
60–69 | 232 | (19.6) | 305 | (30.5) | |
≥70 | 125 | (10.6) | 229 | (22.9) | |
Mean (SD), years | 53.0 | (11.4) | 59.9 | (11.0) | |
Risk alleles* | |||||
Mean (SD) | 42.7 | (4.3) | 40.9 | (4.4) | |
Median (range) | 42 | (30–57) | 41 | (25–54) |
possible range is 0 – 90 (two alleles for each of the 45 SNPs)
Figure 1 shows the distribution of lifetime risk of CRC to age 80 years for the US population by SNP-based risk and family history categories. Persons with no first-degree relative with CRC constitute 90% of the population and have an average lifetime CRC risk to age 80 years of 4.0%, which is 10% lower than the population average risk of 4.4%. Of persons with no first-degree relative with CRC, those in the highest quintile for SNP-based risk have an average risk of 7.0%, (~75% higher than those with the average SNP-based risk). Of persons with no first-degree relative with CRC and in the highest quintile for SNP-based risk, approximately 1 in 4 (27%) have a ‘low’ CRC risk (lifetime risk less than 2%) and approximately 1 in 40 (2.6%) have a ‘very high’ CRC risk (lifetime risk of 30% or greater). Those with no first-degree family history and in the lowest quintile for SNP-based risk have an average risk of 2%. Of these, two-thirds (67%) have a ‘low’ CRC risk and only 1 in 700 (0.14%) have a ‘very high’ risk (Table 2 and Figure 1).
Figure 1.
The distribution of lifetime risk of colorectal cancer (CRC) i.e., cumulative risk to age 80 years for the US population, by three categories of SNP-based risk and three categories of family history (. Risks are shown for those in the lowest quintile of SNP-based risk (left column), those at average risk (centre column), and those in the highest quintile of SNP-based risk (right column) by for those with no first-degree relative with CRC (top row), those with one first-degree relative with CRC (middle row), and those with two or more first-degree relatives with CRC (bottom row).
Table 2.
The distribution of lifetime risk of colorectal cancer (CRC) i.e., cumulative risk to age 80 years for the US population, by categories of family history of CRC, separately for persons in the lowest quintile, average, and highest quintile of SNP-based risk.
No first-degree relative with CRC | One first-degree relative with CRC | Two first-degree relatives with CRC | |||||||
---|---|---|---|---|---|---|---|---|---|
Lifetime CRC risk | Lowest quintile | Average | Highest quintile | Lowest quintile | Average | Highest quintile | Lowest quintile | Average | Highest quintile |
Average risk | 2% | 4% | 7% | 5% | 8% | 14% | 9% | 15% | 23% |
Low (<2%) | 67% | 46% | 27% | 39% | 21% | 9% | 19% | 8% | 3% |
Intermediate (2–12%) | 31% | 48% | 58% | 52% | 59% | 53% | 59% | 51% | 36% |
High (12–30%) | 2% | 5% | 13% | 7% | 16% | 27% | 18% | 28% | 35% |
Very high (>30%) | 0% | 1% | 3% | 1% | 4% | 11% | 5% | 13% | 27% |
At the other end of the scale of family history, 2% of the US population have two or more first-degree relatives with CRC, with an average lifetime CRC risk of 4-times that for the general population. For persons with two first-degree relatives with CRC, the average risk is 15%. But for those in the highest quintile for SNP-based risk, the average risk is 23%. Of these, only 1 in 40 (2.6%) have a ‘low’ CRC risk while approximately one in four (27%) have a ‘very high’ CRC risk. In contrast, even though they have a strong family history, those in the lowest quintile of SNP-based risk have an average risk of 9%, with only one in 5 (19%) having a ‘low’ risk and one in 20 (0.05%) having a ‘very high’ risk (Table 2 and Figure 1).
The addition of a SNP-based risk to the family history-based risk identified persons younger than age 50 who had a CRC risk at least as high as an average 50-year-old which is approximately 0.3% (Figure 2). For women with no first-degree relatives with CRC and in the highest quintile for SNP-based risk, their CRC risk is at least this high by the age of 46 (i.e. 4 years younger), with a corresponding age for men at 45 years (5 years earlier). For those with first-degree relatives with CRC but in lowest quintile of SNP-based risk, this risk threshold is reached at a similar age to the general population at 51 years for women and 48 years for men. For those with two-first degree relatives and being in the highest quintile for SNP-based risk, the 0.3% CRC risk is achieved at age 34 years for women and men (i.e. 16 years younger), and in those in the lowest quintile for risk, the threshold is reached at ages 44 and 43 years women and men respectively. (Figure 2 and Table 3).
Figure 2.
Cumulative risk of colorectal cancer (CRC) for the US population by three categories of SNP-based risk (lowest quintile, average, highest quintile) and two categories of family history (no first-degree relatives with CRC labelled “No family history” and two first-degree relatives with CRC).
Table 3.
Age (years) at which a person’s 5-year risk of colorectal cancer (CRC) reaches or exceeds the threshold of 0.3%, by quintile status of the 45 SNP-based risk and categories of family history of CRC, separately for women and men.
Family history | SNP-based risk | Women | Men |
---|---|---|---|
All | Highest quintile | 46 | 44 |
Lowest quintile | 63 | 58 | |
No first-degree relatives with CRC | Highest quintile | 46 | 45 |
Lowest quintile | 65 | 60 | |
One first-degree relative with CRC | Highest quintile | 40 | 39 |
Lowest quintile | 51 | 48 | |
Two first-degree relatives with CRC | Highest quintile | 34 | 34 |
Lowest quintile | 44 | 43 |
DISCUSSION
CRC risk estimation is important because screening can result in prevention, or at least early intervention, through early detection and removal of precancerous lesions and CRC, with more effective treatment of CRC at an earlier stage. Screening comprises different modalities, including FOBT and colonoscopy, and use should be tailored to the level of personal risk. FOBT is inexpensive and safe, but may be less sensitive for pre-cancerous polyps; while colonoscopy is expensive, invasive and carries its own risks (bleeding, perforation and infrequently even death). We have shown that risk information from 45 independent SNPs adds to the risk information from using family history alone. Importantly, on an individual level, SNP-based risk estimation can lead to a risk category that is higher or lower than when estimated solely through family history.
Based on a population of the same size and sex- and age-distribution, and the same sex- and age-specific CRC incidence as the United States, we have estimated the number people who would screen earlier and later if their SNP-based risk was used to determine screening starting age (as provided in Table 3). If screening starting age was based on SNP-based risk, we estimate that 1.68 million women could begin screening at age 46 (4 years younger than the general population) and 2.48 million men could begin screening at age 44 for men (6 years younger than the general population) as they are in highest quintile of SNP-based risk. Approximately 2,700 of these people would be diagnosed with CRC i.e., cancers that would not have been screened for if SNP-based risk was not used to guide screening start age. This equates to screening approximately 1,500 people for every CRC. However, delaying screening for the lowest quintile for SNP-based risk would miss a screening opportunity when many CRCs would be occurring.
For the vast majority of the population who have no first-degree family history of CRC, many guidelines recommend that screening should commence at age 50 years (commonly by biennial FOBT or 10-yearly colonoscopy). Our analyses suggest that the SNP-based risk, especially when combined with family history, can identify subsets that would be recommended to have a higher level of screening, perhaps starting at a younger age. Of the 90% of the population with no parent, sibling or child with CRC, use of SNPs could identify the 20% of these with the highest SNP-based risk (i.e. highest quintile) who have an average CRC risk of 7%, which is 75% higher than the average risk and screening commencement would be recommended to commence 4 to 5 years early.
For persons with a family history of CRC, many guidelines recommend an increased level of screening (e.g. five-yearly colonoscopy) and screening to begin earlier (e.g. from age 40 for those with two first-degree relatives with colorectal cancer).(16) These people are often referred to familial cancer clinics for genetic screening for mutations in major CRC susceptibility genes including the DNA mismatch repair genes. If identified, high-risk gene carriers can be offered a higher level of screening (e.g., annual colonoscopy). Unfortunately, due in part to the rarity of mutation carriers in these genes (even for persons with a family history), in current practice a mutation cannot be identified in the majority of patients screened because they have a family history. Our analyses show that, within family history categories, SNP-based risk assessment can identify persons who belong in lower or higher risk categories. For persons with a strong family history, such as two first-degree relatives with CRC, the ability of the SNP-based risk assessment to reclassify risk is even more apparent. Those in top quintile for SNP-based risk have an average lifetime CRC risk of 23%, and would be recommended to commence screening 16 years earlier than for the general population. This starting screening age of 34 applies to people who are in the highest quintile of SNP-based risk and who have two first-degree relatives with colorectal cancer. The combination of these two risks (SNP and strong family history) was strong enough, that when multiplied by the average population colorectal cancer risk, meant that a 5-year colorectal cancer risk reaches 0.3% by age 34. This threshold of risk is equivalent to the average population risk at age 50 when screening is recommended for the general population, thus justifying our suggestion for screening to begin 16 years earlier (age 34) for those subjects who have both risk factors and 6 years earlier than recommended age 40 based on having two first-degree relatives with colorectal cancer.(16). With the same strong family history, those 20% with the lowest SNP-based risk had an average CRC risk of 9%, and they would be recommended to commence screening 6–7 years earlier. Just over one quarter of them will be at high or very high CRC risk.
We found that the CRC risk based on 45 SNPs was not appreciably correlated with the family history-based risk, which means the increased risks due to family history and SNPs are virtually independent and their associations are likely multiplicative (as has been found for breast cancer(17)). Therefore, both are important risk factors to consider in order to estimate CRC risk. This also means that the 45 SNPs explain little of the reason why CRC aggregates in families, or why CRC in a relative is associated with an increased risk of CRC.
A potential limitation of our study is that we used a case-control dataset in which the controls were selected for not having a family history. We therefore had to reduce the observed SNP associations by 4% to account for the controls being over-sampled for not having a family history.
There will be many more yet to be discovered independent SNPs associated with the risk of CRC, and there could be interactions between SNPs within and across different genes.(18,19) A SNP-based risk prediction model is likely to perform better when these SNPs are discovered, for example by using larger sample sizes or by fine mapping genomic regions of interest such as those identified by novel approaches such as DEPTH,(20) and included in risk prediction models. Analytic approaches that extract more information from genotyping data by, for example, using machine learning to consider all SNPs that lead to an improvement in risk prediction,(21) especially by focusing on pathways or SNP–SNP interactions, might also produce better SNP-based risk prediction.
To explain the on average 2-fold increased CRC risk associated with having one first-degree relative with CRC, mathematical models predict that the familial component of CRC risk must have a very large variance, so large that the risk for persons in the upper quartile would be at least 20-times than the risk for persons in the lower quartile.(10) This study has shown that, by using risk information based on both SNPs and detailed family history, a non-trivial proportion of this variance can being explained, and we have quantified the ability to differentiate between persons at low risk (much less than population average risk) and those at increased risk, across a very wide range. Given that CRC can be effectively prevented by screening, and mortality from the disease reduced by early detection, risk assessment based on SNPs together with other risk factors including family history enables the possibility of precision prevention and screening to substantially lower the impact of CRC.(2)
Our modelling exercise only considers family history and SNP-based risk as we have focussed this paper on inherited risks. However, other factors do contribute to CRC risk (e.g. lifestyle factors) and a full risk-based assessment for screening could include these factors in addition to family history and SNPs which would result in a greater risk discrimination.(22)
If new guidelines on screening were to adopt SNP-based risk assessment, our study suggests that the screening guidelines for CRC would be substantially altered: (a) For those with 2 first-degree relatives with CRC, screening would commence at age 34 years for both women and men in the highest quintile for SNP-based risk; and at age 43–44 years for men and women in the lowest quintile; (b) For those with one first-degree relatives with CRC, screening would commence at age 39–40 years for men and women in the highest quintile of SNP-based risk, and at age 48–51 years for men and qomen in the lowest quintile; and (c) For those with no first-degree relatives with CRC, screening would commence at age 45–46 years for men and women in the highest quintile, and at age 58–63 years for men and women in the lowest quintile.
While this is an important first step, we agree that many issues would need to be resolved before SNP-based risk was incorporated into standard of care for CRC screening, that are beyond the scope of this study. These include assessment of cost-effectiveness, resources requirements, community, patient and clinician acceptance and feasibility of incorporation within existing screening programs, with potentially ethical, legal and insurance implications. Cost-effectiveness implications of this research are indeed important for screening programs, especially as a new personalised risk-based approach to screening is designed to optimise risk vs benefit compared with conventional approaches to (moderate/high) risk and colonoscopy. Any improvement as promised with the current approach is an important contribution to public health and risk management.
In conclusion, we have shown that risk information from considering the 45 SNPs and a detailed family history together can result in substantial reclassification of risk category for various levels of family history, including those without a family history. It is therefore important to include both family history and SNP assessment when estimating CRC risk. This new risk measure could inform targeted screening and prevention, for the general population young than age 50, and in the clinical setting for those in whom a high-risk gene mutation cannot be identified.
Supplementary Material
Supplementary Table 1. For each of the 45 SNPs, the association with colorectal cancer per risk allele (odds ratio) adjusted for age group, and risk allele frequencies for controls.
Acknowledgements
The authors thank all study participants of the Colon Cancer Family Registry and staff for their contributions to this project.
Funding: This work was supported by grant UM1 CA167551 from the National Cancer Institute and through cooperative agreements with the following Colon Cancer Family Registry sites: Australasian Colorectal Cancer Family Registry (U01 CA074778 and U01/U24 CA097735); Ontario Familial Colorectal Cancer Registry (U01/U24 CA074783); and Seattle Colorectal Cancer Family Registry (U01/U24 CA074794). The genome wide association studies (GWAS) were supported by grants U01 CA 122839, R01 CA143237 and U19 CA148107.
Additional support for case ascertainment was provided from the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute to Fred Hutchinson Cancer Research Center (Control Nos. N01-CN-67009 and N01-PC-35142, and Contract No. HHSN2612013000121), the following U.S. state cancer registries: AZ, CO, MN, NC, NH, and by the Victorian Cancer Registry, Australia and the Ontario Cancer Registry, Canada.
This work was also supported by grant R01CA170122 from NIH, and Centre for Research Excellence grant APP1042021 and Program Grant APP1074383 from the National Health and Medical Research Council (NHMRC), Australia. MAJ is a NHMRC Senior Research Fellow. AKW is a NHMRC Career Development Fellow. DDB is a NHMRC R.D. Wright Career Development Fellow and a University of Melbourne Research at Melbourne Accelerator Program (R@MAP) Senior Research Fellow. JLH is a NHMRC Senior Principal Research Fellow.
Competing interests: Genetype Pty Ltd have provided research support to MAJ and JLH and provided salary support for GSD. The other authors have no conflict of interest to declare with respect to this manuscript.
Appendix
Risk distributions
The distribution of the risk under a genetic mixed model, which incorporates major gene and an unmeasured polygene components,(23,24) was calculated as follows.
The polygene component modelled the combined effect on colorectal cancer (CRC) susceptibility of a large number of independent genetic loci that individually have small, multiplicative effects on the incidence of CRC. Therefore the polygene was modelled as a normally distributed random variable X ~ N (μ, σ2) with mean μ and standard deviation σ, and the incidence of CRC at age t years for a person with polygene X = x was modelled as exHRmajorλ0 (t), where HRmajor is the hazard ratio corresponding to the person’s genotype at the major gene (which is assumed to be independent of X) and λ0 (t) is the population incidence at age t years. Note that we therefore must take μ = −σ2/2 to ensure that the population incidence λ0 (t) is the average incidence for persons with HRmajor = 1.
The age-specific CRC incidence is the hazard function of the random variable T giving the age at CRC diagnosis of a person who is randomly selected from the population. Therefore the cumulative risk to age 80 years for someone with polygene X = x and a major gene hazard ratio of HRmajor is
where is the survival probability to age 80 years for someone at population risk. Inverting this equation therefore gives the cumulative distribution function of Pen(X) since, for any c between 0 and 1, we have
where Φ is the cumulative distribution function for X ~ N (−σ2/2, σ2).
In this paper, we took the population incidence λ0 (t) to be the age-specific CRC incidence for the USA 1998–2002, averaged over sex.(25) For the main analyses we also took a polygenic standard deviation of σ = 1.124, since this corresponds under Pharoah’s formula(26) to a familial relative risk of 1.88, which is the residual familial relative risk for CRC after accounting for the effects of 45 known, CRC-associated SNPs.(5)
Adjustment for oversampling of controls for not having a family history from the Colon Cancer Family Registry Phase I GWAS dataset that used for SNP-based risk estimation
Controls were preferentially selected for GWAS Phase-I testing if they had no family history of CRC, so we would expect them to have a slightly different distribution of the 45 SNP score K compared with controls if they had been recruited regardless of family history. Simulations were used to assess the differences between the distributions of K in these two groups of controls, and the effect of these differences on our final odds ratio estimates.
We considered a nuclear family consisting of two parents (individuals i = 1 and i = 2) and two children (individuals i = 3 and i = 4). As above, we decomposed the polygene Xi = Ki + Ui for person i in the nuclear family into a component Ki due to the 45 known SNPs and an independent component Ui due to all other, unknown genetic variants that are associated with CRC. We assumed and with σU = 0.6011 and σK = 1.124, as above. We also assumed that the joint distribution of K1, K2, K3, K4, followed a multivariate normal distribution with the correlation between parents being 0 and the correlation between all other pairs of (first-degree) family members being 0.5, and similarly for U1, U2, U3, U4. As above (but with HRmajor = 1), we assumed that if individual i has Ki = k and Ui = u then he or she has a lifetime CRC risk .
We then used these assumptions to simulate the known SNP scores K1, K2, K3, K4, the unknown variant scores U1, U2, U3, U4 and the (lifetime) affected statuses of all four family members, and we did this for 10,000,000 nuclear families. The distribution of the known variant score K4 for individual i = 4 (one of the children) was calculated firstly for all families with no affected family members and secondly for all families where individual i = 4 was unaffected. The means of K4 in these two types of families were −0.385 and −0.361, respectively, and the standard deviations of K4 in these two types of families were both 0.60. Therefore the known variant score K4 for controls with a null family history was shifted by −0.021 compared to controls ascertained regardless of family history, which is −0.021/0.60 = −0.035 standard deviations. But for a risk factor which, in cases and controls, is normally distributed with the same standard deviation and a difference in means of Δ standard deviations, it can be shown that the odds ratio is exp (Δ).(27) Our simulations show that our ascertainment criterion is causing us to overestimate Δ by 0.035, so we are therefore overestimating the odds ratio by a factor of exp (0.035) = 1.04, i.e. by 4%.
Footnotes
Declarations
Disclaimer: The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centres in the Colon Cancer Family Registry, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government or the Colon Cancer Family Registry. Authors had full responsibility for the design of the study, the collection of the data, the analysis and interpretation of the data, the decision to submit the manuscript for publication, and the writing of the manuscript.
Publisher's Disclaimer: This Author Accepted Manuscript is a PDF file of an unedited peer-reviewed manuscript that has been accepted for publication but has not been copyedited or corrected. The official version of record that is published in the journal is kept up to date and so may therefore differ from this version.
References
- 1.Win AK, Ait Ouakrim D, Jenkins MA. Risk profiling: familial colorectal cancer. Cancer Forum 2014;38(1):15–25. [Google Scholar]
- 2.Hopper JL. Disease-specific prospective family study cohorts enriched for familial risk. Epidemiol Perspect Innov 2011;8(1):2 doi 10.1186/1742-5573-8-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Dunlop MG, Tenesa A, Farrington SM, Ballereau S, Brewster DH, Koessler T, et al. Cumulative impact of common genetic variants and other risk factors on colorectal cancer risk in 42,103 individuals. Gut 2013;62(6):871–81 doi 10.1136/gutjnl-2011-300537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Huyghe JR, Bien SA, Harrison TA, Kang HM, Chen S, Schmit SL, et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nat Genet 2019;51(1):76–87 doi 10.1038/s41588-018-0286-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Jenkins MA, Makalic E, Dowty JG, Schmidt DF, Dite GS, MacInnis RJ, et al. Quantifying the utility of single nucleotide polymorphisms to guide colorectal cancer screening. Future oncology 2016;12(4):503–13 doi 10.2217/fon.15.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Newcomb PA, Baron J, Cotterchio M, Gallinger S, Grove J, Haile R, et al. Colon Cancer Family Registry: an international resource for studies of the genetic epidemiology of colon cancer. Cancer Epidemiol Biomarkers Prev 2007;16(11):2331–43 doi 10.1158/1055-9965.EPI-070648. [DOI] [PubMed] [Google Scholar]
- 7.Stanesby O, Jenkins M. Comparison of the efficiency of colorectal cancer screening programs based on age and genetic risk for reduction of colorectal cancer mortality. Eur J Hum Genet 2017;25(7):832–8 doi 10.1038/ejhg.2017.60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mealiffe ME, Stokowski RP, Rhees BK, Prentice RL, Pettinger M, Hinds DA. Assessment of clinical validity of a breast cancer risk model combining genetic and clinical information. Journal of the National Cancer Institute 2010;102(21):1618–27 doi 10.1093/jnci/djq388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hopper JL. Odds per adjusted standard deviation: comparing strengths of associations for risk factors measured on different scales and across diseases and populations. Am J Epidemiol 2015;182(10):863–7 doi 10.1093/aje/kwv193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hopper JL, Carlin JB. Familial Aggregation of a Disease Consequent upon Correlation between Relatives in a Risk Factor Measured on a Continuous Scale. Am J Epidemiol 1992;136(9):1138–47. [DOI] [PubMed] [Google Scholar]
- 11.Howlader NNA, Krapcho M, Garshell J, Miller D, Altekruse SF, Kosary CL, Yu M, Ruhl J, Tatalovich Z,Mariotto A, Lewis DR, Chen HS, Feuer EJ, Cronin KA (eds). SEER Cancer Statistics Review, 1975–2011,External Web Site IconNational Cancer Institute; Bethesda, MD, http://seer.cancer.gov/csr/1975_2011/browse_csr.php?sectionSEL=6&pageSEL=sect_06_ta ble.10.html based on November 2013 SEER data submission, posted to the SEER Web site, April 2014. [Google Scholar]
- 12.Win AK, Jenkins MA, Dowty JG, Antoniou AC, Lee A, Giles GG, et al. Prevalence and Penetrance of Major Genes and Polygenes for Colorectal Cancer. Cancer Epidemiol Biomarkers Prev 2017;26(3):404–12 doi 10.1158/1055-9965.EPI-16-0693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Howlader N, Noone A, Krapcho M, Garshell J, Miller D, Altekruse S, et al. , editors. SEER Cancer Statistics Review, 1975–2011. Bethesda, MD: National Cancer Institute; 2014. [Google Scholar]
- 14.Force USPST, Bibbins-Domingo K, Grossman DC, Curry SJ, Davidson KW, Epling JW Jr., et al. Screening for Colorectal Cancer: US Preventive Services Task Force Recommendation Statement. Jama 2016;315(23):2564–75 doi 10.1001/jama.2016.5989. [DOI] [PubMed] [Google Scholar]
- 15.StataCorp. Stata Statistical Software: Release 14. College Station, TX: StataCorp LP; 2015. [Google Scholar]
- 16.Lowery JT, Ahnen DJ, Schroy PC 3rd, Hampel H, Baxter N, Boland CR, et al. Understanding the contribution of family history to colorectal cancer risk and its clinical implications: A state-of-the-science review. Cancer 2016;122(17):2633–45 doi 10.1002/cncr.30080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dite GS, MacInnis RJ, Bickerstaffe A, Dowty JG, Allman R, Apicella C, et al. Breast Cancer Risk Prediction Using Clinical Models and 77 Independent Risk-Associated SNPs for Women Aged Under 50 Years: Australian Breast Cancer Family Registry. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology 2016;25(2):359–65 doi 10.1158/10559965.EPI-15-0838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Frampton MJ, Law P, Litchfield K, Morris EJ, Kerr D, Turnbull C, et al. Implications of polygenic risk for personalised colorectal cancer screening. Ann Oncol 2016;27(3):429–34 doi 10.1093/annonc/mdv540. [DOI] [PubMed] [Google Scholar]
- 19.Tenesa A, Dunlop MG. New insights into the aetiology of colorectal cancer from genome-wide association studies. Nat Rev Genet 2009;10(6):353–8. [DOI] [PubMed] [Google Scholar]
- 20.MacInnis RJ, Schmidt DF, Makalic E, Severi G, FitzGerald LM, Reumann M, et al. Use of a Novel Nonparametric Version of DEPTH to Identify Genomic Regions Associated with Prostate Cancer Risk. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology 2016;25(12):1619–24 doi 10.1158/1055-9965.EPI-16-0301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wei Z, Wang W, Bradfield J, Li J, Cardinale C, Frackelton E, et al. Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease. Am J Hum Genet 2013;92(6):1008–12 doi 10.1016/j.ajhg.2013.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Frampton M, Houlston RS. Modeling the prevention of colorectal cancer from the combined impact of host and behavioral risk factors. Genet Med 2017;19(3):314–21 doi 10.1038/gim.2016.101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Antoniou AC, Pharoah PDP, McMullan G, Day NE, Ponder BAJ, Easton D. Evidence for further breast cancer susceptibility genes in addition to BRCA1 and BRCA2 in a population-based study. Genet Epidemiol 2001;21(1):1–18. [DOI] [PubMed] [Google Scholar]
- 24.Cannings C, Thompson E, Skolnick M. Probability functions on complex pedigrees. Adv Appl Prob 1978;10(1):26–61. [Google Scholar]
- 25.Curado MP, Edwards B, Shin HR, Storm H, Ferlay J, Heanue M, et al. , editors. Cancer Incidence in Five Continents, Vol. IX Lyon, France: International Agency for Research on Cancer; 2007. [Google Scholar]
- 26.Pharoah PDP, Antoniou A, Bobrow M, Zimmern RL, Easton DF, Ponder BAJ. Polygenic susceptibility to breast cancer and implications for prevention. Nat Genet 2002;31(1):33–6. [DOI] [PubMed] [Google Scholar]
- 27.Wentzensen N, Wacholder S. From differences in means between cases and controls to risk stratification: a business plan for biomarker development. Cancer discovery 2013;3(2):148–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Al-Tassan NA, Whiffin N, Hosking FJ, Palles C, Farrington SM, Dobbins SE, et al. A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer. Sci Rep 2015;5:10442 doi 10.1038/srep10442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Peters U, Jiao S, Schumacher FR, Hutter CM, Aragaki AK, Baron JA, et al. Identification of Genetic Susceptibility Loci for Colorectal Tumors in a Genome-Wide Meta-analysis. Gastroenterology 2013;144(4):799–807 e24 doi 10.1053/j.gastro.2012.12.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Houlston RS, Cheadle J, Dobbins SE, Tenesa A, Jones AM, Howarth K, et al. Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nature genetics 2010;42(11):973–7 doi 10.1038/ng.670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Schumacher FR, Schmit SL, Jiao S, Edlund CK, Wang H, Zhang B, et al. Genome-wide association study of colorectal cancer identifies six new susceptibility loci. Nat Commun 2015;6:7138 doi 10.1038/ncomms8138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Real LM, Ruiz A, Gayan J, Gonzalez-Perez A, Saez ME, Ramirez-Lorca R, et al. A colorectal cancer susceptibility new variant at 4q26 in the Spanish population identified by genome-wide association analysis. PloS one 2014;9(6):e101178 doi 10.1371/journal.pone.0101178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Schmit SL, Schumacher FR, Edlund CK, Conti DV, Raskin L, Lejbkowicz F, et al. A novel colorectal cancer risk locus at 4q32.2 identified from an international genome-wide association study. Carcinogenesis 2014;35(11):2512–9 doi 10.1093/carcin/bgu148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Jia WH, Zhang B, Matsuo K, Shin A, Xiang YB, Jee SH, et al. Genome-wide association analyses in East Asians identify new susceptibility loci for colorectal cancer. Nature genetics 2013;45(2):191–6 doi 10.1038/ng.2505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Dunlop MG, Dobbins SE, Farrington SM, Jones AM, Palles C, Whiffin N, et al. Common variation near CDKN1A, POLD3 and SHROOM2 influences colorectal cancer risk. Nature genetics 2012;44(7):770–6 doi 10.1038/ng.2293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Tomlinson IP, Webb E, Carvajal-Carmona L, Broderick P, Howarth K, Pittman AM, et al. A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3. Nature genetics 2008;40(5):623–30 doi ng.111 [pii] 10.1038/ng.111. [DOI] [PubMed] [Google Scholar]
- 37.Tomlinson I, Webb E, Carvajal-Carmona L, Broderick P, Kemp Z, Spain S, et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nature genetics 2007;39(8):984–8 doi ng2085 [pii] 10.1038/ng2085. [DOI] [PubMed] [Google Scholar]
- 38.Zanke BW, Greenwood CM, Rangrej J, Kustra R, Tenesa A, Farrington SM, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nature genetics 2007;39(8):989–94 doi ng2089 [pii] 10.1038/ng2089. [DOI] [PubMed] [Google Scholar]
- 39.Kocarnik JD, Hutter CM, Slattery ML, Berndt SI, Hsu L, Duggan DJ, et al. Characterization of 9p24 risk locus and colorectal adenoma and cancer: gene-environment interaction and meta-analysis. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology 2010;19(12):3131–9 doi 10.1158/1055-9965.EPI-10-0878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhang B, Jia W-H, Matsuda K, Kweon S-S, Matsuo K, Xiang Y-B, et al. Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk. Nat Genet 2014;46(6):533–42 doi 10.1038/ng.298510.1038/ng.2985http://www.nature.com/ng/journal/vaop/ncurrent/abs/ng.2985.html#supplementary-informationhttp://www.nature.com/ng/journal/vaop/ncurrent/abs/ng.2985.html#supplementary-information . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wang H, Burnett T, Kono S, Haiman CA, Iwasaki M, Wilkens LR, et al. Trans-ethnic genomewide association study of colorectal cancer identifies a new susceptibility locus in VTI1A. Nat Commun 2014;5:4613 doi 10.1038/ncomms5613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Tenesa A, Farrington SM, Prendergast JGD, Porteous ME, Walker M, Haq N, et al. Genomewide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat Genet 2008;40(5):631–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Tomlinson IP, Carvajal-Carmona LG, Dobbins SE, Tenesa A, Jones AM, Howarth K, et al. Multiple common susceptibility variants near BMP pathway loci GREM1, BMP4, and BMP2 explain part of the missing heritability of colorectal cancer. PLoS Genet 2011;7(6):e1002105 doi 10.1371/journal.pgen.1002105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Study C, Houlston RS, Webb E, Broderick P, Pittman AM, Di Bernardo MC, et al. Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat Genet 2008;40(12):1426–35 doi 10.1038/ng.262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Ryan BM, Wolff RK, Valeri N, Khan M, Robinson D, Paone A, et al. An analysis of genetic factors related to risk of inflammatory bowel disease and colon cancer. Cancer Epidemiol 2014;38(5):583–90 doi 10.1016/j.canep.2014.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Broderick P, Carvajal-Carmona L, Pittman AM, Webb E, Howarth K, Rowan A, et al. A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nature genetics 2007;39(11):1315–7 doi ng.2007.18 [pii] 10.1038/ng.2007.18. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Table 1. For each of the 45 SNPs, the association with colorectal cancer per risk allele (odds ratio) adjusted for age group, and risk allele frequencies for controls.