Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Jan 1.
Published in final edited form as: Cancer Epidemiol Biomarkers Prev. 2015 Apr 2;24(7):1121–1129. doi: 10.1158/1055-9965.EPI-14-0317

Risk Analysis of Prostate Cancer in PRACTICAL, a Multinational Consortium, Using 25 Known Prostate Cancer Susceptibility Loci

Ali Amin Al Olama 1,56, Sara Benlloch 1,56, Antonis C Antoniou 1,56, Graham G Giles 2,3,55, Gianluca Severi 2,55,*, David E Neal 4,5,55, Freddie C Hamdy 6,55, Jenny L Donovan 7,55, Kenneth Muir 8,9,55, Johanna Schleutker 10,11,55, Brian E Henderson 12,55, Christopher Haiman 12,55, Fredrick R Schumacher 12,55, Nora Pashayan 1,13,55, Paul DP Pharoah 1,55, Elaine A Ostrander 14,55, Janet L Stanford 15,16,55, Jyotsna Batra 17,55, Judith A Clements 17,55, Suzanne K Chambers 18,19,20,55, Maren Weischer 21,55, Børge G Nordestgaard 21,55, Sue A Ingles 12,55, Karina D Sorensen 22,55, Torben F Orntoft 22,55, Jong Y Park 23,55, Cezary Cybulski 24,55, Christiane Maier 25,55, Thilo Doerk 26,55, Joanne L Dickinson 27,55, Lisa Cannon-Albright 28,29,55, Hermann Brenner 30,31,55, Timothy R Rebbeck 32,55, Charnita Zeigler-Johnson 33, Tomonori Habuchi 34,55, Stephen N Thibodeau 35,55, Kathleen A Cooney 36,55, Pierre O Chappuis 37,55, Pierre Hutter 38,55, Radka P Kaneva 39,55, William D Foulkes 40,55, Maurice P Zeegers 41,55, Yong-Jie Lu 42,55, Hong-Wei Zhang 43,55, Robert Stephenson 44, Angela Cox 45, Melissa C Southey 46, Amanda B Spurdle 47, Liesel FitzGerald 48, Daniel Leongamornlert 49, Edward Saunders 49, Malgorzata Tymrakiewicz 49, Michelle Guy 49, Tokhir Dadaev 49, Sarah J Little 49, Koveela Govindasami 49, Emma Sawyer 49, Rosemary Wilkinson 49, Kathleen Herkommer 50, John L Hopper 3, Aritaya Lophatonanon 9,10, Antje E Rinckleb 25, Zsofia Kote-Jarai 49,56, Rosalind A Eeles, on behalf of The UK Genetic Prostate Cancer Study Collaborators/British Association of Urological Surgeons’ Section of Oncology; on behalf of The UK ProtecT Study Collaborators; on behalf of The PRACTICAL Consortium49,51,52,53,54,56, Douglas F Easton 1,56
PMCID: PMC4491026  NIHMSID: NIHMS676863  EMSID: EMS62831  PMID: 25837820

Abstract

Background

Genome-wide association studies have identified multiple genetic variants associated with prostate cancer (PrCa) risk which explain a substantial proportion of familial relative risk. These variants can be used to stratify individuals by their risk of PrCa.

Methods

We genotyped 25 PrCa susceptibility loci in 40,414 individuals and derived a polygenic risk score (PRS). We estimated empirical Odds Ratios for PrCa associated with different risk strata defined by PRS and derived age-specific absolute risks of developing PrCa by PRS stratum and family history.

Results

The PrCa risk for men in the top 1% of the PRS distribution was 30.6 (95% CI 16.4–57.3) fold compared with men in the bottom 1%, and 4.2 (95% CI 3.2–5.5) fold compared with the median risk. The absolute risk of PrCa by age 85 was 65.8% for a man with family history in the top 1% of the PRS distribution, compared with 3.7% for a man in the bottom 1%. The PRS was only weakly correlated with serum PSA level (correlation=0.09).

Conclusions

Risk profiling can identify men at substantially increased or reduced risk of PrCa. The effect size, measured by OR per unit PRS, was higher in men at younger ages and in men with family history of PrCa. Incorporating additional newly identified loci into a PRS should improve the predictive value of risk profiles.

Impact

We demonstrate that the risk profiling based on SNPs can identify men at substantially increased or reduced risk that could have useful implications for targeted prevention and screening programs.

Keywords: Prostate Cancer risk, Genetic and Molecular Epidemiology, Genitourinary Cancers: Prostate, Risk Assessment, Methodology, Modelling and biostatistics, Methodology for SNP data analysis, Statistical methods in Genetics

Introduction

Genome-wide association studies (GWAS) have identified multiple common genetic variants associated with prostate cancer (PrCa) risk. The risks associated with such variants are generally modest, but in combination their effects may be substantial, and may provide the basis of targeted prevention (1). However, since the risks associated with these variants are modest, large studies are required to estimate their risks precisely. To facilitate this estimation, we genotyped 25 PrCa susceptibility SNPs in studies from the PRACTICAL consortium. PRACTICAL is an international PrCa consortium that includes more than 78 studies, including men of European, Asian or African ancestry, and has a combined dataset of over 130,000 samples (http://practical.ccge.medschl.cam.ac.uk/). In the current analysis, we utilised data from 31,833 cases and controls from 24 studies in PRACTICAL and 8,581 samples from replication stage of a GWAS (“GWAS stage 3”). Sixteen out of the twenty five SNPs that we used in this study were identified through studies that included PRACTICAL (24) and nine SNPs were identified by other GWAS (510).

Materials and Methods

Samples

The current analysis was restricted to individuals of European ancestry, based on self-reported ethnicity, and thus we excluded samples with non-European ancestry. Data were contributed from 25 studies in PRACTICAL and GWAS stage 3. Twenty five SNPs were genotyped specifically for this analysis in 31,833 cases and controls in PRACTICAL phase III, unless the genotype data were already available. We also included four studies from the GWAS stage 3 conducted in the United Kingdom and Australia, comprising a further 8,581 cases and controls (11). In this replication stage 1,536 SNPs were genotyped, including the 25 susceptibility SNPs analysed here. These two datasets were combined to give a total of 40,414 samples (20,288 cases and 20,126 controls). Three studies (MCCS, PFCS and UKGPCS) that were included in the GWAS stage 3 also contributed genotyping of additional samples for PRACTICAL phase III (Table 1, Supplementary Table 1 and Supplementary Notes). Studies provided a minimum core dataset that included disease status, age at diagnosis/observation and ethnicity. Twenty two studies provided data on family history and eighteen studies provided data on Gleason score.

Table 1.

Total numbers of cases and controls used in the analyses

Study Controlsa Casesa Totala
GWAS Stage 3 4,076 4,505 8,581
PRACTICAL 16,050 15,783 31,833
Total 20,126 20,288 40,414
Totalb 18,343 16,643 34,986
a

Analyses were restricted to men of European ancestry (see text).

b

Total after excluding 5 studies that oversampled cases with family history.

Where studied included more than one individual from the same family, only the index case was included, so that the analyses were based on unrelated men. For analyses of the polygenic risk score (PRS) we also excluded 5 studies (MAYO, PCFS, TASPRAC, ULM and UTAH) that oversampled cases with family history of PrCa. This reduced the total number of samples to 34,986 (16,643 cases and 18,343 controls). All studies were approved by the relevant ethics committees.

Eighty Nine percent (31,150) of the samples had information on age at diagnosis (interview/blood draw for controls). The mean age at diagnosis for the cases was 64 years, slightly higher than the mean age at interview/blood draw for the controls (58 years; Supplementary Table 2a). Family history information was available for 21,209 (60.6%) samples and among samples with family history information, 10.7% of controls and 18.2% of cases had a family history of PrCa. Before excluding studies with oversampled familial cases, these percentages were 12.9% and 22.6% respectively (Supplementary Table 2a and b).

Genotyping

Genotyping was performed in two experiments; these were subject to separate QC procedures appropriate to the platforms used, before the data were combined for statistical analysis. In PRACTICAL phase III, genotyping of samples from 2 studies was performed by Sequenom, while 22 study sites performed the 5’exonuclease assay (Taqman™) using the ABI Prism 7900HT sequence detection system according to the manufacturer’s instructions. Primers and probes were supplied directly by Applied Biosystems as Assays-By-Design™. Assays at all sites included at least four negative controls and 2–5% duplicates on each 384-well plate. Quality control guidelines were followed by all the participating groups as previously described (4). In addition, all sites also genotyped 16 CEPH samples. We excluded individuals that were not typed for at least 80% of the SNPs attempted. Data on a given SNP for a given site were also excluded if they failed any of the following QC criteria: SNP call rate >95%, no deviation from Hardy-Weinberg equilibrium in controls at P<.00001; <2% discordance between genotypes in duplicate samples and in the CEPH control samples. Cluster plots for SNPs that were close to failing any of the QC criteria were re-examined centrally.

GWAS Stage 3 genotypes were generated using an Illumina Golden Gate Assay. All SNPs for this analysis passed the QC filters used for this experiment: call rate>95%, a minor allele frequency in controls of >1%, or genotype frequency in controls consistent with Hardy-Weinberg equilibrium at p<0.00001. Duplicate concordance was 99.99% (11).

Statistical methods

We used combined data across all studies for the analysis. We assessed the association between each SNP and PrCa using a 1-degree-of-freedom Cochran-Armitage trend test, stratified by studies. Odds ratios (OR) and 95% confidence intervals (95% CI) associated with each genotype and cancer risk, and genotypes for pairs of SNPs, were estimated using unconditional logistic regression, stratified by study as a covariate. Both per-allele ORs, and genotype-specific ORs, were estimated. Heterogeneity in the OR estimates among studies was evaluated using a likelihood ratio test, by comparing with a model in which separate ORs were estimated for each study.

Modification of the ORs by disease aggressiveness and family history was assessed by using both family history (Yes vs. No) and Gleason score (<8 vs. ≥8) as binary variables. A test for association between SNP genotype at a locus and Gleason score as an ordinal variable was also performed, using polytomous regression. Modification of the ORs by age was assessed using a case-only analysis, assessing the association between age and SNP genotype in the cases using polytomous regression. The associations between SNP genotypes and PSA level were assessed using linear regression, after log-transformation of PSA level to correct for skewness.

Contribution to Familial Risk

The contribution of the known SNPs to the familial risk of PrCa, under a multiplicative model, was computed using the formula:

k(logλk)(logλ0)

where λ0 is the observed familial risk to first degree relatives of PrCa cases, assumed to be 2 (12), and λk is the familial relative risk due to locus k, given by:

λk=pkrk2+qk(pkrk+qk)2

where Pk is the frequency of the risk allele for locus k, qk = 1 − Pk and rk is the estimated per-allele odds ratio (13).

To evaluate evidence for interactions between pairs of SNPs, we used a likelihood ratio test and evaluated the evidence for departures from a multiplicative model, by comparing models with and a model without the interaction term for each pair of SNPs. The interaction term was the product of the allele doses for the two SNPs, hence leading to a 1 degree of freedom test for an interaction. Based on the assumption of a log-additive model, we constructed a PRS from the summed genotypes weighted by the estimated per-allele log-odds ratios for each SNP, as estimated by logistic regression as above. Thus for each individual j we derived:

Scorej=i=1Nβigij

Where:

  • N : Number of SNPs (25)

  • gij : Allele dose at SNP i (0, 1, 2) for individual j

  • β i : Per-allele log-odds ratio of SNP i

The missing genotypes for an individual were replaced with the mean genotype of each SNP separately for cases and controls. A sensitivity analysis, in which analyses were based on samples with complete genotype data, gave very similar results (data not shown). We then standardised the PRS by dividing by the overall standard deviation of PRS in the controls.

The risk of PrCa was estimated for the percentiles of the distribution of the PRS; <1%, 1–10%, 10–25%, 25–75% (defined here as “median risk”), 75–90%, 90–99%, >99%; and per standard deviation when fitted as a continuous covariate. We evaluated the fit of the combined risk score to a log-linear model by comparing the model with the PRS fit as a continuous covariate with a model in which separate parameters were estimated for percentiles of risk adjusted for age at diagnosis and family history, using a likelihood ratio test.

We used a likelihood ratio test to evaluate the evidence for interaction between PRS and age at diagnosis/observation, PRS and family history and also family history and age at diagnosis/observation by comparing models with and a model without an interaction term. Effect sizes by family history were compared using a case-only analysis. Analyses were performed using Stata 13.

The relative risk estimates were used to obtain estimates of the absolute risk of PrCa by PRS category and family history. Since we observed evidence for an interaction between PRS and age, we used both models with and without PRS × age interaction term. Absolute risks were constrained such that the age-specific incidences, averaged over all categories of PRS and family history, were consistent with the age-specific incidences of PrCa for the UK population for 2012 (http://ci5.iarc.fr/CI5plus) (14). The model was adjusted for age at diagnosis (age <55, 55–59, 60–64, 65–70 and 70+). The procedure for deriving the age-specific incidences for each SNP profile category has been performed following the procedure explained by Antoniou et al. (15, 16), but adjusted to allow for competing causes of death.

For this purpose, we categorised PRS into seven risk groups (k=risk group 1 to 7), based on the percentile in the controls: <1%, 1–10%, 10–25%, 25–75%, 75–90%, 90–99% and >99%. We could not find any evidence for an interaction between PRS and family history of PrCa (P-value=0.49) and assumed that family history and PRS are independently predictive of PrCa risk. Under this model, the PrCa incidence λkh(t) at age t for an individual in risk group k and family history group h (h=1 with family history, h=0 no family history) was assumed to follow a model of the form: λkh(t)=λ0(t)exp(βkh) where λ0 (t) is the baseline PrCa incidence and exp(βkh) is the risk ratio in the risk group k and family history group h, relative to the baseline category (h=0, k=1), approximated by the odds ratio estimates from the logistic regression analysis. To obtain the baseline incidence, λ0 (t), we constrained the PrCa incidence averaged all risk groups to agree with the population age-specific PrCa incidences μ(t) (the incidence of PrCa at age t per 100,000 individuals in the UK (14)). The baseline incidence can be obtained for each age by:

λ0(t)=μ(t)p01kfkSk0(t1)+p11kfkSk1(t1)p01kfkSk0(t1)exp(βk0)+p11kfkSk1(t1)exp(βk1)

Here P0 is the probability of having no family history in the population (89.26% in the controls in this dataset) and p1 = 1 − p0 is the probability of having family history in the population (10.74% in the controls in this dataset). fk is frequency of the SNP profile risk group k (f1=0.01, (f2=0.09, (f3=0.15, (f4=0.5, (f5=0.15, (f6=0.09, (f7=0.01) and Skh(t) is the probability of surviving PrCa by age (t) in the risk group k for samples in the family history group h, which can be derived from incidence rates λkh(t) for ages <t using the formula Skh(t)=exp(0tλkh(t1)). Since definition Skh(0)=1 for all k and h, it was possible to solve the above equation recursively, starting at age t=0, to obtain the baseline incidences and hence the age-specific PrCa incidences at age (t), λkh(t), for each group. We then computed the absolute risk by age t, adjusting for mortality from other causes, for each risk group, using the formula: 0tSkh(t)×λkh(t)×Sc(t)

Where Sc(t)=exp(0tμc(t1)) is the probability of not dying from another cause of death by age t, based on the age-specific mortality rates μc(t). The age-specific mortality rates, μc(t), was estimated by using all causes incidences of death per 100,000 individuals for England and Wales (http://www.ons.gov.uk/ons/index.html) and the PrCa death incidence per 100,000 individuals in UK in year 2012 (14).

Results

All 25 SNPs showed evidence of association with PrCa (P=0.02 to P=1.4×10−46), with effect sizes that were consistent with previous reports. The largest per-allele OR estimate was 1.56 (95% CI 1.44–1.68) for rs16901979 on 8q24 (Table 2). For each of the 24 autosomal SNPs, the effect size was larger for rare homozygotes than for heterozygotes, and the estimates were consistent with a multiplicative (log-additive) model. There was no evidence for heterogeneity among studies (Table 2).

Table 2.

Summary results of 25 SNPs using PRACTICAL and GWAS Stage 3 datasets in European.

Markera
Chr/Nearby Gene
Allelesb
Positionc
MAFd Per allelee OR
(95%CI)
Het OR e,f
95%CI)
Hom ORe,g
(95%CI)
P-valueh P-valuei
rs721048
2 / EHBP1
C/T
63131731
0.18 1.11
(1.07–1.16)
1.09
(1.04–1.15)
1.32
(1.17–1.48)
9.8×10−8 0.13
rs1465618
2 / THADA
G/A
43553949
0.2150 1.07
(1.03–1.11)
1.08
(1.03–1.13)
1.14
(1.03–1.26)
1.9×10−4 0.39
rs12621278
2 / ITGA6
A/G
173311553
0.06 .75
(.70–.80)
.76
(.71–82)
.38
(.24–.58)
4.9×10−17 0.57
rs2660753
3 / Unknown
G/A
87110674
0.10 1.12
(1.06–1.18)
1.12
(1.06–1.19)
1.32
(1.09–1.61)
1.2×10−5 0.73
rs17021918
4 / PDLIM5
G/A
95562877
0.35 .88
(.85–.91)
.86
(.83–90)
.80
(.74–.85)
6.7×10−15 0.39
rs12500426
4 / PDLIM5
G/T
95514609
0.46 1.10
(1.06–1.13)
1.11
(1.06–1.18)
1.20
(1.12–1.28)
4.8×10−8 0.54
rs7679673
4 / TET2
C/A
106061534
0.40 .88
(.85–.90)
.87
(.83–.91)
.77
(.72–.82)
1.0×10−16 0.08
rs9364554
6 / SLC22A3
C/T
160833664
0.29 1.10
(1.06–1.14)
1.12
(1.07–1.18)
1.18
(1.09–1.27)
4.8×10−8 0.85
rs10486567
7 / JAZF1
G/A
27976563
0.23 .85
(.82–.89)
.86
(.81–.91)
.72
(.63–.81)
4.5×10−12 0.21
rs6465657
7 / LMTK2
A/G
97816327
0.46 1.10
(1.06–1.13)
1.09
(1.04–1.15)
1.21
(1.13–1.28)
3.4×10−9 0.32
rs1447295
8 / Unknown
G/T
128485038
0.11 1.41
(1.35–1.48)
1.41
(1.34–1.49)
2.01
(1.69–2.41)
1.4×10−46 0.50
rs6983267
8 / Unknown
C/A
128413305
0.49 .82
(.79–85)
.80
(.76–85)
.67
(.63–.72)
2.3×10−35 0.61
rs16901979
8 / Unknown
G/T
128124916
0.03 1.56
(1.44–1.68)
1.55
(1.43–1.69)
2.39
(1.47–3.86)
3.8×10−28 0.29
rs2928679
8 / SLC25A37
C/T
23438975
0.48 1.04
(1.01–1.07)
1.03
(.97–1.09)
1.08
(1.01–1.16)
.02 0.10
rs1512268
8 / NKX3.1
G/A
23526463
0.43 1.13
(1.10–1.17)
1.13
(1.08–1.19)
1.29
(1.21–1.37)
2.6×10−16 0.19
rs4962416
10 / CTBP2
A/G
126696872
0.28 1.04
(1.01–1.08)
1.03
(.98–1.08)
1.11
(1.02–1.21)
.02 0.68
rs10993994
10 / MSMB
G/A
51549496
0.39 1.24
(1.20–1.28)
1.21
(1.15–1.27)
1.56
(1.46–1.66)
7.9×10−41 0.36
rs7931342
11 / Unknown
C/A
68994497
0.50 .84
(.81–86)
.86
(.82–91)
.70
(.65–74)
4.8×10−27 0.86
rs7127900
11 / Unknown
G/A
2233574
0.19 1.23
(1.18–1.28)
1.24
(1.18–1.30)
1.47
(1.32–1.65)
6.3×10−26 0.63
rs4430796
17 / HNF1B
A/G
36098040
0.48 .81
(.79–84)
.81
(.77–85)
.66
(.62–71)
2.7×10−38 0.79
rs11649743
17 / HNF1B
G/A
36074979
0.19 .88
(.85–92)
.88
(.83–92)
.79
(.70–90)
5.6×10−10 0.25
rs1859962
17 / Unknown
T/G
69108753
0.48 1.17
(1.14–1.21)
1.22
(1.15–1.28)
1.38
(1.30–1.47)
3.7×10−24 0.19
rs2735839
19 / KLK2/KLK3
G/A
51364623
0.15 .81
(.77–85)
.82
(.78–86)
.62
(.53–73)
1.1×10−19 0.06
rs5759167
22 / BIL/TTLL1
G/T
43500212
0.50 .84
(.82–87)
.83
(.79–87)
.71
(.67–76)
3.4×10−28 0.87
rs5945619
X / NUDT11
T/C
51241672
0.36 1.13
(1.10–1.16)
- 1.28
(1.22–1.35)
1.9×10−20 0.10
a

dbSNP rs number,

b

Major/minor allele, based on the frequencies in controls in PRACTICAL III data,

c

Build 37 position,

d

MAF in controls in combined European dataset.

e

OR = odds ratio (minor allele) from a logistic regression using all European samples stratified by studies with no adjustment,

f

OR in heterozygotes, relative to major allele homozygotes,

g

OR in minor allele homozygotes, relative to major allele homozygotes,

h

Cochran-Armitage test for trend.

i

Heterogeneity P-value among studies

Gleason score was available for 15,107 (74.5%) of the cases used in the analyses; of these, 2,139 had a score of 8+ and 12,968 had a score less than 8. One SNP, rs1447295, on chromosome 8, showed a larger effect size with increasing grade (P=0.001), while four SNPs (rs17021918, rs1512268, rs7127900 and rs2735839) showed a larger effect sizes with decreasing grade (P<0.02; Supplementary Table 3).

Thirteen of the SNPs (rs1465618, rs7679673, rs10486567, rs1447295, rs6983267, rs16901979, rs10993994, rs7931342, rs7127900, rs4430796, rs11649743, rs1859962 and rs5759167) showed a higher per-allele OR for cases with a PrCa family history than those without (P<0.05), while no SNPs showed an effect in the opposite direction consistent with the predictions under a polygenic model (17) (Supplementary Table 3).

Data on serum PSA level were available for 3,922 controls from 6 studies. Six SNPs (rs1447295, rs6983267, rs1512268, rs10993994, rs7127900 and rs2735839) showed association with PSA concentration levels significant at P-value < 0.03. rs1447295 showed an association with PSA in the opposite direction of the PrCa risk association but the rest of five SNPs showed an association with PSA in the same direction of the PrCa risk association (Supplementary Table 4).

Seven SNPs (rs1465618, rs12621278, rs10993994, rs7127900, rs1859962, rs2735839 and rs5945619) showed an evidence for a trend in the per-allele ORs with age; in each case the effect size was larger for cases diagnosed at younger ages (Supplementary Table 5).

The combined effect of all pairs of SNPs was evaluated through a logistic regression model that included each pair of SNPs and an interaction term. The interaction term was significant at P-value <0.05 level for 29 pairs (out of 300 possible pairs) compared with 15 expected by chance, and significant at the P-value <0.01 level for 12 pairs compared with 3 expected by chance. However, no pair was significant at the P-value <0.05 level after a bonferroni correction for the number of tests (nominal significance P-value=1.6×10−4, Supplementary Table 6).

Under the assumption that these 25 SNPs combined approximately multiplicatively to alter the risk of PrCa, we constructed a PRS for 16,643 cases and 18,343 controls based on the estimated per-allele ORs of 25 SNPs, standardised by the standard deviation in controls. The standardised PRS had a mean=0.651 (range −3.81–5.36; SD=0.98) in cases and mean=0.104 (range −4.05–4.15; SD=1) in controls. The standardised PRS was strongly associated with disease risk (OR per unit PRS =1.74, 95%CI 1.70–1.78). The OR per unit increase of the standardised PRS declined with age from 1.76 (95% CI 1.62–1.92) in cases diagnosed at age less than 55 to 1.48 (95% CI 1.37–1.60) in cases diagnosed at age 70+ (P-value= 2.6×10−4, Supplementary Table 5).

The OR per unit increase of PRS was larger for men with PrCa family history (1.79 Vs 1.70; P-value= 1.8×10−4, Supplementary Table 3). We found no evidence of an interaction between PRS and family history (P-value=0.49) or between age at diagnosis and family history (P-value=0.11) but there was some evidence for an interaction between PRS and age at diagnosis (P-value=0.003).

There was no evidence of a difference in the OR per unit PRS according to Gleason Score (OR=1.75, GS<8 Vs OR=1.65, GS 8+) after adjusting for age at diagnosis and family history (P=0.37; Supplementary Table 3). The correlation between PSA and the PRS was weak, both in controls (correlation=0.09) and in cases (correlation =0.02).

When PRS was categorised by percentile, the top 1% of the population had an estimated OR of 30.6 (16.4–57.3) compared with the bottom 1% of the population, and an OR of 4.2 (95%CI 3.2–5.5) compared with the median population risk (defined as the 25–75% risk group). The bottom 1% of the population had an estimated OR of 0.14 (95% CI 0.08–0.24) compared with the median risk (Table 3). After allowing for an interaction between PRS and age, the OR for the top 1% of the population, relative to the median risk group, decreased from 5.6, for men below age <55 years, to 3.8 for men aged 70+ years (Supplementary Table 7 & 8).There was no difference between fit of the model with a continuous covariate for PRS and the model with separate parameters for percentiles of the PRS (P=0.24). In particular, the predicted ORs for the top 1% and the bottom 1% of the population, based on a log-linear model, did not differ from that observed.

Table 3.

Odds ratios for PrCa by percentile of the PRS and family history.

Percentiles ORa,b ORa,c

PRS Group < 1% 1 (baseline) 0.14 (0.08–0.24)
1–10% 2.98 (1.66–5.35) 0.41 (0.36–0.47)
10–25% 4.59 (2.58–8.17) 0.63 (0.57–0.70)
25–75% 7.23 (4.08–12.80) 1 (baseline)
75–90% 12.13 (6.83–21.54) 1.68 (1.54–1.83)
90–99% 16.70 (9.38–29.72) 2.31 (2.09–2.56)
>= 99% 30.63 (16.36–57.34) 4.24 (3.24–5.53)

Family History 2.52 (2.29–2.78) 2.52 (2.29–2.78)
a

ORsobtained by fitting PRS group, family history and age at diagnosis jointly.

b

ORs compared to men in the 1st percentile as baseline.

c

ORs compared to men in the 25th–75th percentile as baseline.

To estimate the absolute risk of PrCa for different risk groups defined by the combined genotypes at the 25 PrCa susceptibility loci, we fitted a logistic regression model that It included parameters for PRS (in 7 categories) together with family history of PrCa once with (Supplementary Table 7) and once without a PRS × age at diagnosis interaction term (Table 3). We used both models (adjusted for age at diagnosis and family history) in order to estimate effect sizes for PRS. Then we used the UK age-specific incidences of PrCa (0 to 85+ years) (14) to estimate age-specific absolute risks of PrCa in the general population after considering competing causes of death for fourteen risk groups defined by PRS and family history (seven PRS risk groups and two family history, see methods). Based on this analysis, the absolute risk of PrCa by age 85 for a man in the top 1% of the risk distribution with family history of PrCa was 65.8% (67.1% in a model not allowing for interaction) and for a man in the lowest 1% was 3.65% (3.67% in a model not allowing for interaction). The absolute risk for a man in the top 1% of the risk distribution with no family history of PrCa was 35.0% (36.1% in a model not allowing for interaction) and 1.46% (1.47% in a model not allowing for interaction) for someone in the lowest 1%. By comparison, the estimated absolute risk for a man in the 25–75% category was 10.2% in the absence of a family history of PrCa, and 23.7% for a man with family history (Figure 1 & 2, Supplementary Figure 1 & 2).

Figure 1.

Figure 1

Absolute risk of PrCa by age in men with family history.

Figure 2.

Figure 2

Absolute risk of PrCa by age in men with no family history.

Discussion

These results demonstrate that risk profiling based on SNPs can identify men at substantially increased or reduced risk of PrCa. We derived a PRS based on a sum of SNP genotypes, weighted by their per-allele log ORs. The estimated ORs for the highest and lowest 1% of the population (4.2 and 0.14, respectively) were consistent with those predicted under a simple polygenic model in which the log OR increases linearly with the PRS. We also showed that the effect size, measured by OR per unit PRS, was higher at younger ages. As expected, the majority of loci, and the PRS, showed a stronger effect for familial cases. In a logistic regression model, both PRS and family history were independently associated with PrCa risk. The OR due to family history was attenuated after adjustment for the PRS (from 2.63 to 2.50), as expected given that family history is, at least in part, a reflection of genetic susceptibility. However, the degree of attenuation (5% on a log-scale) was markedly less than 18%, the estimated contribution of these 25 loci to the familial risk of PrCa estimated based on their ORs and allele frequencies in this study (see methods). The reason for this difference is unclear but might reflect interactions between the known susceptibility loci summarised in the PRS and other factors influencing family history.

In order to investigate the added value of PRS, once we estimated the absolute risk for individuals with family history without fitting their PRS information and then repeated the same procedure after adding their PRS information. The absolute risk of PrCa for a man at age 85 with family history was estimated to be 26.5% when PRS information was ignored. When we incorporated PRS information, a man at age 85,depend on his PRS risk group, could have an absolute risk ranging from 3.67% (if a man is in the bottom 1% of the risk distribution) to 67.1% (if a man is in the top 1% of the risk distribution, Supplementary Figure 1 and 3). These observations indicate that family history and the PRS independently influence risk and can be combined to provide stronger discrimination.

Chatterjee et al. derived theoretical estimates for the predictive performance of polygenic models for ten complex traits or common diseases, including PrCa, using published estimates for individual SNPs (18). They estimated that ~7% of the population will be at two-fold risk or greater for PrCa. We estimated, empirically, that the (average) risk to men in the 90–99% category of the PRS was 2.41 fold, relative to the population median, or approximately 2 fold relative to the population mean. However, this is an average risk over the 90–99% category, so that the percentile of the PRS at which the risk exceeds 2 fold will be >90%. Based on the estimated log(OR) per standardised PRS, approximately 6% of men will have a risk of greater than twofold, very close to the estimate of Chatterjee et al (18).

These results show that genetic risk profiling using SNPs could be useful in defining men at high risk for the disease for targeted prevention and screening programs. The benefits of screening, relative to the costs, will be most favourable among men at higher risk. If, for example, the benefit-cost ratio is favourable for screening men at a greater than two-fold risk, the PRS provides an effective method for identifying such men.

While these analyses demonstrate the value of SNPs for risk prediction, a risk model could be improved in various ways. The analyses presented here are based on the 25 loci first identified to be associated with PrCa. Recently, however, additional loci have been identified (13, 19) and more than 100 common susceptibility loci are now known. In total, these loci increase the estimated proportion of the familial risk to 33% (19). Incorporating all known loci into a PRS should improve the predictive value of risk profiles.

Additionally, the analyses presented here consider family history as a binary (yes/no) covariate. It is known that the risk of PrCa is dependent on both the number of affected relatives and their ages. MacInnis et al. (12, 20) have shown using segregation analysis that the familial aggregation of PrCa can be modelled as the combined effect of a recessive allele and a polygenic component, and that the polygenic component can be further partitioned into a component due to measured SNPs and an unmeasured component. This approach should provide more powerful prediction, particularly in families with multiple cases of the disease. Finally, it is known that serum or urine PSA level is associated with PrCa risk, with the association persisting for several decades. Although some of the risks SNPs are also related to PSA level in the expected direction, the PSA level is only weakly correlated with PRS, indicating that incorporating PSA level and potentially other markers such as MSMB (21) into a risk algorithm should further improve the discrimination (22).

The absence of clear differences in the relative risk associated with SNPs by disease aggressiveness, even in this very large study, is striking. We did not find any convincing evidence for differences in the predictive values of the PRS by disease aggressiveness. The effect size was higher for less aggressive disease, but the difference was still small (1.75 vs. 1.65). This result is in contrast to the clear differences in SNP associations by disease pathology seen in other diseases, for example in breast and ovarian cancer, and indicates that aggressive and non-aggressive disease, at least as measured by Gleason score, share these genetic risk factors as a common aetiology.

Analysis of pairwise combinations of SNPs did not identify any clear examples of departure from a multiplicative model, after adjusting for multiple testing. We did, however, find an excess of interactions at the P<0.01 level over the number that would be expected by chance. This suggests that interactions on this scale likely to exist, but their effect sizes are small and that very large sample sizes, exemplified by this collaborative study, will be required to identify and characterise them. If such interactions could be identified reliably, they may improve the predictive value of the risk profiling, and also provide insights into the biological interactions between the underlying risk variants.

Supplementary Material

1
2
3
4
5

Acknowledgments

We thank all the patients and control men who took part in this study. We also thank all the members from studies that they participated in this phase of PRACTICAL consortium: Esther John, Amit Joshi and Ahva Shahabi, Joanne L Dickinson, James R. Marthick, Mariana C. Stern, Roman Corral, David M A Wallace, Alan Doherty, R I Bhatt, K Subramonian, John Arrand, Louise Flanagan, Sita Ann Bradley, The UK Genetic Prostate Cancer Study Collaborators (www.icr.ac.uk/ukgpcs), Prasad Bollina, Sue Bonnington, Lynne Bradshaw, James Catto, Debbie Cooper, Liz Down, Andrew Doble, Alan Doherty, Garrett Durkan, Emma Elliott, David Gillatt, Pippa Herbert, Peter Holding, Joanne Howson, Mandy Jones, Roger Kockelbergh, Rajeev Kumar, Peter Holding, Howard Kynaston, Athene Lane, Teresa Lennon, Norma Lyons, Hing Leung, Malcolm Mason, Hilary Moody, Philip Powell, Alan Paul, Stephen Prescott, Derek Rosario, Patricia O'Sullivan, Pauline Thompson, Lynne Bradshaw, Sarah Tidball, Paul M. Brown, Anne George, Gemma Marsden, Athene Lane, Michael Davis, Stephen Edwards, Cyril Fisher, Charles Jameson, Elizabeth Page, John Pedersen, Joanne Aitken, Robert A. Gardiner, Srilakshmi Srinivasan, Felicity Lose, Mary-Anne Kedda, Kimberly Alexander, Tracy O’Mara, Gail Risbridger, Wayne Tilley, Lisa Horvarth, Peter Heathcote, Glenn Wood, Greg Malone, Hema Samaratunga, Pamela Saunders, Allison Eckert, Trina Yeadon, Kris Kerr, Angus Collins, Megan Turner, Simon J. Foote, James R. Marthick, Andrea Polanowski, Rebekah M McWhirter, Terrence Dwyer, Christopher L. Blizzard, Elenko Popov, Darina Kachakova, Atanaska Mitkova, Teodora Goranova, Gergana Stancheva, Olga Beltcheva, Rumyana Dodova, Aleksandrina Vlahova, Tihomir Dikov, Svetlana Christova, Michael Borre, Peter Klarskov, Sune F Nielsen, Peter Iversen, Andreas Røder, Stig E Bojesen, Aida Karina Dieffenbach, Manuel Luedeke, Mark Schrader, Josef Hoegel, Walther Vogel, Liisa Määttänen, Teuvo Tammela, Anssi Auvinen, Lori Tillmans, Shaun Riska, Liang Wang, Dan Stram, Kolonel Laurence N., Julio Pow-Sang, Hyun Y. Park, Selina Radlein, Maria Rincon, Babu Zachariah (Supplementary Notes).

Financial Support: D F. Easton was recipient of the CR-UK grant C1287/A10118. R A. Eeles was recipient of the CR-UK grant C5047/A10692 and B E. Henderson was recipient of the NIH grant 1U19CA148537-01.

Footnotes

Conflict of Interest: There is no conflict of interest.

Reference List

  • 1.Pashayan N, Duffy SW, Chowdhury S, Dent T, Burton H, Neal DE, et al. Polygenic susceptibility to prostate and breast cancer: implications for personalised screening. Br J Cancer. 2011;104:1656–1663. doi: 10.1038/bjc.2011.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Amin Al Olama A, Kote-Jarai Z, Giles GG, Guy M, Morrison J, Severi G, et al. Multiple loci on 8q24 associated with prostate cancer susceptibility. Nat Genet. 2009;41:1058–1060. doi: 10.1038/ng.452. [DOI] [PubMed] [Google Scholar]
  • 3.Eeles RA, Kote-Jarai Z, Amin Al Olama A, Giles GG, Guy M, Severi G, et al. Identification of seven new prostate cancer susceptibility loci through a genome-wide association study. Nat Genet. 2009;41:1116–1121. doi: 10.1038/ng.450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Eeles RA, Kote-Jarai Z, Giles GG, Amin Al Olama A, Guy M, Jugurnauth SK, et al. Multiple newly identified loci associated with prostate cancer susceptibility. Nat Genet. 2008;40:316–321. doi: 10.1038/ng.90. [DOI] [PubMed] [Google Scholar]
  • 5.Amundadottir LT, Sulem P, Gudmundsson J, Helgason A, Baker A, Agnarsson BA, et al. A common variant associated with prostate cancer in European and African populations. Nat Genet. 2006;38:652–658. doi: 10.1038/ng1808. [DOI] [PubMed] [Google Scholar]
  • 6.Gudmundsson J, Sulem P, Manolescu A, Amundadottir LT, Gudbjartsson D, Helgason A, et al. Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nat Genet. 2007;39:631–637. doi: 10.1038/ng1999. [DOI] [PubMed] [Google Scholar]
  • 7.Gudmundsson J, Sulem P, Rafnar T, Bergthorsson JT, Manolescu A, Gudbjartsson D, et al. Common sequence variants on 2p15 and Xp11.22 confer susceptibility to prostate cancer. Nat Genet. 2008;40:281–283. doi: 10.1038/ng.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gudmundsson J, Sulem P, Steinthorsdottir V, Bergthorsson JT, Thorleifsson G, Manolescu A, et al. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nat Genet. 2007;39:977–983. doi: 10.1038/ng2062. [DOI] [PubMed] [Google Scholar]
  • 9.Haiman CA, Patterson N, Freedman ML, Myers SR, Pike MC, Waliszewska A, et al. Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet. 2007;39:638–644. doi: 10.1038/ng2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Thomas G, Jacobs KB, Yeager M, Kraft P, Wacholder S, Orr N, et al. Multiple loci identified in a genome-wide association study of prostate cancer. Nat Genet. 2008;40:310–315. doi: 10.1038/ng.91. [DOI] [PubMed] [Google Scholar]
  • 11.Kote-Jarai Z, Amin Al Olama A, Giles GG, Severi G, Schleutker J, Weischer M, et al. Seven prostate cancer susceptibility loci identified by a multi-stage genome-wide association study. Nat Genet. 2011;43:785–791. doi: 10.1038/ng.882. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.MacInnis RJ, Antoniou AC, Eeles RA, Severi G, Guy M, McGuffog L, et al. Prostate cancer segregation analyses using 4390 families from UK and Australian population-based studies. Genet Epidemiol. 2010;34:42–50. doi: 10.1002/gepi.20433. [DOI] [PubMed] [Google Scholar]
  • 13.Eeles RA, Amin Al Olama A, Benlloch S, Saunders EJ, Leongamornlert DA, Tymrakiewicz M, et al. Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array. Nat Genet. 2013;45:385–382. doi: 10.1038/ng.2560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ferlay J, Soerjomataram II, Dikshit R, Eser S, Mathers C, Rebelo M, et al. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer. 2014 doi: 10.1002/ijc.29210. Epub 2014/09/16. [DOI] [PubMed] [Google Scholar]
  • 15.Antoniou AC, Beesley J, McGuffog L, Sinilnikova OM, Healey S, Neuhausen SL, et al. Common breast cancer susceptibility alleles and the risk of breast cancer for BRCA1 and BRCA2 mutation carriers: implications for risk prediction. Cancer Res. 2010;70:9742–9754. doi: 10.1158/0008-5472.CAN-10-1907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Antoniou AC, Pharoah PD, McMullan G, Day NE, Ponder BA, Easton D. Evidence for further breast cancer susceptibility genes in addition to BRCA1 and BRCA2 in a population-based study. Genet Epidemiol. 2001;21(1):1–18. doi: 10.1002/gepi.1014. [DOI] [PubMed] [Google Scholar]
  • 17.Antoniou AC, Easton DF. Polygenic inheritance of breast cancer: Implications for design of association studies. Genet Epidemiol. 2003;25(3):190–202. doi: 10.1002/gepi.10261. [DOI] [PubMed] [Google Scholar]
  • 18.Chatterjee N, Wheeler B, Sampson J, Hartge P, Chanock SJ, Park JH. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat Genet. 2013;45(4):400–405. doi: 10.1038/ng.2579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Amin Al Olama A, Kote-Jarai Z, Berndt SI, Conti DV, Schumacher F, Han Y, et al. A meta-analysis of 87,040 individuals identifies 23 new susceptibility loci for prostate cancer. 2014 doi: 10.1038/ng.3094. PubMed PMID: 25217961. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.MacInnis RJ, Antoniou AC, Eeles RA, Severi G, Amin Al Olama A, McGuffog L, et al. A risk prediction algorithm based on family history and common genetic variants: application to prostate cancer with potential clinical impact. Genet Epidemiol. 2011 doi: 10.1002/gepi.20605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Whitaker HC, Kote-Jarai Z, Ross-Adams H, Warren AY, Burge J, George A, et al. The rs10993994 risk allele for prostate cancer results in clinically relevant changes in microseminoprotein-beta expression in tissue and urine. PLoS One. 2010;5:e13363. doi: 10.1371/journal.pone.0013363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Aly M, Wiklund F, Xu J, Isaacs WB, Eklund M, D'Amato M, et al. Polygenic risk score improves prostate cancer risk prediction: results from the Stockholm-1 cohort study. Eur Urol. 2011;60:21–28. doi: 10.1016/j.eururo.2011.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3
4
5

RESOURCES