Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2023 May 15:2023.05.12.23289860. [Version 1] doi: 10.1101/2023.05.12.23289860

Evaluating Approaches for Constructing Polygenic Risk Scores for Prostate Cancer in Men of African and European Ancestry

Burcu F Darst 1,2,*,¥, Jiayi Shen 1,¥, Ravi K Madduri 3, Alexis A Rodriguez 3, Yukai Xiao 3, Xin Sheng 1, Edward J Saunders 4, Tokhir Dadaev 4, Mark N Brook 4, Thomas J Hoffmann 5, Kenneth Muir 6, Peggy Wan 1, Loic Le Marchand 7, Lynne Wilkens 7, Ying Wang 8, Johanna Schleutker 9, Robert J MacInnis 10,11, Cezary Cybulski 12, David E Neal 13,14,15, Børge G Nordestgaard 16,17, Sune F Nielsen 16,17, Jyotsna Batra 18,19, Judith A Clements 18,19; Australian Prostate Cancer BioResource19,20, Henrik Grönberg 21, Nora Pashayan 22,23, Ruth C Travis 24, Jong Y Park 25, Demetrius Albanes 26, Stephanie Weinstein 26, Lorelei A Mucci 27, David J Hunter 28, Kathryn L Penney 29, Catherine M Tangen 30, Robert J Hamilton 31,32, Marie-Élise Parent 33,34, Janet L Stanford 2,35, Stella Koutros 26, Alicja Wolk 36,37, Karina D Sørensen 38,39, William J Blot 40,41, Edward D Yeboah 42,43, James E Mensah 42,43, Yong-Jie Lu 44, Daniel J Schaid 45, Stephen N Thibodeau 46, Catharine M West 47, Christiane Maier 48, Adam S Kibel 49, Géraldine Cancel-Tassin 50,51, Florence Menegaux 52, Esther M John 53,54,55, Eli Marie Grindedal 56, Kay-Tee Khaw 57, Sue A Ingles 58, Ana Vega 59,60,61, Barry S Rosenstein 62,63, Manuel R Teixeira 64,65; NC-LA PCaP Investigators66,67,68, Manolis Kogevinas 69,70,71,72, Lisa Cannon-Albright 73,74, Chad Huff 75, Luc Multigner 76, Radka Kaneva 77, Robin J Leach 78, Hermann Brenner 79,80,81, Ann W Hsing 82, Rick A Kittles 83, Adam B Murphy 84, Christopher J Logothetis 85, Susan L Neuhausen 86, William B Isaacs 87, Barbara Nemesure 88, Anselm J Hennis 88,89, John Carpten 90, Hardev Pandha 91, Kim De Ruyck 92, Jianfeng Xu 93, Azad Razack 94, Soo-Hwang Teo 95; Canary PASS Investigators2,96, Lisa F Newcomb 2,96, Jay H Fowke 97, Christine Neslund-Dudas 98, Benjamin A Rybicki 98, Marija Gamulin 99, Nawaid Usmani 100,101, Frank Claessens 102, Manuela GagoDominguez 103,104, Jose Esteban Castelao 105, Paul A Townsend 106,107, Dana C Crawford 108, Gyorgy Petrovics 109, Graham Casey 110, Monique J Roobol 111, Jennifer F Hu 112, Sonja I Berndt 26, Stephen K Van Den Eeden 113,114, Douglas F Easton 23, Stephen J Chanock 26, Michael B Cook 26, Fredrik Wiklund 21, John S Witte 115, Rosalind A Eeles 116,117, Zsofia Kote-Jarai 116, Stephen Watya 118, John M Gaziano 119,120,121, Amy C Justice 122,123, David V Conti 1, Christopher A Haiman 1
PMCID: PMC10246022  PMID: 37292833

Abstract

Genome-wide polygenic risk scores (GW-PRS) have been reported to have better predictive ability than PRS based on genome-wide significance thresholds across numerous traits. We compared the predictive ability of several GW-PRS approaches to a recently developed PRS of 269 established prostate cancer risk variants from multi-ancestry GWAS and fine-mapping studies (PRS269). GW-PRS models were trained using a large and diverse prostate cancer GWAS of 107,247 cases and 127,006 controls used to develop the multi-ancestry PRS269. Resulting models were independently tested in 1,586 cases and 1,047 controls of African ancestry from the California/Uganda Study and 8,046 cases and 191,825 controls of European ancestry from the UK Biobank and further validated in 13,643 cases and 210,214 controls of European ancestry and 6,353 cases and 53,362 controls of African ancestry from the Million Veteran Program. In the testing data, the best performing GW-PRS approach had AUCs of 0.656 (95% CI=0.635–0.677) in African and 0.844 (95% CI=0.840–0.848) in European ancestry men and corresponding prostate cancer OR of 1.83 (95% CI=1.67–2.00) and 2.19 (95% CI=2.14–2.25), respectively, for each SD unit increase in the GW-PRS. However, compared to the GW-PRS, in African and European ancestry men, the PRS269 had larger or similar AUCs (AUC=0.679, 95% CI=0.659–0.700 and AUC=0.845, 95% CI=0.841–0.849, respectively) and comparable prostate cancer OR (OR=2.05, 95% CI=1.87–2.26 and OR=2.21, 95% CI=2.16–2.26, respectively). Findings were similar in the validation data. This investigation suggests that current GW-PRS approaches may not improve the ability to predict prostate cancer risk compared to the multi-ancestry PRS269 constructed with fine-mapping.


Prostate cancer is the second leading cause of cancer deaths among men in the US, with incidence rates being highest in men of African ancestry1,2. Earlier identification of men with increased risk of prostate cancer across diverse populations has the potential to reduce the stark health disparities of this disease. We recently performed a large and diverse genome-wide association (GWAS) of prostate cancer in men from African, European, East Asian, and Hispanic populations3. By performing ancestry-specific and multi-ancestry GWAS and fine-mapping analyses, this investigation revealed 269 GWAS-defined prostate cancer risk variants used to develop a multi-ancestry polygenic risk score (PRS269). The PRS269 was highly predictive of prostate cancer risk across populations3 and has since been validated in additional independent multi-ancestry studies4. However, genome-wide PRS (GW-PRS) approaches, which include variants across the genome that do not reach genome-wide statistical significance thresholds, have been reported to have better predictive performance than standard pruning and thresholding PRS of known variants across numerous complex traits, including schizophrenia, coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, breast cancer, and colorectal cancer59.

In this investigation, we compared the predictive ability of GW-PRS for prostate cancer to the multi-ancestry PRS269 of established prostate cancer risk variants. GW-PRS models were trained using summary statistics from the studies used to construct the multi-ancestry PRS269, which included 107,247 cases and 127,006 controls from European (85,554 cases and 91,972 controls), African (10,368 cases and 10,986 controls), East Asian (8,611 cases and 18,809 controls), and Hispanic (2,714 cases and 5,239 controls) populations3. Three recent GW-PRS approaches were evaluated: LDpred210, PRS-CSx11, and EB-PRS12, using the 1.1 million HapMap3 panel variants13 recommended by these approaches, which included 44 of the 269 variants, with all other autosomal prostate cancer risk variants being within 800 kb and correlated with a median r=0.99 (ranging from 0.31–1.00 in any given population) of at least one of the 1.1 million HapMap3 variants (Supplemental Methods and Tables S1S2). For comparison, each model was trained using previously estimated multi-ancestry weights and population-specific weights from the GWAS summary statistics3. GW-PRS models were tested in African ancestry men from the California/Uganda Study (CA/UG Study; 1,586 cases and 1,047 controls) and European ancestry men from the UK Biobank (8,046 cases and 191,825 controls; Supplemental Methods). Additional validation was performed in 6,353 cases and 53,362 controls of African ancestry and 13,643 cases and 210,214 controls of European ancestry from the Million Veteran Program14 (MVP; Supplemental Methods).

In the CA/UG and UK Biobank testing datasets, the best performing GW-PRS approach was PRS-CSx with multi-ancestry weights, with an area under the curve (AUC) of 0.656 (95% CI=0.635–0.677) in African and 0.844 (95% CI=0.840–0.848) in European ancestry men (Supplemental Methods, Figure 1, and Table S3). Each SD unit increase in PRS was associated with 1.83-fold higher odds of prostate cancer (95% CI=1.67–2.00) in men of African ancestry and 2.19-fold higher odds of prostate cancer (95% CI=2.14–2.25) in men of European ancestry (Supplemental Methods, Figure 1, and Table S4). However, compared to PRS-CSx, the PRS269 had higher or nearly identical AUCs in both African (0.679, 95% CI=0.659–0.700) and European (0.845, 95% CI=0.841–0.849) ancestry men, and the PRS269 was associated with 2.05-fold higher odds (95% CI=1.87–2.26) and 2.21-fold higher odds (95% CI=2.16–2.26) of prostate cancer in African and European ancestry men, respectively (Figure 1, Table S3, and Table S4). Findings were consistent when investigating extreme PRS distributions, with similar prostate cancer OR observed for the PRS269 and the best performing GW-PRS (PRS-CSx) when comparing African and European ancestry men in the highest PRS decile (90–100%) to those in the average 40–60% PRS category (Supplemental Methods, Figure S1, and Table S5).

Figure 1.

Figure 1.

Comparison of PRS performance in the CA UG Study and the UK Biobank testing data. PRS performance is evaluated using area under the curve (AUC) estimated in men of A) African and C) European ancestry and OR of prostate cancer for each SD increase in PRS in men of B) African and D) European ancestry.

Similarly, in the validation MVP study, the best performing GW-PRS approach was PRS-CSx with multi-ancestry weights; however, the PRS269 performed either better or similarly with regards to AUC (AUC=0.656 [95% CI=0.649–0.663] versus AUC=0.624 [95% CI=0.617–0.632] in African ancestry men; AUC=0.694 [95% CI=0.690–0.699] versus AUC=0.692 [95% CI=0.687–0.696] in European ancestry men; Figure 2 and Table S3). Likewise, the PRS269 was associated with prostate cancer OR that were comparable or larger than OR estimated with PRS-CSx (OR=1.77, 95% CI=1.72–1.82 versus OR=1.59, 95% CI=1.54–1.63 in African ancestry men; OR=1.99, 95% CI=1.952.02 versus OR=1.97, 95% CI=1.93–2.01 in European ancestry men; Figure 2 and Table S4). OR calculated for African and European men in the top PRS decile were also comparable across the PRS269 and PRS-CSx (Figure S2 and Table S5). In the testing and validation datasets, model performance was similar for both PRS269 and GW-PRS approaches when using either multi-ancestry or population-specific weights (Figures 12, Figures S1S2, and Tables S3S5).

Figure 2.

Figure 2.

Comparison of PRS performance in the MVP validation data. PRS performance is evaluated using area under the curve (AUC) estimated in men of A) African and C) European ancestry and OR of prostate cancer for each SD increase in PRS in men of B) African and D) European ancestry.

Findings from this investigation suggest that current GW-PRS approaches do not outperform the multi-ancestry PRS269 for overall prostate cancer risk prediction. For several other disease examples, GW-PRS have been shown to perform better than PRS of known variants59; however, these PRS are typically constructed from a pruning and thresholding approach within European ancestry individuals rather than a fine-mapping approach across diverse populations. As such, the performance observed for our prostate cancer PRS269 may be due to identifying GWAS risk variants from a multi-ancestry GWAS and fine-mapping study, along with the use of the same multi-ancestry GWAS to construct the GW-PRS3. It is also possible that the unique genetic architecture of prostate cancer contributes to the high performance of the PRS269 across populations, as prostate cancer is one of the most heritable cancers15,16 and has been estimated to display a greater distribution of variants with larger effect sizes than other cancers with similar GWAS sample sizes9.

We have previously shown that GW-PRS including variants with weaker statistical evidence of association in both European and African ancestry men (based on lenient P-value thresholds down to 1.0×10−5) resulted in lower PRS performance3. Likewise, it was recently reported that a GW-PRS constructing from 1 million variants most strongly associated with prostate cancer risk led to comparable results as a GW-PRS based on HapMap3 variants17, further suggesting that GW-PRS approaches may not be improved by selecting a large number of variants weakly associated with prostate cancer risk. Last, a European-ancestry derived PRS of 110 established literaturecurated prostate cancer risk variants was previously found to perform better than a GW-PRS in addition to a standard pruning and thresholding PRS18. These findings in conjunction with the present study suggest that the current multi-ancestry and finemapped PRS269 is optimal, which has important clinical implications. While genotyping a few hundred versus millions of variants to construct PRS is currently logistically easier and more cost-effective, genome-wide genotyping may be optimal in the future to enable the evaluation of PRS across many traits. However, our findings do not imply that the multi-ancestry PRS269 has reached optimal performance; increasing the sample size of non-European ancestry men in the discovery GWAS, particularly African ancestry men, where we and others have observed that the PRS has lower performance than in other populations3,4, will be important to improve genetic risk prediction of prostate cancer. The multi-ancestry PRS269 is an effective risk stratification tool for prostate cancer, and its clinical utility in screening and early detection warrants investigation.

Supplementary Material

Supplement 1
media-1.docx (223.4KB, docx)
Supplement 2
media-2.xlsx (107.8KB, xlsx)

Acknowledgements

This work was supported by the National Cancer Institute at the National Institutes of Health grant (grant numbers U19 CA214253 to C.A.H., R01 CA257328 to C.A.H., U19 CA148537 to C.A.H., R01 CA165862 to C.A.H., and R00 CA246063 to B.F.D.), the Prostate Cancer Foundation (grants 21YOUN11 to B.F.D. and 20CHAS03 to C.A.H.), an award from the Andy Hill Cancer Research Endowment Distinguished Researchers Program (B.F.D.), a Fred Hutch/University of Washington SPORE Career Enhancement Program award (BFD), and the Million Veteran Program-MVP017. This research has been conducted using the UK Biobank Resource under application number 42195. This research is based on data from the Million Veteran Program, Office of Research and Development and the Veterans Health Administration. This publication does not represent the views of the Department of Veteran Affairs or the United States Government.

Footnotes

Competing Interests Statement

The authors have no conflicts of interest to disclose.

References

  • 1.Howlander N., Noone A.M., Krapcho M., Miller D., Brest A., Yu M., Ruhl J., Tatalovich Z., Mariotto A., Lewis D.R., et al. (2020). SEER Cancer Statistics Review, 1975–2017. National Cancer Institute. [Google Scholar]
  • 2.Kolonel L.N., Henderson B.E., Hankin J.H., Nomura A.M., Wilkens L.R., Pike M.C., Stram D.O., Monroe K.R., Earle M.E., and Nagamine F.S. (2000). A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. Am J Epidemiol 151, 346–357. 10.1093/oxfordjournals.aje.a010213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Conti D.V., Darst B.F., Moss L.C., Saunders E.J., Sheng X., Chou A., Schumacher F.R., Olama A.A.A., Benlloch S., Dadaev T., et al. (2021). Transancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat Genet 53, 65–75. 10.1038/s41588-020-00748-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Plym A., Penney K.L., Kalia S., Kraft P., Conti D.V., Haiman C., Mucci L.A., and Kibel A.S. (2021). Evaluation of a Multiethnic Polygenic Risk Score Model for Prostate Cancer. J Natl Cancer Inst. 10.1093/jnci/djab058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.International Schizophrenia Consortium, Purcell S.M., Wray N.R., Stone J.L., Visscher P.M., O’Donovan M.C., Sullivan P.F., and Sklar P. (2009). Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752. 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Khera A.V., Chaffin M., Aragam K.G., Haas M.E., Roselli C., Choi S.H., Natarajan P., Lander E.S., Lubitz S.A., Ellinor P.T., and Kathiresan S. (2018). Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 50, 1219–1224. 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Thomas M., Sakoda L.C., Hoffmeister M., Rosenthal E.A., Lee J.K., van Duijnhoven F.J.B., Platz E.A., Wu A.H., Dampier C.H., de la Chapelle A., et al. (2020). Genome-wide Modeling of Polygenic Risk Score in Colorectal Cancer Risk. Am J Hum Genet 107, 432–444. 10.1016/j.ajhg.2020.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Dikilitas O., Schaid D.J., Kosel M.L., Carroll R.J., Chute C.G., Denny J.A., Fedotov A., Feng Q., Hakonarson H., Jarvik G.P., et al. (2020). Predictive Utility of Polygenic Risk Scores for Coronary Heart Disease in Three Major Racial and Ethnic Groups. Am J Hum Genet 106, 707–716. 10.1016/j.ajhg.2020.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zhang Y.D., Hurson A.N., Zhang H., Choudhury P.P., Easton D.F., Milne R.L., Simard J., Hall P., Michailidou K., Dennis J., et al. (2020). Assessment of polygenic architecture and risk prediction based on common variants across fourteen cancers. Nat Commun 11, 3353. 10.1038/s41467-020-16483-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Prive F., Arbel J., and Vilhjalmsson B.J. (2020). LDpred2: better, faster, stronger. Bioinformatics. 10.1093/bioinformatics/btaa1029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ruan Y., Feng Y.A., Chen C., Lam M., Initiatives S.G.A., Sawa A., Martin A.R., Qin S., Huang H., and Ge T. (preprint). Improving Polygenic Prediction in Ancestrally Diverse Populations. medRxiv. [Google Scholar]
  • 12.Song S., Jiang W., Hou L., and Zhao H. (2020). Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies. PLoS Comput Biol 16, e1007565. 10.1371/journal.pcbi.1007565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.International HapMap Consortium, Altshuler D.M., Gibbs R.A., Peltonen L., Altshuler D.M., Gibbs R.A., Peltonen L., Dermitzakis E., Schaffner S.F., Yu F., et al. (2010). Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58. 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gaziano J.M., Concato J., Brophy M., Fiore L., Pyarajan S., Breeling J., Whitbourne S., Deen J., Shannon C., Humphries D., et al. (2016). Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J Clin Epidemiol 70, 214–223. 10.1016/j.jclinepi.2015.09.016. [DOI] [PubMed] [Google Scholar]
  • 15.Mucci L.A., Hjelmborg J.B., Harris J.R., Czene K., Havelick D.J., Scheike T., Graff R.E., Holst K., Moller S., Unger R.H., et al. (2016). Familial Risk and Heritability of Cancer Among Twins in Nordic Countries. JAMA 315, 68–76. 10.1001/jama.2015.17703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Sampson J.N., Wheeler W.A., Yeager M., Panagiotou O., Wang Z., Berndt S.I., Lan Q., Abnet C.C., Amundadottir L.T., Figueroa J.D., et al. (2015). Analysis of Heritability and Shared Heritability Based on Genome-Wide Association Studies for Thirteen Cancer Types. J Natl Cancer Inst 107, djv279. 10.1093/jnci/djv279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Prive F., Aschard H., Carmi S., Folkersen L., Hoggart C., O’Reilly P.F., and Vilhjalmsson B.J. (2022). Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. Am J Hum Genet 109, 12–23. 10.1016/j.ajhg.2021.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Yu H., Shi Z., Lin X., Bao Q., Jia H., Wei J., Helfand B.T., Zheng S.L., Duggan D., Lu D., et al. (2020). Broad- and narrow-sense validity performance of three polygenic risk score methods for prostate cancer risk assessment. Prostate 80, 83–87. 10.1002/pros.23920. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.docx (223.4KB, docx)
Supplement 2
media-2.xlsx (107.8KB, xlsx)

Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES