Abstract
Implementation of polygenic risk scores (PRS) may improve disease prevention and management but poses several challenges: the construction of clinically valid assays, interpretation for individual patients, and the development of clinical workflows and resources to support their use in patient care. For the ongoing Veterans Affairs Genomic Medicine at Veterans Affairs (GenoVA) Study we developed a clinical genotype array-based assay for six published PRS. We used data from 36,423 Mass General Brigham Biobank participants and adjustment for population structure to replicate known PRS–disease associations and published PRS thresholds for a disease odds ratio (OR) of 2 (ranging from 1.75 (95% CI: 1.57–1.95) for type 2 diabetes to 2.38 (95% CI: 2.07–2.73) for breast cancer). After confirming the high performance and robustness of the pipeline for use as a clinical assay for individual patients, we analyzed the first 227 prospective samples from the GenoVA Study and found that the frequency of PRS corresponding to published OR > 2 ranged from 13/227 (5.7%) for colorectal cancer to 23/150 (15.3%) for prostate cancer. In addition to the PRS laboratory report, we developed physician- and patient-oriented informational materials to support decision-making about PRS results. Our work illustrates the generalizable development of a clinical PRS assay for multiple conditions and the technical, reporting and clinical workflow challenges for implementing PRS information in the clinic.
Subject terms: Risk factors, Population genetics, Disease prevention
The first report from the prospective GenoVA Study provides preliminary insights into the development of a polygenic risk score assay in a clinical setting and discusses the challenges of generating, interpreting and reporting results.
Main
Genome-wide association studies (GWAS) have identified thousands of genomic variants significantly associated with a range of common complex human diseases1,2. Given that the risk conferred by an individual common variant is usually insignificantly small, investigators have aggregated risk alleles across the genome into genetic risk scores to provide a single measure of genetic association for a given trait due to known common variant effects. Although the earliest genetic scores consisted only of variants meeting genome-wide significance3–5, recent computational and methodological advances have leveraged the summary statistics of all available variants from increasingly larger GWAS to calculate polygenic risk scores (PRS)6–9. For some diseases, a PRS in the upper tail of the distribution approximates risks equivalent to those conferred by established clinical risk factors and by genetic variants associated with monogenic disease7,10. Although PRS are typically derived from weights from cross-sectional GWAS of prevalent disease cases and controls, further work has demonstrated their potential to estimate the risk of incident disease11–14.
Suitable clinical implementation of PRS is now an area of active research across many disease areas15–17. The translation of PRS from discovery to the clinic can be conceptualized as having at least three necessary phases (Fig. 1): the first phase relates to epidemiology and statistical genetics, in which PRS are developed and validated in large cohorts and improved with advances in statistical methods; the second phase involves the laboratory, in which laboratory geneticists must develop an analytically and clinically valid pipeline for calculating, interpreting and reporting PRS results for an individual patient; and the third phase involves patient care, in which a treating physician makes medical decisions after putting a patient’s PRS results into the larger clinical context, which involves non-genetic risk factors, comorbidities and patient preferences. The first phase has seen significant methodological advances18 but challenges for the second and third phases remain.
A key assumption underlying the laboratory phase is that a laboratory can develop and implement a valid clinical assay and interpretation pipeline to report PRS results for an individual patient. The development of a clinical assay from a published PRS is not trivial, and significant barriers to the process persist. First, uncertainty exists about whether commonly used, cost-effective genotyping arrays and clinical imputation pipelines can calculate a PRS for an individual with the analytic validity expected of a clinical assay, as opposed to one that is adequate for research. Second, laboratories must implement methods to account for the reduced validity of most PRS in patients of non-European and admixed ancestry19,20. This limitation applies both to the calculation of the PRS itself for an individual patient and to its clinical interpretation, given that published effect sizes are from populations of primarily European ancestry19. Third, laboratories must make several decisions about the content and format of a clinical PRS report, including decisions about where the laboratory’s role as an interpretative service ends and where the role of the treating physician in patient care begins. In the patient care phase, there remain unanswered questions about the information and support that physicians need when contextualizing the PRS results of an individual patient to make clinical decisions, and how those decisions affect patient outcomes.
In the Genomic Medicine at Veterans Affairs (GenoVA) Study (ClinicalTrials.gov identifier: NCT04331535) we have developed processes to advance the laboratory and patient care phases of the clinical translation of PRS. The GenoVA Study is a clinical trial in which patients and their primary care physicians receive a clinical PRS laboratory report on five diseases commonly screened for and initially managed in primary care: coronary artery disease (CAD), type 2 diabetes mellitus (T2D), atrial fibrillation (AFib), colorectal cancer (CRCa), and either prostate cancer (PrCa) in male patients or breast cancer (BrCa) in female patients. Because the objectives of the GenoVA Study are to observe how PRS impact existing disease screening and diagnosis paradigms and enable increased detection of undiagnosed prevalent or newly incident disease, eligible patients have no known diagnoses of the target diseases and are aged 50–70 years, an age range during which much guideline-recommended screening and diagnosis of new disease occurs. Here, we describe the processes created in the GenoVA Study to develop and validate a genotype array-based clinical assay and report for six PRS and to support their effective translation into clinical care by the treating physicians.
Results
Replication of published PRS
Sample characteristics
To demonstrate the accuracy of a prospective PRS pipeline, we first wanted to ensure that we could implement published PRS effectively. We used data from 36,423 Mass General Brigham Biobank (MGBB) participants to replicate the performance of PRS for the six target diseases (Supplementary Table 1). The mean (s.d.) age of MGBB participants was 58.8 (17.1) years (range, 9–106 years), 19,719 (54.1%) were female, and 5,706 (15.7%) were of reported race other than white (white, n = 30,716 (84.3%); Black, n = 1,807 (5.0%); Asian, n = 786 (2.2%); and other or unknown race, n = 3,113 (8.5%) as determined from electronic health record data). Case counts ranged from 392 for CRCa to 3,554 for CAD. Figure 2a shows the counts of participants with one or multiple target diseases. The most common disease co-occurrences were the combinations of CAD and T2D (n = 641) and CAD and AFib (n = 495).
Unadjusted and adjusted PRS distributions
We identified PRS from large GWAS for the six target diseases, for which the summary statistics (base files with alleles and weights) were publicly available from the Polygenic Score Catalog21 (AFib, CAD, T2D, BrCa) or the Cancer PRSWeb (CRCa, PrCa)22 as of 26 December 2019. Supplementary Table 2 lists the number of single-nucleotide polymorphisms (SNPs) in the base file for each of the six published PRS, ranging from 81 SNPs for CRCa23 to 6,917,436 for T2D7, and the subsets of these available as directly genotyped or imputed data from each of three arrays used for MGBB participants, demonstrating minimal loss of information compared with the original published PRS. As shown in Fig. 3, when using the published weights to calculate standardized PRS (PRSstd-raw, see Methods) we observed marked variation in the distribution of each PRS by reported race in the MGBB, most notably in AFib, CAD, and T2D. For example, only 1.7% of white MGBB participants (516/30,716) but almost all of the Black MGBB participants (88.9%, 1,606/1,807) had PRSstd-raw above the published threshold associated with an odds ratio (OR) = 2 for T2D in the 2018 study by Khera et al.7 (Supplementary Tables 3 and 4). The use of residualized, population structure-adjusted, standardized PRS (PRSstd-adj, see Methods) minimized this variation (Fig. 3), such that, for example, 8.6% of white MGBB participants (2,651/30,716) and 4.2% of Black MGBB participants (75/1,807) had a T2D PRSstd-adj above the published OR > 2 threshold. The distributions of PRSstd-adj were well aligned when examined by genotyping batch, decile of age, and sex (Extended Data Figs. 1–3).
Replication of PRS–disease association
As shown in Fig. 4, quantile of PRSstd-adj was highly correlated with log(odds) of disease across the six phenotypes in the MGBB, with correlation coefficients ranging from 0.68 for CRCa to 0.95 for T2D. Extended Data Figs. 4–7 show the correlation of PRSstd-adj quantile and log(odds) of disease in the reported racial groups separately. Our analyses also replicated the published PRS thresholds corresponding to OR > 2. As shown in Table 1, at the published PRSstd-adj thresholds we observed OR ranging from 1.75 (95% CI: 1.57–1.95) for T2D to 2.38 (95% CI: 2.07–2.73) for BrCa in MGBB participants overall. Except for T2D, the 95% confidence interval of the replicated OR for all diseases either included or, in the case of BrCa and AFib, exceeded a point estimate of 2. Results were consistent in analyses restricted to white participants but were variable in other groups, largely because of the small number of disease cases in certain racial subgroups. In 22 of 24 analyses stratified by reported race, subjects with PRSstd-adj above the published OR > 2 thresholds had higher odds of disease than those below these thresholds. In the MGBB overall, the prevalence of a high-risk PRSstd-adj ranged from 5.4% for CRCa to 13.2% for PrCa (in men). Figure 2b illustrates the number of participants with PRSstd-adj above the published OR > 2 threshold for one or more of the target diseases. Of note, similar to the disease co-occurrences observed in MGBB participants, the most common co-occurrences of high-risk PRSstd-adj were the combinations of CAD and T2D (n = 333) and CAD and AFib (n = 211).
Table 1.
Disease | High risk (%)a | OR overall | OR white | OR Black | OR Asian | OR Other/Unknown |
---|---|---|---|---|---|---|
OR (95% CI)b (n/n, n/n)c | OR (95% CI)b (n/n, n/n)c | OR (95% CI)b (n/n, n/n)c | OR (95% CI)b (n/n, n/n)c | OR (95% CI)b (n/n, n/n)c | ||
BrCa | 8.6 |
2.38 (2.07–2.73) (286/1,400, 1,427/16,606) |
2.39 (2.07–2.76) (270/1,156, 1,318/13,495) |
2.24 (0.97–5.15) (7/73, 43/1004) |
0.51 (0.07–3.9) (1/33, 24/405) |
2.35 (1.08–5.1) (8/138, 42/1,702) |
CRCa | 5.4 |
2.37 (1.74–3.24) (46/1,913, 346/34,117) |
2.29 (1.65–3.19) (41/1,646, 312/28,717) |
4.11 (1.17–14.48) (3/83, 15/1706) |
0 (0–NaN) (0/35, 7/744) |
3.30 (0.73–14.88) (2/149, 12/2,950) |
PrCa | 13.1 |
2.22 (1.98–2.48) (498/1,698, 1,693/12,813) |
2.31 (2.05–2.59) (468/1,448, 1,544/11,017) |
1.39 (0.74–2.59) (14/71, 74/521) |
2.58 (0.5–13.28) (2/36, 6/279) |
1.41 (0.78–2.58) (14/143, 69/996) |
AFib | 8.3 |
2.37 (2.12–2.64) (450/2,589, 2,282/31,101) |
2.40 (2.14–2.69) (422/2,179, 2,101/26,014) |
1.47 (0.72–3.01) (9/137, 71/1590) |
2.00 (0.57–7.03) (3/62, 17/704) |
2.28 (1.32–3.94) (16/211, 93/2,793) |
CAD | 9.8 |
1.86 (1.69–2.05) (562/3,018, 2,991/29,851) |
1.91 (1.73–2.12) (503/2,459, 2,680/25,074) |
1.41 (0.86–2.29) (21/177, 125/1484) |
3.96 (1.79–8.76) (9/51, 31/695) |
1.47 (0.97–2.22) (29/331, 155/2,598) |
T2D | 8.4 |
1.75 (1.57–1.95) (439/2,612, 2,924/30,447) |
1.93 (1.71–2.17) (367/2,284, 2,159/25,906) |
1.21 (0.7–2.09) (18/57, 358/1374) |
1.07 (0.37–3.08) (4/49, 52/681) |
1.58 (1.14–2.19) (50/222, 355/2,486) |
High-risk PRS, defined here as a standardized, adjusted PRS (PRSstd-adj) associated with OR > 2 for disease in the original publication. aProportion of MGBB participants exceeding the literature-derived OR > 2 threshold for each disease. bObserved OR (95% CI) in up to 36,423 MGBB participants in the overall cohort and by race reported in the MGBB. c(ncaseshigh-risk PRS/ncontrolshigh-risk PRS, ncaseswithout high-risk PRS/ncontrolswithout high-risk PRS). NaN, not a number.
Prospective PRS assay
Sensitivity and specificity of array and imputation
The replication results above supported the development of a genotype array-based clinical assay for PRS and secondary findings from the American College of Medical Genetics and Genomics v2.0 list (ACMG SF v2.0)24. To determine the performance of the arrays used in the prospective assay and of the imputation pipeline, we used three reference Genome In A Bottle (GIAB) samples (NA12878, NA24385 and NA24631, Supplementary Table 5)25. Sensitivity and positive predictive value (PPV) for single-nucleotide variants (SNV) were > 99.7% on average, with lower performance for indels (sensitivity, 96.3%; PPV, 97.8%). Of note, although sensitivity in the ACMG SF v2.0 regions was high (96.2%), PPV was low (63.6%) due to these regions having an excess of poorly performing rare variants26,27.
As expected, sensitivity and PPV decreased for imputed data, especially for indels (SNV sensitivity, 98.0%; SNV PPV, 97.5%; indel sensitivity, 92.8%; indel PPV, 90.7%) (Supplementary Table 5). NA12878 was not evaluated for imputation accuracy because it is present in the imputation reference dataset from the 1000 Genomes Project and has artificially high imputation accuracy. To further evaluate imputation accuracy, we compared genome sequencing data to array data for 22 diverse samples. Analytical performance was lower in this dataset than in the GIAB high-confidence data (~3% reduction in performance for sensitivity and PPV, Supplementary Table 6).
Performance of prospective PRS assay
For the GIAB samples, PRSstd-adj was robust across different array versions and consistent with results from whole genome sequence (WGS) data; all three GIAB samples were below the high-risk threshold (OR > 2) for all diseases in all methods (Supplementary Table 7). In evaluating the 22 samples with WGS and prospective array data, PRSstd-adj scores were similarly concordant, particularly for AFib, CAD and T2D (Extended Data Fig. 8). Additionally, 108/110 high-risk status classifications were concordant in this dataset (98.2% agreement; Matthews correlation coefficient, 0.84; P < 0.001), with the two discordant values (one in CAD and one in CRCa) being very close to the high-risk threshold (Supplementary Table 8). Finally, we compared nine individuals with high-risk PRS for 10 diseases identified in the MGBB genotyping data to their PRS risk status using the prospective assay (one individual at high risk for AFib, one individual at high risk for BrCa, three individuals at high risk for CAD, three individuals at high risk for CRCa, one individual at high risk for PrCa and one individual at high risk for T2D). All PRS categories were consistent across the two different arrays used for MGBB genotyping and for the clinical assay (Supplementary Table 9).
Clinical PRS report
We then developed a PRS laboratory report consistent in format and content with other clinical genetic test reports (Supplementary File 1)28–30. That is, it includes a description of the test performed and a prominently displayed summary of important findings and their interpretations. Subsequent sections of the report give more detail about the results, including, for each disease, general population prevalence and a brief summary of the GWAS from which the PRS was derived. Sections on methodology and literature references are at the end of the report. The report also reflects several choices made during its development. A graphic highlights in red the disease(s) for which the patient has increased polygenic disease risk, as defined by a PRS corresponding to a published OR > 2 for disease, mirroring both a common threshold from Mendelian genetics31 and the effect sizes for disease risk factors already considered in current clinical care32–36. Any PRS not categorized as high risk is described as conferring average risk. Monogenic disease variants and PRS results are reported separately, without comment on any possible interaction between a monogenic result and a relevant PRS (for example, an average-risk BrCa PRS and a pathogenic variant in BRCA1 associated with hereditary breast and ovarian cancer). Illustrating the boundary where the role of the clinical laboratory ends (phase 2 of Fig. 1) and the role of the treating physician begins (phase 3), the laboratory report does not include information about absolute disease risk or the role of other, non-genetic factors in disease risk, and it is not directive in its recommendations for clinical management of high-risk results.
Clinical processes and supportive materials
In recognition that physicians and patients require additional guidance in contextualizing high-risk PRS results, the GenoVA Study has developed processes and materials to support the clinical use of PRS. A genetic counselor contacts each patient with a high-risk PRS result or monogenic disease variant to discuss the result’s health significance and offer guidance for a conversation to have with their physician. All patients and their primary care physicians receive a copy of the laboratory report, and each patient with at least one high-risk PRS result is additionally given patient-oriented educational materials about the relevant disease(s) (Supplementary File 2). The patient’s primary care physician also receives a copy of physician-oriented educational materials to support their decision-making about PRS (Supplementary File 3). Given the current state of the evidence, the physician materials note that professional guidelines do not recommend specific changes to general screening or prevention recommendations based on PRS results, but these materials are updated over the course of the study as evidence accrues to support distinct recommendations.
Results from the first 227 prospective samples
As of 21 October 2021, 227 GenoVA trial participants have been assayed using the prospective PRS pipeline from two primary sample types (130 blood, 97 saliva). Of these, 108 participants (48%) self-report as white race and non-Hispanic/Latinx ethnicity, and 78 (34%) identify as women. In this preliminary sample of trial enrollees, the proportions of participants whose PRS are above the study threshold for high risk are consistent with those observed in the MGBB, ranging from 5.7% for CRCa to 15.3% for PrCa (Table 2). Two actionable ACMG SF v2.0 variants have been identified and confirmed in the first 227 enrollees (BRCA1:NM_007294 c.2748delT (p.Asn916LysfsX84), likely pathogenic; BRCA2:NM_000059 c.3545_3546delTT (p.Phe1182X), pathogenic). The reporting of these results to trial participants and their physicians is underway. The study will determine whether PRS implementation affects clinical management and enables the detection of undiagnosed prevalent cases and incident cases during the observation period.
Table 2.
BrCa | CRCa | PrCa | AFib | CAD | T2D | |
---|---|---|---|---|---|---|
Total analyzed, n | 77 | 227 | 150a | 227 | 227 | 227 |
Average risk, n (%) | 67 (87.0) | 214 (94.3) | 127 (84.7) | 203 (89.4) | 211 (92.9) | 210 (92.5) |
High risk, n (%) | 10 (13.0) | 13 (5.7) | 23 (15.3) | 24 (10.6) | 16 (7.1) | 17 (7.5) |
Results from the first 227 GenoVA participants. High-risk PRS, defined here as PRSstd-adj associated with OR > 2 for disease in the original publication. All other results are considered as average risk.
aOne participant with male sex identifies as female.
Discussion
Bridging two significant gaps between PRS development and clinical implementation, we developed a clinical genotyping array-based assay for six PRS and a process to report the results to patients and primary care physicians. The PRS were robust across multiple genotyping arrays and imputation pipelines. The distributions of unadjusted PRS varied by reported race in a large biobank, impeding clinical validation, but adjustment for population structure enabled the replication of published PRS–disease associations. These results supported the development of a population structure-adjusted pipeline for PRS calculation and reporting for individual patients, now implemented in a clinical trial of PRS testing along with patient and physician educational materials and genetic counseling support.
The development and implementation of our PRS assay and report illustrate key choices that laboratories must make in what we term phase 2 of the PRS implementation pathway. First, for each target disease, we had to choose the specific PRS to implement among multiple publicly available options (that is, PRS developed and validated by others in phase 1)21,22. Considerations include the performance of the PRS in both the published discovery and replication cohorts in addition to the population that the laboratory is interested in targeting. Guidelines are emerging on what defines high-quality PRS reporting37, and this improved transparency should help laboratories to select appropriate PRS from the many available. Second, we chose to use a genotype array-based approach instead of genome sequencing. Like genotyping, low-coverage genome sequencing technology is also relatively low cost38. We chose the Illumina GDA because its widespread use in the All of Us Research Program39, eMERGE Consortium15 and other projects optimizes the likelihood that it will be a well-supported genotyping platform for future improvements, and enhances the generalizability of our methods to other institutions looking to implement clinical PRS testing. Third, although published methods can adjust for population structure in large cohorts of people40,41, these methods are not immediately applicable for correcting a PRS for a prospectively genotyped individual patient, whose sample is at best part of a small clinically analyzed batch with insufficient data for robust population structure adjustment. Correction thus requires additional decisions about how to adjust for population structure and which reference to use. We chose to impute data against 1000 Genomes Project phase 3 data and to project each new individual patient sample onto the principal components from the MGBB. Other laboratories may choose to impute against the larger TOPMed (Trans-Omics for Precision Medicine) population42, although issues of genome build discrepancy and regulatory prohibition against sending patient data to external research servers are limitations. Fourth, once a platform is selected, a clinical laboratory must determine the benchmarks that define an analytically valid PRS assay. We chose to verify the PRS performance in our laboratory to determine the appropriate parameters for our assay; calculate the analytical performance of the genotyping array and imputation pipeline using both well-characterized reference samples and individual level genome data; and calculate the robustness and performance of the PRS using genome data and multiple array platforms from both reference and individual samples. This multi-step approach helped ensure the accuracy of the data going into the PRS as well as the final performance of the PRS itself.
We also made numerous choices in how to report PRS results and interpretations to patients and physicians. We decided to report a dichotomous PRS interpretation (that is, high risk versus average risk) instead of a continuous result (for example, percentile rank, relative risk or absolute risk). We have previously described the trade-offs of these approaches, including the need for actionability thresholds; transparency about the limitations of PRS, particularly in underrepresented populations; and the absence of validated predictions models that incorporate both PRS and other clinical risk factors43. For the GenoVA Study we favored a dichotomous result to indicate a possible clinical action threshold to the treating physician. We chose OR > 2 to define high polygenic risk, consistent with effect sizes of traditional risk factors considered for the target diseases32–36. Another laboratory may use the methods we describe to produce measures of continuous risk or of categorical risk at different thresholds thought to be clinically meaningful, which will probably vary among the diseases for which they choose to implement PRS. Estimating absolute disease risk (for example, with the BOADICEA model for breast cancer44 or the Pooled Cohort Equations for atherosclerotic cardiovascular disease45) may be considered the gold standard for risk stratification, but validated absolute risk models are not available for most diseases and require patient information (for example, mammographic breast density and blood pressure) that is often unavailable to the interpreting laboratory. Drawing on other examples from primary care, we chose not to include directive clinical recommendations on the PRS laboratory report itself, instead assigning such activities to phase 3 of the PRS implementation pathway, supported by informational materials and genetic counseling support. We note that, for example, although a laboratory reports the results of a patient’s low-density lipoprotein cholesterol and reference range for the assay, it is the treating physician who contextualizes that result with the patient’s other characteristics to decide whether to offer cholesterol-lowering therapy.
The question of how to support physician management of PRS results without under- or overselling the potential benefits of PRS is controversial, given the lack of prospective data showing that the clinical use of PRS improves patient outcomes. In this early era of PRS implementation, the most prudent course of action is probably to develop educational and consultant resources, such as those used in the GenoVA Study, to present transparently the evidence for and limitations of PRS interpretations without being overly prescriptive in their recommendations. Given the participant age range and choice of diseases in the GenoVA Study, we anticipate that most physician actions will fall within already clinically acceptable practices (for example, more frequent hemoglobin A1c screening for T2D or favoring colonoscopy screening over fecal immunochemical testing for CRCa screening). Stronger evidence of benefit will be needed to justify actions that deviate more significantly from accepted practice, such as screening starting at much younger ages or requiring more invasive or expensive procedures. As they do in all areas of medicine, physicians will need to use available evidence and clinical judgment to make the best decisions with their patients. The GenoVA Study is collecting data on what physicians do with PRS results and their preferences for how they can be supported in this decision-making.
Although other laboratories are developing PRS assays in both clinical and research settings and have reported the aggregate performance of these PRS in a population, including biobanks or customers of direct-to-consumer companies15,38,46–48, none has described the development and validation of a clinical, population structure-adjusted assay for prospectively tested individuals. Although the eMERGE consortium and other studies are actively developing trans-ancestry PRS for a number of common diseases15,49, we report, here, a single clinical assay for population structure-adjusted PRS for multiple diseases. And while other laboratories may make different decisions about the number of disease PRS they choose to implement, whether and how to compare the performance of multiple available PRS for each disease, and the format of the clinical PRS report, our work provides a framework for how a laboratory can clinically validate and implement a prospective PRS suitable for an individual patient.
Much has been written about the reduced validity of most PRS in populations of non-European ancestry, due to their use of non-causal loci and effect sizes from GWAS in predominantly European discovery cohorts19,20,50,51. As we await larger datasets from more diverse populations and the methodological advances that will improve the performance of trans-ancestry PRS10,15,49, a clinical laboratory looking to develop a PRS assay for a given disease has the following options: (1) postpone implementation, as done by some commercial laboratories;52,53 (2) implement separate ancestry-specific published PRS only in those ancestral groups from which they were derived and validated; or (3) implement a single PRS that aims for applicability across ancestry groups and report transparently any applicable limitations in the underlying evidence and its interpretation for specific individuals or ancestral groups. Because the second option requires the assignment of an individual patient to a specific ancestry group, either before or during PRS analysis, and, problematically, risks the inequitable provision of PRS to some populations but not to others, we chose the third option for the GenoVA Study and implemented a single method of adjustment for population structure. After doing so, we observed that the chosen PRS threshold corresponding to OR > 2 generally identified subjects at higher risk of disease across reported race in the MGBB replication cohort. The magnitude and precision of this effect did vary by reported race, probably due to two factors: small numbers of MGBB cases for certain diseases in certain racial groups; and real differences in the ability of these PRS to correlate with disease risk in non-European ancestry groups, as has been observed even in well-developed trans-ancestry PRS10,54. Methodological advances that leverage local ancestry or GWAS summary statistics from multiple diverse populations will improve the performance of PRS across ancestry groups55,56. In the meantime, we have developed a clinically validated PRS assay, the application of which in diverse ancestry groups is defensible but the results of which, nonetheless, have limitations. These limitations are clearly presented in a clinical laboratory report (phase 2), which can then be contextualized by the physician for each individual (phase 3). Applying population-level data to individual patient care represents both the science and art of medical practice, particularly when the individual patient is not well represented in the available data57,58.
In conclusion, data from increasingly larger and more diverse populations, coupled with computational advances, are propelling PRS into consideration for clinical implementation. We have shown that laboratory assay development and PRS reporting to patients and physicians are feasible (but non-trivial) next phases in PRS implementation. As the performance of PRS continues to improve, particularly for individuals of underrepresented ancestry groups, the implementation processes we describe can serve as generalizable models for laboratories and health systems looking to realize the potential of PRS for improved patient health.
Methods
Selection of PRS for implementation
We identified large GWAS for the six target diseases for which the summary statistics (base files with alleles and weights) were freely available from the Polygenic Score (PGS) Catalog21 (AFib, CAD, T2D, BrCa) or the Cancer PRSWeb (CRCa, PrCa)22 as of 26 December 2019. For the three cardiometabolic diseases (AFib, CAD and T2D) we chose the PRS derived from the UK Biobank in Khera et al. 2018 (ref. 7): for AFib, the PGS Catalog Publication (PGP) ID is PGP000006 and the PGS ID is PGS000016; for CAD, the PGP ID is PGP000006 and the PGS ID is PGS000013; and for T2D the PGP ID is PGP000006 and the PGS ID is PGS000014. For the three cancers we chose PRS derived from the largest published GWAS at the time: for BrCa we used Michailidou et al. 2017 and Mavaddat et al. 2019 (PGS ID = PGS000007, PGP ID = PGP000002) (ref. 59,60); for CRCa we used Huyghe et al. 2019 (PRSWEB_PHECODE153_CRC-Huyghe_PT_MGI_20191112, PRS tuning parameter: 3.98107170553497e-07) (ref.23); and for PrCa we used Schumacher et al. 2018 (PRSWEB_PHECODE185_Pca- PRACTICAL_LASSOSUM_MGI_20191112, PRS tuning parameter: s0.5_Lambda0.00695192796177561) (ref. 61).
Replication of published PRS
Population and sample
Given that the GenoVA Study is enrolling participants from eastern Massachusetts, USA, we used data from the Mass General Brigham (formerly Partners Healthcare) Biobank (MGBB)62 to evaluate the performance of the selected PRS in a similar population and workflow for our study and assay. MGBB participants were not included in the published derivation and validation studies for the PRS used. In brief, MGBB was launched in 2010 with the initial goal of collecting DNA, plasma, and serum samples from 75,000 patients from Brigham and Women’s Hospital, Massachusetts General Hospital, and other MGBB-affiliated healthcare facilities, and obtaining patient consent for the linkage between biospecimen data, medical record data and survey data. We use the terms ‘race’ and ‘ethnicity’ to refer to social constructs often used in healthcare operations and biomedical research to evaluate and address disparities between populations. Racial categories of participants in the MGBB (for example, white or Asian) are derived from electronic health record (EHR) data. For the present analysis we collapsed reported race in MGBB into four categories: Asian, Black, white, and other/unknown. Race and ethnicity of GenoVA Study participants were collected through EHR data and self-report and categorized using the five racial categories (American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and white) and two ethnic categories (Hispanic/Latinx and Not Hispanic/Latinx) required by US federal data collection standards. We use the term ‘ancestry’ to describe the genetic construct describing inheritance of variants from global ancestral populations.
Disease phenotyping
We used validated computed phenotypes from MGBB to define case and control status for each of the six diseases (Supplementary Table 1). Validated MGB phenotypes are available for CAD (PPV = 95%), AFib (PPV = 94%), T2D (PPV = 95%), and colorectal (PPV = 100%), breast (PPV = 95%) and prostate cancer (PPV = 100%)63–65. For each disease, ‘caseness’ was defined as prevalent disease on 16 December 2019. For subgroup analyses, participant age was determined on 16 December 2019 or at death, if earlier. Only women and men were assigned case or control status for breast and prostate cancer, respectively.
Genotyping and imputation
We used genotype data from the 36,423 MGBB participants with available genotyping data as of 16 December 2019. Genotyping was performed using standard processing described previously on one of three Illumina Infinium genotyping arrays: (1) a pre-release version developed by the Multi-Ethnic Genotyping Array Consortium (Multi-Ethnic Genotyping Array (MEGA), n = 4,924); (2) an expanded version of this pre-commercial array (Expanded Multi-Ethnic Genotyping Array (MEGAEX), n = 5,345); and (3) the final commercial version (Multi-Ethnic Global (MEG), n = 26,157). The MEGA, MEGAEX and MEG arrays consisted of 1.39, 1.74 and 1.78 million probes, respectively66. For MEGA and MEGAEX data, only probes found in the commercial version of the array (MEG) were used in the present analysis. Quality control for the genotyping requires samples to have at least a 99% call rate and concordant sex between the EHR and what is computed from the array data. We used existing MGBB imputed data generated by batching sets of ~5,000 participants and imputing against the 1000 Genomes Project phase 3 data using the Michigan Imputation Server67 (https://imputationserver.sph.umich.edu/index.html#!), with ShapeIT (v2.r790) (ref. 68) used for phasing and Minimac3 used for imputation with default settings. Sets of imputed variants were compared with the base files for each PRS to ensure sufficient representation of probes (Supplementary Table 2) (ref. 67).
Calculation of PRS and adjustment for population structure
Unadjusted raw PRS (PRSraw) for each disease were calculated using PLINK (v.2.0a) by taking the product of the count of risk alleles and the risk allele weight at each locus in the PRS and then summing across available risk loci. The loci included in each PRS, the risk alleles and the corresponding weights were downloaded from the PGS Catalog or Cancer PRSWeb. A population structure-adjusted PRS was calculated for each disease, using a previously described approach40 implementing principal components analysis to compute adjusted residualized PRS for each disease. Principal components were calculated using all genotyped MGBB participants and a set of 16,385 of 16,443 previously reported ancestry-informative SNPs69. For each disease we then fit a linear model for PRSraw as a function of the first four principal components in controls for that disease (PRSraw ~ PC1 + PC2 + PC3 + PC4) in R (v.4.0.3). We then applied this model to calculate a predicted PRS (PRSpred) for each disease in all cases and controls. Residualized, population structure-adjusted PRS (PRSadj) were then computed for each individual for each disease as the difference between the raw and the predicted PRS (PRSraw − PRSpred). For PRSraw, values were standardized (PRSstd-raw) using the mean and standard deviation in the MGBB of the PRSraw values (Supplementary Table 3). Similarly, PRSstd-adj was computed using the mean and standard deviation in the MGBB of the PRSadj values (Supplementary Table 3). The distributions of PRSstd-raw and PRSstd-adj by genotype array, sex, age deciles and reported race were compared among all subjects using the density function in R (v.4.0.3).
PRS–disease association
The association of PRSstd-adj with the odds of disease was replicated in MGBB participants using the six disease phenotypes described above. For each PRS and disease, odds of disease (ncases/ncontrols) were calculated for each of 50 PRS quantiles. For race-stratified analyses, PRS deciles were used if too few cases were available for analysis across 50 quantiles. To visualize the PRS–disease associations, we plotted the log(odds) of disease against the mean PRSstd-adj in each quantile. Correlation was measured with Pearson correlation coefficients using RStudio (v.1.1.383) with R (v.4.0.3).
PRS threshold for high risk
We set a predicted polygenic OR > 2 to identify individuals at high polygenic risk for each disease, mirroring both a common threshold from Mendelian genetics31 and the effect sizes for disease risk factors already considered in current clinical care32–36. To operationalize this OR > 2 threshold, we compared standardized PRS Z scores for each individual to a disease-specific cut off 𝛕, based on previously published estimates of the change in odds of disease per standard deviation change in the PRS (Supplementary Table 3). Specifically, 𝛕 = ln(2)/ln(ORs.d.), where 2 is the target OR threshold defining high risk and ORs.d. is the estimated multiplicative change in odds per standard deviation change in the PRS. Assuming that the published ORs.d. accurately captures the relationship between PRS and disease, the odds of disease for individuals with standardized PRS Z score = 𝛕 are twofold that of individuals with a median PRS Z score. These standardized PRS thresholds were used to assign individual patients to risk categories as described below (PRS calculation for clinical assay for individual samples).
Clinical PRS assay for individual samples
Based on the results of the above methods, we developed and validated a genotype array-based clinical assay for PRS, in addition to secondary findings from the ACMG v2.0 list (ACMG SF v2.0, Fig. 1)24. We include additional variants identified by the ACMG or other organizations as important secondary findings as updated recommendations accrue70.
Validation samples
Replicates of each of three reference samples from GIAB25 maintained by the National Institute of Standards and Technology were included in the validation assay: NA12878 × 9, NA24631 × 6 and NA24385 × 6. Analytical performance (sensitivity and PPV for presence or absence of variant sites) was determined in the benchmarking regions (v3.3.2). In addition, we included 22 samples with polymerase chain reaction-free genome sequencing data (described below) and 9 samples with high-risk PRS for one of the six diseases as determined by the MGBB data, including one individual with high-risk PRS for two diseases. To test the sensitivity of the secondary finding analysis, we genotyped 20 samples with previously identified pathogenic or likely pathogenic variants in the ACMG SF v2.0 list.
Genotyping and imputation
Validation samples were genotyped according to manufacturer-standard workflows on either a pre-commercial release of the Illumina Global Diversity Array (GDA-PC) or the final commercial release of the Global Diversity Array (GDA). The Illumina-specific files containing called genotypes in AA/AB/BB format (GTC files) generated by genotype array were converted to variant call format (VCF) using a modified version of the gtc2vcf script from Illumina. All samples required an overall call rate of greater than 98.5%. Imputation was performed using updated software, with EAGLE v2.4.1 (ref. 71) for phasing and Minimac4 (ref. 67) for imputation using the 1000 Genomes Project phase 3 dataset. Importantly, monomorphic sites were not removed during the imputation process due to the small batch sizes used in the prospective assay.
PRS calculation for clinical assay for individual samples
PRSraw was calculated for each sample as described above. To determine PRSadj, unadjusted PRS (PRSraw) were first calculated for each individual sample as described for the overall MGBB cohort. For each individual, the eigenvariable, eigenvalue and frequency output from the MGBB principal components analysis were used to project each new individual sample onto the MGBB principal components, using the following command in PLINK v.2.0a:72
plink2 —pfile individual_data —read-freq ref_pcs.acount —score ref_pcs.eigenvec.allele 2 5 header-read no-mean-imputation variance-standardize —score-col-nums 6-15 —out new_projection
The resulting projected principal components were then scaled to match the MGBB principal components by taking the square root of the eigenvalue and then multiplying by 2. The scaled principal components (PCs) were fitted into the linear model for each disease developed in the MGBB data to obtain PRSpred:
BrCa: PRSpred = 17.609341*PC1 − 4.146935*PC2 + 5.335144*PC3 + 3.833931*PC4 − 0.421679
CRCa: PRSpred = −13.659121*PC1 + 6.411109*PC2 − 2.483703*PC3 − 6.869127*PC4 + 6.131384
PrCa: PRSpred = 23.441147*PC1 + 13.724771*PC2 − 9.528270*PC3 + 4.118756*PC4 + 11.506243
AFib: PRSpred = 9.6269881*PC1 − 3.2878238*PC2 − 6.6519006*PC3 − 3.0149108*PC4 + 32.4067610
CAD: PRSpred = −6.1974327*PC1 − 3.6757094*PC2 − 1.3488677*PC3 − 1.3490566*PC4 + 18.0582457
T2D: PRSpred = 26.4700782*PC1 − 7.4283370*PC2 + 9.3782116*PC3 + 1.6994457*PC4 + 55.6998719
PRSadj was then calculated as the difference between PRSraw and PRSpred. Standardized, adjusted PRS values (PRSstd-adj) were calculated using the mean and standard deviation of PRSadj in MGBB and compared against the PRS threshold corresponding to OR > 2 as determined from the original publications (Supplementary Table 3). Any PRSstd-adj result above the PRS threshold corresponding to OR > 2 was categorized as high polygenic risk.
Genome sequencing
We selected 22 diverse samples that had previously undergone clinical whole genome sequencing to determine the robustness of PRS across different platforms. Genome sequencing was performed at the Clinical Research Sequencing Platform of the Broad Institute using polymerase chain reaction-free library construction and sequencing on an Illumina NovaSeq with two 150 bp paired-end reads with ≥95% of bases covered at ≥20-fold. Reads were aligned to GRCh37 using the Burrows–Wheeler Aligner (BWA v.0.7.15)73 and variant calls were made using HaplotypeCaller from the Genomic Analysis Tool Kit (GATK v.4.0.3.0)74,75. PRSraw, PRSstd-raw, PRSadj and PRSstd-adj were calculated as above for the other prospective samples. As stated above, these 22 samples were also analyzed on the GDA-PC array to compare PRS between genome sequencing and array. The difference between the sequence-based and array-based PRS were visualized, and dichotomous risk classifications were formally compared using the Matthews correlation coefficient76.
Identification of actionable variants associated with monogenic disease
Variants from the original genotyping VCF were annotated and filtered to the 59 genes suggested for screening of secondary findings as recommended by the ACMG (ACMG SF v2.0)24 to find: (1) variants previously identified as disease causing by the MGB Laboratory for Molecular Medicine; (2) variants classified as pathogenic or likely pathogenic in ClinVar with a minor allele frequency (MAF) < 0.1%; (3) variants classified as a disease-causing mutation in the Human Gene Mutation Database with a MAF < 0.03%; and (4) loss-of-function variants (nonsense, frameshift, canonical splice-site, and initiating methionine variants) with a MAF < 0.1% in genes in which that is a disease mechanism. Clinical variant classification was carried out in accordance with the criteria set by the guidelines by the ACMG and the Association of Molecular Pathology77, with disease-specific modifications as recommended by the Clinical Genome Resource Expert Panels78.
Prospectively enrolled trial participants
The assay described above is now in use in the ongoing GenoVA Study randomized trial of clinical PRS (ClinicalTrials.gov identifier: NCT04331535), in which eligible participants are patients of the VA Boston Healthcare System, aged 50–70 years, without known diagnoses of the six target diseases. Enrollees provide a clinical blood or saliva sample for analysis at the Laboratory for Molecular Medicine.
Ethics declaration
Analyses of the genomic and MGBB samples and data have been reviewed and approved by the Mass General Brigham institutional review board (2019P001933). Analyses for the prospective pipeline, including the use of prior clinical samples, were conducted under the Mass General Brigham institutional review board (2004P001056); all individuals with clinical testing, including those with genome sequencing data, gave consent for clinical testing, and all individual data were de-identified. The GenoVA Study is approved by the VA Boston Healthcare System (no. 3241) and Harvard Medical School institutional review board (IRB19-0594), and all enrollees provided written informed consent.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41591-022-01767-6.
Supplementary information
Acknowledgements
This work was supported by NIH National Human Genome Research Institute grant R35HG010706 (J.L.V.). S.A.L. is supported by NIH grants R01HL139731, R01HL157635 and American Heart Association 18SFRN34250007. P.N. is supported by grants from the National Heart, Lung and Blood Institute (R01HL142711, R01HL148050, R01HL151283, R01HL127564, R01HL148565, R01HL135242, R01HL151152), National Institute of Diabetes and Digestive and Kidney Diseases (R01DK125782), Fondation Leducq (TNE-18CVD04) and Massachusetts General Hospital (Fireman Chair). M.S.L. is supported by NIH grants U01HG008685, R01HG010372, R01HL143295, OT2OD02750, U41HG006834 and U01TR003201, all unrelated to the present work.
Extended data
Author contributions
L.H., P.K., C.A.B., M.K.G., R.C.G., A.C.F.L., S.A.L., P.N., J.L.V. and M.S.L. conceived the study and contributed to its design. L.H., G.F.B., E.D.H., C.K., P.K.V.K., S.S.P., M.S., W.Y. and C.A.B. analyzed the data from the MGBB and GenoVA samples. A.A.A., C.A.B., M.D., N.E.J. and J.L.V. collected GenoVA data. L.H., J.L.V. and M.S.L. drafted the manuscript, and all authors reviewed the scientific content of the manuscript prior to submission.
Peer review
Peer review information
Nature Medicine thanks Krista Fischer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Anna Maria Ranzoni was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Data availability
The majority of the MGBB genotyped samples are deposited in dbGAP as part of the eMERGE consortium, phase 3 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001584.v2.p2). Additional MGBB data were accessed under institutional review board protocol for this current study and are not publicly available due to restrictions on the data. Data from the GenoVA Study trial will be made publicly available after study completion. The 1000 Genomes Project phase 3 dataset used in this study was v5a and was downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/.
Code availability
The code used to adjust the PRS for population structure is available for download here: https://github.com/MGB-Personalized-Medicine/PRS-adjustment.
Competing interests
A.C.F.L. owns stock in Fabric Genomics. S.A.L. receives sponsored research support from Bristol Myers Squibb, Pfizer, Boehringer Ingelheim, Fitbit, Medtronic, Premier and IBM, and has consulted for Bristol Myers Squibb, Pfizer, Blackstone Life Sciences and Invitae. P.N. reports investigator-initiated grants from Amgen, Apple, Boston Scientific, AstraZeneca and Novartis, personal fees from Apple, Genentech, AstraZeneca, Blackstone Life Science, Foresite Labs and Novartis, spousal employment at Vertex, and being co-founder of TenSixteen Bio, all unrelated to the present work. R.C.G. has received compensation for advising the following companies: AIA, Allelica, Embryome, GenomeWeb, Genomic Life, Grail, Humanity, Meenta, OptumLabs, Plumcare, Verily, VinBigData; and is co-founder of Genome Medical, Inc. C.K. now works at Novartis Institutes for BioMedical Research. A.A.A., C.A.B., M.D. and J.L.V. are employees of the US Department of Veterans Affairs (DVA); the views expressed in this paper do not represent those of the DVA or US government. All other authors have no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Posthumous authorship: Wanfeng Yu
These authors contributed equally: Jason L. Vassy, Matthew S. Lebo.
Extended data
are available for this paper at 10.1038/s41591-022-01767-6.
Supplementary information
The online version contains supplementary material available at 10.1038/s41591-022-01767-6.
References
- 1.Shendure J, Findlay GM, Snyder MW. Genomic medicine: progress, pitfalls, and promise. Cell. 2019;177:45–57. doi: 10.1016/j.cell.2019.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.GWAS Catalog (National Human Genome Research Institute); https://www.ebi.ac.uk/gwas/
- 3.Meigs JB, et al. Genotype score in addition to common risk factors for prediction of type 2 diabetes. N. Engl. J. Med. 2008;359:2208–2219. doi: 10.1056/NEJMoa0804742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ripatti S, et al. A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses. Lancet. 2010;376:1393–1400. doi: 10.1016/S0140-6736(10)61267-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zheng SL, et al. Cumulative association of five genetic variants with prostate cancer. N. Engl. J. Med. 2008;358:910–919. doi: 10.1056/NEJMoa075819. [DOI] [PubMed] [Google Scholar]
- 6.Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to omnigenic. Cell. 2017;169:1177–1186. doi: 10.1016/j.cell.2017.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Khera AV, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Vilhjálmsson BJ, et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Inouye M, et al. Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J. Am. Coll. Cardiol. 2018;72:1883–1893. doi: 10.1016/j.jacc.2018.07.079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Conti DV, et al. Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat. Genet. 2021;53:65–75. doi: 10.1038/s41588-020-00748-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Klarin D, et al. Genome-wide association analysis of venous thromboembolism identifies new risk loci and genetic overlap with arterial vascular disease. Nat. Genet. 2019;51:1574–1579. doi: 10.1038/s41588-019-0519-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Mosley JD, et al. Predictive accuracy of a polygenic risk score compared with a clinical risk score for incident coronary heart disease. JAMA. 2020;323:627–635. doi: 10.1001/jama.2019.21782. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Vassy JL, et al. Polygenic type 2 diabetes prediction at the limit of common variant detection. Diabetes. 2014;63:2172–2182. doi: 10.2337/db13-1663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Seibert TM, et al. Polygenic hazard score to guide screening for aggressive prostate cancer: development and validation in large scale cohorts. BMJ. 2018;360:j5757. doi: 10.1136/bmj.j5757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.National Human Genome Research Institute (NHGRI). Electronic Medical Records and Genomics (eMERGE) Networkhttps://www.genome.gov/Funded-Programs-Projects/Electronic-Medical-Records-and-Genomics-Network-eMERGE (2020).
- 16.Shieh, Y. et al. Breast cancer screening in the precision medicine era: risk-based screening in a population-based trial. J. Natl Cancer Inst. 109, 10.1093/jnci/djw290 (2017). [DOI] [PubMed]
- 17.Brockman DG, et al. Design and user experience testing of a polygenic score report: a qualitative study of prospective users. BMC Med. Genomics. 2021;14:238. doi: 10.1186/s12920-021-01056-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Choi SW, Mak TS-H, O’Reilly PF. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 2020;15:2759–2772. doi: 10.1038/s41596-020-0353-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Martin AR, et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 2020;12:44. doi: 10.1186/s13073-020-00742-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lambert SA, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 2021;53:420–425. doi: 10.1038/s41588-021-00783-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Fritsche LG, et al. Cancer PRSweb: an online repository with polygenic risk scores for major cancer traits and their evaluation in two independent biobanks. Am. J. Hum. Genet. 2020;107:815–836. doi: 10.1016/j.ajhg.2020.08.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Huyghe JR, et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nat. Genet. 2019;51:76–87. doi: 10.1038/s41588-018-0286-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kalia SS, et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet. Med. 2017;19:249–255. doi: 10.1038/gim.2016.190. [DOI] [PubMed] [Google Scholar]
- 25.Zook JM, et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data. 2016;3:160025. doi: 10.1038/sdata.2016.25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bowling KM, et al. Identifying rare, medically relevant variation via population-based genomic screening in Alabama: opportunities and pitfalls. Genet. Med. 2021;23:280–288. doi: 10.1038/s41436-020-00976-z. [DOI] [PubMed] [Google Scholar]
- 27.Weedon MN, Wright CF, et al. Use of SNP chips to detect rare pathogenic variants: retrospective, population based diagnostic evaluation. BMJ. 2021;372:n214. doi: 10.1136/bmj.n214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Scheuner MT, Edelen MO, Hilborne LH, Lubin IM. Effective communication of molecular genetic test results to primary care providers. Genet. Med. 2013;15:444–449. doi: 10.1038/gim.2012.151. [DOI] [PubMed] [Google Scholar]
- 29.McLaughlin HM, et al. A systematic approach to the reporting of medically relevant findings from whole genome sequencing. BMC Med. Genet. 2014;15:134. doi: 10.1186/s12881-014-0134-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Farmer GD, Gray H, Chandratillake G, Raymond FL, Freeman ALJ. Recommendations for designing genetic test reports to be understood by patients and non-specialists. Eur. J. Hum. Genet. 2020;28:885–895. doi: 10.1038/s41431-020-0579-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Senol-Cosar O, et al. Considerations for clinical curation, classification, and reporting of low-penetrance and low effect size variants associated with disease risk. Genet. Med. 2019;21:2765–2773. doi: 10.1038/s41436-019-0560-8. [DOI] [PubMed] [Google Scholar]
- 32.Goff DC, et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines. Circulation. 2014;129(Suppl. 2):S49–S73. doi: 10.1161/01.cir.0000437741.48606.98. [DOI] [PubMed] [Google Scholar]
- 33.American Diabetes Association. 2. Classification and diagnosis of diabetes: standards of medical care in diabetes – 2021. Diabetes Care. 2021;44(Suppl. 1):S15–S33. doi: 10.2337/dc21-S002. [DOI] [PubMed] [Google Scholar]
- 34.Grossman DC, et al. Screening for prostate cancer: US Preventive Services Task Force recommendation statement. JAMA. 2018;319:1901–1913. doi: 10.1001/jama.2018.0161. [DOI] [PubMed] [Google Scholar]
- 35.Siu AL, US Preventive Services Task Force Screening for breast cancer: US Preventive Services Task Force recommendation statement. Ann. Intern. Med. 2016;164:279–296. doi: 10.7326/M15-2886. [DOI] [PubMed] [Google Scholar]
- 36.Davidson KW, US Preventive Services Task Force Screening for colorectal cancer: US Preventive Services Task Force recommendation statement. JAMA. 2021;325:1965–1977. doi: 10.1001/jama.2021.6238. [DOI] [PubMed] [Google Scholar]
- 37.Wand H, et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature. 2021;591:211–219. doi: 10.1038/s41586-021-03243-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Homburger JR, et al. Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores. Genome Med. 2019;11:74. doi: 10.1186/s13073-019-0682-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Denny JC, et al. The ‘All of Us’ Research Program. N. Engl. J. Med. 2019;381:668–676. doi: 10.1056/NEJMsr1809937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Khera AV, et al. Whole-genome sequencing to characterize monogenic and polygenic contributions in patients hospitalized with early-onset myocardial infarction. Circulation. 2019;139:1593–1602. doi: 10.1161/CIRCULATIONAHA.118.035658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Dikilitas O, et al. Predictive utility of polygenic risk scores for coronary heart disease in three major racial and ethnic groups. Am. J. Hum. Genet. 2020;106:707–716. doi: 10.1016/j.ajhg.2020.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kowalski MH, et al. Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS Genet. 2019;15:e1008500. doi: 10.1371/journal.pgen.1008500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lewis ACF, Green RC, Vassy JL. Polygenic risk scores in the clinic: translating risk into action. HGG Adv. 2021;2:100047. doi: 10.1016/j.xhgg.2021.100047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Lee A, et al. BOADICEA: a comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genet. Med. 2019;21:1708–1718. doi: 10.1038/s41436-018-0406-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Stone NJ, et al. 2013 ACC/AHA guideline on the treatment of blood cholesterol to reduce atherosclerotic cardiovascular risk in adults: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation. 2014;129(Suppl. 2):S1–S45. doi: 10.1161/01.cir.0000437738.63853.7a. [DOI] [PubMed] [Google Scholar]
- 46.Hughes E, et al. Development and validation of a clinical polygenic risk score to predict breast cancer risk. JCO Precis. Oncol. 2020;4:585–592. doi: 10.1200/PO.19.00360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Our Health + Ancestry DNA Service - 23andMe (23andMe); https://www.23andme.com/dna-health-ancestry/
- 48.Chen S-F, et al. Genotype imputation and variability in polygenic risk score estimation. Genome Med. 2020;12:100. doi: 10.1186/s13073-020-00801-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.National Human Genome Research Institute. Polygenic RIsk MEthods in Diverse populations (PRIMED) Consortiumhttps://www.genome.gov/Funded-Programs-Projects/PRIMED-Consortium
- 50.Manolio TA. Using the data we have: improving diversity in genomic research. Am. J. Hum. Genet. 2019;105:233–236. doi: 10.1016/j.ajhg.2019.07.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Lewis ACF, Green RC. Polygenic risk scores in the clinic: new perspectives needed on familiar ethical issues. Genome Med. 2021;13:14. doi: 10.1186/s13073-021-00829-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Ray, T. Myriad Genetics recalibrates breast cancer PRS for all ancestries in anticipation of broader launch. Genomewebhttps://www.genomeweb.com/molecular-diagnostics/myriad-genetics-recalibrates-breast-cancer-prs-all-ancestries-anticipation (2021).
- 53.Ambry Product Team. Important discontinuation notice: AmbryScore: polygenic risk scores (PRS) https://info.ambrygenetics.com/take-a-brief-survey-for-entry-into-amazon-gift-card-drawing
- 54.Ge, T. et al. Validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations. Preprint at medRxiv10.1101/2021.09.11.21263413 (2021). [DOI] [PMC free article] [PubMed]
- 55.Marnetto D, et al. Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals. Nat. Commun. 2020;11:1628. doi: 10.1038/s41467-020-15464-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ruan, Y. et al. Improving polygenic prediction in ancestrally diverse populations. Preprint at medRxiv10.1101/2020.12.27.20248738 (2021).
- 57.Armstrong KA, Metlay JP. Annals clinical decision making: translating population evidence to individual patients. Ann. Intern. Med. 2020;172:610–616. doi: 10.7326/M19-3496. [DOI] [PubMed] [Google Scholar]
- 58.Sniderman AD, LaChapelle KJ, Rachon NA, Furberg CD. The necessity for clinical reasoning in the era of evidence-based medicine. Mayo Clin. Proc. 2013;88:1108–1114. doi: 10.1016/j.mayocp.2013.07.012. [DOI] [PubMed] [Google Scholar]
- 59.Michailidou K, et al. Association analysis identifies 65 new breast cancer risk loci. Nature. 2017;551:92–94. doi: 10.1038/nature24284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Mavaddat N, et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 2019;104:21–34. doi: 10.1016/j.ajhg.2018.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Schumacher FR, et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat. Genet. 2018;50:928–936. doi: 10.1038/s41588-018-0142-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Karlson EW, Boutin NT, Hoffnagle AG, Allen NL. Building the Partners Healthcare Biobank at Partners Personalized Medicine: informed consent, return of research results, recruitment lessons and operational considerations. J. Pers. Med. 2016;6:2. doi: 10.3390/jpm6010002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Yu S, et al. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. J. Am. Med. Inform. Assoc. 2015;22:993–1000. doi: 10.1093/jamia/ocv034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Yu S, et al. Enabling phenotypic big data with PheNorm. J. Am. Med. Inform. Assoc. 2018;25:54–60. doi: 10.1093/jamia/ocx111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Gainer VS, et al. The Biobank portal for Partners Personalized Medicine: a query tool for working with consented Biobank samples, genotypes, and phenotypes using i2b2. J. Pers. Med. 2016;6:11. doi: 10.3390/jpm6010011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Blau A, Brown A, Mahanta L, Amr SS. The translational genomics core at Partners Personalized Medicine: facilitating the transition of research towards personalized medicine. J. Pers. Med. 2016;6:10. doi: 10.3390/jpm6010010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Das S, et al. Next-generation genotype imputation service and methods. Nat. Genet. 2016;48:1284–1287. doi: 10.1038/ng.3656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Delaneau O, Zagury J-F, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods. 2013;10:5–6. doi: 10.1038/nmeth.2307. [DOI] [PubMed] [Google Scholar]
- 69.Libiger O, Schork NJ. A method for inferring an individual’s genetic ancestry and degree of admixture associated with six major continental populations. Front. Genet. 2013;3:322. doi: 10.3389/fgene.2012.00322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Miller DT, et al. ACMG SF v3.0 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG) Genet. Med. 2021;23:1381–1390. doi: 10.1038/s41436-021-01172-3. [DOI] [PubMed] [Google Scholar]
- 71.Loh P-R, et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 2016;48:1443–1448. doi: 10.1038/ng.3679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Chang CC, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv10.1101/201178 (2018).
- 75.Van der Auwera, G. A. & O’Connor, B. D. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra 1st edn (O’Reilly Media, 2020).
- 76.Shi L, et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol. 2010;28:827–838. doi: 10.1038/nbt.1665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Richards S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 2015;17:405–424. doi: 10.1038/gim.2015.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Rivera‐Muñoz EA, et al. ClinGen Variant Curation Expert Panel experiences and standardized processes for disease and gene‐level specification of the ACMG/AMP guidelines for sequence variant interpretation. Hum. Mutat. 2018;39:1614–1622. doi: 10.1002/humu.23645. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The majority of the MGBB genotyped samples are deposited in dbGAP as part of the eMERGE consortium, phase 3 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001584.v2.p2). Additional MGBB data were accessed under institutional review board protocol for this current study and are not publicly available due to restrictions on the data. Data from the GenoVA Study trial will be made publicly available after study completion. The 1000 Genomes Project phase 3 dataset used in this study was v5a and was downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/.
The code used to adjust the PRS for population structure is available for download here: https://github.com/MGB-Personalized-Medicine/PRS-adjustment.