Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Nov 1.
Published in final edited form as: Cancer Epidemiol Biomarkers Prev. 2014 Nov;23(11):2322–2327. doi: 10.1158/1055-9965.EPI-14-0559

The Role of Genome Sequencing in Personalized Breast Cancer Prevention

Weiva Sieh 1, Joseph H Rothstein 1, Valerie McGuire 1, Alice S Whittemore 1
PMCID: PMC4221442  NIHMSID: NIHMS621671  PMID: 25342391

Abstract

Background

There is uncertainty about the benefits of using genome-wide sequencing to implement personalized preventive strategies at the population level, with some projections suggesting little benefit. We used data for all currently known breast cancer susceptibility variants to assess the benefits and harms of targeting preventive efforts to a population subgroup at highest genomic risk of breast cancer.

Methods

We used the allele frequencies and effect sizes of 86 known breast cancer variants to estimate the population distribution of breast cancer risks and evaluate the strategy of targeting preventive efforts to those at highest risk. We compared the efficacy of this strategy to that of a “best-case” strategy based on a risk distribution estimated from breast cancer concordance in monozygous twins, and to strategies based on previously estimated risk distributions.

Results

Targeting those in the top 25% of the risk distribution would include approximately half of all future breast cancer cases, compared to 70% captured by the best-case strategy and 35% based on previously known variants. In addition, current evidence suggests that reducing exposure to modifiable nongenetic risk factors will have greatest benefit for those at highest genetic risk.

Conclusions

These estimates suggest that personalized breast cancer preventive strategies based on genome sequencing will bring greater gains in disease prevention than previously projected. Moreover these gains will increase with increased understanding of the genetic etiology of breast cancer.

Impact

These results support the feasibility of using genome-wide sequencing to target the women who would benefit from mammography screening.

Keywords: Breast cancer, screening, prevention, genetic risk score, risk stratification

Introduction

Describing an article in Science Translational Medicine entitled “The Predictive Capacity of Personal Genome Sequencing” by Roberts (1) an editor wrote:

Imagine that everyone at birth could have their whole genome sequenced at negligible cost. Surely, this must be a worthwhile endeavor, given the list of luminaries who have already had this sequencing completed. But how well will such tests perform?

The authors address this question by using the incidence data of twins for a range of diseases to estimate the percent of the population and the percent of diseased cases whose genomes would classify them at high risk, and the risk among those whose genomes put them at low risk relative to the population. They conclude that the genomes of most people would classify them at low risk for most of the diseases, and that the predictive value of such a negative test result would generally be small, because the disease risks among those who test negative would be similar to those in the general population. These conclusions have been criticized as deriving from assumptions that involve no information about the genetic factors relevant to the diseases studied (2). Here we compare the breast cancer findings of Roberts (1) to those obtained using information on the currently known breast cancer susceptibility loci.

To be specific, consider the strategy of sequencing the genomes of all young women, and using their genotypes at breast cancer loci to construct for each a genomic risk score, which specifies the rank of her inherited breast cancer risk relative to that of others in the population. These ranks can then be used to classify women into high- and low-risk categories, with high-risk women targeted for more intensive screening and preventive efforts. At present we don’t fully know a person’s breast cancer risk score, but genome sequencing coupled with current knowledge would allow us to assign her a partial score by combining her genotypes at all known breast cancer loci with the effect sizes of the risk alleles at these loci. Here we shall evaluate and compare the efficacy of this strategy to that of: a) the best-case classification theoretically possible if we knew the full risk scores; and b) the estimates of Roberts (1).

Materials and Methods

Population studied

Since the frequencies and effect sizes of breast cancer susceptibility loci may differ by race/ethnicity, we restrict analyses to women of European ancestry. This study did not require approval from an ethical review board, as it involved only the use of published summary data.

Statistical analysis

We modeled the lifetime probability of developing breast cancer (i.e., the absolute risk) for an individual with genomic risk score s as R = 1 − exp (−ces), where c is a positive constant. This model specifies a monotonically increasing relationship between risk R and risk score s. Therefore the population percentile of an individual’s risk equals that of her risk score, and the efficacy of percentile-based stratification depends on the variances of the risk score distributions in the population and among future breast cancer cases. For the theoretically derived distribution of fully known risk scores, these variances can be estimated using the arguments of Pharoah (3,4) and Begg and Pike (5), as described in the Supplementary Materials and Methods.

To estimate the variance of the partially known risk scores, we modeled them as linear combinations of genotypes at a set of uncorrelated breast cancer loci from the literature, with coefficients given by their estimated effect sizes. Specifically, we listed all loci with validated breast cancer association in women of European ancestry, and then selected a subset of 86 uncorrelated loci. We chose 93 breast cancer susceptibility loci by reviewing the literature for established, replicated associations (615). We then selected a subset of 86 uncorrelated loci by computing all 4,278 pairwise correlation coefficients using the SNP Annotation and Proxy Search tool <http://www.broadinstitute.org/mpg/snap/ldsearch.php> for the CEU population from the 1000 Genomes Project. We found seven pairs of loci with squared correlation coefficient exceeding 0.25 (16), and for each of these we selected the locus with the largest value of βp(1-p), where β is the log relative risk associated with the variant and p is its allele frequency. The seven SNPs we excluded were: rs10069690, rs3215401, rs2943559, rs10759243, rs11199914, rs494406, and rs75915166.

The remaining 86 loci consist of: 1) genes containing rare variants of high and moderate penetrance; and 2) single nucleotide polymorphisms (SNPs) identified in genome-wide association studies of breast cancer. We used the cumulative allele frequency and took the relative risk estimates for rare variants in breast cancer susceptibility genes to be the midpoints of the ranges spanned by the published studies. We also used the averages of the risk allele frequencies and relative risk (odds-ratio) estimates for SNPs that were associated with breast cancer in multiple genome-wide association studies.

Table 1 shows the 86 loci, and the frequencies and effect sizes of their risk alleles (6-15). We modeled the combined effects of these 86 loci by assuming that they act multiplicatively on a woman’s cumulative hazard for breast cancer. As shown in the Supplementary Materials and Methods, this implies that her partially known risk score has the additive form s = β1g1 + … + β86g86, where gk = 0,1,2 denotes her count of risk alleles at locus k, and βk denotes the effect size of the risk allele at locus k as obtained from the literature, k = 1, …, 86. Determining the variance of the resulting partial scores is infeasible, since it would require summing over all 386 = 1041 possible multi-locus genotypes. However it can be approximated by random genotype sampling as described in the Supplementary Materials and Methods.

Table 1.

Risk-allele frequencies and relative risks for breast cancer susceptibility loci among women of European-American ancestry

Locus Chromosome Gene Risk allele frequency (%) Relative risk
1 rs11249433 1p NOTCH2/FCGR1B 39 1.14
2 Multiple variants 1p MUTYH 0.5 1.4–2.2
3 rs616488 1p PEX14 67 1.06
4 rs11552449 1p TPN22/BCL2L15 17 1.07
5 rs4245739 1q MDM4 26 1.14
6 rs4849887 2p None 90 1.10
7 Multiple variants 2p MSH6 0.9 4.9–4.9
8 Multiple variants 2p MSH2 0.01 2.4–2.4
9 rs12710696 2p None 36 1.11
10 rs2016394 2q METAP1D 52 1.05
11 rs1550623 2q CDCA7 84 1.06
12 rs1045485 2q CASP8 0.85 1.03
13 rs13387042 2q IGFBP2, IGBP5, TPN2 52 1.12
14 rs16857609 2q DIRC3 26 1.08
15 rs4973768 3p SLC4A7/NEK10 46 1.11
16 rs12493607 3p TGFBR2 35 1.06
17 rs6762644 3p ITPR1/EGOT 40 1.07
18 rs9790517 4q TET2 23 1.05
19 rs6828523 4q ADAM29 87 1.11
20 rs10941679 5p MRPS30/HCN1 26 1.19
21 rs7734992 5p TERT 43 1.05
22 rs889312 5q MAP3K1/MEIR3 28 1.13
23 rs10472076 5q RAB3C 38 1.05
24 rs1353747 5q PDE4D 90 1.09
25 rs1432679 5q EBF1 43 1.07
26 rs204247 6p RANBP9 43 1.05
27 rs17530068 6q None 22 1.09
28 rs2046210 6q ESR1 34 1.13
29 rs3757318 6q ESR1 7 1.21
30 rs11242675 6q FOXQ1 61 1.06
31 rs720475 7q ARHGEF5/NOBOX 75 1.06
32 Multiple variants 8q NBN 0.9 1.3–3.1
33 rs9693444 8p None 32 1.07
34 rs6472903 8q None 82 1.10
35 rs2943559 8q HNF4G 7 1.13
36 rs13281615 8q MYC 40 1.08
37 rs11780156 8q MIR1208 16 1.07
38 rs1011970 9p CDKN2A/B 17 1.09
39 rs865686 9q KLF4/RAD23B 61 1.12
40 rs10759243 9q None 39 1.06
41 rs2380205 10p ANKRD16 56 1.02
42 rs7072776 10p MLLT10/DNAJC1 29 1.07
43 rs11814448 10p DNAJC1 2 1.26
44 rs10995190 10q ZNF365 85 1.16
45 rs704010 10q ZMIZ1 39 1.07
46 Multiple variants 10q PTEN 0.01 2.0–10.0
47 rs7904519 10q TCF7L2 46 1.06
48 rs11199914 10q None 68 1.05
49 rs2981579 10q FGFR2 38 1.26
50 rs3817198 11p LSP1/H19 30 1.07
51 rs3903072 11q OVOL1 53 1.05
52 rs614367 11q CCND1/FGFs 15 1.15
53 rs494406 11q CCND1 26 1.07
54 Multiple variants 11q ATM 0.3 2.0–3.0
55 rs11820646 11q None 59 1.09
56 rs10771399 12p PTHLH 88 1.19
57 rs12422552 12p None 26 1.05
58 rs17356907 12q NTN4 70 1.10
59 rs1292011 12q TBX3/MAPKAP5 58 1.10
60 Multiple variants 13q BRCA2 0.1 9.0–21.0
61 rs2236007 14q PAX9/SLC25A21 79 1.08
62 rs999737 14q RAD51B 78 1.09
63 rs2588809 14q RAD51L1 16 1.08
64 rs941764 14q CCDC88C 34 1.06
65 rs3803662 16q TOX3/LOC643714 25 1.20
66 rs17817449 16q MIR1972-2-FTO 60 1.08
67 rs11075995 16q FTO 24 1.10
68 Multiple variants 16q CDH1 0.01 2.0–10.0
69 Multiple variants 16p PALB2 0.01 2.0–4.0
70 rs13329835 16q CDYL2 22 1.08
71 Multiple variants 17q BRCA1 0.06 5.0–45.0
72 Multiple variants 17q BRIP2 0.1 2.0–3.0
73 Multiple variants 17p TP53 0.01 2.0–10.0
74 rs6504950 17q STXBP4/COX11 73 1.05
75 Multiple variants 17q RAD51C 0.5 3.2–3.5
76 rs527616 18q None 62 1.05
77 rs1436904 18q CHST9 60 1.04
78 Multiple variants 19p STK11 0.01 2.0–10.0
79 rs8170 19p MERIT40 19 1.25
80 rs4808801 19p SSBP4/ISYNA1/ELL 65 1.06
81 rs3760982 19q KCNN4/ZNF283 46 1.06
82 rs2284378 20q RALY 33 1.16
83 rs2823093 21q NRIP1 73 1.09
84 Multiple variants 22q CHEK2 0.4 2.0–3.0
85 rs132390 22q EMID1/RHBDD3 3.6 1.12
86 rs6001930 22q MKL1 11 1.12

Performance of risk-score-based classification

We estimated how well we can identify future breast cancer cases by classifying women into high-risk (targeted) and low-risk (untargeted) subgroups based on the percentiles of their fully and partially known risk scores (see Supplementary Materials and Methods for details). Specifically, we estimated the sensitivity (Sn), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV), and risk in untargeted women relative to that of the population.

Results

We estimated the population variance of the risk scores based on the 86 currently known breast cancer susceptibility variants to be 0.35. This variance, while lower than the estimate of 1.44 for the variance of the fully known risk scores determined using the arguments of Pharoah (3,4), is nevertheless considerably higher than the value 0.07 obtained for the risk scores based on the seven loci known in 2008 (4).

Figure 1 shows the percentage of breast cancer cases included among women having the highest 100 (1- α)th percent of risk scores, for 0 <α< 1. The curves correspond to the best-case classification with risk score variance equal to 1.44 (solid curve), the currently feasible classification based on partially known risk scores, with variance equal to 0.35 (dashed curve), and the classification based on the seven loci known in 2008 (4) with variance equal to 0.07 (dotted curve). Since the efficacy of risk stratification for prevention depends on the population variance of the risk scores, these results indicate that: a) current genetic knowledge far exceeds that in 2008 (4); and b) despite these gains, considerably better stratification should still be possible in the future, as we better understand the etiology of this disease.

Figure 1. Percent of all breast cancer cases explained by those at highest risk for the disease.

Figure 1

Curves are shown for the best-case scenario when all breast cancer susceptibility alleles are known (solid curve), currently known susceptibility alleles (dashed curve), and seven alleles known in 2008 (4) (dotted curve).

Table 2 shows additional measures of discrimination obtained by classifying women as high- or low-risk on the basis of their risk scores, where the high-risk group is defined as those whose scores exceed the 100 (1- α)th percentile of the centered Gaussian risk score distribution. How do these results compare with the breast cancer predictions of Roberts (1)? The latter were based on a high-risk group defined as those whose risks exceed the 90th–95th percentile of the population distribution. The authors estimated that this classification would target between 10–35% of all future breast cancer cases. In contrast, Table 2 shows that the percentage of cases targeted would be approximately 47% using the best-case classification and 32% using the currently feasible classification. In addition, the authors estimated that the ratio of risk among women classified at low risk relative to the population would be as high as 0.72–0.90, indicating poor specificity. Yet Table 2 suggests that this relative risk is lower: 0.59 with the best-case classification and 0.75 with the currently feasible classification. Thus the present estimates provide more optimistic projections than those obtained using the theoretical model of Roberts (1).

Table 2.

Impacta of targeting a subset of European-American women for breast cancer preventive strategies

Percent of population at high risk (α) Lifetime risk in low-risk (%) Risk for high-risk relative to low-risk Risk for low-risk relative to population Percent of cases who are high-risk (Sn) Percent of non cases who are high-risk (1-Sp) Percent of high-risk who are cases (PPV) Percent of low-risk who are non cases (NPV) NNTb
A. Current knowledge of genetic susceptibility alleles (risk score variance = 0.35)
100 100 100 12.68a 95
90 4.11 3.32 0.32 96.76 89.02 13.63 95.89 88
75 5.20 2.92 0.41 89.74 72.86 15.17 94.80 79
67 5.68 2.84 0.45 85.22 64.35 16.13 94.32 74
50 6.64 2.82 0.52 73.82 46.54 18.72 93.36 64
33 7.66 2.99 0.60 59.54 29.15 22.88 92.34 52
25 8.20 3.18 0.65 51.49 21.15 26.12 91.80 46
10 9.54 4.30 0.75 32.31 6.76 40.97 90.46 29
0 12.68 1.00 0 0 87.32
B. Best-case scenario (risk score variance = 1.44)
100 100 100 12.68a 95
90 0.83 16.85 0.07 99.34 88.64 14.00 99.17 86
75 1.55 10.60 0.12 96.95 71.81 16.39 98.45 73
67 1.94 9.24 0.15 94.94 62.94 17.97 98.06 67
50 2.92 7.68 0.23 88.48 44.41 22.44 97.08 53
33 4.24 7.04 0.33 77.61 26.52 29.82 95.76 40
25 5.07 7.00 0.40 69.99 18.47 35.50 94.93 34
10 7.51 7.88 0.59 46.67 4.68 59.18 92.49 20
0 12.68 1.00 0 0 87.32
a

Assuming that the lifetime risk of developing breast cancer is 12.68% based upon estimates for the U.S white female population (19).

b

NNT = number of women who must be targeted to avoid one breast cancer death, assuming one sixth of all breast cancers are fatal and targeting prevents death from half of them.

Discussion

We have estimated the variance of breast cancer risk scores among women of European ancestry by using a multiplicative model for the joint effects of currently known breast cancer loci. We found that the distribution of these partially known risk scores has variance 0.35, which is similar in magnitude to the estimate of 0.28 obtained independently by Pashayan (17). We have compared the performance of targeted preventive measures based on the currently feasible partially known risk scores to those obtained using the theoretical distribution of fully known risk scores derived by Pharoah (3,4). The results suggest that the predictive power of genome sequencing to determine breast cancer risk is considerably greater than that described by Roberts (1), and the estimates contradict the authors’ statement that “…our conclusions […] represent an absolute upper bound that cannot be improved by improvements in technology or genetic knowledge.”

To achieve the optimal predictive power represented by the “best-case” classification, we will need to identify the combined effects of all causal alleles for breast cancer. Moreover, better understanding of gene-environment interactions could further improve predictive power. For example, the performance measures described here underestimate the potential value of genomic-based risk classification. This is because a child’s lifetime breast cancer risk is determined not only by her genome, but also by her future levels of nongenetic (lifestyle, environmental and epigenetic) risk factors. Epidemiologic data support a multiplicative model for the joint effects of genetic and nongenetic factors on breast cancer risk (18). Under this model, a modifiable nongenetic factor associated with an overall 50% increase in risk would add considerably more to the absolute risk of a female whose lifetime genetic risk is 36% (increasing it to 54%) than to that of a female whose lifetime genomic risk is 4% (increasing it only to 6%). Thus high-risk women have considerably more to gain by appropriate choices of lifestyle factors than do low-risk women.

In conclusion, the data-based estimates presented here suggest that personalized breast cancer preventive strategies informed by genome sequencing may bring greater gains in cost-efficient disease prevention than previously projected. Moreover, these gains will increase as we gain increased understanding of the etiology of breast cancer.

Supplementary Material

1

Acknowledgments

Grant Support: This research is supported by grants K07CA143047 (W. Sieh) and R01CA094069 (A.S. Whittemore, J.H. Rothstein) from the U.S. National Cancer Institute.

Footnotes

Disclosure of Potential Conflict of Interest: The authors declare no competing interests.

References

  • 1.Roberts NJ, Vogelstein JT, Parmigiani G, Kinzler KW, Vogelstein B, Velculescu VE. The predictive capacity of personal genome sequencing. Sci Transl Med. 2012;4:133ra58. doi: 10.1126/scitranslmed.3003380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Topol EJ. Comment on “the predictive capacity of personal genome sequencing”. Sci Transl Med. 2012;4:135le135. doi: 10.1126/scitranslmed.3004126. author reply 135lr133. [DOI] [PubMed] [Google Scholar]
  • 3.Pharoah PD, Antoniou A, Bobrow M, Zimmern RL, Easton DF, Ponder BA. Polygenic susceptibility to breast cancer and implications for prevention. Nat Genet. 2002;31:33–6. doi: 10.1038/ng853. [DOI] [PubMed] [Google Scholar]
  • 4.Pharoah PD, Antoniou AC, Easton DF, Ponder BA. Polygenes, risk prediction, and targeted prevention of breast cancer. N Engl J Med. 2008;358:2796–803. doi: 10.1056/NEJMsa0708739. [DOI] [PubMed] [Google Scholar]
  • 5.Begg CB, Pike MC. Comment on “the predictive capacity of personal genome sequencing”. Sci Transl Med. 2012;4:135le133. doi: 10.1126/scitranslmed.3004162. author reply 135lr133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bogdanova N, Feshchenko S, Schurmann P, Waltes R, Wieland B, Hillemanns P, et al. Nijmegen breakage syndrome mutations and risk of breast cancer. Int J Cancer. 2008;122:802–6. doi: 10.1002/ijc.23168. [DOI] [PubMed] [Google Scholar]
  • 7.Garcia-Closas M, Couch FJ, Lindstrom S, Michailidou K, Schmidt MK, Brook MN, et al. Genome-wide association studies identify four ER negative-specific breast cancer risk loci. Nat Genet. 2013;45:392–8. doi: 10.1038/ng.2561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ghoussaini M, Pharoah PDP, Easton DF. Inherited genetic susceptibility to breast cancer. The Beginning of the end or the end of the beginning? Am J Pathol. 2013;183:1038–51. doi: 10.1016/j.ajpath.2013.07.003. [DOI] [PubMed] [Google Scholar]
  • 9.Mavaddat N, Antoniou AC, Easton DF, Garcia-Closas M. Genetic susceptibility to breast cancer. Mol Oncol. 2010;4:174–91. doi: 10.1016/j.molonc.2010.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Michailidou K, Hall P, Gonzalez-Neira A, Ghoussaini M, Dennis J, Milne RL, et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat Genet. 2013;45:353–61. doi: 10.1038/ng.2563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Peng S, Lu B, Ruan W, Zhu Y, Sheng H, Lai M. Genetic polymorphisms and breast cancer risk: evidence from meta-analyses, pooled analyses, and genome-wide association studies. Breast Cancer Res Treat. 2011;127:309–24. doi: 10.1007/s10549-011-1459-5. [DOI] [PubMed] [Google Scholar]
  • 12.Rennert G, Lejbkowicz F, Cohen I, Pinchev M, Rennert HS, Barnett-Griness O. MUTYH mutation carriers have increased breast cancer risk. Cancer. 2012;118:1989–93. doi: 10.1002/cncr.26506. [DOI] [PubMed] [Google Scholar]
  • 13.Win AK, Lindor NM, Young JP, Macrae FA, Young GP, Williamson E, et al. Risks of primary extracolonic cancers following colorectal cancer in Lynch Syndrome. J Natl Cancer Inst. 2012;104:1363–72. doi: 10.1093/jnci/djs351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Meindl A, Hellebrand H, Wiek C, Erven V, Wappenschmidt B, Niederacher D, et al. Germline mutations in breast and ovarian cancer pedigrees establish RAD51C as a human cancer susceptibility gene. Nat Genet. 2010;42:410–414. doi: 10.1038/ng.569. [DOI] [PubMed] [Google Scholar]
  • 15.Thompson ER, Boyle SE, Johnson J, Ryland GL, Sawyer S, Choong DYH, et al. Analysis of RDA51C germline mutations in high risk breast and ovarian cancer families and ovarian cancer patients. Hum Mutat. 2010;33:95–99. doi: 10.1002/humu.21625. [DOI] [PubMed] [Google Scholar]
  • 16.International Schizophrenia C. Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–52. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pashayan N, Hall A, Chowdhury S, Dent T, Pharoah PD, Burton H. Public health genomics and personalized prevention: lessons from the COGS project. J Intern Med. 2013;274:451–6. doi: 10.1111/joim.12094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Li H, Beeghly-Fadiel A, Wen W, Lu W, Gao YT, Xiang YB, et al. Gene-environment interactions for breast cancer risk among Chinese women: a report from the Shanghai Breast Cancer Genetics Study. Am J Epidemiol. 2013;177:161–70. doi: 10.1093/aje/kws238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Howlader N, Noone A, Krapcho M, Garshell J, Neyman N, Altekruse S, et al. SEER Cancer Statistics Review, 1975–2010. National Cancer Institute; Bethesda, MD: based on November 2012 SEER data submission, posted to the SEER web site, April 2013. Available from: http://seer.cancer.gov/csr/1975_2010/ [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES