Abstract
Introduction:
Breast cancer (BC) is one of the most common cancers globally. Genetic testing can facilitate screening and risk-reducing recommendations, and inform use of targeted treatments. However, genes included in testing panels are from studies of European-ancestry participants. We sequenced Hispanic/Latina (H/L) women to identify BC susceptibility genes.
Methods:
We conducted a pooled BC case-control analysis in H/L women from the San Francisco Bay area, Los Angeles County, and Mexico (4,178 cases and 4,344 controls). Whole exome sequencing was conducted on 1,043 cases and 1,188 controls and a targeted 857-gene panel on the remaining samples. Using ancestry-adjusted SKAT-O analyses, we tested the association of loss of function (LoF) variants with overall, estrogen receptor (ER)-positive, and ER-negative BC risk. We calculated odds ratios (OR) for BC using ancestry-adjusted logistic regression models. We also tested the association of single variants with BC risk.
Results:
We saw a strong association of LoF variants in FANCM with ER-negative BC (p=4.1×10−7, OR [CI]: 6.7 [2.9–15.6]) and a nominal association with overall BC risk. Among known susceptibility genes, BRCA1 (p=2.3×10−10, OR [CI]: 24.9 [6.1–102.5]), BRCA2 (p=8.4×10−10, OR [CI]: 7.0 [3.5–14.0]), and PALB2 (p=1.8×10−8, OR [CI]: 6.5 [3.2–13.1]) were strongly associated with BC. There were nominally significant associations with CHEK2, RAD51D, and TP53.
Conclusion:
In H/L women, LoF variants in FANCM were strongly associated with ER-negative breast cancer risk. It previously was proposed as a possible susceptibility gene for ER-negative BC, but is not routinely tested in clinical practice. Our results demonstrate that FANCM should be added to BC gene panels.
Keywords: breast cancer susceptibility, FANCM, estrogen receptor negative breast cancer, whole exome sequencing
INTRODUCTION
Breast cancer is the most common cancer in women,1 and is influenced by hormonal, environmental, and genetic factors.2 Approximately 11% of screening age women have a first-degree relative diagnosed with breast cancer,3 and these women have ~2-fold higher risk of being diagnosed with breast cancer.4 Germline pathogenic variants found in high-penetrance genes such as BRCA15, BRCA2,6 PALB2,7 TP538,9 and others10 are associated with high risk of breast cancer and underlie familial cancer syndromes. Several intermediate-penetrance genes including CHEK211 and ATM12 also have been identified. In addition, genome-wide association studies (GWAS) have identified many common variants that contribute to breast cancer risk.13 However, all the pathogenic variants in susceptibility genes and the common breast cancer risk variants identified to date explain only half of the heritability of the disease.13
Most of the knowledge of genetic susceptibility to breast cancer is based on studies done in European-ancestry populations, leaving gaps in the understanding of genetic effects in other populations. In particular, among Hispanic/Latina (H/L) women in the United States (US), breast cancer is the most common cancer and leading cause of cancer-related death.14 Latin-American populations are genetically diverse, and many H/L individuals have admixed ancestry, comprised primarily of European, Indigenous American, and African components.15,16 GWAS of breast cancer in H/L women led to the discovery of a protective variant near ESR1, which most commonly occurs in women with higher Indigenous American ancestry.17 In addition, unique founder variants have been identified in BRCA1,18 PALB2,19,20 and CHEK2.19 Since the sample sizes in studies of H/L women are smaller than in studies of White women in the U.S. and European women, the genetic contribution to breast cancer in this population remains poorly understood.
Genetic testing for pathogenic variants in breast cancer susceptibility genes is currently used to identify women at high risk of developing breast cancer, who may benefit from increased screening and risk-reducing interventions, for cascade testing in families to identify other individuals at increased risk, and to inform the use of targeted treatments in those who develop cancer.21–24 However, most evidence supporting association with risk in these genes is from studies of European ancestry participants. Two large recent studies in predominantly European ancestry participants confirm the association of known breast cancer susceptibility genes with increased breast cancer risk in population-based cohorts, reiterating that many of these genes are important to include on clinical genetic testing panels.25,26
To better understand the impact of rare variants in coding sequence of genes on breast cancer risk among H/L women, we performed a whole exome sequencing (WES) and targeted replication approach in over 8,500 H/L women from California and Mexico. We report an exome-wide significant association between FANCM and estrogen-receptor (ER) negative breast cancer, as well as associations between several other known risk genes and breast cancer.
Methods
Study samples:
Our study was a discovery and replication pooled case-control analysis of invasive breast cancer among self-identified H/L women. All cases had been diagnosed with at least one invasive breast cancer and we used age at first breast cancer diagnosis for women diagnosed with more than one breast cancer at different ages. Participant selection in our discovery population has been previously described.19 Briefly, discovery cases were selected for having previously tested negative for BRCA1/2, and were diagnosed at <51 years of age and/or had bilateral (synchronous or metachronous) breast cancer, breast and ovarian cancers, or were diagnosed between 51 and 70 years with a family history of breast cancer in ≥2 first-degree or second-degree relatives diagnosed at age <70 years. Discovery cases were selected from three high-risk registry studies. We included self-identified H/L women with breast cancer from the Clinical Cancer Genomics Community Research Network (CCGCRN),27,28 a network of cancer centers and community-based clinics that provide genetic counseling to individuals with a personal or family history of cancer. We also included self-identified H/L women with breast cancer from the University of California at San Francisco (UCSF) Clinical Genetics and Prevention Program and the University of Southern California (USC) Norris Comprehensive Cancer Center clinical genetics program. Discovery controls were self-identified H/L women enrolled by City of Hope (COH) staff through health fairs and participants in the Multiethnic Cohort (MEC). The MEC is a large prospective cohort study conducted in California (mainly Los Angeles county) and Hawaii.29 Controls from the MEC did not have breast cancer and approximately half had diabetes.
Self-identified H/L participants in the replication set were from six studies (Table 1). The Cancer de Mama (CAMA) study is a population-based case–control study of breast cancer conducted in Mexico City, Monterrey and Veracruz. Cases aged 35–69 years at diagnosis and diagnosed between 2004 and 2007 were recruited from 12 hospitals (3 to 5 hospitals in each region). Controls were recruited based on membership in the same health plan as the cases and were frequency-matched on 5-year age groups.30,31 The California Pacific Medical Center - Breast Health Center (CPMC) cohort32 is composed of women who presented for mammography in San Francisco, California between 2004 and 2011 and we included incident and prevalent breast cancer cases. The PATHWAYS study is a cohort of breast cancer cases diagnosed at Kaiser Permanente Northern California.33 We used samples from participants who were enrolled in PATHWAYS and reported H/L ethnicity. From the nested case-control study within the MEC, we included cases with invasive breast cancer diagnosed at the age of >50 years and controls matched on age and self-identified ethnicity.29 The Northern California Breast Cancer Family Registry (NC-BCFR)34 recruited and followed about 4,000 breast cancer families and individuals with breast cancer, including cases with indicators of increased genetic susceptibility (including diagnosis before age 35, personal history of ovarian cancer, personal history of first breast cancer in contralateral breast before age 50, family history of breast cancer in first degree relative, family history of ovarian cancer in first degree relative, or family history of childhood cancer cancer in first degree relative) and cases without such indicators. We included both subsets of H/L cases from the NC-BCFR. Cases aged 18–64 years diagnosed from 1995 to 2009 were ascertained through the Greater Bay Area Cancer Registry. Population controls were identified through random-digit dialing and frequency matched on race/ethnicity and 5-year age groups to cases diagnosed from 1995 to 1998. The San Francisco Bay Area Breast Cancer Study (SFBCS)35 is a population-based multiethnic case–control study of breast cancer, where cases aged 35–79 years at diagnosis with invasive breast cancer from 1995 to 2002 were identified through the Greater Bay Area Cancer Registry and controls were identified by random-digit dialing and matched to cases on race/ethnicity and 5-year age groups. All participants were consented and enrolled into the study through center-specific institutional review board-approved protocols.
Table 1:
Cases n = 4,264 | Controls n = 4,350 | |
---|---|---|
Discovery samples, Whole-exome sequencing (Clinical Research Exome, Agilent) | n = 1,043 | n = 1,188 |
CCGCRN | 885 (84.9%) | N/A |
COH | N/A | 313 (26.3%) |
MEC | N/A | 875 (73.7%) |
UCSF | 52 (5%) | N/A |
USC | 106 (10.2%) | N/A |
Replication, targeted sequencing | n = 3,221 | n = 3,162 |
CAMA | 1,123 (34.9%) | 1,122 (35.5%) |
CPMCRI Cohort | 12 (0.4%) | 847 (26.8%) |
PATHWAYS | 412 (12.8%) | N/A |
MEC | 816 (25.3%) | 865 (27.4%) |
NC-BCFR | 696 (21.6%) | 54 (1.7%) |
SFBCS | 162 (5%) | 274 (8.7%) |
Unselected for Hereditary Risk* (%) | 2,993 (70.2%) | 3,162 (72.7%) |
Age (mean (SD)) | 52.1 (12.5) | 55.9 (11.5) |
African ancestry (mean (SD)) | 0.05 (0.07) | 0.05 (0.05) |
Indigenous American ancestry (mean (SD)) | 0.40 (0.23) | 0.43 (0.22) |
European ancestry (mean (SD)) | 0.54 (0.22) | 0.52 (0.22) |
Family history of breast cancer in a first degree relative (%) | ||
Yes | 891 (20.9%) | 324 (7.4%) |
No | 3,034 (71.2%) | 3,162 (72.7%) |
Missing | 339 (8%) | 864 (19.9%) |
Estrogen receptor status** (%) | ||
Positive | 2,094 (49.1%) | N/A |
Negative | 713 (16.7%) | N/A |
Missing | 1,457 (34.2%) | N/A |
Progesterone receptor status** (%) | ||
Positive | 1,594 (37.4%) | N/A |
Negative | 940 (22%) | N/A |
Missing | 1,730 (40.6%) | N/A |
Human epidermal growth factor receptor 2 status (%) | ||
Positive | 349 (8.2%) | N/A |
Negative | 1,285 (30.1%) | N/A |
Missing | 2,630 (61.7%) | N/A |
Selection was not based on hereditary risk in Kaiser, MEC, CPMC Cohort, the San Francisco Bay Area Cancer Study, CAMA, and for some participants in the Northern California Breast Cancer Family Registry.
Estrogen receptor and progesterone receptor status was missing in >80% of CAMA cases and in approximately 20% of other cases.
CAMA=Cancer de Mama; CCGCRN=Clinical Cancer Genomics Community Research Network; COH=City of Hope; CPMCRI-Cohort=California Pacific Medical Center - Breast Health Center; MEC=Multi-Ethnic Cohort; PR=Progesterone receptor; UCSF=University of California San Francisco; USC=University of Southern California.
Sequencing and genotyping
Whole exome sequencing (WES) from DNA of discovery participants has been described previously.19 The SureSelect Clinical Research Exome (Agilent, Santa Clara, CA) kit was used to capture exons of all known human transcripts. For participants in the replication set, targeted sequencing was conducted on 857 genes, which were selected based on the discovery data and biological plausibility. A custom SureSelect XT kit (Agilent, Santa Clara, CA) was used to capture the exons of 857 genes and also included 189 known breast cancer SNPs and 100 ancestry informative markers (Supplemental Table X). Briefly, for both the WES and the targeted sequencing, we used KAPA Hyper Preparation Kits (Kapa Biosystems, Inc., Wilmington, MA) to generate libraries from 500 ng DNA. One hundred base-pair paired end sequencing on the HiSEQ 2500 Genetic Analyzer (Illumina Inc., San Diego, CA) was performed in the COH Integrative Genomics Core (IGC) to an average fold coverage of ×65 for the WES samples and ~x75 for the targeted-sequencing samples. Paired-end reads from each sample were aligned to human reference genome (hg37) using the Burrows-Wheeler Alignment Tool (BWA, version 0.7.5a-r405) under default settings, and the aligned binary format sequence (BAM) files were sorted and indexed using SAMtools.36,37 The same FASTA reference file had been used for aligning the MEC control samples. The sorted and indexed BAMs were processed by Picard MarkDuplicates (version 1.67, http://broadinstitute.github.io/picard/) to remove duplicate sequencing reads.
Variant calling from the BAM files from the IGC and the Broad Sequencing Center were processed together at UCSF. After local realignment of reads around in-frame insertions and deletions (indels) and base quality score recalibration by The Genome Analysis Toolkit (GATK, v3.6–0-g89b7209), GATK HaplotypeCaller was used to call variants (https://software.broadinstitute.org/gatk). Variants with a call quality <20, a read depth <10, a less frequent allele depth of <4, or an allele fraction ratio <30% were filtered out for low quality. We also excluded participants with <20-fold average coverage. DNA from eight MEC participants were sequenced at both COH and the Broad Sequencing Center with >99.8% concordance for variant calling.
After sequencing, we excluded discovery cases with previously undetected BRCA1/2 pathogenic variants. We used PLINK 1.9 (http://www.cog-genomics.org/plink/1.9/)38 to exclude first-degree relatives within the discovery and replication separately and genetic duplicates across the discovery and replication. Due to differences in participant inclusion criteria among studies in our discovery sample, we conducted a control-control analysis and excluded variants that were different between our two control populations at an alpha threshold of 0.05. We also conducted a sensitivity analysis among participants of the CCGCRN/COH only, and excluded variants where the sensitivity analysis point estimate was outside of the 95% confidence interval of the main analysis.
Ancestry estimation
Ancestry estimation for our discovery population has been described previously.19 The Clinical Research Exome included a custom panel of 180 ancestry-informative single nucleotide polymorphisms. On the basis of a previous publication,39 these markers were selected to be informative for ancestry in mixed-European, Indigenous-American, and African populations. In addition, we selected 7,691 variants common to our WES data and a data set of Axiom arrays, including African (N = 90), European (N= 90), and Indigenous American (N = 51) populations. We selected unlinked markers by linkage disequilibrium pruning in PLINK, identifying a subset of 4,544 variants for ancestry estimation. We estimated genetic ancestry using ADMIXTURE 1.340 and performed analyses with both supervised (specifying the ancestral populations) and unsupervised (including the data from ancestral populations, but not specifying the identity of ancestral populations) runs. To determine genetic ancestry in the MEC control participants, we used the same ancestral reference samples and the same approach using ADMIXTURE. However, since the exome sequencing data did not include ancestry informative markers, we selected a subset of independent variants (n = 12,758) that overlapped between the Axiom arrays and the MEC dataset.
In our replication sample, we performed genetic ancestry estimation for each individual. We used 90 European (1000 Genomes), 90 African (1000 Genomes), 90 east Asian (1000 Genomes) and 71 Indigenous American ancestry41 reference samples. We combined our study data and the reference data using 1,195 SNPs that overlapped across all datasets. We then used ADMIXTURE 1.3 to estimate the ancestry for each individual using the supervised method.
Statistical analysis
Gene-based aggregate rare variant analyses were based on loss of function (LoF) variants, including frameshift, stopgain, and splice variants. Variants with minor allele frequency > 0.0025 and benign clinical significance in CLINVAR42 were excluded from the gene-based analyses. Statistical significance for each gene was determined using SKAT-O.43,44 Odds ratios (OR) and 95% confidence intervals (CI) for breast cancer associated with any LoF variant for each gene were calculated using logistic regression models in which women with at least one LoF variant in a gene were encoded as “1” and all other women were encoded as “0”. These models included ancestry (European and Indigenous American) as covariates. These models constituted our burden tests. For additional quality control, results from SKAT-O and burden tests were included if the gene had > 5 variants or at least one alternate allele in both cases and controls. We used the replication sample for all BRCA1/2 analyses, as BRCA1/2 status was a selection criterion for the discovery sample. Additionally, we looked for the known Mexican founder BRCA1 exon 9–12 deletion, and verified its presence using manual review with the Integrative Genomics Viewer (IVG), then included this mutation in the exome data. Single variant analyses and aggregate rare variant analyses were conducted in discovery and replication analyses separately, as well as in a joint discovery and replication analysis. Additional breast cancer subtype-specific analyses were conducted, where cases were separately restricted to ER-positive and ER-negative disease. Sensitivity analyses were run separately in participants from hereditary breast cancer studies and studies that did not select cases based on breast cancer risk factors. All analyses were adjusted for European and Indigenous American ancestry.
Results
The mean age at diagnosis of breast cancer cases was 42.6 years (standard deviation [SD]: 8.5) and the mean age at enrollment of controls was 60.3 years (SD: 10.9) in the discovery set (P=2.4×10−200) (Table 1). In the replication set, the mean age at diagnosis of breast cancer was 55.3 years (SD: 12.0) and the mean age of controls at enrollment was 54.9 years (SD: 11.4, P=0.27). European ancestry (EA) proportion was slightly higher among cases (P=6.0×10−6) and Indigenous American ancestry (IA) was slightly higher among controls (P=4.4×10−7). Ancestry proportions for cases and controls are shown in Supplementary Figure 1. History of a first-degree relative with breast cancer was higher among cases than controls (P=2.5×10−108). Most of the cases were ER-positive and progesterone receptor (PR)-positive and 21.3% of those who were tested were human epidermal growth factor receptor 2 (HER2)-positive.
We conducted joint analyses of the discovery and replication datasets. In the analysis that tested for overall breast cancer risk, we found significant associations with BRCA1, BRCA2 and PALB2, with ORs (95% CI) of 24.9 (6.1–102.5), 7.0 (3.5–14.0), and 6.5 (3.2–13.1), respectively. In the analyses for ER-negative and ER-positive breast cancers, we found significant associations with ER-negative breast cancer for LoF variants in FANCM, BRCA1, and BRCA2 with ORs of 6.7 (2.9–15.6), 40.7 (8.9–186.5), and 10.5 (4.5–24.7), respectively (Table 2 and Figure 1). No exome-wide significant associations were found with ER-positive breast cancer. Three other known breast cancer genes had suggestive associations with breast cancer in our study sample: CHEK2 with overall and ER-positive breast cancer, RAD51D with ER-negative breast cancer, and TP53 with ER-negative breast cancer (Supplementary Table 1 and Figure 1). We found no significant associations of ATM, BARD1, CDH1, RAD51C, or RECQL and breast cancer risk. We ran a sensitivity analysis for ATM using truncating variants only, due to the unexpectedly null results and large number of splice variants found in this gene, and also found no association with breast cancer (OR=1.2 [0.8–1.9]). All genes with suggestive (P<0.01) associations for overall, ER-positive, and ER-negative breast cancer are shown in Supplementary Tables 2 and 3.
Table 2:
Gene | Chr | Overall | ER-Positive | ER-Negative |
---|---|---|---|---|
BRCA1 * | 17 | 2.3×10−10 | 0.03 | 4.4×10−16 |
BRCA2 * | 13 | 8.4×10−10 | 3.3×10−4 | 8.0×10−15 |
FANCM | 14 | 9.8×10−3 | 0.11 | 4.1×10−7 |
PALB2 | 16 | 1.8×10−8 | 1.3×10−5 | 5.9×10−5 |
Chr=Chromosome; ER=estrogen receptor.
P-values are from gene-based SKAT-O analyses. Bold P-values indicate Bonferroni corrected statistical significance at an alpha threshold of 0.05/20,000=2.5×10−6.
Discovery participants were selected for being BRCA1/2 negative (see methods), replication results are presented for BRCA1/2.
The association between FANCM LoF variants and ER- breast cancer was largely driven by two stopgain variants, rs147021911 (chr14:45658326C>T) and rs144567652 (chr14:45667921C>A / C>T) (Figure 2). A total of 1.7% of participants with ER-negative disease had a LoF variant in FANCM, compared to only 0.6% of cases with ER-positive breast cancer and 0.2% of controls (Figure 2). More than half of participants had a missense variant in FANCM and proportions were similar among participant groups (data not shown).
Since ascertainment can affect effect sizes and our discovery dataset and some of our replication datasets were selected based on several criteria, we analyzed the top genes among participants in the replication set who were from studies of cases unselected for hereditary risk (Supplementary Figure 2). The effect sizes for CHEK2 and PALB2 were similar in all studies and in unselected studies. The OR for CHEK2 and ER-positive breast cancer in unselected studies was 10.7, which was similar to the OR of 10.8 found in all studies. We found that the association between FANCM and ER-negative breast cancer was similar in studies of unselected cases (OR=5.6) and in all studies (OR=6.7).
Since many of the known genes for breast cancer are involved in repair of double-stranded breaks, we reviewed all of the suggestive associations in the combined discovery and validation analyses and identified genes which are members of this pathway using a previously curated list.45 In analyses that included both LoF variants and missense variants with high likelihood of being deleterious, we found suggestive evidence for ATR and FANCG (Supplementary Tables 4 and 5). Of these, ATR was more strongly associated with ER-positive disease, while FANCG was more strongly associated with ER-negative breast cancer (Supplementary Table 4 and 5).
Discussion
In this case-control study of breast cancer in over 8,500 H/L women, we found strong evidence of association between breast cancer and LoF variants in FANCM largely driven by two FANCM stop gain variants. In addition we saw strong associations with the known breast cancer genes BRCA1, BRCA2, and PALB2.
The two most common FANCM variants we observed are more common in European populations than other world populations. These variants were both first identified in European ancestry BRCA1/2-negative familial breast cancer studies,46,47 and both have previously been associated with ER-negative or triple negative breast cancer.46,48,49 Additionally, rs147021911 (chr14:45658326C>T, c.5101 C >T, p.Gln1701*) was associated with poor breast cancer survival in a Finnish population,50 and other FANCM LoF mutations were found in small breast cancer case series,51–55 including a study that showed bi-allelic FANCM variants in early onset or bilateral breast cancer cases.56 To our knowledge, our study is the first to find an exome-wide significant association between FANCM and ER-negative breast cancer in any population. Since we used an exome-wide association approach for discovery, the effect size we detected may be inflated by the winner’s curse.57 However, even if the effect size in our study is inflated due to these factors, the higher impact on ER-negative disease is consistent with prior studies and suggests that testing for FANCM LoF variants can help identify women at increased risk for developing ER-negative breast cancer.
Similar to most other breast cancer susceptibility genes, FANCM is involved in double-stranded DNA repair. In particular, FANCM localizes to stalled replication forks and initiates the response to double-stranded breaks by homologous recombination repair.58 FANCM is a member of the Fanconi Anemia (FA) complex and is the most conserved gene within the FA pathway,59 although it has recently been excluded as a gene predisposing for FA.56 The FA pathway is essential for handling DNA interstrand cross linking damage in DNA repair through the homologous recombination repair pathway.60 A stronger association between FANCM and ER-negative disease than ER-positive disease is similar to results for RAD51C and RAD51D,25 which are also key components of the FA pathway.
FANCM previously has been proposed as a susceptibility gene for development of ER-negative breast cancer although without genome-wide significance,25 but is not routinely tested in clinical practice.46 Given our genome-wide significant result and the previous associations, the data now support inclusion of FANCM on clinical testing panels. Women with breast cancer and FANCM LoF variants may benefit from poly adenosine diphosphate-ribose polymerase (PARP) inhibitors;49,61,62 therefore, testing for FANCM LoF variants may identify additional therapeutic options for women with ER-negative breast cancer.
We found that carriers of LoF variants in CHEK2 had a particularly high risk of ER-positive disease, though this did not reach exome-wide significance. The association with ER-positive disease among LoF variant carriers of CHEK2 is consistent with prior studies;25,26,63 however, the effect sizes we saw in our study sample were substantially higher than those in previous reports.25 In analyses that included only cases that were not selected for familial breast cancer, the confidence intervals were wide but the lower bound of the confidence interval (2.1) included the OR found in the Breast Cancer Association Consortium study,25 indicating that our larger odds ratio may be due to chance. The higher OR we observed may also be due to the younger ages of the Latino population and/or to heterogeneity of genetic or hormonal and environmental risk factors across populations. Latinas tend to have more protective non-genetic risk factors such as younger age at first pregnancy, higher parity,64 lower postmenopausal hormone use, and lower alcohol consumption,65 when compared to US non-Hispanic White populations66 which may lead to stronger associations with genetic risk factors.67 In contrast to the strong association we observed with CHEK2, we found no evidence for an association with LoF variants in ATM in our study. The upper bound of the confidence interval for ER-positive disease excludes the reported odds ratio in the Breast Cancer Association Consortium study25 but not the odds ratio reported in the CARRIERS study.26 We have previously noted no significant association with ATM LoF variants in Latinas19 in a dataset that partially overlaps our current report. These results may be due to genetic or hormonal and environmental factors that attenuate the associations with ATM LoF variants in this population.
Our study, conducted among H/L women, identified strong associations with breast cancer risk genes, including the first exome-wide significant association between FANCM and ER-negative disease, with a notably smaller sample size than either of the recent rare exome variant association studies conducted in Europeans,25,26 thus demonstrating the importance of conducting genetic studies in admixed populations. Most large previous complex trait genetics studies, including those on breast cancer, have been conducted in European-ancestry participants.68,69 The lack of genetic studies in admixed populations exacerbates health disparities as genomics is increasingly used in clinical practice.
Our study has several strengths. To our knowledge, it is the largest exome sequencing analysis of breast cancer cases and controls in H/L women to date. We included participants from both hereditary-risk studies and unselected cases and controls, and compared results across these two study types, making our findings applicable to both groups. Our study also has several limitations. The WES for discovery was performed in only 1,043 cases and 1,188 controls and was likely underpowered to find intermediate-penetrance genes. To compensate for this, we selected 857 genes in the replication phase and sequenced these in 3,221 cases and 3,162 controls to enhance the likelihood of detecting associations. Larger studies in H/L populations are needed to confirm our results, to identify new candidate genes for breast cancer and to detect and understand factors contributing to heterogeneity of effect sizes compared to studies in US non-Hispanic White and European populations. Our discovery phase included sequencing data previously collected from a subset of controls in the Multi-Ethnic Cohort using a slightly different WES target capture. This difference potentially could lead to spurious associations due to technical or demographic differences between the different capture kits in the analysis of the discovery set. However, we accounted for potential demographic differences by adjusting for ancestry and excluding variants that were significantly different in an analysis of our two discovery control populations.
In conclusion, our study demonstrates an exome-wide significant association between LoF variants in FANCM and ER-negative breast cancer in H/L women from multiple studies. We also found exome-wide significant associations for BRCA1, BRCA2, and PALB2. Our findings suggest that FANCM should be added to genetic testing panels for breast cancer, which is especially important for H/L women. Additionally, our findings demonstrate the importance of conducting genetic studies in admixed populations.
Supplementary Material
Acknowledgements
We thank the participants of the SFBCS/NC-BCFR, Pathways, MEC, CAMA, and COH-CCGCRN studies.
This work was funded by the National Cancer Institute (R01CA184585, K24CA169004) and the State of California Initiative to Advance Precision Medicine (OPR18111). Research reported in this publication included work performed in the City of Hope Integrative Genomics Core supported by the National Cancer Institute of the National Institutes of Health under grant number P30CA033572. The content and views are solely the responsibility of the authors and should not be construed to represent the views of the National Institutes of Health. SLN and this research were partially funded by the Morris and Horowitz Families Professorship. The Multiethnic Cohort was supported by NIH grants R01CA164973, R01CA054281, and CA063464. Sequencing of the MEC controls was conducted as part of the Slim Initiative for Genomic Medicine, a project funded by the Carlos Slim Health Institute in Mexico. Seventy-six of the controls were from the California Teachers Study; the collection and data were funded through the National Institutes of Health (R01CA77398). The Northern California Breast Cancer Family Registry was supported by grant UM1 CA164920 from the National Cancer Institute. The SFBCS was funded by grants R01CA063446 and R01CA077305 from the National Cancer Institute, grant DAMD17–96–1–6071 from the US Department of Defense, and grant 7PB-0068 from the California Breast Cancer Research Program. The Kaiser Permanente Research Program on Genes, Environment and Health was supported by Kaiser Permanente national and regional Community Benefit programs, and grants from the Ellison Medical Foundation, the Wayne and Gladys Valley Foundation, and the Robert Wood Johnson Foundation. The PATHWAYS cohort was supported by grants from the National Cancer Institute (U01 CA195565 R01 CA105274) The CAMA Study was funded by Consejo Nacional de Ciencia y Tecnologıa, Mexico (SALUD-2002-C01–7462) and recruitment at the Puebla site was supported by R01CA120120.
LF is supported by R01CA204797. JNW was supported by NIH RC4 CA153828; Breast Cancer Research Foundation (#20–172), and American Society of Clinical Oncology Conquer Cancer® Research Professorship in Breast Cancer Disparities. YS was supported by the National Center for Advancing Translational Sciences under award KL2TR001870 and the National Cancer Institute under award K08CA237829.
Footnotes
Conflicts of Interest
None.
Sources
- 1.Harbeck N. & Gnant M. Breast cancer. The Lancet 389, 1134–1150 (2017). [DOI] [PubMed] [Google Scholar]
- 2.Garcia-Closas M., Gunsoy N. B. & Chatterjee N. Combined Associations of Genetic and Environmental Risk Factors: Implications for Prevention of Breast Cancer. JNCI J. Natl. Cancer Inst. 106, dju305–dju305 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Durham D. D. et al. Breast cancer incidence among women with a family history of breast cancer by relative’s age at diagnosis. Cancer (2022) doi: 10.1002/cncr.34365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Pharoah P. D., Day N. E., Duffy S., Easton D. F. & Ponder B. A. Family history and the risk of breast cancer: a systematic review and meta-analysis. Int. J. Cancer 71, 800–809 (1997). [DOI] [PubMed] [Google Scholar]
- 5.Miki Y. et al. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science 266, 66–71 (1994). [DOI] [PubMed] [Google Scholar]
- 6.Wooster R. et al. Identification of the breast cancer susceptibility gene BRCA2. Nature 378, 789–792 (1995). [DOI] [PubMed] [Google Scholar]
- 7.Rahman N. et al. PALB2, which encodes a BRCA2-interacting protein, is a breast cancer susceptibility gene. Nat. Genet. 39, 165–167 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Malkin D. et al. Germ line p53 mutations in a familial syndrome of breast cancer, sarcomas, and other neoplasms. Science 250, 1233–1238 (1990). [DOI] [PubMed] [Google Scholar]
- 9.Srivastava S., Zou Z. Q., Pirollo K., Blattner W. & Chang E. H. Germ-line transmission of a mutated p53 gene in a cancer-prone family with Li-Fraumeni syndrome. Nature 348, 747–749 (1990). [DOI] [PubMed] [Google Scholar]
- 10.Meijers-Heijboer H. et al. Low-penetrance susceptibility to breast cancer due to CHEK2(*)1100delC in noncarriers of BRCA1 or BRCA2 mutations. Nat. Genet. 31, 55–59 (2002). [DOI] [PubMed] [Google Scholar]
- 11.Thompson D. et al. Cancer risks and mortality in heterozygous ATM mutation carriers. J. Natl. Cancer Inst. 97, 813–822 (2005). [DOI] [PubMed] [Google Scholar]
- 12.Skol A. D., Sasaki M. M. & Onel K. The genetics of breast cancer risk in the post-genome era: Thoughts on study design to move past BRCA and towards clinical relevance. Breast Cancer Res. 18, 99 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Miller K. D. et al. Cancer Statistics for Hispanics/Latinos, 2018. CA. Cancer J. Clin. 68, 425–445 (2018). [DOI] [PubMed] [Google Scholar]
- 14.Bryc K. et al. Colloquium paper: genome-wide patterns of population structure and admixture among Hispanic/Latino populations. Proc. Natl. Acad. Sci. U. S. A. 107 Suppl 2, 8954–8961 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.González Burchard E. et al. Latino populations: a unique opportunity for the study of race, genetics, and social environment in epidemiological research. Am. J. Public Health 95, 2161–2168 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Fejerman L. et al. Genome-wide association study of breast cancer in Latinas identifies novel protective variants on 6q25. Nat. Commun. 5, 1–8 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Herzog J. S. et al. Genetic epidemiology of BRCA1- and BRCA2-associated cancer across Latin America. NPJ Breast Cancer 7, 107 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Weitzel J. N. et al. Pathogenic and likely pathogenic variants in PALB2, CHEK2, and other known breast cancer susceptibility genes among 1054 BRCA-negative Hispanics with breast cancer. Cancer 125, 2829–2836 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Daly M. B. et al. Genetic/Familial High-Risk Assessment: Breast, Ovarian, and Pancreatic, Version 2.2021, NCCN Clinical Practice Guidelines in Oncology. J. Natl. Compr. Cancer Netw. JNCCN 19, 77–102 (2021). [DOI] [PubMed] [Google Scholar]
- 20.US Preventive Services Task Force et al. Risk Assessment, Genetic Counseling, and Genetic Testing for BRCA-Related Cancer: US Preventive Services Task Force Recommendation Statement. JAMA 322, 652–665 (2019). [DOI] [PubMed] [Google Scholar]
- 21.Easton D. F. et al. Gene-panel sequencing and the prediction of breast-cancer risk. N. Engl. J. Med. 372, 2243–2257 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.King M.-C., Levy-Lahad E. & Lahad A. Population-based screening for BRCA1 and BRCA2: 2014 Lasker Award. JAMA 312, 1091–1092 (2014). [DOI] [PubMed] [Google Scholar]
- 23.Dorling L. et al. Breast Cancer Risk Genes — Association Analysis in More than 113,000 Women. N. Engl. J. Med. 384, 428–439 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hu C. et al. A Population-Based Study of Genes Previously Implicated in Breast Cancer. N. Engl. J. Med. 384, 440–451 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Weitzel J. N. et al. Prevalence and Type of BRCA Mutations in Hispanics Undergoing Genetic Cancer Risk Assessment in the Southwestern United States: A Report From the Clinical Cancer Genetics Community Research Network. J. Clin. Oncol. 31, 210–216 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.MacDonald D. J., Blazer K. R. & Weitzel J. N. Extending comprehensive cancer center expertise in clinical cancer genetics and genomics to diverse communities: the power of partnership. J. Natl. Compr. Cancer Netw. JNCCN 8, 615–624 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kolonel L. N. et al. A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. Am. J. Epidemiol. 151, 346–357 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Angeles-Llerenas A. et al. Moderate physical activity and breast cancer risk: the effect of menopausal status. Cancer Causes Control CCC 21, 577–586 (2010). [DOI] [PubMed] [Google Scholar]
- 29.Beasley J. M. et al. Alcohol and risk of breast cancer in Mexican women. Cancer Causes Control CCC 21, 863–870 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Shieh Y. et al. Breast cancer risk prediction using a clinical risk model and polygenic risk score. Breast Cancer Res. Treat. 159, 513–525 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.John E. M., Sangaramoorthy M., Koo J., Whittemore A. S. & West D. W. Enrollment and biospecimen collection in a multiethnic family cohort: the Northern California site of the Breast Cancer Family Registry. Cancer Causes Control CCC 30, 395–408 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.John E. M., Phipps A. I., Davis A. & Koo J. Migration history, acculturation, and breast cancer risk in Hispanic women. Cancer Epidemiol. Biomark. Prev. Publ. Am. Assoc. Cancer Res. Cosponsored Am. Soc. Prev. Oncol. 14, 2905–2913 (2005). [DOI] [PubMed] [Google Scholar]
- 33.Li H. & Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma. Oxf. Engl. 25, 1754–1760 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Li H. et al. The Sequence Alignment/Map format and SAMtools. Bioinforma. Oxf. Engl. 25, 2078–2079 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Purcell S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559–575 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Galanter J. M. et al. Development of a panel of genome-wide ancestry informative markers to study admixture throughout the Americas. PLoS Genet. 8, e1002554 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Alexander D. H., Novembre J. & Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Spear M. L. et al. A genome-wide association and admixture mapping study of bronchodilator drug response in African Americans with asthma. Pharmacogenomics J. 19, 249–259 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Landrum M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lee S., Wu M. C. & Lin X. Optimal tests for rare variant effects in sequencing association studies. Biostat. Oxf. Engl. 13, 762–775 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wu M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Knijnenburg T. A. et al. Genomic and Molecular Landscape of DNA Damage Repair Deficiency across The Cancer Genome Atlas. Cell Rep. 23, 239–254.e6 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kiiski J. I. et al. Exome sequencing identifies FANCM as a susceptibility gene for triple-negative breast cancer. Proc. Natl. Acad. Sci. U. S. A. 111, 15172–15177 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Peterlongo P. et al. FANCM c.5791C>T nonsense mutation (rs144567652) induces exon skipping, affects DNA repair activity and is a familial breast cancer risk factor. Hum. Mol. Genet. 24, 5345–5355 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Kiiski J. I. et al. FANCM mutation c.5791C>T is a risk factor for triple-negative breast cancer in the Finnish population. Breast Cancer Res. Treat. 166, 217–226 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Figlioli G. et al. The FANCM:p.Arg658* truncating variant is associated with risk of triple-negative breast cancer. NPJ Breast Cancer 5, 38 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Kiiski J. I. et al. FANCM c.5101C>T mutation associates with breast cancer survival and treatment outcome. Int. J. Cancer 139, 2760–2770 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Rashid M. U., Muhammad N., Khan F. A. & Hamann U. Absence of the FANCM c.5101C>T mutation in BRCA1/2-negative triple-negative breast cancer patients from Pakistan. Breast Cancer Res. Treat. 152, 229–230 (2015). [DOI] [PubMed] [Google Scholar]
- 49.Nguyen-Dumont T. et al. FANCM and RECQL genetic variants and breast cancer susceptibility: relevance to South Poland and West Ukraine. BMC Med. Genet. 19, 12 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Neidhardt G. et al. Association Between Loss-of-Function Mutations Within the FANCM Gene and Early-Onset Familial Breast Cancer. JAMA Oncol. 3, 1245–1248 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Schubert S. et al. The identification of pathogenic variants in BRCA1/2 negative, high risk, hereditary breast and/or ovarian cancer patients: High frequency of FANCM pathogenic variants. Int. J. Cancer 144, 2683–2694 (2019). [DOI] [PubMed] [Google Scholar]
- 52.Silvestri V. et al. A possible role of FANCM mutations in male breast cancer susceptibility: Results from a multicenter study in Italy. Breast Edinb. Scotl. 38, 92–97 (2018). [DOI] [PubMed] [Google Scholar]
- 53.Catucci I. et al. Individuals with FANCM biallelic mutations do not develop Fanconi anemia, but show risk for breast cancer, chemotherapy toxicity and may display chromosome fragility. Genet. Med. Off. J. Am. Coll. Med. Genet. 20, 452–457 (2018). [DOI] [PubMed] [Google Scholar]
- 54.Lohmueller K. E., Pearce C. L., Pike M., Lander E. S. & Hirschhorn J. N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat. Genet. 33, 177–182 (2003). [DOI] [PubMed] [Google Scholar]
- 55.Panday A. et al. FANCM regulates repair pathway choice at stalled replication forks. Mol. Cell 81, 2428–2444.e6 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Blackford A. N. et al. The DNA translocase activity of FANCM protects stalled replication forks. Hum. Mol. Genet. 21, 2005–2016 (2012). [DOI] [PubMed] [Google Scholar]
- 57.Michl J., Zimmer J. & Tarsounas M. Interplay between Fanconi anemia and homologous recombination pathways in genome integrity. EMBO J. 35, 909–923 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Stoepker C. et al. DNA helicases FANCM and DDX11 are determinants of PARP inhibitor sensitivity. DNA Repair 26, 54–64 (2015). [DOI] [PubMed] [Google Scholar]
- 59.Cybulski C. et al. Estrogen receptor status in CHEK2-positive breast cancers: implications for chemoprevention. Clin. Genet. 75, 72–78 (2009). [DOI] [PubMed] [Google Scholar]
- 60.Gilliland F. D. et al. Reproductive risk factors for breast cancer in Hispanic and non-Hispanic white women: the New Mexico Women’s Health Study. Am. J. Epidemiol. 148, 683–692 (1998). [DOI] [PubMed] [Google Scholar]
- 61.Pérez-Stable E. J., Marín G. & Marín B. V. Behavioral risk factors: a comparison of Latinos and non-Latino whites in San Francisco. Am. J. Public Health 84, 971–976 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Pike M. C. et al. Breast cancer in a multiethnic cohort in Hawaii and Los Angeles: risk factor-adjusted incidence in Japanese equals and in Hawaiians exceeds that in whites. Cancer Epidemiol. Biomark. Prev. Publ. Am. Assoc. Cancer Res. Cosponsored Am. Soc. Prev. Oncol. 11, 795–800 (2002). [PubMed] [Google Scholar]
- 63.Mavaddat N. et al. Prediction of breast cancer risk based on profiling with common genetic variants. J. Natl. Cancer Inst. 107, djv036 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Martin A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Martin A. R. et al. Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. Am. J. Hum. Genet. 100, 635–649 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.