Abstract
Many risk factors have been identified for breast cancer. The potential causality for some of them remain uncertain, and few studies have comprehensively investigated these associations by molecular subtypes. We performed a two-sample Mendelian randomization (MR) study to evaluate potential causal associations of 23 known and suspected risk factors and biomarkers with breast cancer risk overall and by molecular subtypes using data from the Breast Cancer Association Consortium. The inverse-variance weighted method was used to estimate odds ratios (OR) and 95% confidence interval (CI) for association of each trait with breast cancer risk. Significant associations with breast cancer risk were found for 15 traits, including age at menarche, age at menopause, body mass index, waist-to-hip ratio, height, physical activity, cigarette smoking, sleep duration, and morning-preference chronotype, and six blood biomarkers (estrogens, insulin-like growth factor-1, sex hormone-binding globulin [SHBG], telomere length, HDL-cholesterol, and fasting insulin). Noticeably, an increased circulating SHBG was associated with a reduced risk of estrogen receptor (ER)-positive cancer (OR=0.83, 95%CI: 0.73–0.94), but an elevated risk of ER-negative (OR=1.12, 95%CI: 0.93–1.36) and triple negative cancer (OR=1.19, 95%CI: 0.92–1.54) (Pheterogeneity=0.01). Fasting insulin was most strongly associated with an increased risk of HER2-negative cancer (OR=1.94, 95%CI: 1.18–3.20), but a reduced risk of HER2-enriched cancer (OR=0.46, 95%CI: 0.26–0.81) (Pheterogeneity=0.006). Results from sensitivity analyses using MR-Egger and MR-PRESSO were generally consistent. This study provides strong evidence supporting potential causal associations of several risk factors for breast cancer and suggests potential heterogeneous associations of SHBG and fasting insulin levels with subtypes of breast cancer.
Keywords: Breast cancer, risk factors, epidemiology, molecular subtypes, Mendelian randomization, causal inference
Graphical Abstract
Introduction
Breast cancer is the most common cancer among women worldwide1. Previous epidemiological studies provided evidence for associations of breast cancer risk with reproductive factors (such as early age at menarche and late age at menopause)2, lifestyle-related factors (such as physical inactivity, cigarette smoking, and alcohol drinking)3, 4, and several biomarkers (such as estrogens and insulin-like growth factor 1 [IGF-1])5, 6. The causality for many of these associations has not yet been established, as most of the evidence was derived from observational studies which often suffer from selection, recall, confounding, and reverse causation biases, as well as measurement errors.
Mendelian randomization (MR) analyses have been increasingly used in epidemiologic studies for causal inference. MR analyses use genetic variants as instrumental variables (IVs) to estimate the exposure of interest for association analyses of disease outcomes. Since genetic alleles are randomly assorted during gametogenesis, MR analyses are analogous to randomized clinical trials, thereby minimizing or even eliminating the likelihood of confounding and reverse causality biases commonly encountered in conventional epidemiologic studies7. MR analyses can provide strong evidence for causal inference when three assumptions are met: 1) the genetic variants are associated with the exposure of interest; 2) the genetic variants are not associated with any of the confounders of the exposure-outcome relationship; 3) the genetic variants affect the outcome only through the exposure (no pleiotropic effect)8. In recent years, we and others have conducted several MR studies to evaluate the association of several risk factors with breast cancer risk9–17. However, many other risk factors remain to be evaluated in MR studies. To date, no study has been conducted to evaluate large numbers of known and suspected risk factors for breast cancer. Evaluating these risk factors in a single study using the same methodology may help to compare the strength of the association and provide a comprehensive understanding of breast cancer etiology. Moreover, virtually all previous studies only assessed the associations of these factors with overall breast cancer risk or by estrogen receptor (ER) status. Understanding associations of these risk factors by breast cancer intrinsic subtype could provide additional insights into the etiology and biology of this cancer. Therefore, in this large MR study, we comprehensively assessed associations of 23 known and suspected risk factors with breast cancer risk and further assessed these associations by ER status and intrinsic subtypes.
Methods
Instrumental variables for breast cancer risk factors
After reviewing the literature about the epidemiology of breast cancer and genetic studies of breast cancer risk factors, we selected 23 known and suspected risk factors of breast cancer for the current study, including reproductive factors (age at menarche, age at menopause), body weight, and other anthropometrics (body mass index [BMI], BMI-adjusted waist-to-hip ratio [WHR adj BMI], weight at birth, height, mammographic dense area, and breast size), lifestyle factors (physical activity, alcohol consumption, smoking behavior, sleep duration, and chronotype), and blood biomarkers (estrogens, progesterone, sex hormone binding globulin [SHBG], insulin-like growth factor 1 [IGF-1], leukocyte telomere length, high density lipoprotein-cholesterol [HDL-C], low density lipoprotein-cholesterol [LDL-C], triglycerides, fasting insulin, and 2-hour glucose). We used lifetime smoking index to measure smoking behavior, as this index is a composite score that captures smoking duration, intensity, and cessation18. We searched the National Human Genome Research Institute-European Bioinformatics Institute Catalog of human GWAS and the literature available up to June 2021 to identify genetic variants associated with these traits9, 12, 16, 18–34. Effect sizes estimated for genetic variants that met the genome-wide significance level (P < 5×10−8) were extracted from the largest GWAS (listed in Table S1) conducted to date in European ancestry populations and used for MR analyses in this study. To select independent IVs in each locus, we excluded all genetic variants that are correlated with the most significant variant at a linkage disequilibrium level of R2 ≥ 0.1 among European-ancestry populations based on data from the 1000 Genomes Project. The variance explained by its associated variants for each trait ranged from 0.2% of physical activity to 41.7% of height, which is equivalent to an F-statistic of 14.3 to 177.0 used to define the instrumental variables (Table 1). The proportion of variance explained (PVE, %) and F-statistics for a trait were calculated following the formulae: PVE=, F-statistic= , where: βi and sei are the beta coefficient and standard error of the ith SNP for the trait from the published GWAS, respectively; N represents the sample size; K is the number of SNPs used in the instrument. Detailed information on the genetic instruments of these traits is presented in Table S1.
Table 1.
Associations of 23 known and suspected risk factors and biomarkers with overall breast cancer risk a: Results from Mendelian randomization analyses
Risk factors b | Number of SNPs | Variance explained (%) | F-statistics | IVW |
MR-PRESSO |
P pleiotropy | ||
---|---|---|---|---|---|---|---|---|
OR (95% CI) | P | OR (95% CI) | P | |||||
| ||||||||
Reproductive factors | ||||||||
Age at menopause (years) | 53 | 5.5 | 75.5 | 1.09 (1.06–1.13) | 2.71×10−7 | 1.11 (1.08–1.14) | 2.53×10−10 | 0.232 |
Age at menarche (years) | 381 | 7.2 | 67.4 | 0.85 (0.75–0.96) | 0.010 | 0.94 (0.86–1.03) | 0.220 | 0.890 |
Body weight and other measurements | ||||||||
BMI (kg/m2) | 856 | 8.3 | 72.4 | 0.96 (0.92–0.99) | 0.023 | 0.97 (0.94–1.00) | 0.069 | 0.880 |
WHR adj BMI | 440 | 4.3 | 71.1 | 0.92 (0.87–0.97) | 0.002 | 0.91 (0.87–0.95) | 2.85×10−5 | 0.052 |
Weight at Birth (g) | 57 | 1.7 | 47.3 | 0.90 (0.79–1.03) | 0.145 | 0.93 (0.84–1.02) | 0.124 | 0.322 |
Height (cm) | 2791 | 41.7 | 177.0 | 1.03 (1.01–1.05) | 0.004 | 1.02 (1.00–1.03) | 0.057 | 0.977 |
Mammographic dense area (cm2) | 7 | 1.2 | 32.4 | 1.29 (0.97–1.71) | 0.085 | 1.25 (1.09–1.42) | 0.047 | 0.875 |
Breast size (cup) | 7 | 1.8 | 42.7 | 1.26 (0.83–1.90) | 0.274 | NA | 0.451 | |
Lifestyle factors | ||||||||
Physical activity (accelerometer-measured (milligravities)) | 5 | 0.2 | 35.2 | 0.51 (0.26–0.99) | 0.048 | 0.84 (0.61–1.17) | 0.309 | 0.229 |
Alcohol consumption (drinks per week) | 169 | 1.0 | 33.6 | 1.01 (0.89–1.16) | 0.829 | 0.99 (0.88–1.11) | 0.881 | 0.590 |
Smoking behavior (lifetime smoking index) | 126 | 1.3 | 48.8 | 1.16 (1.03–1.30) | 0.012 | 1.17 (1.05–1.30) | 0.006 | 0.208 |
Sleep duration (hours) | 52 | 0.5 | 39.6 | 1.37 (1.11–1.70) | 0.003 | 1.26 (1.10–1.45) | 0.002 | 0.880 |
Chronotype | 340 | 2.5 | 34.3 | 0.89 (0.83–0.94) | 1.29×10−4 | 0.90 (0.85–0.95) | 1.02×10−4 | 0.324 |
Biomarkers | ||||||||
Estrogens (pmol/l) | 3 | 1.6 | 14.3 | 1.14 (1.01–1.28) | 0.030 | NA | 0.970 | |
Progesterone (nmol/l) | 4 | 10.0 | 35.0 | 1.01 (0.97–1.04) | 0.761 | 1.01 (0.98–1.03) | 0.701 | 0.332 |
SHBG (nmol/L) | 9 | 3.6 | 38.9 | 0.85 (0.76–0.94) | 0.002 | 0.85 (0.79–0.90) | 0.003 | 0.480 |
IGF-1 (nmol/L) | 265 | 9.1 | 72.8 | 1.05 (1.00–1.11) | 0.037 | 1.06 (1.02–1.11) | 0.005 | 0.295 |
Leukocyte telomere length | 19 | 1.5 | 63.6 | 1.19 (1.08–1.32) | 5.63×10−4 | 1.20 (1.11–1.29) | 3.09×10−4 | 0.008 |
HDL-C (mmol/L) | 528 | 15.9 | 144.5 | 1.10 (1.06–1.15) | 1.12×10−5 | 1.10 (1.06–1.14) | 1.23×10−6 | 0.103 |
LDL-C (mmol/L) | 218 | 7.9 | 174.0 | 1.05 (0.99–1.11) | 0.083 | 1.02 (0.97–1.07) | 0.407 | 0.020 |
Triglycerides (mmol/L) | 436 | 11.4 | 130.3 | 0.96 (0.92–1.01) | 0.097 | 0.95 (0.91–0.99) | 0.009 | 0.138 |
Fasting insulin adj BMI (pmol/L) | 14 | 0.7 | 49.5 | 1.27 (1.03–1.56) | 0.025 | 1.28 (1.10–1.50) | 0.010 | 0.222 |
2-h glucose adj BMI (mmol/L) | 9 | 0.9 | 43.4 | 1.28 (0.96–1.72) | 0.098 | 1.07 (0.88–1.31) | 0.534 | 0.005 |
Abbreviation: IVW, inverse-variance weighted; MR, mendelian randomization; PRESSO, pleiotropy residual sum and outlier; BMI, body mass index; WHR, waist-to-hip ratio; HDL-C, high density lipoprotein-cholesterol; LDL-C, low density lipoprotein-cholesterol; IGF-1, insulin like growth factor 1; SHBG, sex hormone binding globulin; NA, not available.
Based on 133,384 breast cancer cases and 113,789 controls.
The unit is in 2 years change per copy of the effect allele for the OR estimates of age at menarche and menopause. The unit is in standard deviation change per copy of the effect allele for the OR estimates of BMI, WHR adj BMI, weight at birth, height, mammographic dense area, physical activity, alcohol consumption, IGF-1, HDL-C, LDL-C, triglycerides, fasting insulin adj BMI, and 2-h glucose adj BMI. The unit is in per-unit change per copy of the effect allele for the OR estimates of breast size, smoking behavior, sleep duration, estrogens, progesterone, SHBG, and leukocyte telomere length. The unit is in per-category change per copy of the effect allele for the OR estimates of chronotype. From definite evening, more evening than morning, don’t know, more morning than evening, and definite morning, which coded as −2, −1,0, 1, and 2.
Data for genome-wide association studies of breast cancer risk
Summary-level statistic data from two recent GWAS of breast cancer risk, overall and by molecular subtypes, were obtained from the Breast Cancer Association Consortium (BCAC). Data for overall breast cancer risk were based on GWAS conducted among 247,173 women of European ancestry (133,384 patients with breast cancer and 113,789 controls)35. Data for breast cancer risk by ER status were derived from GWAS conducted among 196,943 women of European descent (69,501 ER-positive cases, 21,468 ER-negative cases, and 105,974 controls)36. Data for risk by intrinsic subtypes were obtained from GWAS conducted among 160,993 women of European descent (45,253 luminal A-like cases, 6,350 luminal B/HER2 negative-like cases, 6,427 luminal B-like cases, 2,884 HER2 enriched-like cases, 8,602 triple-negative cases, and 91,477 controls)35. Further details are presented in Table S2. The definition of five intrinsic subtypes was based on the hormone receptors, HER2 receptors, and tumor grade35: 1) luminal A-like (ER+ and/or PR+, HER2−, and grade 1 & 2); 2) luminal B/HER2-negative-like (ER+ and/or PR+, HER2−, and grade 3); 3) luminal B-like (ER+ and/or PR+, HER2+); 4) HER2-enriched-like (ER− and PR−, HER2+); 5) triple-negative (ER−, PR−, HER2−).
Statistical analyses
The two-sample MR analyses were performed primarily using the inverse-variance weighted (IVW) method to assess odds ratio (OR) and 95% confidence interval (CI) for the association of each trait with breast cancer risk overall and by ER status or five intrinsic subtypes37. Two additional methods were also used to assess the robustness of the results. We used the MR-Egger regression method to assess possible horizontal pleiotropic effects, which was implicated when the intercept value differed from zero 38. The MR pleiotropy residual sum and outlier test (MR-PRESSO) was applied to detect outlying SNPs with evidence of horizontal pleiotropy; the effect estimates were reassessed after removing the outliers39. Because of a strong genetic correlation of BMI with age at menarche or SHBG reported in previous studies12, 40, a multivariable MR approach was implemented in the analyses of age at menarche and SHBG by including the same SNPs-associated BMI as a covariate. To assess if the association of a trait of interest with breast cancer risk may differ by cancer subtypes, we performed the heterogeneity test using Cochran’s Q statistic. All statistical analyses were performed using R packages ‘MendelianRandomization’ and ‘MRPRESSO’.
Results
Associations with overall breast cancer risk
The IVW analysis showed that older age at menopause and younger age at menarche were significantly associated with an elevated risk of breast cancer. An increased level in BMI, WHRadj BMI, physical activity, and morning preference chronotype were significantly associated with a decreased risk of breast cancer, while a greater height, lifetime smoking index, and longer sleep duration were significantly associated with an increased risk of breast cancer. In addition, several biomarkers, including leukocyte telomere length and circulating estrogens, IGF-1, SHBG, lipid-related biomarker (HDL-C), and fasting insulin, also showed significant associations with breast cancer risk (Table 1). Associations for most risk factors remained statistically significant after adjusting for multiple comparisons using false discovery rate (FDR), except physical activity, estrogens, and IGF-1 (Figure S1). The estimates of associations derived from the MR-PRESSO and MR-Egger regression analyses were generally consistent with the results from the IVW method (Table 1 and Table S3), and the results are summarized in Figure S2. There was no strong evidence of directional horizontal pleiotropy according to the MR-Egger intercept test, with the exception telomere length, LDL-C, and 2-h glucose. After the removal of outlier SNPs (2 for telomere length, 8 for LDL-C, and 4 for 2-h glucose), there was no evidence of directional horizontal pleiotropy.
Associations with breast cancer risk by ER status
As shown in Table 2, significant associations with the risk of both ER-positive and ER-negative breast cancer were observed for age at menopause, age at menarche, sleep duration, morning preference chronotype, and HDL-C (all P for heterogeneity >0.05). Although the associations of height, WHR, physical activity, leukocyte telomere length, and IGF-1 with ER-positive breast cancer were generally stronger than their associations with ER-negative breast cancer, none of heterogeneity tests were statistically significant. Circulating SHBG was inversely associated with risk of ER-positive breast cancer (OR=0.83, 95%CI: 0.73–0.94) but positively associated with risk of ER-negative breast cancer (OR=1.12, 95%CI: 0.93–1.36) (P for heterogeneity=0.01). Results based on the MR-PRESSO and MR-Egger regression methods were generally consistent with the IVW method (Table S3−S4).
Table 2.
Associations of 23 known and suspected risk factors and biomarkers with breast cancer risk, by estrogen receptor (ER): Results from Mendelian randomization analyses
Risk factors a | ER-positive (69,501 cases) c |
ER-negative (21,468 cases) |
Pheterogeneity b | ||
---|---|---|---|---|---|
OR (95% CI) | P | OR (95% CI) | P | ||
| |||||
Reproductive factors | |||||
Age at menopause (years) | 1.11 (1.06–1.15) | 4.45×10−7 | 1.07 (1.02–1.12) | 0.005 | 0.229 |
Age at menarche (years) | 0.85 (0.74–0.98) | 0.024 | 0.77 (0.65–0.92) | 0.004 | 0.421 |
Body weight and other measurements | |||||
BMI (kg/m2) | 0.96 (0.92–1.00) | 0.067 | 0.96 (0.91–1.02) | 0.178 | 0.941 |
WHR adj BMI | 0.91 (0.85–0.97) | 0.004 | 0.99 (0.92–1.05) | 0.677 | 0.092 |
Weight at Birth (g) | 0.91 (0.79–1.05) | 0.194 | 0.92 (0.78–1.09) | 0.352 | 0.914 |
Height (cm) | 1.03 (1.01–1.05) | 0.008 | 1.01 (0.98–1.03) | 0.540 | 0.226 |
Mammographic dense area (cm2) | 1.30 (0.99–1.71) | 0.060 | 1.26 (0.88–1.80) | 0.209 | 0.889 |
Breast size (cup) | 1.21 (0.85–1.72) | 0.292 | 1.39 (0.76–2.52) | 0.282 | 0.695 |
Lifestyle factors | |||||
Physical activity (accelerometer-measured (milligravities)) | 0.46 (0.21–1.00) | 0.049 | 0.96 (0.42–2.17) | 0.920 | 0.200 |
Alcohol consumption (drinks per week) | 1.01 (0.86–1.18) | 0.902 | 1.06 (0.90–1.26) | 0.478 | 0.661 |
Smoking behavior (lifetime smoking index) | 1.11 (0.97–1.26) | 0.121 | 1.15 (0.96–1.37) | 0.138 | 0.759 |
Sleep duration (hours) | 1.44 (1.14–1.80) | 0.002 | 1.44 (1.08–1.92) | 0.013 | 0.980 |
Chronotype | 0.87 (0.81–0.93) | 6.87×10−5 | 0.89 (0.81–0.98) | 0.019 | 0.651 |
Biomarkers | |||||
Estrogens (pmol/l) | 1.18 (1.02–1.36) | 0.026 | 1.25 (1.00–1.56) | 0.052 | 0.675 |
Progesterone (nmol/l) | 1.02 (0.97–1.06) | 0.478 | 0.97 (0.87–1.09) | 0.619 | 0.474 |
SHBG (nmol/L) | 0.83 (0.73–0.94) | 0.004 | 1.12 (0.93–1.36) | 0.239 | 0.010 |
IGF-1 (nmol/L) | 1.07 (1.01–1.13) | 0.025 | 1.02 (0.95–1.09) | 0.576 | 0.291 |
Leukocyte telomere length | 1.18 (1.03–1.35) | 0.019 | 1.08 (0.84–1.39) | 0.560 | 0.541 |
HDL-C (mmol/L) | 1.12 (1.06–1.17) | 1.81×10−5 | 1.12 (1.06–1.20) | 2.24×10−4 | 0.867 |
LDL-C (mmol/L) | 1.04 (0.98–1.11) | 0.177 | 1.03 (0.96–1.12) | 0.421 | 0.817 |
Triglycerides (mmol/L) | 0.95 (0.90–1.00) | 0.058 | 0.97 (0.91–1.04) | 0.389 | 0.577 |
Fasting insulin adj BMI (pmol/L) | 1.32 (0.98–1.79) | 0.070 | 1.03 (0.79–1.36) | 0.807 | 0.237 |
2-h glucose adj BMI (mmol/L) | 1.33 (0.97–1.82) | 0.076 | 1.41 (1.06–1.87) | 0.018 | 0.795 |
Abbreviation: BMI, body mass index; WHR, waist-to-hip ratio; HDL-C, high density lipoprotein-cholesterol; LDL-C, low density lipoprotein-cholesterol; IGF-1, insulin like growth factor 1; SHBG, sex hormone binding globulin.
The unit is in 2 years change per copy of the effect allele for the OR estimates of age at menarche and menopause. The unit is in standard deviation change per copy of the effect allele for the OR estimates of BMI, WHR adj BMI, weight at birth, height, mammographic dense area, physical activity, alcohol consumption, IGF-1, HDL-C, LDL-C, triglycerides, fasting insulin adj BMI, and 2-h glucose adj BMI. The unit is in per-unit change per copy of the effect allele for the OR estimates of breast size, smoking behavior, sleep duration, estrogens, progesterone, SHBG, and leukocyte telomere length. The unit is in per-category change per copy of the effect allele for the OR estimates of chronotype. From definite evening, more evening than morning, don’t know, more morning than evening, and definite morning, which coded as −2, −1,0, 1, and 2.
Heterogeneity of the association with a given trait between ER+ and ER− breast cancer was evaluated using Cochran’s Q statistic.
Compared with 105,974 controls.
Associations with breast cancer risk by intrinsic subtypes
There were five traits showing different associations with the risk of breast cancer by intrinsic subtypes (Figure 1). Fasting insulin was associated with a reduced risk of HER2 enriched cancer but an increased risk of HER2 negative cancer (P for heterogeneity= 0.006). A higher HDL-C level was associated with an increased risk of the luminal A-like subtype (OR=1.13; 95% CI: 1.07–1.19), HER2 enriched-like subtype (OR=1.19; 95% CI: 1.06–1.34), and triple-negative subtype (OR=1.12; 95% CI: 1.04–1.21), but not associated with risk of luminal B-like or luminal B/HER2 negative-like subtype (P for heterogeneity=0.048). SHBG was associated in opposite directions with luminal A-like and triple-negative subtypes (OR: 0.79 vs. 1.19; P for heterogeneity=0.01). Age at menopause showed significant associations with all intrinsic subtypes except triple-negative, and the heterogeneity of the associations between luminal A-like and triple-negative subtypes was statistically significant (P=0.035). Alcohol consumption was only associated with HER2 enriched cancer (OR=1.46; 95% CI: 1.02–2.09), and heterogeneous test was statistically significant as compared with lumina A-like cancer (P for heterogeneity=0.039). No significant different associations were observed among the five intrinsic subtypes with any of the other risk factors (Table S5). The MR-PRESSO and MR-Egger regression analyses yielded similar results as the IVW-method (Tables S6 and S7), although fewer statistically significant findings were observed in analyses using MR-Egger regressions than those from other two analyses.
Figure 1.
Summary of results for five risk factors and biomarkers showing different associations with any of the five intrinsic subtypes of breast cancer: Results from Mendelian randomization analyses. A. High density lipoprotein-cholesterol (HDL-C); B. Fasting insulin adj BMI; C. Sex hormone binding globulin (SHBG); D. Age at menopause; E. Alcohol consumption. a Luminal A-like (ER+ and/or PR+, HER2−, grade 1 & 2, n= 45,253); luminal B/HER2-negative-like (ER+ and/or PR+, HER2−, grade 3, n=6,350); luminal B-like (ER+ and/or PR+, HER2+, n=6,427); HER2-enriched-like (ER− and PR−, HER2+, n=2,884); triple-negative (ER−, PR−, HER2−, n=8,602). Compared with 91,477 controls. Possible heterogeneity for the association with a given trait by intrinsic subtypes was evaluated using Cochran’s Q statistic. For the pair-wise comparisons, the luminal A-like was used as the reference and other subtypes were compared separately with the reference to test the heterogeneity.
Discussion
In this large MR study, we found that age at menarche and age at menopause were associated with breast cancer risk, which is consistent with previous MR analyses 9, 12. We also observed significant associations of three body measurements (BMI, WHR adj BMI, and height) with risk of overall breast cancer, which is also consistent with our previous MR analyses using individual-level data10, 11, 13. Our results for sleep traits are consistent with results from a recent MR study41, which has suggested an inverse association of morning preference chronotype and a positive association of sleep duration with overall breast cancer risk. Few MR studies to date have evaluated the role of lifetime smoking index on breast cancer risk. Our study reported a positive association of lifetime smoking index with overall breast cancer risk, which is supported by a recent MR study42. Consistent with previous observational studies using measured blood estrogens, we found a positive association of blood estrogen levels with breast cancer risk. Our results for an inverse association of SHBG and a positive association of IGF-1 with breast cancer risk are supported by recent MR studies16, 40. Using individual-level data, we showed previously that circulating fasting insulin was associated with an increased risk of breast cancer13. In the present study, the associations between fasting insulin and breast cancer risk remain statistically significant. Additionally, this study based on a larger number of variants confirmed our previous finding for a positive association of HDL-C with breast cancer risk17. Of note, a previous MR study reported a borderline significant association of telomere length with overall breast cancer risk43. We have now observed a statistically significant association using a much larger sample size and more telomere length-associated genetic variants.
Genetically predicted age at menopause was found to be strongly associated with the risk of breast cancer except for triple negative breast cancer in our study, and this finding was supported by a previous longitudinal study44. Interestingly, we found that circulating SHBG was inversely associated with luminal A and B breast cancer but not triple negative breast cancer. Possible biological mechanisms for this finding are unclear. Measured SHBG has been consistently linked to a reduced risk of breast cancer. However, to our knowledge, no study has specifically evaluated the association of circulating SHBG with risk of triple-negative breast cancer, which differs from other common types of breast cancer in many risk factors 45, 46. In particular, several reproductive factors related to a high estrogen exposure (such as age at menarche, nulliparity, and age at first full-term pregnancy) were not associated with risk of triple-negative breast cancer47, 48, suggesting that triple-negative breast cancer may have a different etiologic profile from other breast cancer subtypes.
It has been well established that insulin and IGF signaling pathways play a significant role in breast carcinogenesis49. Circulating insulin and IGF-1 or IGF-binding protein 3 have been reported in association with breast cancer risk in several previous studies6, 50. We have confirmed these findings using MR analyses. However, we found that circulating insulin was inversely associated with HER2-enriched cancer. This finding was unexpected. Future studies are needed to validate this finding and evaluate possible biological mechanisms.
We did not find any significant association of mammographic dense, breast size, and alcohol drinking with breast cancer risk in our MR analyses. One possible explanation is that there might be no causal association between these traits and breast cancer risk. Another possible explanation is that the genetic instruments for some of these traits may be weak. Identifications of additional genetic variants associated with these traits may be helpful in future MR analyses to re-evaluate these traits. Some modifiable traits, such as alcohol consumption, are heavily influenced by non-genetic factors. Potential confounding from those non-genetic factors may partly explain the inconsistent results between our MR analysis and observational studies.
This is the first study that has comprehensively assessed large numbers of known and suspected risk factors and biomarkers for breast cancer risk. The sample size is large, which provides great statistical power to assess many traits with risk of breast cancer overall and by molecular subtypes. These findings may help to elucidate the underlying mechanism for specific molecular subtypes. However, there are several potential limitations of this study. First, even though we used three MR methods to assess the robustness of our results and found no apparent evidence of pleiotropic effects or violations of MR assumptions, we cannot completely rule out possible residual pleiotropic effects for some genetic instruments. Second, for several traits, such as physical activity, estrogens, and alcohol drinking, only a few genetic variants were identified to date, and these variants explain only a small fraction of the variation of these traits. Therefore, risk estimates for these traits could be unstable or biased. Additional studies are needed to evaluate these traits in the future. Third, some findings could be false positive due to multiple comparisons. However, virtually all associations observed in our studies are consistent with the study hypotheses suggested by previous conventional epidemiologic studies. Fourth, the sample size is relatively small for some subtypes, especially for the HER2-enriched tumors, which may have limited our statistical power to detect associations for these subtypes of cancer. Lastly, we cannot evaluate potential influences of menopausal status on our study results as summary statistics data by menopausal status were unavailable. This issue could be addressed in future studies when data by menopausal status become available.
In conclusion, this comprehensive two-sample MR analysis using the most up to date instrument variables supports potential causal associations for several reproductive factors, anthropometric traits, and blood biomarkers with the risk of breast cancer. This study provides evidence that the association of several risk factors with breast cancer risk may differ by ER status and intrinsic subtypes. These findings highlight the heterogeneous nature of this disease, which may help to improve the understanding of breast cancer tumorigenesis.
Supplementary Material
What’s new?
Previous studies have identified many risk factors for breast cancer, but the potential causality for some of these associations has not yet been established, and few studies have comprehensively investigated these associations by molecular subtypes. This large two-sample Mendelian randomization study supports potential causal associations for 15 risk factors and blood biomarkers with the risk of breast cancer. We also showed that the association of breast cancer risk with certain risk factors and biomarkers may differ by ER status and intrinsic subtypes, suggesting that these cancer subtypes may have different etiologic profiles.
Acknowledgements
We thank Marshal Younger and Rachel Mullen for their assistance in preparing and submitting this manuscript. This work was supported, in part, by Anne Potter Wilson Chair endowment to Vanderbilt University.
Abbreviations
- BCAC
Breast Cancer Association Consortium
- BMI
body mass index
- CI
confidence interval
- ER
estrogen receptor
- FDR
false discovery rate
- GWAS
genome-wide association study
- HDL-C
high density lipoprotein-cholesterol
- HER2
human epidermal growth factor receptor 2
- IGF-1
insulin-like growth factor 1
- IVs
instrumental variables
- IVW
inverse-variance weighted
- LDL-C
low density lipoprotein-cholesterol
- MR
Mendelian randomization
- OR
odds ratio
- PR
progesterone receptor
- PRESSO
pleiotropy residual sum and outlier test
- PVE
proportion of variance explained
- SHBG
sex hormone-binding globulin
- WHR
waist-to-hip ratio
Footnotes
Conflict of Interest
The authors declare no potential conflicts of interest.
Ethics Statement
Summary data were derived from previously reported studies that followed the relevant institutional review boards and patient consent procedures and followed the procedure the Data Access Coordination Committee of BCAC.
Data Availability Statement
GWAS summary data can be obtained from Breast Cancer Association Consortium (http://bcac.ccge.medschl.cam.ac.uk/). Further information is available from the corresponding author upon reasonable request.
References
- 1.International Agency for Research on Cancer. Cancer Tomorrow http://gco.iarc.fr/tomorrow/home. Accessed September 22, 2021.
- 2.Collaborative Group on Hormonal Factors in Breast Cancer. Menarche, menopause, and breast cancer risk: individual participant meta-analysis, including 118 964 women with breast cancer from 117 epidemiological studies. Lancet Oncol. 2012;13(11): 1141–1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lahmann PH, Friedenreich C, Schuit AJ, et al. Physical activity and breast cancer risk: the European Prospective Investigation into Cancer and Nutrition. Cancer Epidemiol Biomarkers Prev. 2007;16(1): 36–42. [DOI] [PubMed] [Google Scholar]
- 4.Jung S, Wang M, Anderson K, et al. Alcohol consumption and breast cancer risk by estrogen receptor status: in a pooled analysis of 20 studies. Int J Epidemiol. 2016;45(3): 916–928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Moore SC, Matthews CE, Shu XO, et al. Endogenous Estrogens, Estrogen Metabolites, and Breast Cancer Risk in Postmenopausal Chinese Women. J Natl Cancer Inst. 2016;108(10): djw103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.The Endogenous Hormones and Breast Cancer Collaborative Group. Insulin-like growth factor 1 (IGF1), IGF binding protein 3 (IGFBP3), and breast cancer risk: pooled individual data analysis of 17 prospective studies. Lancet Oncol. 2010;11(6): 530–542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Smith GD, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32(1): 1–22. [DOI] [PubMed] [Google Scholar]
- 8.Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet. 2014;23(R1): R89–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Day FR, Ruth KS, Thompson DJ, et al. Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair. Nat Genet. 2015;47(11): 1294–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhang B, Shu XO, Delahanty RJ, et al. Height and Breast Cancer Risk: Evidence From Prospective Studies and Mendelian Randomization. J Natl Cancer Inst. 2015;107(11): djv219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Guo Y, Warren Andersen S, Shu XO, et al. Genetically Predicted Body Mass Index and Breast Cancer Risk: Mendelian Randomization Analyses of Data from 145,000 Women of European Descent. PLoS Med. 2016;13(8): e1002105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Day FR, Thompson DJ, Helgason H, et al. Genomic analyses identify hundreds of variants associated with age at menarche and support a role for puberty timing in cancer risk. Nat Genet. 2017;49(6): 834–841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Shu X, Wu L, Khankari NK, et al. Associations of obesity and circulating insulin and glucose with breast cancer risk: a Mendelian randomization analysis. Int J Epidemiol. 2019;48(3): 795–806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Papadimitriou N, Dimou N, Tsilidis KK, et al. Physical activity and risks of breast and colorectal cancer: a Mendelian randomisation analysis. Nat Commun. 2020;11(1): 597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zhu J, Jiang X, Niu Z. Alcohol consumption and risk of breast and ovarian cancer: A Mendelian randomization study. Cancer Genet. 2020;245: 35–41. [DOI] [PubMed] [Google Scholar]
- 16.Murphy N, Knuppel A, Papadimitriou N, et al. Insulin-like growth factor-1, insulin-like growth factor-binding protein-3, and breast cancer risk: observational and Mendelian randomization analyses with approximately 430 000 women. Ann Oncol. 2020;31(5): 641–649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Beeghly-Fadiel A, Khankari NK, Delahanty RJ, et al. A Mendelian randomization analysis of circulating lipid traits and breast cancer risk. Int J Epidemiol. 2020;49(4): 1117–1131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Wootton RE, Richmond RC, Stuijfzand BG, et al. Evidence for causal effects of lifetime smoking on risk for depression and schizophrenia: a Mendelian randomisation study. Psychol Med. 2020;50(14): 2435–2443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lindström S, Thompson DJ, Paterson AD, et al. Genome-wide association study identifies multiple loci associated with both mammographic density and breast cancer risk. Nat Commun. 2014;5: 5303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Eriksson N, Benton GM, Do CB, et al. Genetic variants associated with breast size also influence breast cancer risk. BMC Med Genet. 2012;13: 53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Yengo L, Sidorenko J, Kemper KE, et al. Meta-analysis of genome-wide association studies for height and body mass index in approximately 700000 individuals of European ancestry. Hum Mol Genet. 2018;27(20): 3641–3649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Pulit SL, Stoneman C, Morris AP, et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Hum Mol Genet. 2019;28(1): 166–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Horikoshi M, Beaumont RN, Day FR, et al. Genome-wide associations for birth weight and correlations with adult disease. Nature. 2016;538(7624): 248–252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Doherty A, Smith-Byrne K, Ferreira T, et al. GWAS identifies 14 loci for device-measured physical activity and sleep duration. Nat Commun. 2018;9(1): 5257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Karlsson Linner R, Biroli P, Kong E, et al. Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nat Genet. 2019;51(2): 245–257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Liu M, Jiang Y, Wedow R, et al. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat Genet. 2019;51(2): 237–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Jansen PR, Watanabe K, Stringer S, et al. Genome-wide analysis of insomnia in 1,331,010 individuals identifies new risk loci and functional pathways. Nat Genet. 2019;51(3): 394–403. [DOI] [PubMed] [Google Scholar]
- 28.Jones SE, Lane JM, Wood AR, et al. Genome-wide association analyses of chronotype in 697,828 individuals provides insights into circadian rhythms. Nat Commun. 2019;10(1): 343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Scott RA, Lagou V, Welch RP, et al. Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nat Genet. 2012;44(9): 991–1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Xue A, Wu Y, Zhu Z, et al. Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat Commun. 2018;9(1): 2941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Richardson TG, Sanderson E, Palmer TM, et al. Evaluating the relationship between circulating lipoprotein lipids and apolipoproteins with risk of coronary heart disease: A multivariable Mendelian randomisation analysis. PLoS Med. 2020;17(3): e1003062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Li C, Stoma S, Lotta LA, et al. Genome-wide Association Analysis in Humans Links Nucleotide Metabolism to Leukocyte Telomere Length. Am J Hum Genet. 2020;106(3): 389–404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Pott J, Bae YJ, Horn K, et al. Genetic Association Study of Eight Steroid Hormones and Implications for Sexual Dimorphism of Coronary Artery Disease. J Clin Endocrinol Metab. 2019;104(11): 5008–5023. [DOI] [PubMed] [Google Scholar]
- 34.Coviello AD, Haring R, Wellons M, et al. A genome-wide association meta-analysis of circulating sex hormone-binding globulin reveals multiple Loci implicated in sex steroid hormone regulation. PLoS Genet. 2012;8(7): e1002805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zhang H, Ahearn TU, Lecarpentier J, et al. Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses. Nat Genet. 2020;52(6): 572–581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Michailidou K, Lindstrom S, Dennis J, et al. Association analysis identifies 65 new breast cancer risk loci. Nature. 2017;551(7678): 92–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 2013;37(7): 658–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015;44(2): 512–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Verbanck M, Chen CY, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet. 2018;50(5): 693–698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Dimou NL, Papadimitriou N, Gill D, et al. Sex hormone binding globulin and risk of breast cancer: a Mendelian randomization study. Int J Epidemiol. 2019;48(3): 807–816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Richmond RC, Anderson EL, Dashti HS, et al. Investigating causal relations between sleep traits and risk of breast cancer in women: mendelian randomisation study. BMJ. 2019;365: l2327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Dimou N, Yarmolinsky J, Bouras E, et al. Causal Effects of Lifetime Smoking on Breast and Colorectal Cancer Risk: Mendelian Randomization Study. Cancer Epidemiol Biomarkers Prev. 2021;30(5): 953–964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Haycock PC, Burgess S, Nounu A, et al. Association Between Telomere Length and Risk of Cancer and Non-Neoplastic Diseases: A Mendelian Randomization Study. JAMA Oncol. 2017;3(5): 636–651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Phipps AI, Chlebowski RT, Prentice R, et al. Reproductive history and oral contraceptive use in relation to risk of triple-negative breast cancer. J Natl Cancer Inst. 2011;103(6): 470–477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Gaudet MM, Gierach GL, Carter BD, et al. Pooled Analysis of Nine Cohorts Reveals Breast Cancer Risk Factors by Tumor Molecular Subtype. Cancer Res. 2018;78(20): 6011–6021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.McCarthy AM, Friebel-Klingner T, Ehsan S, et al. Relationship of established risk factors with breast cancer subtypes. Cancer Med. 2021;10(18): 6456–6467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Ma H, Ursin G, Xu X, et al. Reproductive factors and the risk of triple-negative breast cancer in white women and African-American women: a pooled analysis. Breast Cancer Res. 2017;19(1): 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.John EM, Hines LM, Phipps AI, et al. Reproductive history, breast-feeding and risk of triple negative breast cancer: The Breast Cancer Etiology in Minorities (BEM) study. Int J Cancer. 2018;142(11): 2273–2285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Christopoulos PF, Msaouel P, Koutsilieris M. The role of the insulin-like growth factor-1 system in breast cancer. Mol Cancer. 2015;14: 43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Gunter MJ, Hoover DR, Yu H, et al. Insulin, insulin-like growth factor-I, and risk of breast cancer in postmenopausal women. J Natl Cancer Inst. 2009;101(1): 48–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
GWAS summary data can be obtained from Breast Cancer Association Consortium (http://bcac.ccge.medschl.cam.ac.uk/). Further information is available from the corresponding author upon reasonable request.