Abstract
Observational studies suggest that higher birth weight (BW) is associated with increased risk of breast cancer in adult life. We conducted a two-sample Mendelian randomisation (MR) study to assess whether this association is causal. Sixty independent single nucleotide polymorphisms (SNPs) known to be associated at P < 5 × 10−8 with BW were used to construct (1) a 41-SNP instrumental variable (IV) for univariable MR after removing SNPs with pleiotropic associations with other breast cancer risk factors and (2) a 49-SNP IV for multivariable MR after filtering SNPs for data availability. BW predicted by the 41-SNP IV was not associated with overall breast cancer risk in inverse-variance weighted (IVW) univariable MR analysis of genetic association data from 122,977 breast cancer cases and 105,974 controls (odds ratio = 0.86 per 500 g higher BW; 95% confidence interval 0.73–1.01). Sensitivity analyses using four alternative methods and three alternative IVs, including an IV with 59 of the 60 BW-associated SNPs, yielded similar results. Multivariable MR adjusting for the effects of the 49-SNP IV on birth length, adult height, adult body mass index, age at menarche, and age at menopause using IVW and MR-Egger methods provided estimates consistent with univariable analyses. Results were also similar when all analyses were repeated after restricting to estrogen receptor-positive or -negative breast cancer cases. Point estimates of the odds ratios from most analyses performed indicated an inverse relationship between genetically-predicted BW and breast cancer, but we are unable to rule out an association between the non-genetically-determined component of BW and breast cancer. Thus, genetically-predicted higher BW was not associated with an increased risk of breast cancer in adult life in our MR study.
Electronic supplementary material
The online version of this article (10.1007/s10654-019-00485-7) contains supplementary material, which is available to authorized users.
Keywords: Birth weight, Breast cancer, Mendelian randomisation
Background
The hypothesis that the risk of developing breast cancer in adulthood may be increased by factors that first act in utero—in particular by fetal exposure to higher levels of maternal estrogen—was first proposed in 1990 [1]. Since then, several observational studies have examined this hypothesis by using birth weight as an index of the effects of intrauterine hormones on fetal growth and of the extent of the fetal mammary stem cell pool from which breast tumours may eventually arise [2–6]. When considered overall, these studies have suggested that higher weight at birth may be associated with an increased susceptibility to breast cancer in later life [7–20], but a few studies have failed to demonstrate this association [21–28]. While most studies have adjusted for the effects of recognised breast cancer risk factors measured at a specific time point in their samples, this does not rule out the possibility of residual effects of these factors, which may act at different points over the course of life, driving the observed association between birth weight and breast cancer. It has not been possible to determine whether this association is causal and to dissect whether it is birth weight per se or an external factor influencing fetal growth and development that might underpin a possible association with future breast cancer risk.
Mendelian randomisation (MR) is a form of instrumental variable (IV) analysis that uses genetic instruments or single nucleotide polymorphisms (SNPs) associated with an exposure of interest to infer a causal relationship (or lack thereof) between this exposure and an outcome. Since SNPs are randomly allocated at conception, an MR study may be thought of as being analogous to a randomised controlled trial of the effects of the exposure on the outcome, making such a study less susceptible to classical confounding. However, MR is a valid tool for causal inference only if three assumptions hold: (1) SNPs used as part of the IV are associated with the exposure, (2) these SNPs are not associated with known or unknown confounders for the outcome, and (3) given these confounders, the SNPs affect the outcome only through the exposure of interest and not via other pathways. Here, we report the result of an MR analysis that aimed to investigate the relationship between birth weight as the exposure and susceptibility to breast cancer as the outcome. A genome-wide association meta-analysis of over 150,000 individuals has previously identified 60 independent SNP loci that are associated with birth weight (BW) and genetic association data for 59 of these loci are available from a separate meta-analysis of breast cancer susceptibility that included over 225,000 women [29, 30]. We used these data to perform an MR study with a two-sample design (Fig. 1) wherein the genetic associations with outcome and exposure are estimated in independent studies. It is challenging to evaluate empirically whether the assumptions required for causal inference with MR are met and therefore, we also report the results of several additional analyses performed to assess the robustness of our MR result.
Methods
Birth weight (exposure) data
Genome-wide significant (P < 5 × 10−8) associations for BW at 60 independent loci were previously reported by Horikoshi et al. [29]. The lead SNPs at 49 loci were identified at P < 5 × 10−8 in the European ancestry component (n = 143,677) of the study by Horikoshi et al. The lead SNPs at the remaining loci had P values ranging from 5.1 × 10−8 to 4.4 × 10−7 in the European ancestry meta-analysis and were identified at P < 5 × 10−8 in the trans-ancestry component of the same study, which combined the European results with results from 10,104 individuals of diverse ancestry [29]. Summary results from the European-only meta-analysis for all 60 SNPs were obtained from the Early Growth Genetics (EGG) Consortium (Supplementary Tables 1 and 2). BW in the EGG Consortium studies was collected from heterogeneous sources (from measurements made at birth by medical doctors, birth records and medical registers, maternal interviews, and adult self-report) [29].
Breast cancer (outcome) data
Summary results from genome-wide association meta-analyses for breast cancer susceptibility in women of European ancestry by Michailidou et al. [30] were downloaded from the Breast Cancer Association Consortium (BCAC) website (Supplementary Table 1). Results were available for 59 of the 60 BW-associated lead SNPs (not available for rs139975827) for three breast cancer susceptibility phenotypes (Supplementary Table 2): overall breast cancer risk (122,977 cases/105,974 controls), estrogen receptor (ER)-positive breast cancer risk (69,501 cases/105,974 controls), and ER-negative breast cancer risk (21,468 cases/105,974 controls). No proxy SNPs correlated with rs139975827 with r2 > 0.8 could be identified. Fifty-eight SNPs were either genotyped or imputed with quality score > 0.8 in the OncoArray project, which was the largest single component of the BCAC meta-analyses [30]. One SNP, rs138715366, was imputed with quality score = 0.62 and also retained for subsequent analyses.
Birth length and other breast cancer-risk factor data
Summary results from genome-wide association meta-analyses for birth length (n = 28,459) [31], adult height (n = 253,288) [32], adult body mass index (BMI) (in women; n = 171,977) [33], age at menarche (n = 132,989) [34], age at menopause (n = 69,360) [35], age at first birth (in women; n = 189,656) [36], and number of children ever born (in women; n = 225,230) [36] were obtained from four genetic consortia. The consortia and corresponding websites used for data download are listed in Supplementary Table 1. The age at menarche sample used here excluded women genotyped in the Collaborative Oncological Gene-environment Study (COGS) who were included in the breast cancer meta-analyses [30, 34]. All summary statistics obtained were from analyses limited to individuals of European ancestry. These statistics included standardised regression (beta) coefficients and P-values from genetic association analysis in each data set, with the exceptions of age at first birth and number of children ever born, where only P values were available.
Data harmonisation
The genetic association data for BW and breast cancer susceptibility were based on imputation using the 1000 Genomes reference panel, while all other data sets had been imputed using HapMap. Out of the 59 BW-associated lead SNPs that were also present in the breast cancer data set, summary statistics were available for 26 SNPs across all data sets (Supplementary Table 3). For an additional 26 SNPs, proxy SNPs in linkage disequilibrium with the BW-associated lead SNP (r2 > 0.6 in European samples) were previously identified by Horikoshi et al. [29]. The same proxies were used here because they were available across all data sets (Supplementary Table 3). For a further two SNPs (rs10402712 and rs7402982), proxies with r2 > 0.6 in European samples (rs11667352 and rs2017500, respectively) were identified in the current study using the SNP Annotation and Proxy (SNAP) Search tool version 2.2 (http://archive.broadinstitute.org/mpg/snap; [37]). No suitable proxy SNP was identified for four BW-associated lead SNPs (rs61830764, rs138715366, rs72851023, and rs144843919). A fifth SNP (rs11096402) is an X-chromosome variant mapped only in the BW and breast cancer data sets.
In summary, breast cancer data were available for 59 of the 60 BW-associated lead SNPs. When considering these two data sets together with the seven additional “risk factor” data sets, 54 BW loci had results listed across all data sets [26 represented by BW-associated lead SNPs and 28 by correlated (r2 > 0.6) proxy SNPs]. The signs of the beta coefficients for each SNP across all summary data sets were aligned to the BW-increasing allele (except for age at first birth and number of children ever born, where only P-values were available).
Univariable Mendelian randomisation
Of the 54 BW loci represented across all data sets, 13 demonstrated association with at least one of the seven breast cancer risk factors at P < 5 × 10−8 (Supplementary Table 3). The BW-associated lead SNP at the 41 loci that did not show genome-wide significant pleiotropic associations were used to construct the main 41-SNP IV for univariable MR analyses (these SNPs are highlighted in Supplementary Table 2). This was done to ensure that the univariable MR analyses were, as far as possible, evaluating the association between BW and adult breast cancer risk independent of the effects of SNPs that were also associated with other potential later-life breast cancer risk factors such as adult height. MR was first performed using the inverse-variance weighted (IVW) method wherein SNP associations with the outcome (breast cancer) were regressed on SNP associations with the exposure (BW) using a linear model weighted by the inverse of the variance of SNP associations with the outcome [38]. This approach to MR analysis as applied to summary genetic association statistics is equivalent to standard two-stage least squares regression MR using individual-level genetic data [38]. The standard error of the IVW effect was estimated by a multiplicative random-effects model.
The IVW method assumes that the IV as a whole satisfies the MR assumptions or that all SNPs in the IV constitute valid instruments. Because the validity of this assumption is difficult to test empirically, four additional MR methods were also used as sensitivity analyses to assess the robustness of the result from the primary analysis under alternative assumptions: weighted median function [39], weighted mode function [40], MR-Egger regression [41], and penalised robust IVW regression [42]. The weighted median function provides a valid result under the assumption that over 50% of the weight in the IV model comes from SNPs that satisfy the MR assumptions. The weighted mode function is valid if the largest number of SNPs with similar individual Wald ratios [beta(outcome)/beta(exposure)] are valid instruments even if other SNPs in the IV do not meet the requirements for causal inference using MR. The weighted effect in these functions is derived from the inverse of the standard error of the Wald ratio for each SNP. Penalised robust IVW uses robust regression in place of standard linear regression-based IVW, with a penalty on the contribution to the analysis of SNPs with outlying Wald ratios. MR-Egger regression allows for horizontal pleiotropic effects, i.e., the association between SNPs in the IV and other traits that may affect the outcome via pathways independent of the exposure. It assumes that such pleiotropy is not correlated with SNP-exposure associations and under this assumption, the MR-Egger result is valid even if all SNPs in the IV are invalid instruments for MR due to their pleiotropic associations. Further, the MR-Egger regression intercept provides an estimate of the average pleiotropic effect over all SNPs in the IV and if this differs from zero, it indicates the presence of horizontal pleiotropy [41]. The null hypothesis that this intercept did not differ significantly from zero was also tested. The R package MendelianRandomization (v0.2.2) was used for the penalised robust IVW MR analysis and TwoSampleMR for all other MR methods [42, 43].
Pleiotropy was also assessed by calculating Cochran’s Q as a measure of between-instrument heterogeneity using the Wald ratios for each SNP in the 41-SNP IV [44]. To identify specific SNPs in this IV that contributed to observed heterogeneity and were potentially pleiotropic, four steps were taken. First, SNPs were removed iteratively from the 41-SNP IV until Cochran’s Q test was no longer significant at the P < 0.05 level using the stepwise downward “model selection” procedure implemented in the R package gtx (v0.0.8). Second, Cook’s distance (with a threshold of 4/number of SNPs in the IV) was used to identify SNPs in the 41-SNP IV with a disproportionate influence on the primary IVW model [45, 46]. Third, leave-one-out permutation analysis was performed, i.e., each SNP was sequentially removed from the 41-SNP IV and the primary IVW MR analysis repeated. Fourth, studentised residuals (with a threshold of ± 2) were used to identify outlier SNPs in the IVW model [46]. The effect of pleiotropy was further investigated by applying the standard IVW approach to a 24-SNP IV created by removing all BW loci associated with at least one of the seven breast cancer risk factors at P < 9.2 × 10−4 (after Bonferroni correction for examining pleiotropic associations at 54 BW loci). As with the main 41-SNP IV, all 24 SNPs were BW-associated lead SNPs (the SNPs are highlighted in Supplementary Table 2). Finally, we note that we did also test using the IVW method the full 59-SNP IV (i.e., all SNPs listed in Supplementary Table 2), which constituted all known BW-associated lead SNPs with matched breast cancer data available regardless of their pleiotropic associations.
The proportions of variance of BW explained by the 41-SNP and 59-SNP IVs were estimated using the “steiger.R” function (https://rdrr.io/github/MRCIEU/TwoSampleMR/src/R/steiger.R) and a priori power to detect an association at a significance level of 0.05 was calculated using an online tool (https://sb452.shinyapps.io/power) [47]. The strength of the 41-SNP and 59-SNP IVs was also evaluated using the F-statistic [48], with F < 10 considered as an indicator of a ‘weak’ instrument [49]. All statistical tests were 2-sided and P < 0.05 was considered statistically significant unless otherwise specified. Each analysis was repeated for the three outcomes: overall breast cancer risk, ER-positive breast cancer risk, and ER-negative breast cancer risk.
Multivariable Mendelian randomisation
Multivariable MR (MVMR) represents an alternative strategy for conducting an MR analysis using an IV that contains SNPs which in addition to their established association with the exposure of interest are also associated with other known risk factors for the outcome [50, 51]. MVMR allows for the inclusion of such SNPs in the IV by adjusting for their associations with these risk factors. MVMR analysis was performed by regressing SNP associations with breast cancer on SNP associations with BW, birth length, adult height, adult BMI, age at menarche, and age at menopause in a single regression model. This model thus helped estimate the effect of BW on breast cancer independent of the effects of these other factors on breast cancer [50, 51]. Like its univariable counterpart, an inverse-variance weighted linear regression model with multiplicative random effects was used for MVMR. The IV used for MVMR contained 49 SNPs (Supplementary Table 4). As described under “Data harmonisation”, 26 SNPs were BW-associated lead SNPs and 23 were proxy SNPs strongly correlated (r2 > 0.8) with the BW-associated lead SNPs at their respective BW loci (five additional proxy SNPs with r2 > 0.6 but r2 < 0.8 were omitted from the MVMR IV; Supplementary Tables 3 and 4). Since beta coefficients were not available for SNP associations with age at first birth and number of children ever born, these variables were not included in the MVMR model. However, none of the 49 SNPs were associated at P < 5 × 10−8 with these two variables (Supplementary Tables 3 and 4). Multivariable extensions of MR-Egger regression and the MR-Egger intercept test were also applied to this 49-SNP IV [52]. Each analysis was repeated for the three outcomes: overall breast cancer risk, ER-positive breast cancer risk, and ER-negative breast cancer risk.
Statistical power
The 41-SNP IV explained 1.3% of the variance of BW (F-statistic = 46.1) and had over 80% power to detect 11% increase (or decrease) in overall breast cancer risk in the univariable MR analysis (odds ratios (ORs) of 0.89 or 1.11 per 1-standard deviation (SD; 500 g) higher BW). The 59-SNP IV explained 2% of the variance of BW (F-statistic = 49.7) and had over 80% power to detect 9% increase (or decrease) in overall breast cancer risk in the univariable MR analysis (ORs of 0.91 or 1.09 per 1-SD higher BW). For comparison, the largest observational investigation of the association between BW and breast cancer risk [7], a pooled analysis of individual participant data from 32 studies (22,058 breast cancer cases and 604,854 non-cases) identified 6% increase in breast cancer risk or a pooled relative risk per 1-SD (500 g) higher BW of 1.06 [95% confidence interval (CI) 1.02–1.09]. The 41-SNP and 59-SNP genetic instruments had 35% and 50% power, respectively, to detect an OR of 1.06. Power calculations for the corresponding univariable (IVW) MR analyses specific to ER-positive and ER-negative breast cancer risks are provided in Supplementary Table 5. We also confirmed the power of the 41-SNP IV to detect an association between lower BW and increased risk of type 2 diabetes (T2D) in later life using summary genome-wide association meta-analysis statistics from 26,676 T2D cases and 132,532 controls (Supplementary Table 1; IVW OR = 1.93; 95% CI 1.20–3.08; P = 0.006) [53]. This association has been previously identified in two MR studies [54, 55].
Results
Associations between individual SNPs and breast cancer
Of the 59 BW-associated lead SNPs, 18 were associated with overall breast cancer risk at P < 0.05, six after Bonferroni correction for testing 59 SNPs (P < 8 × 10−4), and two at P < 5 × 10−8 (Supplementary Table 2). Of the 18 SNPs associated with overall breast cancer risk at P < 0.05, the direction of association with BW was inverse for 13 SNPs (i.e., the allele that increased BW was protective for breast cancer). Notably, this direction of association is contrary to that identified by observational studies which suggest that higher BW is associated with increased breast cancer risk in later life. The 13 SNPs included both genome-wide significant overall breast cancer risk SNPs, rs2229742 (encoding missense mutation R448G in NRIP1) and rs1101081 (intronic SNP in ESR1). The 18 SNPs represented a six-fold enrichment over the number of associations with breast cancer at P < 0.05 that were expected by chance alone. Associations between individual SNPs and ER-positive and ER-negative breast cancer are also provided in Supplementary Table 2.
Univariable Mendelian randomisation
MR analysis of the 41-SNP IV using the IVW method yielded an OR of 0.86 (95% CI 0.73–1.01; P = 0.06) for overall breast cancer per 1-SD (500 g) higher BW (Fig. 2 a). These estimates were consistent in direction with the results of the weighted median, weighted mode, MR-Egger, and penalised robust IVW sensitivity analyses [Fig. 2a and Supplementary Figure 1(a)]. The MR-Egger intercept test (P = 0.26) suggested absence of strong directional horizontal pleiotropy (i.e., effects of the 41-SNP IV on breast cancer via a pathway that arises proximal to or upstream of BW).
Testing for heterogeneity between SNPs in the IV using Cochran’s Q indicated significant heterogeneity in Wald ratios for individual SNPs in the IV (P = 3.6 × 10−16 for overall breast cancer; P = 6.4 × 10−11 for ER-positive breast cancer; P = 6.5 × 10−5 for ER-negative breast cancer), suggesting that some of the 41 SNPs may exert disproportionately large effects on breast cancer risk some of which might act via pathways other than BW. Leave-one-out permutation [Supplementary Figure 1 (b)], Cook’s distance, and stepwise downward “model selection” consistently identified five SNPs with large effects on breast cancer that had an undue influence on the overall results (Supplementary Table 6). The stepwise downward approach suggested that an additional (sixth) SNP also contributed to the observed heterogeneity, while studentised residuals identified three of the five SNPs as outliers (Supplementary Table 6). Removing the five SNPs and repeating the standard IVW method using the remaining 36 SNPs that represented a more homogeneous genetic instrument for BW yielded an OR of 0.92 (95% CI 0.83–1.02; P = 0.11; Cochran’s Q = 49.57 and its associated P = 0.052). Further reducing the potential impact of pleiotropy at the cost of losing power by using the 24-SNP IV constructed after removal of all SNPs associated at P < 9.2 × 10−4 (instead of P < 5 × 10−8) with at least one of seven putative breast cancer risk factors did not significantly change the primary IVW MR result (Fig. 2a). Conversely, using the full 59-SNP IV to leverage maximum statistical power did not meaningfully alter the primary result either (Fig. 2a). Results for each corresponding analysis specific to ER-positive and ER-negative breast cancer are presented in Fig. 2b, c, Supplementary Figures 1 (c) to (f), and Supplementary Table 6. Across all analyses point estimates of the OR were either null or indicated an inverse relationship between BW and adult breast cancer, again contrary to the observational literature.
Multivariable Mendelian randomisation
MR analysis of the 49-SNP IV for BW simultaneously adjusting for the genetically predicted effects of these SNPs on birth length, adult height, adult BMI, age at menarche, and age at menopause using the weighted regression-based framework provided estimates for the association between BW and overall breast cancer risk that were consistent with the univariable results (OR = 0.82; 95% CI 0.60–1.11; P = 0.20; Fig. 2a). The estimates from the multivariable extension of MR-Egger were similar (Fig. 2a) and the corresponding MR-Egger intercept test was not significant (P = 0.89 for overall breast cancer; P = 0.75 for ER-positive breast cancer; P = 0.99 for ER-negative breast cancer). Results specific to ER-positive and ER-negative breast cancer are presented in Fig. 2b, c.
Discussion
We conducted a two-sample MR study using summary statistics from the largest genome-wide association meta-analyses data sets currently available for BW and for breast cancer susceptibility in adults. Our study provides no evidence to support the association between higher weight at birth and an increased risk of developing breast cancer in later life that was previously reported in observational studies. In fact, we found that higher genetically-predicted BW might, if anything, be associated with reduced breast cancer risk. Our results were robust to the application of different MR approaches, each with its own distinct set of assumptions.
A few observational studies suggest that the association between higher BW and increased future breast cancer risk is confined to premenopausal women but combined analyses of these studies show that this association does not differ by menopausal status [7, 8]. ER-negative breast cancers are more common in premenopausal women and our MR study did not reveal any differences in the (lack of) association between genetically-predicted BW and breast cancer risk by ER-status (Fig. 2b, c). Some of the epidemiological literature also suggests that longer birth length may be a stronger risk factor for adult breast cancer than, and independently of, higher BW [7]. Birth length is highly correlated with BW and harder to measure as a phenotype making its measurement prone to error. Only two SNP-associations (rs905938 and rs724577) with birth length have been identified at genome-wide significance (P < 5 × 10−8) to date [31]. Both SNPs were also associated at P < 5 × 10−8 with BW and at P < 6 × 10−8 with adult height in the data sets that we used (Supplementary Table 3; the birth length-increasing alleles also increased BW and adult height). We did not use MR to assess the independent effect of birth length on adult breast cancer risk due to the availability of just two SNPs both of which are strongly pleiotropic and the consequent potential for unreliable estimates with such a genetic instrument. We could not include age at first birth and number of children ever born in our multivariable MR analyses since the publicly-available summary genetic association statistics for these two phenotypes were restricted to P-values only. However, only two BW SNPs were associated with age at first birth (rs12823128 and rs7742369) and no SNP was associated with number of children ever born at P < 9.2 × 10−4 (after Bonferroni correction for testing associations at 54 SNPs; Supplementary Table 3). We removed these two SNPs in constructing our 24-SNP IV for univariable MR analysis, which yielded estimates consistent with the other univariable and multivariable MR analyses. Given this result and the lack of strong associations between the other birth weight SNPs and age at first birth and number of children ever born, not being able to adjust for these two putative breast cancer risk factors in our multivariable MR is highly unlikely to have substantially altered the overall conclusion from our MR study.
While our point estimates for the effect of BW on breast cancer risk never reached statistical significance at the P < 0.05 level, they did uniformly indicate that the effect (if any) of higher BW on adult breast cancer might be protective (Fig. 2). A possible protective effect of higher BW on adult breast cancer risk would be consistent with MR findings that genetically-predicted adult BMI is associated with reduction in breast cancer risk but, crucially, opposite in direction to estimates from combined analyses of observational studies investigating the BW-breast cancer link [7, 8, 56]. MR analysis has previously identified an association between higher BW and elevated BMI in adulthood and genome-wide genetic correlation analysis carried out as part of the Horikoshi et al. study also supported a positive correlation between BW and adult BMI [29, 55]. Our MR results, however, reflect the effects of the genetic IV for BW on breast cancer independent of the effects of the IV on adult BMI since we removed adult BMI-associated SNPs from the IV before univariable analyses and adjusted for SNP associations with adult BMI as a covariate in multivariable analyses.
There are three aspects of the genome-wide association meta-analysis by Horikoshi et al. [29] from which we obtained the BW lead SNPs to construct the IVs used in our MR study that are worth noting here. First, our genetic IV with maximum power (59-SNP IV) explained only 2% of the variance of BW and therefore we were relatively underpowered to detect an odds ratio of 1.06 (or an even smaller effect) at the 5% significance level. Second, all autosomal SNPs genotyped on the array used by the UK Biobank (which was the largest sub-study in the genome-wide association meta-analysis by Horkioshi et al.) together explained approximately 15% of the variance of BW. Thus, it is likely that factors other than genetics account for the majority of the variance of BW. Third, the Horikoshi et al. study excluded individuals with extremes of BW (for the UK Biobank, which contributed nearly half the European sample, this was defined as < 2500 g or > 4000 g). Thus, the genetic IV we used may not adequately capture common genetic variants that only specifically affect the extremes of BW (i.e., only if such variants exist). This is relevant because in combined analyses of observational studies, statistically significant increases in adult breast cancer risk are only identified when comparing BW ≥ 4000 g as an exposure to BW with 3000–3499 g or < 2500 g as the reference categories [7, 8]. However, it is highly unlikely that the genetic architecture of extreme BW completely differs from the genetic architecture of BW in general and therefore, our genetic IV likely predicts BW to some degree even at the extremes.
It has been suggested that insulin-like growth factor (IGF) signalling is one potential pathway linking increased BW to breast cancer risk in later life [3]. Haematopoietic stem or progenitor cells in neonatal cord blood have been used as a proxy for the size of the fetal mammary stem cell pool since the latter is impossible to measure. The level of haematopoietic stem cells in cord blood is positively correlated with IGF-1 levels in cord blood and with BW [5, 57]. Cord blood IGF-1 concentrations have been found to be higher among Caucasian neonates than Chinese neonates and it has been hypothesised that these differences may be responsible, in part, for the differences in breast cancer risks observed between these two populations [58, 59]. Lead SNPs near IGF1 (rs7964361) and IGF1R (rs7402982) that encode IGF-1 and its receptor, respectively, were associated with birth weight at P < 5 × 10−8 and with adult height in women at P < 6 × 10−4 (Supplementary Tables 2 and 3). However, neither SNP was associated with overall or ER-positive or ER-negative breast cancer risks at P < 0.05 although the alleles that increased BW also conferred breast cancer risk (Supplementary Table 2; IVW MR analysis P = 0.59 for overall breast cancer risk when using just these two SNPs as an IV for BW). Thus, while there is an association between SNPs in the core IGF-1 signalling genes and BW, our data do not provide any evidence that higher BW as predicted by these two IGF pathway SNPs is associated with adult breast cancer risk. While data on the association between BW and breast cancer risk in non-European populations is scarce, it is also worth noting here that a small study from China failed to show any association [23].
In conclusion, the association between higher BW and increased risk of developing breast cancer in adulthood has been identified in several observational studies published over the last three decades. Revisiting this association using a comprehensive set of recently-developed MR methods and the largest available genomic data sets provides no evidence to suggest that genetically-predicted higher BW is associated with increased breast cancer risk in adult life. A key unanswered question is whether the association between higher BW and increased breast cancer risk seen in observational studies is largely due to the BW-increasing effect of non-genetic factors such as maternal hormones and nutrition and we are unable to rule out an association between the non-genetically-determined component of BW and breast cancer.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
This work builds on multiple, publicly available data sets. Data on birth weight and birth length were obtained from the Early Growth Genetics (EGG) Consortium. Data on adult height and adult body mass index were obtained from the Genetic Investigation of ANthropometric Traits (GIANT) Consortium. Data on age at menarche and age at menopause were obtained from the Reproductive Genetics (ReproGen) Consortium. Data on age at first birth and number of children ever born were obtained from the Social Science Genetic Association Consortium (SSGAC). Data on type 2 diabetes were obtained from the DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium. Data on breast cancer were obtained from the Breast Cancer Association Consortium (BCAC).
Funding
Funding was provided by Cancer Research UK (grant number C490/A16561 to Paul D. P. Pharoah).
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Trichopoulos D. Hypothesis: does breast cancer originate in utero? Lancet. 1990;335:939–940. doi: 10.1016/0140-6736(90)91000-Z. [DOI] [PubMed] [Google Scholar]
- 2.Trichopoulos D. Intrauterine environment, mammary gland mass and breast cancer risk. Breast Cancer Res. 2003;5:42–44. doi: 10.1186/bcr555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ginestier C, Wicha MS. Mammary stem cell number as a determinate of breast cancer risk. Breast Cancer Res. 2007;9:109. doi: 10.1186/bcr1741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Trichopoulos D, Lagiou P, Adami H-O. Towards an integrated model for breast cancer etiology: the crucial role of the number of mammary tissue-specific stem cells. Breast Cancer Res. 2005;7:13–17. doi: 10.1186/bcr966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Strohsnitter WC, Savarese TM, Low HP, Chelmow DP, Lagiou P, Lambe M, et al. Correlation of umbilical cord blood haematopoietic stem and progenitor cell levels with birth weight: implications for a prenatal influence on cancer risk. Br J Cancer. 2008;98:660–663. doi: 10.1038/sj.bjc.6604183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Qiu L, Low HP, Chang C-I, Strohsnitter WC, Anderson M, Edmiston K, et al. Novel measurements of mammary stem cells in human umbilical cord blood as prospective predictors of breast cancer susceptibility in later life. Ann Oncol. 2012;23:245–250. doi: 10.1093/annonc/mdr153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Silva I dos S, De Stavola B, McCormack V. Collaborative Group on Pre-Natal Risk Factors and Subsequent Risk of Breast Cancer. Birth size and breast cancer risk: re-analysis of individual participant data from 32 studies. PLoS Med. 2008;5:e193. doi: 10.1371/journal.pmed.0050193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Michels KB, Xue F. Role of birthweight in the etiology of breast cancer. Int J Cancer. 2006;119:2007–2025. doi: 10.1002/ijc.22004. [DOI] [PubMed] [Google Scholar]
- 9.Kaijser M, Akre O, Cnattingius S, Ekbom A. Preterm birth, birth weight, and subsequent risk of female breast cancer. Br J Cancer. 2003;89:1664–1666. doi: 10.1038/sj.bjc.6601357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ahlgren M, Melbye M, Wohlfahrt J, Sørensen TIA. Growth patterns and the risk of breast cancer in women. N Engl J Med. 2004;351:1619–1626. doi: 10.1056/NEJMoa040576. [DOI] [PubMed] [Google Scholar]
- 11.Vatten LJ, Nilsen TIL, Tretli S, Trichopoulos D, Romundstad PR. Size at birth and risk of breast cancer: prospective population-based study. Int J Cancer. 2005;114:461–464. doi: 10.1002/ijc.20726. [DOI] [PubMed] [Google Scholar]
- 12.Michels KB, Trichopoulos D, Robins JM, Rosner BA, Manson JE, Hunter DJ, et al. Birthweight as a risk factor for breast cancer. Lancet. 1996;348:1542–1546. doi: 10.1016/S0140-6736(96)03102-9. [DOI] [PubMed] [Google Scholar]
- 13.Sanderson M, Williams MA, Malone KE, Stanford JL, Emanuel I, White E, et al. Perinatal factors and risk of breast cancer. Epidemiology. 1996;7:34–37. doi: 10.1097/00001648-199601000-00007. [DOI] [PubMed] [Google Scholar]
- 14.Innes K, Byers T, Schymura M. Birth characteristics and subsequent risk for breast cancer in very young women. Am J Epidemiol. 2000;152:1121–1128. doi: 10.1093/aje/152.12.1121. [DOI] [PubMed] [Google Scholar]
- 15.Mellemkjaer L, Olsen ML, Sørensen HT, Thulstrup AM, Olsen J, Olsen JH. Birth weight and risk of early-onset breast cancer (Denmark) Cancer Causes Control CCC. 2003;14:61–64. doi: 10.1023/A:1022570305704. [DOI] [PubMed] [Google Scholar]
- 16.Kaijser M, Lichtenstein P, Granath F, Erlandsson G, Cnattingius S, Ekbom A. In utero exposures and breast cancer: a study of opposite-sexed twins. J Natl Cancer Inst. 2001;93:60–62. doi: 10.1093/jnci/93.1.60. [DOI] [PubMed] [Google Scholar]
- 17.Michels KB, Xue F, Terry KL, Willett WC. Longitudinal study of birthweight and the incidence of breast cancer in adulthood. Carcinogenesis. 2006;27:2464–2468. doi: 10.1093/carcin/bgl105. [DOI] [PubMed] [Google Scholar]
- 18.Vatten LJ, Maehle BO, Lund Nilsen TI, Tretli S, Hsieh CC, Trichopoulos D, et al. Birth weight as a predictor of breast cancer: a case–control study in Norway. Br J Cancer. 2002;86:89–91. doi: 10.1038/sj.bjc.6600011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.McCormack VA, dos Santos Silva I, Koupil I, Leon DA, Lithell HO. Birth characteristics and adult cancer incidence: Swedish cohort of over 11,000 men and women. Int J Cancer. 2005;115:611–617. doi: 10.1002/ijc.20915. [DOI] [PubMed] [Google Scholar]
- 20.dos Santos Silva I, De Stavola BL, Hardy RJ, Kuh DJ, McCormack VA, Wadsworth MEJ. Is the association of birth weight with premenopausal breast cancer risk mediated through childhood growth? Br J Cancer. 2004;91:519–524. doi: 10.1038/sj.bjc.6601972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ekbom A, Hsieh CC, Lipworth L, Adami HQ, Trichopoulos D. Intrauterine environment and breast cancer risk in women: a population-based study. J Natl Cancer Inst. 1997;89:71–76. doi: 10.1093/jnci/89.1.71. [DOI] [PubMed] [Google Scholar]
- 22.Sanderson M, Williams MA, Daling JR, Holt VL, Malone KE, Self SG, et al. Maternal factors and breast cancer risk among young women. Paediatr Perinat Epidemiol. 1998;12:397–407. doi: 10.1046/j.1365-3016.1998.00133.x. [DOI] [PubMed] [Google Scholar]
- 23.Sanderson M, Shu XO, Jin F, Dai Q, Ruan Z, Gao Y-T, et al. Weight at birth and adolescence and premenopausal breast cancer risk in a low-risk population. Br J Cancer. 2002;86:84–88. doi: 10.1038/sj.bjc.6600009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Troisi R, Hatch EE, Titus-Ernstoff L, Palmer JR, Hyer M, Strohsnitter WC, et al. Birth weight and breast cancer risk. Br J Cancer. 2006;94:1734–1737. doi: 10.1038/sj.bjc.6603122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Andersen ZJ, Baker JL, Bihrmann K, Vejborg I, Sørensen TIA, Lynge E. Birth weight, childhood body mass index, and height in relation to mammographic density and breast cancer: a register-based cohort study. Breast Cancer Res. 2014;16:R4. doi: 10.1186/bcr3596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Spracklen CN, Wallace RB, Sealy-Jefferson S, Robinson JG, Freudenheim JL, Wellons MF, et al. Birth weight and subsequent risk of cancer. Cancer Epidemiol. 2014;38:538–543. doi: 10.1016/j.canep.2014.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Titus-Ernstoff L, Egan KM, Newcomb PA, Ding J, Trentham-Dietz A, Greenberg ER, et al. Early life factors in relation to breast cancer risk in postmenopausal women. Cancer Epidemiol Biomarkers Prev. 2002;11:207–210. [PubMed] [Google Scholar]
- 28.Hodgson ME, Newman B, Millikan RC. Birthweight, parental age, birth order and breast cancer risk in African-American and white women: a population-based case-control study. Breast Cancer Res. 2004;6:R656–R667. doi: 10.1186/bcr931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Horikoshi M, Beaumont RN, Day FR, Warrington NM, Kooijman MN, Fernandez-Tajes J, et al. Genome-wide associations for birth weight and correlations with adult disease. Nature. 2016;538:248–252. doi: 10.1038/nature19806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Michailidou K, Lindström S, Dennis J, Beesley J, Hui S, Kar S, et al. Association analysis identifies 65 new breast cancer risk loci. Nature. 2017;551:92–94. doi: 10.1038/nature24284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.van der Valk RJP, Kreiner-Møller E, Kooijman MN, Guxens M, Stergiakouli E, Sääf A, et al. A novel common variant in DCST2 is associated with length in early life and height in adulthood. Hum Mol Genet. 2015;24:1155–1168. doi: 10.1093/hmg/ddu510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014;46:1173–1186. doi: 10.1038/ng.3097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. doi: 10.1038/nature14177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Perry JR, Day F, Elks CE, Sulem P, Thompson DJ, Ferreira T, et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature. 2014;514:92–97. doi: 10.1038/nature13545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Day FR, Ruth KS, Thompson DJ, Lunetta KL, Pervjakova N, Chasman DI, et al. Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair. Nat Genet. 2015;47:1294–1303. doi: 10.1038/ng.3412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Barban N, Jansen R, de Vlaming R, Vaez A, Mandemakers JJ, Tropf FC, et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat Genet. 2016;48:1462–1472. doi: 10.1038/ng.3698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O’Donnell CJ, de Bakker PIW. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008;24:2938–2939. doi: 10.1093/bioinformatics/btn564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 2013;37:658–665. doi: 10.1002/gepi.21758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol. 2016;40:304–314. doi: 10.1002/gepi.21965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Hartwig FP, Davey Smith G, Bowden J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int J Epidemiol. 2017;46:1985–1998. doi: 10.1093/ije/dyx102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015;44:512–525. doi: 10.1093/ije/dyv080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Yavorska OO, Burgess S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. Int J Epidemiol. 2017;46:1734–1739. doi: 10.1093/ije/dyx034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Hemani G, Zheng J, Wade KH, Laurin C, Elsworth B, Burgess S, et al. MR-Base: a platform for systematic causal inference across the phenome using billions of genetic associations. 2016 [cited 2018 Jan 30]. http://biorxiv.org/lookup/doi/10.1101/078972.
- 44.Greco MFD, Minelli C, Sheehan NA, Thompson JR. Detecting pleiotropy in Mendelian randomisation studies with summary data and a continuous outcome. Stat Med. 2015;34:2926–2940. doi: 10.1002/sim.6522. [DOI] [PubMed] [Google Scholar]
- 45.Cook RD. Detection of influential observation in linear regression. Technometrics. 1977;19:15. [Google Scholar]
- 46.Corbin LJ, Richmond RC, Wade KH, Burgess S, Bowden J, Smith GD, et al. BMI as a modifiable risk factor for type 2 diabetes: refining and understanding causal estimates using Mendelian randomization. Diabetes. 2016;65:3002–3007. doi: 10.2337/db16-0418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Burgess S. Sample size and power calculations in Mendelian randomization with a single instrumental variable and a binary outcome. Int J Epidemiol. 2014;43:922–929. doi: 10.1093/ije/dyu005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Palmer TM, Lawlor DA, Harbord RM, Sheehan NA, Tobias JH, Timpson NJ, et al. Using multiple genetic variants as instrumental variables for modifiable risk factors. Stat Methods Med Res. 2012;21:223–242. doi: 10.1177/0962280210394459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Swerdlow DI, Kuchenbaecker KB, Shah S, Sofat R, Holmes MV, White J, et al. Selecting instruments for Mendelian randomization in the wake of genome-wide association studies. Int J Epidemiol. 2016;45:1600–1616. doi: 10.1093/ije/dyw088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Burgess S, Thompson SG. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am J Epidemiol. 2015;181:251–260. doi: 10.1093/aje/kwu283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Burgess S, Dudbridge F, Thompson SG. Re: “Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects”. Am J Epidemiol. 2015;181:290–291. doi: 10.1093/aje/kwv017. [DOI] [PubMed] [Google Scholar]
- 52.Rees JMB, Wood AM, Burgess S. Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy. Stat Med. 2017;36:4705–4718. doi: 10.1002/sim.7492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Scott RA, Scott LJ, Mägi R, Marullo L, Gaulton KJ, Kaakinen M, et al. An expanded genome-wide association study of type 2 diabetes in Europeans. Diabetes. 2017;66:2888–2902. doi: 10.2337/db16-1253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Wang T, Huang T, Li Y, Zheng Y, Manson JE, Hu FB, et al. Low birthweight and risk of type 2 diabetes: a Mendelian randomisation study. Diabetologia. 2016;59:1920–1927. doi: 10.1007/s00125-016-4019-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Zanetti D, Tikkanen E, Gustafsson S, Priest JR, Burgess S, Ingelsson E. Birthweight, type 2 diabetes mellitus, and cardiovascular disease: addressing the Barker hypothesis With Mendelian randomization. Circ Genomic Precis Med. 2018;11:e002054. doi: 10.1161/CIRCGEN.117.002054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Guo Y, Warren Andersen S, Shu X-O, Michailidou K, Bolla MK, Wang Q, et al. Genetically predicted body mass index and breast cancer risk: Mendelian randomization analyses of data from 145,000 women of European descent. PLoS Med. 2016;13:e1002105. doi: 10.1371/journal.pmed.1002105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Savarese TM, Strohsnitter WC, Low HP, Liu Q, Baik I, Okulicz W, et al. Correlation of umbilical cord blood hormones and growth factors with stem cell potential: implications for the prenatal origin of breast cancer hypothesis. Breast Cancer Res. 2007;9:R29. doi: 10.1186/bcr1674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Lagiou P, Hsieh CC, Lipworth L, Samoli E, Okulicz W, Troisi R, et al. Insulin-like growth factor levels in cord blood, birth weight and breast cancer risk. Br J Cancer. 2009;100:1794–1798. doi: 10.1038/sj.bjc.6605074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Lagiou P, Samoli E, Hsieh C-C, Lagiou A, Xu B, Yu G-P, et al. Maternal and cord blood hormones in relation to birth size. Eur J Epidemiol. 2014;29:343–351. doi: 10.1007/s10654-014-9914-3. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.