Abstract
Reproductive longevity is critical for fertility and impacts healthy ageing in women1,2, yet insights into the underlying biological mechanisms and treatments to preserve it are limited. Here, we identify 290 genetic determinants of ovarian ageing, assessed using normal variation in age at natural menopause (ANM) in ~200,000 women of European ancestry. These common alleles were associated with clinical extremes of ANM; women in the top 1% of genetic susceptibility have an equivalent risk of premature ovarian insufficiency to those carrying monogenic FMR1 premutations3. Identified loci implicate a broad range of DNA damage response (DDR) processes and include loss-of-function variants in key DDR genes. Integration with experimental models demonstrates that these DDR processes act across the life-course to shape the ovarian reserve and its rate of depletion. Furthermore, we demonstrate that experimental manipulation of DDR pathways highlighted by human genetics increase fertility and extend reproductive life in mice. Causal inference analyses using the identified genetic variants indicates that extending reproductive life in women improves bone health and reduces risk of type 2 diabetes, but increases risks of hormone-sensitive cancers. These findings provide insight into the mechanisms governing ovarian ageing, when they act across the life-course, and how they might be targeted by therapeutic approaches to extend fertility and prevent disease.
Introduction
Over the last 150 years life expectancy has increased from 45 to 85 years4, but the timing of reproductive senescence (age at natural menopause (ANM)) has remained relatively constant (50-52 years)5. The genetic integrity of oocytes decreases with advancing age6 and natural fertility ceases ~10 years before menopause1. More women are choosing to delay childbearing to older ages, resulting in increased use of assisted conception techniques7,8. Oocyte and ovarian tissue preservation can prolong fertility but is invasive and there is only a ~6.5% chance of achieving pregnancy with each mature oocyte thawed, which decreases with age9.
ANM is determined by the non-renewable ovarian reserve, which is established during fetal development and continuously depleted until reproductive senescence (Extended Data Fig. 1). DNA damage response (DDR) is the primary biological pathway that regulates reproductive senescence, highlighted by genome-wide association studies (GWAS)10, rare single gene disorders that cause Premature Ovarian Insufficiency (POI)11 and animal models12. Better understanding of how and when molecular processes influence the establishment and decline of ovarian reserve will inform future therapeutic strategies for infertility treatment and fertility preservation. To address this, our current study increases the number of ANM-associated genetic loci five-fold13 from 56 to 290. We integrate these data with experiments in mice to characterize the specific DDR processes that contribute to reproductive ageing, providing insights into when they act across the life-course, how they might be modified to preserve fertility and the potential consequences for broader health.
Results
Genome-wide array data, imputed to ~13.1 million genetic variants with minor allele frequency ≥0.1%, were available in 201,323 women of European ancestry (Extended Data Fig. 2, Supplementary Table 1). We identified 290 statistically independent signals associated with ANM (P<5x10-8), including six on the X-chromosome which was previously untested in large-scale studies (Figure 1, Supplementary Table 2). Effect estimates for the 290 signals were consistent between linear and Cox proportional hazard models and across strata of the metaanalysis (Extended Data Fig. 3). There was no evidence of test statistic inflation due to population structure (LD score intercept=1.02, s.e. 0.03). All previously reported signals13 retained genome-wide significance (Figure 1).
Additive, per-allele effect sizes for the 290 signals ranged from ~3.5 weeks to ~74 weeks (Figure 1, Extended Data Fig. 2 and Supplementary Table 2). Three of these variants exhibited non-additive effects (Extended Data Fig. 4 a-d, Supplementary Table 3 and Supplementary Results). We sought to replicate our 290 signals using independent samples from 23andMe, Inc (N=294,828 women). We observed high concordance in effect estimates between the datasets (Supplementary Table 2 and Extended Data Fig. 3 g), with nearly all variants at least nominally associated with ANM in 23andMe. Eight variants fell below genome-wide significance in a meta-analysis of our discovery with 23andMe (Pmax=2.6x10-5), half the number of expected false-positive associations (290*0.05=14.5). We next evaluated these loci in 78,317 women of East Asian ancestry. There was broad replication, consistent with previous observations14, but substantial heterogeneity of effect sizes and allele frequencies (Supplementary Table 2). This was exemplified at the ENTPD1 locus, where one signal had an effect size ~3 times larger in East Asians (rs1889921), whilst a second independent signal ~20kb away had an effect estimate half the size in East Asians (rs7087644).
Using additional independent samples from the deCODE study (N=16,556 women), we estimated our identified signals cumulatively explained 10.1% of the variance in ANM. This compared to an estimate of 12.3% in UK Biobank (UKBB) using weights for the 290 variants derived from our non-UKBB samples (Supplementary Table 2). The identified signals therefore account for 31-38% of the overall genotype-array estimated heritability in UKBB (h2 g=32.4%, s.e. 0.8%), compared to 15.7-19.8% for the 56 previously reported signals (Extended Data Fig. 4 e).
Common variants act on extremes of ANM
It is unclear where in the population distribution of ANM the influence of common genetic variants begins and ends. Our GWAS was restricted to the 99% of women with ANM between 40-60 years. ANM before 40 years (POI) is considered a Mendelian disorder, but may have a polygenic component. To test which parts of the ANM distribution are influenced by common genetic variation, we calculated a polygenic score (PGS) in 108,840 women in UKBB with the full range of ANM using genetic weights derived from the independent non-UKBB component of the meta-analysis (Supplementary Table 2). This was coded such that a higher PGS indicates increased susceptibility to later ANM. ANM from 34 to 61 years had a significant polygenic influence (Figure 2 a). For example, women with ANM at 34 years had an average -0.5 SD (95% CI 0.26-0.69, P=1.5x10-5) lower PGS than the population mean. We had limited sample size to test outside of these age ranges, however there was some evidence for a depletion of a polygenic influence at ages younger than 34 years (Figure 2 a). These data suggest that common genetic variants act on clinically relevant extremes of ANM, although it remains unclear what fraction of POI cases may be polygenic vs monogenic.
Secondly, we evaluated the predictive ability of the PGS. Genetic risk alone proved to be a weak predictor (ROC-AUC 0.65 and 0.64 for early menopause (age <45 years) and POI respectively) (Figure 2 b and c), however the PGS performed significantly better than smoking status which is the most robust epidemiologically associated risk factor (ROC-AUC 0.58). Adding smoking status to the PGS did not appreciably improve prediction of early menopause (ROC-AUC 0.66). Despite low overall discriminative ability, the PGS was able to identify individuals at high risk of POI (Figure 2 c). Women at the top 1% of the PGS (rescaled such that high PGS indicates increased susceptibility to earlier menopause) had equivalent POI risk (PGS OR 4.71 [3.15-7.04] vs 50th centile, P=4.4x10-14) to that reported for women with FMR1 premutations, the leading tested monogenic cause of POI (OR~5)3. It is however notable that the top 1% of genetic risk is more prevalent than the FMR1 premutation carrier rate (1:250).
Functional genes and pathways implicated
We used a combination of in silico fine-mapping and expression quantitative trait (eQTL) data to identify putatively functional genes implicated by our genetic association signals (Supplementary Table 2). Firstly, 81 of the 290 independent ANM signals were highly correlated (minimum r2=0.8) with one or more variants predicted deleterious for gene function, implicating 91 genes (Supplementary Table 4). Twelve of these genes harboured predicted loss-of-function variants and seven genes (MCM8, EXO1, HELB, C1orf112, C19orf57, FANCM and FANCA) contained multiple statistically independent predicted-deleterious variants (Supplementary Table 4). We extended this analysis using exome sequence data from 45,351 women in UKBB. Loss-of-function variants near two highlighted genes were associated with ANM (Supplementary Table 5). In aggregate, women carrying loss-of-function variants in BRCA2 (N=143) and CHEK2 (N=68) reported ANM 1.54 years earlier (95%CI 0.73-2.34, P=6.8x10-5) and 3.49 years later (95%CI 2.36-4.63, P=1x10-13) respectively. BRCA1 loss-of-function was the next most significantly associated GWAS-highlighted gene in these analyses (N=32 LOF carriers, 2.63 years earlier ANM, 95%CI 1.00-4.26, p=1.1x10-4). Homozygous loss of function variants in BRCA2 were recently described as a rare cause of POI15, but we did not identify any such homozygotes for either BRCA2, CHEK2 or BRCA1. Notably, identified GWAS signals mapped within 300kb of 20/74 genes that when disrupted cause primary amenorrhea and/or POI (Supplementary Table 6), highlighting the common biological processes shared between normal variation in reproductive ageing and clinical extremes.
Next, we integrated publicly available gene expression data across 44 tissue types with our GWAS results (Supplementary Table 5). This highlighted expression-linked genes at 116 of the 290 loci (Supplementary Tables 2 and 5). Using three computational approaches we observed enrichment in hematopoietic stem cells and their progenitors (Supplementary Tables 7–12). Biological pathway enrichment analyses using a range of approaches, highlighted the importance of DDR processes as the key regulator of ANM (Supplementary Tables 13–16). We hypothesise that the shared expression profile in both haematopoietic stem cells and oocytes reflects the relative importance of DDR in both cell types16. In contrast to puberty timing17, which represents the beginning of reproductive life, we observed no enrichment of hypothalamic and pituitary expressed genes, but enrichment of genes expressed in the ovary and other reproductive tissues (Supplementary Table 9).
Finally, we attempted to leverage data from multi-tissue co-expression networks to identify genes which sit in the centre of these networks and interact with many other genes near ANM-associated variants. Such genes are analogous to the “core” genes proposed in the omnigenic model of genetic architecture18. This approach identified 250 genes, 47 of which were within 300kb of one of the identified 290 loci (Supplementary Tables 17 and 18). A notable example is MCM8, implicated directly by two missense variants and co-expressed with many genes highlighted by our GWAS (Extended Data Fig. 5).
ANM genes act across the life-course
Previous analyses highlighted the involvement of DNA repair in the regulation of ovarian ageing. This study supports a much broader DDR involvement as well as metabolic signaling networks such as PI3K19 with increased resolution of these pathways and when in the life-course they might act (Extended Data Fig. 1, Supplementary Results). We identify DDR pathways associated with replication stress, Fanconi Anemia pathway, DNA-protein crosslink repair, R loops (Extended Data Fig. 6), meiotic recombination and 58 genes implicated in regulation of apoptosis (Supplementary Table 19) providing evidence that variation in cell death following DDR is an important mechanism for ANM. This includes components and interactors of the central, conserved DDR checkpoint kinases ATR-CHEK1 (single stranded DNA) and ATM-CHEK2 (double strand breaks) that integrate and determine repair and cellular response from a broad variety of DNA repair pathways (Extended Data Fig. 6). The expression patterns across developmental stages in human follicles further supports distinct activities across fetal and follicular stages (Extended Data Fig. 7, Supplementary Table 20), including TP63, which was predominantly expressed during follicular stages, consistent with apoptotic inducing activity in response to DNA damage observed in growing oocytes in mouse20–23. These observations are consistent with the DDR regulating both the establishment of the ovarian reserve during fetal life and its depletion until ANM.
In utero effects and maternal diet
Previous work in mice demonstrated that a maternal obesogenic diet during pregnancy decreased ovarian reserve in offspring24. We extend this observation by demonstrating that two of our highlighted genes (Dmc1 and Brsk1) are differentially expressed in the offspring ovary due to maternal obesity (Supplementary Table 5, Extended Data Fig. 8). Dmc1 is a meiosis-specific DNA recombinase that assembles at the site of DSBs and is essential for meiotic recombination and gamete formation25. Expression levels of Brsk1 were decreased in ovarian tissue of the offspring of obese mice, an effect which appeared to be enhanced further when the offspring were additionally exposed to an obesogenic diet from weaning (Extended Data Fig. 8). Brsk1 acts as a DNA damage sensor and targets Wee1 and Mapt1 for phosphorylation, both of which were also up-regulated in our model. Wee1 is highly expressed in fetal germ cells, inhibits mitosis and is specifically down-regulated late in oogenesis26. The mechanisms linking maternal diet-induced altered expression of these genes to reduced ovarian reserve in the offspring remain unclear. However, our findings, in addition to observations that low birthweight is associated with menopause27, support the hypothesis that DDR mechanisms acting in utero to influence reproductive lifespan may be modifiable by maternal exposures.
Extending reproductive life in animals
Our GWAS highlighted loss of function alleles in CHEK2 associated with later ANM. Whilst previous work has shown genetic manipulation of DDR genes in animal models limits reproductive lifespan, it remains to be tested whether it can also extend it. CHEK2 plays a crucial role in culling oocytes in mouse mutants defective in meiotic recombination or after artificial induction of double-strand breaks22,28,29. In young females, Chek2 inactivation can partially rescue oocyte loss and in some mutants, fertility, with high levels of non-physiologically induced endogenous and exogenous DNA damage23,28,30,31. To better understand the function of the checkpoint kinase pathways in physiological reproductive ageing, we used genetically modified Chek1 and Chek2 mice (Figure 3, Extended Data Fig. 9-11). Follicular atresia was reduced in Chek2-/- females around reproductive senescence (13.5 months). This occurred without a concomitant increase in the ovarian reserve in young mice (1.5 months) (Figure 3 a, Extended Data Fig. 9 a-e)28. The aged Chek2-/- females showed elevated anti-Müllerian hormone levels (Extended Data Fig. 9 f) and an increased follicular response to gonadotrophin stimulation (Figure 3 c, Extended Data Fig. 9 g) consistent with a larger ovarian reserve at 13.5 months. Fertilization, blastocyst formation and litter sizes in naturally-mated aged Chek2-/- females were similar to littermate controls (Extended Data Fig. 9 h-j), suggesting that the endogenous damage that Chek2 responds to does not compromise the health of offspring or mothers in later reproductive life (Extended Data Fig. 9 j, k). Thus, depletion of the ovarian reserve is slowed in Chek2-/- females, resulting in improved ovarian function around the time of reproductive senescence and suggests a potential therapeutic target for enhancing IVF stimulation through short-term apoptotic inhibition.
In contrast to Chek2-/-, Chek1-/- mice are embryonic lethal due to its essential function when DNA replication is perturbed as well as during mitosis32. We found that two different maternal, germline-specific conditional knockouts of Chek1 (Chek1 cko), one of which also leads to defects in prospermatogonia in males33, results in infertility in females due to failure during preimplantation embryo development (Extended Data Fig. 10). Chek1 is required for prophase I arrest and functions in G2/M checkpoint regulation in murine oocytes23,34 and its activator, ATR, is important for meiotic recombination as well as follicle formation35,36. An extra copy, ie. three alleles of murine Chek1 (SuperChek1 or sChek1), is reported to partially rescue lifespan in ATRSeckel mice, suggesting that CHEK1 becomes rate-limiting when cells are under replication stress37. We found that sChek1 on its own increased the ovarian reserve from birth as well as later in life (Figure 3 b, Extended Data Fig. 11 b-f). Large antral follicle counts were also elevated in the aged sChek1 females, compared to litter-mate controls, indicating that follicular activity was also increased. Immediately prior to the typical age at reproductive senescence, sChek1 females ovulated an increased number of mature MII oocytes (11-13 months) (Figure 3 c, Extended Data Fig. 11 g). These exhibited increased mRNA expression of Chek1 (Extended Data Fig. 11 a) and had similar capacity for forming blastocyst embryos as wild type (Extended Data Fig. 11 i, j). When transferred, these embryos gave rise to healthy, fertile pups over two generations (Extended Data Fig. 11 k-n). Thus, sChek1 causes a larger ovarian reserve to be established at birth and the oocytes appear to maintain their genomic integrity, as confirmed by aneuploidy analysis and efficiency of embryogenesis and fertility of pups (Extended Data Fig. 11 g-n), resulting in enhanced follicular activity and delayed reproductive senescence. We speculate that this is due to upregulation of replication-associated DNA repair processes during mitosis and meiosis and that repair might be limiting for establishing and maintaining the ovarian reserve. Taken together, our data show that modulating key DDR genes can extend reproductive lifespan in vivo, generating healthy pups that are fertile over several generations. This can occur either by abolishing DDR checkpoints (Chek2 deletion) or by upregulating repair processes (sChek1).
Health consequences of later ANM
We used our identified genetic variants to infer causal relationships, using a Mendelian Randomization (MR) framework, between ANM and several health outcomes (Supplementary Tables 21–23). Consistent with previous studies2,13, each 1-year genetically-mediated later ANM increased the relative risks of several hormone-sensitive cancers by up to 5% (Supplementary Table 21). In contrast, we observed beneficial effects of genetically-mediated later ANM on bone mineral density, fracture risk and type 2 diabetes. Our findings are consistent with evidence from randomised controlled trials that oestrogen therapy maintains bone health and protects from type 2 diabetes38,39. Furthermore, recent MR studies demonstrate causal associations between sex hormone levels and type 2 diabetes40. Trial data in younger women taking HRT suggested no increased risk of cardiovascular disease, stroke or all-cause mortality39. In agreement with this we found no evidence to support causal associations for ANM with cardiovascular disease, lipid levels, Alzheimer’s disease, body mass or longevity (Supplementary Table 21), all of which have been reported in observational studies41–47. Finally, we evaluated putative modifiable determinants of ANM reported by observational studies27. We found that genetically instrumented increased alcohol consumption and tobacco smoking were associated with earlier ANM (Supplementary Tables 24 and 25). Each additional cigarette smoked per day decreased ANM by ~2.5 weeks, whilst women who drank alcohol at the maximum recommended limit experienced ~1 year earlier menopause compared to those who drank little. Furthermore, genetically instrumented age at menarche was associated with ~8 weeks earlier ANM per-year earlier menarche.
Collectively our analyses have provided novel insights into the biological processes underpinning reproductive ageing in women, how they can be manipulated to extend reproductive life, and what the consequence of this might be at a population level. We anticipate these findings will greatly inform experimental studies seeking to identify new therapies for enhancement of reproductive function and fertility preservation in women.
Online Methods
Information on ethical regulations and approvals for all animal experiments are detailed in the corresponding sections below. Within each of the human population studies included in the genome-wide analyses (all of which have been previously published), each participant provided informed consent and the study protocol was approved by the institutional review board at the parent institution.
Phenotype definition
We included women with age at natural menopause (ANM) from age 40 to 60 inclusive. ANM was derived from self-reported questionnaire data by each study (Supplementary Table 1) and was the age at last naturally occurring menstrual period followed by at least 12 consecutive months of amenorrhea. Exclusions were women with menopause caused by hysterectomy, bilateral ovariectomy, radiation or chemotherapy, and those using HRT before menopause. Within each of the studies, each participant provided written informed consent and the study protocol was approved by the institutional review board at the parent institution.
Genome-wide association study meta-analysis
A genome-wide meta-analysis of autosomal and chromosome X variants in women of European ancestry was carried out on summary statistics from analyses in three strata, allowing for the identification of heterogeneity due to different methodology. The three strata were (Extended Data Fig. 2): (i) meta-analysis of 1000 Genomes imputed studies; (ii) meta-analysis of samples from the Breast Cancer Association Consortium (BCAC: http://bcac.ccge.medschl.cam.ac.uk); (iii) UK Biobank GWAS. The overall meta-analysis included variants present in at least two of the three strata. All meta-analyses were inverse-variance weighted without GC correction and were carried out in METAL (https://genome.sph.umich.edu/wiki/METALDocumentation). Analysis was conducted by analysts and two geographically distinct sites independently and the resulting summary statistics were compared for consistency.
The meta-analysis of 1000 Genomes imputed studies included 40 datasets imputed to 1000 Genomes Phase I version 3 for the autosomes and 29 for chromosome X (Supplementary Table 1, Supplementary Notes). Each individual study applied quality control to directly genotyped variants and samples prior to imputation (suggested exclusion thresholds for variants were Hardy-Weinberg equilibrium P<1×10-5, call rate <95% and minor allele frequency (MAF) <1%; suggested exclusions for samples were >5% missing genotypes, population outliers, high inbreeding coefficient, heterozygosity outliers, sex mismatches and related samples). Each individual study carried out GWAS using a two-tailed additive linear regression model adjusted for genetic principal components/relationship matrix depending on the software used (Supplementary Table 1), without GC correction. Since all samples included were female, chromosome X was analysed as for the autosomes. Once data were submitted, each study underwent quality control centrally according to standard protocols implemented independently by two analysts. Summary statistics for each study were stored centrally. Prior to meta-analysis, genetic variants ids were converted to “chr:position” format (position in build 37) and alleles for insertion/deletion polymorphisms were coded as “I/D” to ensure consistency across studies. Meta-analysis was carried out including SNPs with imputation quality≥0.4 and MAF≥0.001. Variants in at least half of datasets for either the autosomes or for chromosome X (as appropriate) were taken forward to the overall meta-analysis, resulting in ~10.9 million variants.
GWAS summary statistics for the BCAC data were provided as four datasets, containing breast cancer cases and controls, with each genotyped on the iCOGs and OncoArray genotyping arrays (Supplementary Table 1). Quality control was applied to directly genotyped variants prior to imputation and data were imputed to the HRC r1.1 (2016) reference panel. Association analysis and quality control was carried out centrally as for the 1000 Genomes imputed studies. Summary statistics from the four BCAC datasets were meta-analysed, including variants with imputation quality≥0.4 and MAF≥0.001. Variants in two or more of the four datasets were taken forward to the overall meta-analysis, resulting in ~14.5 million variants.
UK Biobank genotyped 488,377 participants on two arrays, 49,950 on the UK BiLEVE Axiom array (807,411 markers) and 438,427 on the UK Biobank Axiom array (825,927 markers), which were then imputed using a combined 1000 Genomes Phase 3 and HRC reference panel. Details of central genotyping, quality control and imputation are described elsewhere48. We included 451,454 individuals identified as European in our analysis. Briefly, principal components analyses were used to cluster individuals of White European descent (described more fully elsewhere49). We further removed participants who had subsequently withdrawn from the study (n=7) and those where their self-reported sex did not match their genetic sex (n=348) resulting in 451,099 individuals. GWAS was carried out in 106,048 women with ANM by applying a linear mixed model in BOLT-LMM50 to adjust for population structure and relatedness, also adjusting for study centre and data release. Summary statistics taken forward to the overall meta-analyses were for ~16.6 million variants with imputation quality ≥0.5 and MAF≥0.001. UK Biobank data were analysed by two analysts independently and summary statistics results were compared for consistency.
Genome-wide significance was set at P<5x10-8. Statistical independence was determined using a combination of two approaches. Firstly, we used distance-based clumping to select the most significantly associated SNP within a 1Mb window. Secondly, we augmented this list with secondary signals within these 1Mb windows that were identified through approximate conditional analysis implemented in GCTA51. We only considered secondary signals that were uncorrelated with other selected signals (r2<0.05) and genome-wide significant in both univariate and joint models. 10,000 ancestry matched samples from UK Biobank were used in GCTA as an LD reference panel.
Assessing the impact of time to event models on the signals identified
We performed Cox proportional hazards regression for the 290 genome-wide significant ANM signals, allowing inclusion in our analyses of women excluded from the definition of natural menopause. We used UK Biobank imputed genotype data and performed analyses in 379,768 unrelated individuals of European descent (as described previously), of whom 185,293 were included in our Cox analyses (phenotype definition as described previously27). Briefly, Cox proportional hazards regression was run using stset and stcox (Breslow method for ties) in Stata v16.0 using age as the time variable, starting at birth (0 years) and ending at last age at risk of natural menopause. Natural menopause was set as the event, with individuals censored at bilateral oophorectomy and/or hysterectomy, or start of HRT use (if ongoing at time of menopause, hysterectomy or oophorectomy). We included the covariates genotyping chip and release of genotype data, recruitment centre and the first five genetic principal components, which were considered to be constant throughout the time at risk. We calculated -1 × natural log(hazard ratio) to allow comparison with effect estimates from linear regression from the full meta-analysis and meta-analysis excluding UK Biobank.
Confirmation of identified signals and variance explained estimates
We sought to confirm our findings by testing the 290 identified loci in an independent sample of 294,828 women from 23andMe. Participants provided informed consent and participated in the research online, under a protocol approved by the external AAHRPP-accredited IRB, Ethical & Independent Review Services (E&I Review). Participants were included in the analysis on the basis of consent status as checked at the time data analyses were initiated.The variant-level data for the 23andMe replication dataset are fully disclosed in the manuscript. Individual-level data are not publicly available due participant confidentiality, and in accordance with the IRB-approved protocol under which the study was conducted. Women’s age at menopause was ascertained across multiple surveys using two questions: "About how old were you when you had your last menstrual period? (under_30/30_34/45_49/40_44/55+/50_54/35_39/declined/not_sure)" and "How old were you when you had your last menstrual period?". As menopause age was ascertained in 4-year bins we rescaled the effect estimates appropriately to be on the same 1-year scale as our discovery analyses. Analyses were performed using a linear model (gaussian family), controlling for age (in years), the top 5 genetic principal components and genotyping platform.
To assess the relevance of these loci in women of East Asian ancestry, we meta-analysed data (total N=78,317 women) from the China Kadoorie Biobank study and Biobank Japan (BBJ). A total of 47,140 female participants in BBJ whose age at menopause was available were included in the current study. If different ages at menopause were reported in multiple visits, we took mean of ages at menopause. We excluded individuals 1) with maximum difference more than five years in the reported ages at menopause on multiple visits; 2) whose age at recruitment was younger than reported age at menopause; 3) whose age at menopause was younger than 40 or older than 60 years, or 4) with medical history of hysterectomy, ovariectomy, radiation, chemotherapy and hormone replacement treatment before age at menopause. Subjects 1) whose DNA microarray data was not available, 2) with low call rate (<0.98), 3) whose genetic data suggested as male, 4) who were genetically identical to other subjects or 5) who were outliers from EAS cluster in PCA plot were excluded from the analyses. We applied the same quality control for variants as the previous literature52. After quality control, remaining variants were phased and subsequently imputed onto the reference panel containing the 1000 Genomes Project Phase 3 and around 3,000 Japanese whole-genome sequence data52. We restricted subsequent analyses to variants with rsq >0.3. For an association study of age at menopause, we applied a linear mixed model using BOLT-LMMv2.3.4 software correcting for age in years and the top ten genetically determined principal components as covariates.
The China Kadoorie Biobank baseline survey was conducted during 2004-2008 in 10 geographically diverse regions of China (5 rural, 5 urban), with resurveys of approximately 5% of the cohort at 5-yearly intervals. 302632 women aged 35-74 years were enrolled with a mean age at baseline of 51.4 (SD 10.5), of whom 162,929 provided at least one reported age at menopause, in response to the questions “Have you had your menopause? If so, age of completion of menopause?”, with mean (SD) of 48.2 (4.4) years. Genotyping data was available for 31,177 women with values for age at menopause in the range 35-60 years and who had not had prior hysterectomy, oophorectomy, or cancer. Genotyping used custom Affymetrix Axiom® arrays with imputation into the 1000 Genomes Phase 3 reference using SHAPEIT3 and IMPUTE4 (IMPUTE2 for chrX). Age at menopause was adjusted for year-of-birth and year-of-birth-squared, and analyses were carried out separately for each of the 10 recruitment regions using BOLT-LMM v2.3.2 followed by inverse-variance-weighted fixed effect meta-analysis in METAL. Analyses used CKB data release 15.
The variance explained by our identified signals were estimated in a further independent sample of 16,556 women from the Icelandic deCODE study. Of those women, 14,771 were chip-typed and 1,785 are imputed 1st and 2nd degree relatives of chip-typed individuals. We assessed the aggregate significance of the identified loci by testing how many alleles had the same direction of effect using a binomial sign test (null expectation 50%). The proportion of variance explained using replication summary statistics provided by deCODE (n=16,556). We calculated the variance explained by each variant in deCODE (using the formula 2×β2×MAF×(1-MAF)), dividing the sum of the variance explained in total for the 290 variants by the SE2 of menopause age in deCODE.
We additionally estimated the proportion of variance in ANM explained by the 290 genome-wide significant signals in UK Biobank by calculating linear regression R2 in 88,829 unrelated women of European descent (as described previously49) who had menopause age recorded. We generated estimates by combining the 290 variants as a genetic risk score with the allelic dosage weighted by the effect size from meta-analysis of the 1KG and BCAC strata only (Supplementary Table 2). Genotypes were extracted from imputed data and we included the covariates genotyping chip and release of genotype data, recruitment centre, age and the first five genetic principal components. Genotype-array heritability estimates were calculated using REML implemented in BOLT-LMM to provide a denominator for proportion of heritability explained.
Assessing deviation from an additive genetic model
A dominance deviation test53 was run for the 290 genome-wide significant ANM signals. Briefly, in this test a dominance deviation term representing the heterozygous group (coded 0, 1 and 0) is fitted jointly with an additive genotype term in the regression model. This test determines whether the average trait value carried by the heterozygous group lies halfway between the two homozygote groups as expected under an additive model. We used best guess genotypes converted from UK Biobank imputed genotype data and performed linear regression analysis in Stata v16.0 in 379,768 unrelated individuals of European descent (identified as described previously49. We regressed ANM on genotype including the covariates genotyping chip and release of genotype data, recruitment centre and the first five genetic principal components. We also tested a dominant model, comparing the effect allele heterozygotes/homozygote group with other allele homozygotes, and a recessive model, comparing effect allele homozygotes with heterozygotes and other allele homozygotes. Genetic variants with a P-value for the dominance deviation term that was smaller than Bonferonni corrected P=0.05 (P=0.05/290=0.000172) were considered to show evidence of non-additive effects.
Gene burden analyses of UK Biobank exome sequencing data
We carried out gene burden association testing of rare variants in women identified from ~200K people with exome sequencing data available in the UK Biobank study. We included 45,351 women with ANM between 18–65 years in our analyses to maximise the sample size and ensure inclusion of women with POI who might be expected to be more likely to be carriers of rare variants.
Detailed sequencing methodology is provided by Szustakowski et al54. Briefly, exomes were captured with the IDT xGen Exome Research Panel v1.0 which targeted 39Mbp of the human genome with coverage exceeding on average 20x on 95.6% of sites. The OQFE protocol was used for mapping and variant calling to the GRCh38 reference. Variants included in our analyses had individual and variant missingness <10%, Hardy Weinberg Equilibrium p-value >10-15, minimum read depth of 7 for SNPs and 10 for indels, and at least one sample per site passed the allele balance threshold > 15% for SNPs and 20% for indels.
Variants in CCDS transcripts were annotated using Variant Effect Predictor55. We identified loss-of-function (LoF) variants (stop-gain, frameshift, or abolishing a canonical splice site (-2 or +2 bp from exon, excluding the ones in the last exon)) deemed to be high confidence by LOFTEE (https://github.com/konradjk/loftee). We conducted gene-burden analyses using a SKAT-O test implemented in SAIGE-GENE56 based on variants with MAF<0.001. SAIGE-GENE implements a generalized mixed-model region-based association test that can account for population stratification and sample relatedness in large-scale analyses. We applied an inverse normal rank transformation to ANM prior to analyses and included recruitment centre as a covariate. For each gene, we present results for the transcript with the smallest SKAT-O p-value. Since the magnitude of effect sizes from SAIGE-GENE are not easily interpretable, we calculated the sum of LoF alleles in BRCA1, BRCA2 and CHEK2 for each person. We tested each score’s association with ANM by performing linear regression in Stata v16.0 in unrelated samples of European descent (identified as described previously [PMID: 30423117]) including recruitment centre and the first five genetic principal components as covariates.
Identifying putatively functional genes
We used two in silico approaches to prioritise putatively functional genes across our highlighted loci. Firstly, To identify variants with functional consequences, we looked up variants in r2>0.8 with the signals in Variant Effect Predictor (build 38). We identified missense, frameshift, insertion/deletions and stop-gained and splice site disrupting variants, which we then classified according to their VEP, PolyPhen and SIFT impact. We considered ‘high impact’ variants as those classified as high impact by VEP (stop-gained, frameshift and splice site disrupting). ‘Medium impact’ variants were missense variants classed as moderate impact by VEP, which were either deleterious in SIFT and were at least possibly damaging in PolyPhen. ‘Low impact’ variants were missense or inframe insertions/deletions classed as moderate impact by VEP and were tolerated and/or benign in PolyPhen. LD was calculated using PLINK v1.9 from best guess genotypes for 1000 Genomes Phase 3/HRC imputed variants in ~340,000 unrelated UK Biobank participants of white British ancestry. Genetic variant locations were converted from b37 to b38 using UCSC Liftover.
Secondly, we integrated our ANM genome-wide summary statistics with eQTL data using Summary Mendelian Randomization (SMR)57. Publicly available expression datasets for 48 tissues in GTEx v7 and 10 brain regions were downloaded from the SMR website (https://cnsgenomics.com/software/smr/#eQTLsummarydata). Whole-blood data in an eQTL meta-analysis of 31,684 samples was available from the eQTLGen consortium [https://www.biorxiv.org/content/10.1101/447367v1] A Bonferroni corrected p-value threshold was used in each expression dataset individually and only associations with HEIDI P > 0.01 were considered to avoid coincidental overlap due to extended patterns of LD. This resulted in a total of 44 (SMR P<7x10-6) significant transcriptions in the brain, 96 in whole blood (P<3x10-6) and 732 across all GTEx tissues (SMR P<3.6x10-7). We excluded brain and whole blood tissues from the collection of 48 tissues in GTEx as they were better represented by the other expression datasets.
Identifying enriched cell and tissue types
We used three approaches to identify cell and tissue types enriched for ANM associated variants. DEPICT was run using default settings as described previously58 using GWAS summary statistics including all autosomal variants with P-value <1x10-5. The cell-type specific expression matrices used as input to DEPICT were generated from individual single-cell gene expression datasets (see below). Briefly, each data set was processed by first normalizing cells’s gene expression to a common transcript count (10,000 transcript per cell) before calculating the average expression of each gene for each cell-type annotation. Averaged data was log-transformed (natural log). We computed cell-type specific gene expression following using a two-step z-score approach - first we calculated gene-wise z-scores (each gene; mean=0, sd=1) to remove the effect of ubiquitous expressed genes, then we calculated cell-type-wise z-scores (each cell-type; mean=0, sd=1) on gene-wise z-scores. For mouse expression datasets we mapped mouse genes to human orthologs using Ensembl (v. 91) keeping only genes with a 1-1 ortholog mapping.
DEPICT analyses were run on two datasets: 1) Tabula Muris (https://tabula-muris.ds.czbiohub.org/)59, restricted to the fluorescence-activated cell sorting samples. To keep the tissue level information in the dataset, we defined cell-type annotations as ‘tissue cell-types’ by combining the cell-type label (‘cell_ontology_class’ column) with the origin tissue of the cell-type (‘tissue’ column). This allowed us to e.g. distinguish B-cells originating from fat, spleen and marrow tissue. In total we analyzed 115 cell-type annotations from 44,949 cells; 2) Nestorowa et al. human hematopoietic stem and progenitor cell differentiation dataset60 was not normalized to a common transcript count because the data was pre-normalized by the authors. We defined cell-type annotations as the 12 distinct hematopoietic stem and progenitor cell (HSPC) phenotypes reported by the authors (shown in their manuscript Figure 3A). The annotations covered 1,483 cells.
Secondly, we additionally performed tissue enrichment analysis using linkage-disequilibrium (LD) score regression to specifically expressed genes (LDSC-SEG)61. We used three datasets available on the LDSC-SEG resource page (https://github.com/bulik/ldsc/wiki/Cell-type-specific-analyses), relating to cell and tissue-specific annotations from GTEx62, Epigenome Roadmap63 and the “Franke lab”58,64.
Finally, tissue enrichment analyses were performed using ‘Downstreamer’, which is described in a separate section below.
Pathway analysis
MAGENTA was used to explore pathway-based associations in the full GWAS data set. MAGENTA implements a gene set enrichment analysis (GSEA)-based approach65. We used upstream and downstream limits of 110Kb and 40Kb to assign variants to genes, excluded the HLA region from the analysis and set the number of permutations to 10,000 for GSEA testing, with analysis using 75% and 95% cut-offs. Significance was determined when an individual pathway reached FDR<0.05 in either analysis. In total, 3,222 pathways from Gene Ontology, PANTHER, KEGG and Ingenuity were tested for enrichment of multiple modest associations with ANM.
We additionally performed pathway analyses in ‘Downstreamer’ (described in section below) and MAGMA66 v1.08. MAGMA analyses were performed using the full genome-wide summary statistics, but restricted to variants that were predicted deleterious (i.e non-synonymous and loss of function). Gene-sets included in the analyses were obtained from MsigDB v7.2, which included 12,358 curated gene sets from KEGG, Reactome, BioCarta and GO terms consisting of biological processes, cellular components and molecular functions.
Downstreamer methodology
In short, Downstreamer identifies genes connected to genes at GWAS loci (core genes) through expression and identifies enriched pathways. Downstreamer implements a strategy that accounts for LD structure and chromosomal organization, operating in two steps. In the first step, gene-level prioritization scores are calculated for the GWAS trait and a null distribution. In the second step, the gene-level prioritization scores are associated with the co-regulation matrix and pathway annotations. Further details are outlined below.
Downstreamer step 1
Calculation of gene-level prioritization scores (GWAS gene Z-scores)
The primary step is to convert GWAS summary statistics from p-values per variant to an aggregate p-value per gene (gene p-value) while accounting for local LD structure. This aggregate gene level p-value represents the GWAS signal potentially attributable to that gene.
First, we applied genomic control to correct for inflation in the GWAS signal. We then integrated the procedure from the PASCAL67 method into Downstreamer to aggregate variant p-values into a gene p-value while accounting for the LD structure. We aggregated all variants within a 25kb window around the start and end of a gene using the non-Finnish European samples of the 1000 Genomes (1000G) project, Phase 3 to calculate LD [26432245]. We calculated these GWAS gene p-values for all 20,327 protein-coding genes (Ensembl release v75). The gene p-values were then converted to Z-scores for use in subsequent analysis. These are referred to as GWAS gene Z-scores.
Calculation of gene Z-scores for null GWASs to account for chromosomal organization of genes and to calculate empirical p-values
To account for long range effects of haplotype structure which results in genes getting a similar GWAS gene Z-score, we use a generalized least squares (GLS) regression model for all regressions done in Downstreamer. The GLS model takes a correlation matrix that models this gene-gene correlation.
To calculate this correlation matrix we first simulated 10,000 random phenotypes by drawing phenotypes from a normal distribution and then associating them to the genotypes of the 1000G Phase 3 non-Finnish European samples. We used only overlapping variants between the real traits and the permuted GWASs to avoid biases introduced by genotyping platforms or imputation. We then calculated the GWAS gene Z-scores for each of the 10,000 simulated GWAS signals as described above. Next, we calculated the Pearson correlations between the GWAS gene Z-scores. As simulated GWAS signals are random and independent of each other, any remaining correlation between GWAS gene Z-scores reflects the underlying LD patterns and chromosomal organization of genes.
We simulated an additional 10,000 GWASs as described above to empirically determine enrichment p-values and, finally, we used an additional 100 simulations to estimate the false discovery rate (FDR) of Downstreamer associations.
Downstreamer step 2
Calculation of Z-scores for co-regulation matrix
To calculate core scores, we used a previously generated co-regulation matrix that is based on a large multi-tissue gene network68. In short, publicly available RNA-seq samples were downloaded from the European Nucleotide Archive (https://www.ebi.ac.uk/ena). After QC, 56,435 genes and 31,499 samples covering a wide range of human cell-types and tissues remained. We performed a PCA on this dataset and selected 165 components representing 50% of the variation that offered the best prediction of gene function. We then selected the protein coding genes and centred and scaled the eigenvectors for these 165 components (mean = 0, standard deviation = 1) such that each component was given equal weight. The first components mostly describe tissue differences68, so this normalization ensures that tissue-specific-patterns do not disproportionately drive the co-regulation matrix. The co-regulation matrix is defined as the Pearson correlation between the genes from the scaled eigenvector matrix. The diagonal of the co-regulation matrix was set to zero to avoid the correlation with itself having a disproportionate effect on the association to the GWAS gene Z-scores. Finally, we converted the Pearson r to Z-scores.
Calculation of Z-scores for pathways and gene sets
To identify pathway and disease enrichments, we used the following databases: Human Phenotype Ontology (HPO), Kyoto Encyclopaedia of Genes and Genomes (KEGG), Reactome and Gene Ontology (GO) Biological Process, Cellular Component and Molecular Function. We have previously predicted how much each gene contributes to these gene sets, resulting in a Z-score per pathway or term per gene68. We collapsed genes into meta-genes in parallel with the GWAS step, to ensure compatibility with the GWAS gene Z-scores following the same procedure as in the GWAS pre-processing. Meta-gene Z-scores were calculated as the Z-score sum divided by the square root of the number of genes. Finally, all pathway Z-scores were scaled (mean = 0, standard deviation = 1).
Pre-processing of GWAS gene Z-scores and pruning of highly correlated genes
For each GWAS, both real and simulated, we carried out rank-based inverse normal transformation of GWAS Z-scores to ensure that outliers would not have disproportionate weights. Due to limitations in the PASCAL methodology that result in ties at a minimum significance level of 1x10-12 for highly significant genes, we used the minimum SNP P-value from the GWAS to identify the most significant gene and resolve the tie. We then used a linear model to correct for gene length, as longer genes will typically harbour more SNPs.
Sometimes, two (or more) genes will be so close to one another that their GWAS gene Z-scores are highly correlated, violating the assumptions of the linear model. Thus, genes with a Pearson correlation r ≥ 0.8 in the 10,000 GWAS permutations were collapsed into ‘meta-genes’ and treated as one gene. Meta-gene Z-scores were averaged across the input Z-scores. Lastly, the GWAS Z-scores of the meta genes were scaled (mean = 0, standard deviation = 1).
GLS model to calculate pathway enrichment and core gene scores
We used a GLS regression to associate the GWAS gene Z-scores to the pathway Z-scores and co-regulation Z-scores (described below). These two analyses result in the pathway enrichments and core gene prioritisations, respectively. We used the gene-gene correlation matrix derived from the 10,000 permutations as a measure of conditional covariance of the error term (□) in the GLS to account for the relationships between genes due to LD and proximity. The pseudo-inverse of □ is used as a substitute for □-1
The formula of the GLS is as follows:
Where β is the estimated effect size of pathway, term or gene from the co-regulation matrix, Ω is the gene-gene correlation matrix, X is the design matrix of real GWAS Z-scores and y is the vector of gene Z-scores per pathway, term or gene from the co-regulation matrix. As we standardized the predictors, we did not include an intercept in the design matrix and X only contains one column with the real GWAS gene Z-scores. We estimated the beta’s for the 10,000 random GWASs in the same way and subsequently used them to estimate the empirical p-value for β.
Definition of POI and DDR genes
We combined genes implicated in the DDR from a number of sources yielding a total of 778 genes (Supplementary Table 19)69–71. To identify genes associated with premature ovarian insufficiency/primary ovarian insufficiency (ICD-11 GA30.6), we carried out a search in PubMed for premature ovarian insufficiency, primary ovarian insufficiency, premature ovarian failure and ovarian dysfunction in humans and reviewed all primary studies published in English until 22nd of July, 2020. We included syndromic, non-syndromic, sporadic as well as familial single nucleotide variants, insertion/deletions and copy number variants (CNVs) and included 114 genetic variants from 139 studies. We did not attempt to review the clinical significance of the variants, which ranged from classical POI genes to newly identified CNVs in whole-exome sequencing studies. We expanded our search to review articles and ClinVar. We uncovered another four genes implicated in Perrault Syndrome for which our search terms were not included in the original articles. This gave a total of 118 genes. Our search detected all genetic variants entered in ClinVar as pathogenic, likely pathogenic or with conflicting interpretations of pathogenicity. We excluded genes with variants when no assertion criteria were provided and no published data were available for assessment in ClinVar. Two studies of large chromosomal rearrangements as well as quantitative trait loci consisting of more than a single genetic variant from GWAS in POI populations were excluded resulting in 74 genes (Supplementary Table 6). Gene lists were curated independently of the current meta-analysis and genes were only included if there was convincing evidence independent of any GWAS study.
Polygenic prediction of early menopause
To evaluate the impact of common variants on clinical extremes of ANM, we first performed a GWAS meta-analysis excluding the UK Biobank study (N=95,275). Effect estimates from this analysis (Supplementary Table 2) were then used for subsequent polygenic score (PGS) construction of ~6.97 million autosomal variants across the genome using LDPRED72. The PGS was calculated using PLINK73 v1.90b4.4 in an independent sample of 108,840 women with the full phenotypic range of ANM ages from the UK Biobank study, rescaled to have a mean of 0 and standard deviation of 1. We then estimated the centile distribution of the genetic risk score for all women with a valid ANM (with no lower or upper phenotype boundary). Two outcomes were defined: early menopause (EM) defined as ANM < 45 (N=11,268) vs all other women (N=97,572); and premature ovarian insufficiency (POI), defined as ANM < 40 years (N=2,407) vs all other women (N=106,433). Logistic regression analyses, adjusting for age, genotype array and 10 genetic principal components, were then performed with either EM or POI as the outcome. This was performed 99 times for each centile of genetic risk (coded 1) vs the 50th centile of genetic risk (coded 0). To assess the relevance of this score to each ANM age group, we estimated the average PGS value by year of ANM. For example, we grouped all women with ANM = 47 and estimated the mean and standard error of the PGS in this group of women. Our intuition was that any ANM range not influenced by common genetic variants would have the population mean PGS (i.e mean = 0 and SD = 1). Receiver operating characteristics (ROC) models were performed in Stata v14 using the roctab, rocgold and rocreg commands.
Mendelian Randomization analyses
In order to infer causal relationships between ANM and other health related outcomes, we performed Mendelian Randomization (MR). The 290 independent ANM signals were used as a genetic instrument for later ANM. Where a signal was not present in the outcome GWAS, we identified the best HapMap2 proxy with r2>0.5 within 250 kb either side of the signal and its relevant weight was included in our genetic instrument (Supplementary Table 23). The genetic variants were identified in publicly available GWAS datasets for a range of outcomes of interest (Supplementary Table 22). These were used in three methods of MR - inverse variance weighted74, MR-EGGER75 and weighted median76. As a sensitivity analysis we additionally removed signals that appeared to be outliers. This was achieved using the Radial method considering the IVW model77. We also performed MR considering the effect of a range of putative modifiable risk factors on ANM as the outcome using the same MR models. Genetic instruments were created for the risk factors using independent genetic variants with effects estimated in published GWAS (Supplementary Table 25). For the risk factors of cigarette exposure and alcohol consumption, the MR was performed with a single genetic variant by calculating a Wald ratio for the effect of the variants on ANM divided by the effect on the risk factor using mrrobust in Stata v16.0. The effect of the genetic variant for alcohol consumption was measured in log(drinks per week) (note that drink is a US measure of alcohol consumption equal to 14g pure alcohol, equivalent to 1.75 UK units). Hence a change from 1 drink to 7 drinks (US maximum recommended per week) would be the equivalent of a 1.95 increase in log(drinks per week), which when applied to the Wald estimate, gives the respective change in age at menopause.
Expression of candidate genes identified by human GWAS in a mouse model of environmentally-induced low ovarian reserve
Generation of mouse model
All animal experiments underwent ethical review by the University of Cambridge Animal Welfare and Ethical Review Board and were carried out under the UK Home Office Animals (Scientific Procedures) Act (1986, United Kingdom). Female C57BL/6J mice were randomized to be fed ad libitum either a standard laboratory chow diet (7% simple sugars/3% fat; Special Dietary Services, Witham, UK) or an obesogenic diet (10% simple sugars/20% animal lard; Special Diets Services, Witham, UK). The obesogenic diet was supplemented with a separate pot of sweetened condensed milk (55% simple sugars/8% fat; Nestle UK, Gatwick, UK) available to the animals within the cage. A detailed description of the dietary regimen has been published previously78. Female mice were placed on the allocated diet six weeks prior to first mating with wild-type males on standard chow diet. The first litter was discarded after weaning, and only proven-breeder females were used for the experimental protocols. Second matings occurred once females on the obesogenic diet had reached at least 10g absolute fat mass, as assessed by time domain nuclear resonance imaging (TDNMR) (Minispec Time Domain Nuclear Resonance, Bruker Optics). The female mice remained on their allocated diets throughout the breeding, pregnancy, and lactation phases. After delivery, each litter was culled to six pups at random to standardize their plane of nutrition from postnatal day 3 in all litters. There was no significant difference in the pre-culling litter size between obesogenic and control litters. Equal sex ratios within the litters were maintained as far as possible. After weaning at day 21, female offspring were randomly allocated to either the control or the obesogenic diets (identical to those used for the dams) and remained on these diets for the duration of the study. Bodyweight and food intake were measured weekly. At 12 weeks of age, offspring total and fat mass were assessed by weighing and by TDNMR (Minispec Time Domain Nuclear Resonance, Bruker Optics) respectively. Following an overnight fast, the female offspring were weighed and then culled by CO2 asphyxiation and cervical dislocation. Ovaries were dissected and weighed immediately. One ovary from each animal was snap-frozen in liquid nitrogen or dry ice, and stored at -80°C, the other was fixed in formalin/paraldehyde. The fixed ovary was sectioned and subjected to haematoxylin and eosin (H&E) staining to ensure equal distribution of estrous stages in each experimental group (data not shown). Detailed reproductive and metabolic phenotyping of the female pups has previously been published24.
Gene expression analysis
A screen of 35 DNA damage response genes highlighted by our previous GWAS on ANM were selected for investigation13 - Brca1, Bre, Brsk1, Chd7, Chek2, Dido1, Fbxo18, Helb, Helq, Mcm8, Mlf1ip, Msh5, Msh6, Mycbp, Polg, Prim1, Rad51, Rad54l, Rev3l, Uimc1, Apex, Aptx1, Cdk2ap1, Dmc1, Exo1, Fam175a, Fanci, Ino80, Kntc1, Papd7, Parl, Parp2, Polr2e, Polr2h and Tlk1. Expression levels were measured in whole snap-frozen ovaries. RNA was extracted using a miRNeasy mini Kit (Qiagen, Hilden, Germany). The kit was used according to the manufacturer’s instructions, with the addition of DNaseI digestion to ensure that the samples were free from genomic DNA contamination. The extracted RNA was quantified using a Nanodrop spectrophotometer (Nanodrop Technologies, Wilmington, DE, US). cDNA was synthesized from 1μg RNA using oligo-dT primers and M-MLV reverse transcriptase. Gene expression was quantified via RT-PCR (StepOne Plus machine; Applied Biosystems, Warrington, UK) using custom-designed primers (Sigma, Poole, UK) and SYBR green reagents (Applied Biosystems, Warrington, UK). Equal efficiency of reverse transcription between all groups was confirmed using the housekeeper gene ppia, and absence of gDNA contamination was confirmed by quantifying myh6, which was absent in all samples.
Statistical analysis
All data were initially analyzed using a 2-way ANOVA with maternal diet and offspring diet as the independent variables. In order to correct for multiple hypothesis testing of gene expression levels, p values were transformed to q values to take account of the false discovery rates using the p.adjust function in R stats package (R Foundation for Statistical Computing, Vienna, Austria). Data are represented as means ± SEM. Where p values are reported, an alpha level <0.05 was considered statistically significant. All data analysis was conducted using the R statistical software package version 2.14.1 (R Foundation for Statistical Computing, Vienna, Austria). In all cases, n refers to the number of litters, and n=8 for all groups. Study power was determined based on effect sizes for gene expression differences observed in our previous studies of this model24.
Human oocytes mRNA screen
Research on RNA expression in human eggs was carried out according to the Helsinki II declaration and was conducted in accordance with national regulation on research on human subjects and material. The research was approved by the Scientific Ethical Committee of the Capital Region of Denmark (Videnskabsetisk Komite) in accordance with Danish National regulation (H-2-2011-044; extension license amm. Nr. 51307; license holder: Claus Yding Andersen and H-1604473; license holder: Eva R. Hoffmann; H-16027088 granted to Marie Louise Grøndahl). The full protocols contained permission to conduct mRNA sequencing on human eggs. GDPR approval was obtained from the national data agency (SUND-2016-60, Eva R Hoffmann and HGH-2016_086 to Marie Louise Grøndahl). All participants provided informed consent according to Danish ethical regulation after receiving written information and oral clarification about participation. Participants could withdraw from the study at any time. Participants did not receive monetary compensation and their participation was fully voluntary and did not affect their fertility treatment.
Single human MII oocytes were collected as described previously79, lysed in-tube and the cDNA was amplified according to the manufacturer’s instructions (Takara Bio; mRNA-Seq, SMART-Seq v4 ultra low input RNA kit, cat. no. 634894). The quality of individual cDNA libraries was verified on an Agilent 2100 Bioanalyzer instrument using a high sensitivity DNA kit (Agilent, 5067-4626). The libraries were prepared with 100 pg input using the Nextera XT DNA library preparation kit (Illumina, FC-131-1024) and the Nextera XT index kit v2 (FC-131-2002) and quantified on a Qubit 3.0 fluorimeter (Thermo Fisher Scientific, Q32854). The quality of the final library was verified on the Agilent 2100 Bioanalyzer high sensitivity DNA chip and pooled to 4 nM. The 4 nM library pools were denatured and loaded according to the recommended NextSeq500 guidelines (Illumina Inc.).
Expression analysis of GWAS genes in human oocytes and granulosa cells at various stages of development
We used processed RNA-seq data of Fetal Primordial Germ Cells from Li et al (2017, Accession code: GSE86146)80 from 17 human female embryos ranging from 5-26 weeks post-fertilisation, and from Zhang et al (2018, Accession code GSE107746)81 studies, follicles at 5 different stages of development from fresh ovarian tissue from 7 adult donors, separated into oocytes and granulosa cell fractions; in addition to our MII Oocytes single-cell RNA-seq dataset (described below).
We transformed the per-cycle base call (BCL) file output from the sequencing run of 11 human MII oocytes into per-read FASTQ files using the bcl2fastq2 Conversion Software v2.19 from Illumina. The samples libraries were multiplexed across four sequencing lanes and the FastQ files from each of the four lanes were concatenated to generate one set of paired fastq files per sample. We performed sample QC and filtering of reads to remove low quality reads, adaptor sequences and low quality bases with trimmomatic82 version 0.36 in two steps using ILLUMINACLIP:/ /Trimmomatic-0.36/adapters/NexteraPE-PE.fa:2:30:10 (SLIDINGWINDOW:4:20 CROP:72 HEADCROP:10 MINLEN:40 followed by and extra trim of headbases with HEADCROP:10.) Subsequent to filtering, we used the remaining paired reads for alignment by hisat283 to the human genome GeneCode v.27 release with the paired GenCode v.27 gtf file containing gene annotations using: ($HISAT2 -p 22 --dta -x.gencode.v27 -1 R1.fastq -2 R2.fastq -S sample.sam) (Pertea et al. 2016). The resulting sam files were sorted, indexed and transformed to bam files using samtools84. QC measures of aligned reads was generated using picard metrics (https://slowkow.github.io/picardmetrics) and the CollectRnaSeqMetrics tool from picard tools (http://broadinstitute.github.io/picard). We filtered the bam files for mitochondrial reads and Stringtie was applied to merge and assemble reference guided transcripts for gene level quantifications of raw counts, and transcripts per million (TPM)85. Of the 283 consensus genes highlighted by the GWAS (Supplementary Table 5), 258 passed QC and were available in the expression dataset. Gene expression levels in TPM were used for further analyses as this unit allows efficient comparison of gene expression levels between samples from different studies. A pseudo-count of 1 was added to all TPM values and converted to log2 scale before the heatmaps were plotted. Hierarchical clustering by euclidean distance, z-score calculation and plotting the heatmap was done using the R package ‘pheatmap’ (Kolde R, 2019, v1.0.12). Z-scores are calculated by subtracting the mean of TPM values in all samples for a gene and dividing by the standard deviation. Samples with only TPM>5 were considered for heatmap showing the GWAS genes.
sChek1, Chek1 cKO, and Chek2 mice
Mouse work at the University of Copenhagen (sChek1) was licensed under 2016-15-0202-00043 by the Danish Animal Experiments Inspectorate (Dyreforsøgstilsynet, Denmark). Mouse work at UAB (Chek2) was approved by the UAB and the Catalan Ethics Committee for Animal Experimentation (CEEAAH 1091; DAAM6395). Mouse work at CCHMC (Chek1 cKO, Ddx4-Cre) was performed according to the guidelines of the Institutional Animal Care and Use Committee (protocol no. IACUC2018-0040) approved by CCHMC. The Chek1 cKO, Zp3-Cre embryology was conducted at the Institute of Animal Physiology and Genetics CAS in Libechov (Czech Republic), abiding by the policies of the Expert Committee for the Approval of Projects of Experiments on Animals of the Academy of Sciences of the Czech Republic (# 43-2015).
Chek1 cKO (Ddx4-Cre), sChek1, and Chek2 mutant mice were generated previously33,37,86. The lines were maintained in C57BL/6-129Sv and inbred C57BL/6-129Sv (sChek1 and Chek2) backgrounds respectively. The chek2 mouse is available under accession number BRC03481 at the RIKEN Bioresource Centre. The Chek1 cKO Zp3-Cre embryos were generated by crossing mice with Zp3-Cre transgene87 to mice with Chek1 allele containing LoxP sites88 resulting in mice expressing Cre-recombinase under the control of the oocyte specific zona pellucida 3 promotor (Zp3::Cre) to produce Chek1 cKO (Zp3-Cre). All experiments were carried out using litter mate controls or with animals of closely related parents as controls. The four mutant strains were kept at the University of Copenhagen (sChek1), Autonomous University of Barcelona (Chek2), Cincinnati Children’s Hospital Medical Center (Chek1 cKO - Ddx4-Cre) and Institute of Animal Physiology and Genetics CAS in Libechov, Czech Republic (Chek1 cKO Zp3-Cre). Breeding cages were set in a conventional way with strict specific pathogen-free barrier and mice used for experiments were kept in individual ventilated cages (IVC). 12h light exposure was provided. Temperature, relative humidity and air changes per hour were 22 °C (+/-2 °C), 55% +/-10 %, and 17 respectively. Food and water were provided ad libitum. Animals were genotyped two times, initially upon weaning and again before experimental procedures were carried out. Mouse genotyping was performed by PCR analysis using the following primers for the Chek1 cKO (Ddx4): F1 (5′-ACC TGC CCG CAA CTC CCT TTC-3’) and R2 (5′-TGC AAC AGC TTC AGT TAT TC-3′); for the cKO Chek1(Zp3-Cre): Cre_low (5′-TAT TCG GAT CAT CAG CTA-3′), Cre_up (5′-GGT GGG AGA ATG TTA ATC-3′), CHK1F1 (5′-ACC TGC CCG CAA CTC CCT TTC-3′), CHK1R1(5′-CCA TGA CTC CAA GCA CAG CGA-3′). The sizes of products were 318 bp for wild type and 380 bp for loxP/loxP transgene. The size of the Zp3-Cre transgene was 139 bp. For sChek1 the primers were: gsChek1_left “TGT CTT CCC TTC CCT GCT TA”, gsChek1_right1 “TCC CAA GGG TCA GAG ATC AT” and gsChek1_5’PCR2 “GTA AGC CAG TAT ACA CTC CGC TA”. The wild type gene yields a size of 400 bp whereas the transgene is 270 bp. For Chek2, the primers WT1F (5′–GTGTGCGCCACCACTATCCTG–3’), WT2R (5′–CCCTTGGCCATGTTTCATCTG–3′) and NeoMutR (5′–TCCTCGTGCTTTACGGTATC–3′) were used to detect the wild type (450 bp) and the mutant (625 bp) alleles in one PCR reaction. The Qiagen Taq polymerase PCR kit was used for genotyping (Cat No 201203 / 201205).
Mouse ovarian histology and follicle count
Ovaries were dissected and placed in 4% formaldehyde (Chek1 cKO (Ddx4)) & Bouin’s fixative solution (70% saturated picric acid solution (Applichem, A2520, 1000), 25% formaldehyde, 5% glacial acetic acid (Merck, 1.00063.2500)) or 4% formaldehyde for Chek1 cKO (Ddx4-Cre) overnight at 4 °C. The ovaries were washed two times with cold PBS for 30 minutes followed by dehydration with an increasing concentration of ethanol. Subsequently, the samples were submerged in Histo-Clear II (Cat. # HS-202, National Diagnostics) for 30 min. at room temperature. This was repeated another two times (three times in total) with fresh Histo-Clear II. Ovaries were embedded in paraffin blocks and cut to a thickness of 7 μm (sCHEK1 and Chek2) and 6 μm (Chek1 cKO (Ddx4-Cre)) and mounted on poly-L-lysine coated slides. After de-paraffinization and rehydration, the slides were stained with PAS-hematoxylin. The tissue was imaged using a Zeiss Axio scanner Z.1 and follicles with a visible nucleus were counted using the Zen Blue lite software from Zeiss. Primordial follicles contain one layer of flat granulosa cells surrounding the oocytes, primary follicles have one layer of cuboid granulosa cells. Secondary follicles contain two or more layers of granulosa cells and antral follicles are those with one or several cavities (the antrum).
Mouse ovulation induction and oocyte collection
Ovulation was induced by injection of 5 IU of PMSG (Prospec; ref HOR-272) followed by 5 IU of hCG (Chorulon Vet; ref 422741) after 47 hours. For 11-13, 16 and 24 months old mice, 7.5 IU of each hormones were used. 12 hours post-hCG injection, the mice were sacrificed and oviducts were dissected under a stereo-microscope to release the cumulus masses into 90 μl drop of fertilization medium covered with mineral oil (NordilCell; ref 90142). Oocytes were recovered from oviducts by gently tearing swollen ampulla of oviducts to release cumulus masses into medium. Recipe of fertilization medium was previously published elsewhere89.
RT-qPCR on mice oocytes
Total RNA from oocytes was isolated with the Arcturus PicoPure RNA Isolation Kit from Applied Biosystems following the manufacturer’s instructions. Reverse transcription reactions were done with twenty eight nanograms of RNA using the Maxima First Strand cDNA Synthesis Kit for RT-qPCR with dsDNase (Thermo Fisher Scientific). cDNA was quantified by qPCR with the Applied Biosystems 7500 FAST Real-Time PCR System using Power SYBR green PCR Master Mix from Thermo Fisher Scientific. The sequences of the primers used are the following: Chek1-For: 5’- AAGCCACGAGAATGTAGTGAAA-3’, Chek1-Rev: 5’- AGCATCTTGTTCAGGCATCC-3’, Actb-For: 5’-CCAACCGTGAAAAGATGACC-3’, Actb-Rev: 5’-ACCAGAGGCATACAGGGACA-3’. Values were normalized to the expression of Actb housekeeping gene.
Mouse embryo development in vitro
Fresh pre-thawed frozen sperm from a proven fertile C57BL/6 wild-type male was used for in vitro fertilization and poured into a dish containing mature MII eggs in fertilization medium. Disappearance of germinal vesicle (GV) and polar body extrusion confirmed fertilization. Zygotes were incubated at 5% CO2 and 37 °C. After incubating zygotes in fertilisation medium for overnight, We transferred zygotes to a 60 mm petri dish containing 50 μl KSOM (Chemicon, cat MR-106-D) covered by mineral oil(NordilCell; ref 90142). Two separate dishes were prepared for embryos from each genotype. The embryos were again incubated at 5% CO2 and 37 °C. The developmental stage of embryos was assessed using a stereomicroscope at the equivalent of 0.5, 1.5, 2.5, 3.5, 4.4 and 5.5 days post-coitum (dpc). For chek2, where the wild type frequency of fertilization was lower than in the the Chek1-cko and sChek1 strains, we used young C57BL/6J.Ola.Hsd females to control for the efficiency of IVF (85%).
Mouse ovulation and embryo development (Chek1 cKO, Zp3-Cre)
Chek1 ctrl and cKO females were stimulated with 5 IU of PMSG (HOR-272, Prospec) followed by 5 IU of hCG (Ovitrelle, Merck) after 44 hours. After 18 hours, the females were sacrificed using cervical dislocation according to the protocols authorized by the ethics committee, and ovulated MII oocytes and zygotes were collected in M2 media (M7167-50ML, Sigma-Aldrich) by tearing ampulla from oviduct. The oocytes and zygotes in cumulus mass were placed into a drop of M2 media supplied with 300 μg/ml hyaluronidase (H4272, Sigma-Aldrich) to release the cumulus cells. The MII oocytes and zygotes were cultured at 5% CO2 and 37°C in EmbryoMax® KSOM media (MR-106-D, Sigma-Aldrich) and after 10 hours were scored using Leica DMI 6000 microscope. Only zygotes with visible pronuclei were left for subsequent culture.
Immunofluorescence analysis of mouse preimplantation embryos (Chek1 cKO, Zp3-Cre)
The embryos were 3x briefly washed in PBS supplied with 1mg/ml poly(vinyl alcohol) and fixed in 3.7% formaldehyde for 45 min. They were permeabilized thereafter by 0.5% Triton X-100 in PBS for 45 min. To block unspecific antibody binding, the embryos were incubated in 2% normal donkey serum (NDS) for 2 hours. The embryos were incubated overnight at 4°C at a dilution 1:200 in primary antibody against gH2AX (9718, Cell Signaling Technology). The next day, they were incubated for 90 min at a dilution 1:100 in Rhodamine (TRITC)-AffiniPure Donkey Anti-Rabbit IgG (711-025-152, Jackson Immuno Research). Then they were mounted in ProLong™ Gold Antifade Mountant with DAPI (P36941, Invitrogen) with a spacer to uphold the embryonic 3D structure. The embryos were washed 5x for 8 min in PBS supplied with 1mg/ml bovine serum albumin or 0.2% NDS between each steps. The embryos were scanned using a confocal microscope (Leica TCS SP5) and Fiii software90 was employed for image analysis.
Mouse embryo transfer
Wild-type female recipient mice (surrogate) were prepared to receive embryos by mating them with an infertile male one night before the transfer of embryos. Successful preparation of recipient mice for embryo transfer was confirmed by checking for the presence of a plug. Two cell-stage (1.5dpc) embryos were transferred into a single horn of recipient mice and anaesthesia were maintained during this procedure. Pups were born after 19 days of embryo transfer.
Natural breeding, assessment of health of offspring and fertility in mouse
To test the natural breeding efficiency, we set cages with one or two adult (2-months or 12-month-old) control or females with a male of proven fertility. We registered litter sizes and dates of delivery for all litters obtained during a period for up to one year.
Mice Serum AMH analysis
Mice of various ages were anesthetized. Blood was collected in a plain tube, allowed to clot for one hour at room temperature and then centrifuged at 3000 rpm (1500g) for 15 minutes at 4 °C. After centrifugation, supernatant (serum) was collected in a 1.5 ml tube and stored at -80 °C. Serum AMH levels were determined by using AMH ELISA kit (cat. # AL-113) from Ansh Labs, Webster, TX.
Assessment of the health of the offspring from control and mutant breeding was performed on a weekly basis by the personnel of the respective animal facilities following the standard health monitoring protocols approved by the Copenhagen or Catalan Ethics Committee for Animal Experimentation.
Extended Data
Supplementary Material
Acknowledgements
This research has been conducted using the UK Biobank resource under application numbers 871 (Exeter) and 9797 (Cambridge). Full individual study acknowledgements can be found in the Supplementary Information. The authors wish to dedicate this work to the memory of Professor Petr Solc.
Footnotes
Author contributions
All authors reviewed the original and revised manuscripts. Leads on manuscript writing: K.S.R, F.R.D, E.R.H, A.Murray, I.Roig, J.RB.P. Central statistical genetics analysis team: K.S.R, F.R.D, A.Murray, J.RB.P. Animal model working group: J.H, A.M-M, C.E.A, L.K, H.A, J.L.T, J.MG, S.T, E.PTH, M.F, Y.H, A.S, A.Puj, A.J.L, J.A.D, S.E.O, S.H.N, P.Solc, E.R.H, I.Roig. Human oocyte expression working group: A.Azad, V.S, R.B, K.W.O, M.K.H, M.LG, C.Y, E.R.H. Sample collection, genotyping, phenotyping and individual study analysis: K.S.R, F.R.D, D.J.T, P.F, A.Clar, O.B.B, P.Sul, R.G.W, C.T, M.H, K.L, N.O, P.N.T, P.A, S.Stan, P.RHJ.T, T.U.A, B.Z.A, E.N, I.L.A, A.M.A, K.J.A, A.Aug, S.Band, C.M.B, R.N.B, H.B, M.W.B, S.Beno, S.Berg, M.B, E.B, S.E.B, M.K.B, D.I.B, N.B, J.A.B, L.B, J.E.B, A.Camp, H.C, J.E.C, E.C, S.J.C, G.C, M.C, T.C, F.J.C, A.Cox, L.C, S.S.C, F.C, K.C, G.D, E.JCN.d, R.d, I.D, E.W.D, J.D, A.M.D, M.D, M.E, T.E, P.A.F, J.D.F, L.Fer, N.F, T.M.F, M.G-D, M.Mezz, M.G-C, C.G, G.G.G, H.G, D.F.G, V.G, P.G, C.A.H, N.H, P.H, C.Ha, C.He, W.H, G.H, J.L.H, J.J.H, F.H, D.H, M.A.I, R.D.J, M.DR.J, E.M.J, P.K.J, D.K, S.LR.K, C.Kart, R.K, C.M.K, I.K, C.Koop, P.K, A.W.K, Z.K, M.LaBi, G.L, C.L, L.J.L, J.SE.L, D.A.L, L.LM, J.Li, A.L, S.Lind, T.L, M.Lin, Y.L, S.Liu, J.Lu, R.M, P.KE.M, M.Mang, A.Mann, B.Mar, J.Mar, N.G.M, H.M, B.McK, S.E.M, C.Meis, T.M, C.Men, A.Mets, L.M, R.L.M, G.W.M, D.O.M, A.Mulas, A.M.M, Alison.M, M.A.N, A.N, R.N, T.N, D.R.N, A.F.O, H.O, J.N.P, A.V.P, N.L.P, N.P, A.Pet, U.P, P.DP.P, O.P, E.Por, B.M.P, I.Rah, G.R, H.S.R, P.M.R, S.M.R, A.R, L.M.R, F.R.R, J.R, I.Rud, R.R, D.R, C.F.S, E.S, D.P.S, S.San, E.J.S, C.Sar, D.Schl, M.K.S, M.J.S, K.E.S, C.Sco, S.Shek, A.V.S, B.H.S, J.A.S, R.S, M.C.S, T.D.S, J.J.S, M.S, D.Sto, J.BJ.v, K.Str, U.S, A.J.S, T.Tan, L.R.T, A.T, U.Þ, N.J.T, D.T, M.T, M.A.T, T.Tru, J.T, A.G.U, S.U, C.M.V, V.V, U.V, P.V, H.V, Q.W, N.J.W, C.R.W, D.R.W, A.N.W, K.W, G.W, J.F.W, B.HR.W, A.W, A.R.W, W.Z, M.Z, Z.C, L.Li, L.Fra, S.Burg, P.D, T.H.P, K.Stef, J.C, Y.T.v, K.L.L, D.I.C, D.F.E, J.A.V, J.M.M, K.K.O, A.Murray, J.RB.P
Competing interests
Full individual study and author disclosures can be found in the Supplementary Information.
Data availability
Full genome-wide association summary statistics for the discovery meta-analysis are available from the ReproGen website (www.reprogen.org).
MII Oocyte dataset EGAS00001004947. Access to EGAS00001004947 is granted in accordance with the ethics permission under which the data were collected from participants and under appropriate GDPR compliant data processor agreements.
SMR https://cnsgenomics.com/software/smr/#eQTLsummarydata
Tabula Muris https://tabula-muris.ds.czbiohub.org/
LDSC-SEG https://github.com/bulik/ldsc/wiki/Cell-type-specific-analyses
RNA-seq samples https://www.ebi.ac.uk/ena
References
- 1.Lambalk CB, van Disseldorp J, de Koning CH, Broekmans FJ. Testing ovarian reserve to predict age at menopause. Maturitas. 2009;63:280–91. doi: 10.1016/j.maturitas.2009.06.007. [DOI] [PubMed] [Google Scholar]
- 2.Collaborative Group on Hormonal Factors in Breast Cancer. Type and timing of menopausal hormone therapy and breast cancer risk: individual participant meta-analysis of the worldwide epidemiological evidence. Lancet (London, England) 2019;394:1159–1168. doi: 10.1016/S0140-6736(19)31709-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Murray A, et al. Population-based estimates of the prevalence of FMR1 expansion mutations in women with early menopause and primary ovarian insufficiency. Genet Med. 2014;16:19–24. doi: 10.1038/gim.2013.64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Christensen K, Doblhammer G, Rau R, Vaupel JW. Ageing populations: the challenges ahead. Lancet (London, England) 2009;374:1196–208. doi: 10.1016/S0140-6736(09)61460-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.InterLACE Study Team. Variations in reproductive events across life: a pooled analysis of data from 505 147 women across 10 countries. Hum Reprod. 2019;34:881–893. doi: 10.1093/humrep/dez015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gruhn JR, et al. Chromosome errors in human eggs shape natural fertility over reproductive life span. Science. 2019;365:1466–1469. doi: 10.1126/science.aav7321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Donnez J, Dolmans M-M. Fertility Preservation in Women. N Engl J Med. 2017;377:1657–1665. doi: 10.1056/NEJMra1614676. [DOI] [PubMed] [Google Scholar]
- 8.Yding Andersen C, Mamsen LS, Kristensen SG. FERTILITY PRESERVATION: Freezing of ovarian tissue and clinical opportunities. Reproduction. 2019;158:F27–F34. doi: 10.1530/REP-18-0635. [DOI] [PubMed] [Google Scholar]
- 9.Argyle CE, Harper JC, Davies MC. Oocyte cryopreservation: where are we now? Hum Reprod Update. 2016;22:440–9. doi: 10.1093/humupd/dmw007. [DOI] [PubMed] [Google Scholar]
- 10.Stolk L, et al. Meta-analyses identify 13 loci associated with age at menopause and highlight DNA repair and immune pathways. Nat Genet. 2012;44:260–8. doi: 10.1038/ng.1051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Venturella R, et al. The Genetics of Non-Syndromic Primary Ovarian Insufficiency: A Systematic Review. Int J Fertil Steril. 2019;13:161–168. doi: 10.22074/ijfs.2019.5599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Titus S, et al. Impairment of BRCA1-related DNA double-strand break repair leads to ovarian aging in mice and humans. Sci Transl Med. 2013;5:172ra21. doi: 10.1126/scitranslmed.3004925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Day FR, et al. Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair. Nat Genet. 2015;47:1294–303. doi: 10.1038/ng.3412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Horikoshi M, et al. Elucidating the genetic architecture of reproductive ageing in the Japanese population. Nat Commun. 2018;9:1977. doi: 10.1038/s41467-018-04398-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Caburet S, et al. Homozygous hypomorphic BRCA2 variant in primary ovarian insufficiency without cancer or Fanconi anaemia trait. J Med Genet. 2020 doi: 10.1136/jmedgenet-2019-106672. [DOI] [PubMed] [Google Scholar]
- 16.Thompson DJ, et al. Genetic predisposition to mosaic Y chromosome loss in blood. Nature. 2019;575:652–657. doi: 10.1038/s41586-019-1765-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Day FR, et al. Genomic analyses identify hundreds of variants associated with age at menarche and support a role for puberty timing in cancer risk. Nat Genet. 2017;10:1–19. doi: 10.1038/ng.3841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Boyle EA, Li YI, Pritchard JK. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell. 2017;169:1177–1186. doi: 10.1016/j.cell.2017.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Reddy P, et al. Oocyte-specific deletion of Pten causes premature activation of the primordial follicle pool. Science. 2008;319:611–3. doi: 10.1126/science.1152257. [DOI] [PubMed] [Google Scholar]
- 20.Suh E-K, et al. p63 protects the female germ line during meiotic arrest. Nature. 2006;444:624–8. doi: 10.1038/nature05337. [DOI] [PubMed] [Google Scholar]
- 21.Deutsch GB, et al. DNA damage in oocytes induces a switch of the quality control factor TAp63α from dimer to tetramer. Cell. 2011;144:566–76. doi: 10.1016/j.cell.2011.01.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Tuppi M, et al. Oocyte DNA damage quality control requires consecutive interplay of CHK2 and CK1 to activate p63. Nat Struct Mol Biol. 2018;25:261–269. doi: 10.1038/s41594-018-0035-7. [DOI] [PubMed] [Google Scholar]
- 23.Rinaldi VD, Bloom JC, Schimenti JC. Oocyte Elimination Through DNA Damage Signaling from CHK1/CHK2 to p53 and p63. Genetics. 2020;215:373–378. doi: 10.1534/genetics.120.303182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Aiken CE, Tarry-Adkins JL, Penfold NC, Dearden L, Ozanne SE. Decreased ovarian reserve, dysregulation of mitochondrial biogenesis, and increased lipid peroxidation in female mouse offspring exposed to an obesogenic maternal diet. FASEB J. 2016;30:1548–56. doi: 10.1096/fj.15-280800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Pittman DL, et al. Meiotic prophase arrest with failure of chromosome synapsis in mice deficient for Dmc1, a germline-specific RecA homolog. Mol Cell. 1998;1:697–705. doi: 10.1016/s1097-2765(00)80069-6. [DOI] [PubMed] [Google Scholar]
- 26.Nakajo N, et al. Absence of Wee1 ensures the meiotic cell cycle in Xenopus oocytes. Genes Dev. 2000;14:328–38. [PMC free article] [PubMed] [Google Scholar]
- 27.Ruth KS, et al. Events in Early Life are Associated with Female Reproductive Ageing: A UK Biobank Study. Sci Rep. 2016;6:24710. doi: 10.1038/srep24710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bolcun-Filas E, Rinaldi VD, White ME, Schimenti JC. Reversal of female infertility by Chk2 ablation reveals the oocyte DNA damage checkpoint pathway. Science. 2014;343:533–536. doi: 10.1126/science.1247671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Adhikari D, et al. Inhibitory phosphorylation of Cdk1 mediates prolonged prophase I arrest in female germ cells and is essential for female reproductive lifespan. Cell Res. 2016;26:1212–1225. doi: 10.1038/cr.2016.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Rinaldi VD, Bolcun-Filas E, Kogo H, Kurahashi H, Schimenti JC. The DNA Damage Checkpoint Eliminates Mouse Oocytes with Chromosome Synapsis Failure. Mol Cell. 2017;67:1026–1036.:e2. doi: 10.1016/j.molcel.2017.07.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tharp ME, Malki S, Bortvin A. Maximizing the ovarian reserve in mice by evading LINE-1 genotoxicity. Nat Commun. 2020;11:330. doi: 10.1038/s41467-019-14055-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Liu Q, et al. Chk1 is an essential kinase that is regulated by Atr and required for the G(2)/M DNA damage checkpoint. Genes Dev. 2000;14:1448–59. [PMC free article] [PubMed] [Google Scholar]
- 33.Abe H, et al. CHEK1 coordinates DNA damage signaling and meiotic progression in the male germline of mice. Hum Mol Genet. 2018;27:1136–1149. doi: 10.1093/hmg/ddy022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Chen L, et al. Checkpoint kinase 1 is essential for meiotic cell cycle regulation in mouse oocytes. Cell Cycle. 2012;11:1948–55. doi: 10.4161/cc.20279. [DOI] [PubMed] [Google Scholar]
- 35.Pacheco S, et al. ATR is required to complete meiotic recombination in mice. Nat Commun. 2018;9:2622. doi: 10.1038/s41467-018-04851-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Pacheco S, Maldonado-Linares A, Garcia-Caldés M, Roig I. ATR function is indispensable to allow proper mammalian follicle development. Chromosoma. 2019;128:489–500. doi: 10.1007/s00412-019-00723-7. [DOI] [PubMed] [Google Scholar]
- 37.López-Contreras AJ, Gutierrez-Martinez P, Specks J, Rodrigo-Perez S, Fernandez-Capetillo O. An extra allele of Chk1 limits oncogene-induced replicative stress and promotes transformation. J Exp Med. 2012;209:455–61. doi: 10.1084/jem.20112147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Salpeter SR, et al. Meta-analysis: effect of hormone-replacement therapy on components of the metabolic syndrome in postmenopausal women. Diabetes Obes Metab. 2006;8:538–54. doi: 10.1111/j.1463-1326.2005.00545.x. [DOI] [PubMed] [Google Scholar]
- 39.Manson JE, et al. Menopausal hormone therapy and health outcomes during the intervention and extended poststopping phases of the Women’s Health Initiative randomized trials. JAMA. 2013;310:1353–68. doi: 10.1001/jama.2013.278040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Ruth KS, et al. Using human genetics to understand the disease impacts of testosterone in men and women. Nat Med. 2020;26:252–258. doi: 10.1038/s41591-020-0751-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Dam V, et al. Association of menopausal characteristics and risk of coronary heart disease: a pan-European case-cohort analysis. Int J Epidemiol. 2019;48:1275–1285. doi: 10.1093/ije/dyz016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.de Kat AC, et al. Unraveling the associations of age and menopause with cardiovascular risk factors in a large population-based study. BMC Med. 2017;15:2. doi: 10.1186/s12916-016-0762-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Atsma F, Bartelink M-LEL, Grobbee DE, van der Schouw YT. Postmenopausal status and early menopause as independent risk factors for cardiovascular disease: a meta-analysis. Menopause. 13:265–79. doi: 10.1097/01.gme.0000218683.97338.ea. [DOI] [PubMed] [Google Scholar]
- 44.Ambikairajah A, Walsh E, Cherbuin N. Lipid profile differences during menopause: a review with meta-analysis. Menopause. 2019;26:1327–1333. doi: 10.1097/GME.0000000000001403. [DOI] [PubMed] [Google Scholar]
- 45.Pike CJ. Sex and the development of Alzheimer’s disease. J Neurosci Res. 2017;95:671–680. doi: 10.1002/jnr.23827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Zhu D, et al. Body mass index and age at natural menopause: an international pooled analysis of 11 prospective studies. Eur J Epidemiol. 2018;33:699–710. doi: 10.1007/s10654-018-0367-y. [DOI] [PubMed] [Google Scholar]
- 47.Shadyab AH, et al. Ages at menarche and menopause and reproductive lifespan as predictors of exceptional longevity in women: the Women’s Health Initiative. Menopause. 2017;24:35–44. doi: 10.1097/GME.0000000000000710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Bycroft C, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Tyrrell J, et al. Using genetics to understand the causal influence of higher BMI on depression. Int J Epidemiol. 2019;48:834–848. doi: 10.1093/ije/dyy223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Loh P-R, et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet. 2015;47:284–290. doi: 10.1038/ng.3190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Terao C, et al. Chromosomal alterations among age-related haematopoietic clones in Japan. Nature. 2020;584:130–135. doi: 10.1038/s41586-020-2426-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wood AR, et al. Variants in the FTO and CDKAL1 loci have recessive effects on risk of obesity and type 2 diabetes, respectively. Diabetologia. 2016;59:1214–21. doi: 10.1007/s00125-016-3908-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Szustakowski JD, et al. Advancing Human Genetics Research and Drug Discovery through Exome Sequencing of the UK Biobank. medRxiv. 2020:2020.11.02.20222232. doi: 10.1101/2020.11.02.20222232. [DOI] [PubMed] [Google Scholar]
- 55.McLaren W, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Zhou W, et al. Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts. Nat Genet. 2020;52:634–639. doi: 10.1038/s41588-020-0621-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Zhu Z, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48:481–487. doi: 10.1038/ng.3538. [DOI] [PubMed] [Google Scholar]
- 58.Pers TH, et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun. 2015;6:5890. doi: 10.1038/ncomms6890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Tabula Muris Consortium et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018;562:367–372. doi: 10.1038/s41586-018-0590-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Nestorowa S, et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood. 2016;128:e20–31. doi: 10.1182/blood-2016-05-716480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Finucane HK, et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat Genet. 2018;50:621–629. doi: 10.1038/s41588-018-0081-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348:648–60. doi: 10.1126/science.1262110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–30. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Fehrmann RSN, et al. Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat Genet. 2015;47:115–25. doi: 10.1038/ng.3173. [DOI] [PubMed] [Google Scholar]
- 65.Segrè AV, et al. Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet. 2010;6 doi: 10.1371/journal.pgen.1001058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.de Leeuw CA, Mooij JM, Heskes T, Posthuma D. MAGMA: Generalized Gene-Set Analysis of GWAS Data. PLoS Comput Biol. 2015;11 doi: 10.1371/journal.pcbi.1004219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Lamparter D, Marbach D, Rueedi R, Kutalik Z, Bergmann S. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics. PLoS Comput Biol. 2016;12:e1004714. doi: 10.1371/journal.pcbi.1004714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Deelen P, et al. Improving the diagnostic yield of exome- sequencing by predicting gene-phenotype associations using large-scale gene expression analysis. Nat Commun. 2019;10:2837. doi: 10.1038/s41467-019-10649-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Knijnenburg TA, et al. Genomic and Molecular Landscape of DNA Damage Repair Deficiency across The Cancer Genome Atlas. Cell Rep. 2018;23:239–254.:e6. doi: 10.1016/j.celrep.2018.03.076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Pearl LH, Schierz AC, Ward SE, Al-Lazikani B, Pearl FMG. Therapeutic opportunities within the DNA damage response. Nat Rev Cancer. 2015;15:166–80. doi: 10.1038/nrc3891. [DOI] [PubMed] [Google Scholar]
- 71.Álvarez-Quilón A, et al. Endogenous DNA 3’ Blocks Are Vulnerabilities for BRCA1 and BRCA2 Deficiency and Are Reversed by the APE2 Nuclease. Mol Cell. 2020;78:1152–1165.:e8. doi: 10.1016/j.molcel.2020.05.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Vilhjálmsson BJ, et al. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. Am J Hum Genet. 2015;97:576–92. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Purcell S, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 2013;37:658–65. doi: 10.1002/gepi.21758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Burgess S, Thompson SG. Interpreting findings from Mendelian randomization using the MR-Egger method. Eur J Epidemiol. 2017;32:377–389. doi: 10.1007/s10654-017-0255-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator. Genet Epidemiol. 2016;40:304–14. doi: 10.1002/gepi.21965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Bowden J, et al. Improving the visualization, interpretation and analysis of two-sample summary data Mendelian randomization via the Radial plot and Radial regression. Int J Epidemiol. 2018;47:2100. doi: 10.1093/ije/dyy265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Samuelsson A-M, et al. Diet-induced obesity in female mice leads to offspring hyperphagia, adiposity, hypertension, and insulin resistance: a novel murine model of developmental programming. Hypertens (Dallas,Tex 1979) 2008;51:383–92. doi: 10.1161/HYPERTENSIONAHA.107.101477. [DOI] [PubMed] [Google Scholar]
- 79.Sankar A, et al. KDM4A regulates the maternal-to-zygotic transition by protecting broad H3K4me3 domains from H3K9me3 invasion in oocytes. Nat Cell Biol. 2020;22:380–388. doi: 10.1038/s41556-020-0494-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Li L, et al. Single-Cell RNA-Seq Analysis Maps Development of Human Germline Cells and Gonadal Niche Interactions. Cell Stem Cell. 2017;20:858–873.:e4. doi: 10.1016/j.stem.2017.03.007. [DOI] [PubMed] [Google Scholar]
- 81.Zhang Y, et al. Transcriptome Landscape of Human Folliculogenesis Reveals Oocyte and Granulosa Cell Interactions. Mol Cell. 2018;72:1021–1034.:e4. doi: 10.1016/j.molcel.2018.10.029. [DOI] [PubMed] [Google Scholar]
- 82.Bolger AM, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Chen S, et al. AfterQC: automatic filtering, trimming, error removing and quality control for fastq data. BMC Bioinformatics. 2017;18:80. doi: 10.1186/s12859-017-1469-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016;11:1650–67. doi: 10.1038/nprot.2016.095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Takai H, et al. Chk2-deficient mice exhibit radioresistance and defective p53-mediated transcription. EMBO J. 2002;21:5195–205. doi: 10.1093/emboj/cdf506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Lewandoski M, Wassarman KM, Martin GR. Zp3-cre, a transgenic mouse line for the activation or inactivation of loxP-flanked target genes specifically in the female germ line. Curr Biol. 1997;7:148–51. doi: 10.1016/s0960-9822(06)00059-5. [DOI] [PubMed] [Google Scholar]
- 88.Lam MH, Liu Q, Elledge SJ, Rosen JM. Chk1 is haploinsufficient for multiple functions critical to tumor suppression. Cancer Cell. 2004;6:45–59. doi: 10.1016/j.ccr.2004.06.015. [DOI] [PubMed] [Google Scholar]
- 89.Takeo T, Nakagata N. Superovulation using the combined administration of inhibin antiserum and equine chorionic gonadotropin increases the number of ovulated oocytes in C57BL/6 female mice. PLoS One. 2015;10:e0128330. doi: 10.1371/journal.pone.0128330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Schindelin J, et al. Fiji: an open-source platform for biological-image analysis. Nat Methods. 2012;9:676–82. doi: 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Full genome-wide association summary statistics for the discovery meta-analysis are available from the ReproGen website (www.reprogen.org).
MII Oocyte dataset EGAS00001004947. Access to EGAS00001004947 is granted in accordance with the ethics permission under which the data were collected from participants and under appropriate GDPR compliant data processor agreements.
SMR https://cnsgenomics.com/software/smr/#eQTLsummarydata
Tabula Muris https://tabula-muris.ds.czbiohub.org/
LDSC-SEG https://github.com/bulik/ldsc/wiki/Cell-type-specific-analyses
RNA-seq samples https://www.ebi.ac.uk/ena