Skip to main content
Human Molecular Genetics logoLink to Human Molecular Genetics
. 2015 Jul 21;24(19):5628–5636. doi: 10.1093/hmg/ddv257

Low-frequency germline variants across 6p22.2–6p21.33 are associated with non-obstructive azoospermia in Han Chinese men

Bixian Ni 1,4,, Yuan Lin 1,4,, Liangdan Sun 6,7,, Meng Zhu 4,, Zheng Li 8, Hui Wang 4, Jun Yu 1,2, Xuejiang Guo 1,2, Xianbo Zuo 6,7, Jing Dong 4, Yankai Xia 1,5, Yang Wen 1,4, Hao Wu 1,2, Honggang Li 9, Yong Zhu 8, Ping Ping 8, Xiangfeng Chen 8, Juncheng Dai 4, Yue Jiang 1,4, Peng Xu 10, Qiang Du 11, Bing Yao 12, Ning Weng 10, Hui Lu 8, Zhuqing Wang 8, Xiaobin Zhu 8, Xiaoyu Yang 1,3, Chenliang Xiong 9, Hongxia Ma 4, Guangfu Jin 4, Jianfeng Xu 13, Xinru Wang 1,5, Zuomin Zhou 1,2, Jiayin Liu 1,3, Xuejun Zhang 6,7, Donald F Conrad 15,16, Zhibin Hu 1,4,14,*, Jiahao Sha 1,2,*
PMCID: PMC4902876  PMID: 26199320

Abstract

Genome-wide association studies (GWAS) have identified several common loci contributing to non-obstructive azoospermia (NOA). However, a substantial fraction of NOA heritability remains undefined, especially those low-frequency [defined here as having a minor allele frequency (MAF) between 0.5 and 5%] and rare (MAF below 0.5%) variants. Here, we performed a 3-stage exome-wide association study in Han Chinese men to evaluate the role of low-frequency or rare germline variants in NOA development. The discovery stage included 962 NOA cases and 1348 healthy male controls genotyped by exome chips and was followed by a 2-stage replication with an additional 2168 cases and 5248 controls. We identified three low-frequency variants located at 6p22.2 (rs2298090 in HIST1H1E encoding p.Lys152Arg: OR = 0.30, P = 2.40 × 10−16) and 6p21.33 (rs200847762 in FKBPL encoding p.Pro137Leu: OR = 0.11, P = 3.77 × 10−16; rs11754464 in MSH5: OR = 1.78, P = 3.71 × 10−7) associated with NOA risk after Bonferroni correction. In summary, we report an instance of newly identified signals for NOA risk in genes previously undetected through GWAS on 6p22.2–6p21.33 in a Chinese population and highlight the role of low-frequency variants with a large effect in the process of spermatogenesis.

Introduction

Infertility is a growing severe reproductive health problem and is estimated to affect 10–15% of couples worldwide (1). In China, the incidence of infertility is 10%, as reported by the Chinese Women and Children Development Center and the China Population Association (2). Approximately 50% of these cases are due to male factor abnormalities (3,4). A substantial proportion of male infertility is accompanied by abnormal semen quality, including azoospermia. Non-obstructive azoospermia (NOA) (OMIM:#415000), a severe form of azoospermia characterized by no or little sperm in semen as a result of a congenital dysfunction in spermatogenesis, occurs in ∼1% of all adult men (5,6). Up to now, several genetic causes of NOA have been reported involving whole chromosome aneuploidy, Y chromosome microdeletions and autosomal chromosome mutations (710).

Genome-wide association studies (GWAS) have been successfully applied in determining the genetic basis of more than 17 trait categories, and over 15 396 genetic loci have been identified for cancer, diabetes, hypertension and other diseases (1113). Our previous GWAS focused on common genetic variants (minor allele frequency MAF > 0.05) and identified three loci at 1p13.3, 1p36.32 and 12p12.1 for NOA in Han Chinese men (14). Recently, we evaluated promising associations in an extended 3-stage validation and found four additional susceptibility loci at 6p21.32, 10q25.3, 6p12.2 and 1q42.13, among which CDC42BPA at 1q42.13 was found to have a phenotypic effect on male fertility using the Drosophila model (15). In addition, another GWAS conducted by Zhao et al. has revealed variants within the human leukocyte antigen regions (6p22) to be associated with NOA in Han Chinese subjects (16).

However, most common genetic variants identified by GWAS confer relatively small increments in risk (1.1–1.5-fold) and explain only a small proportion of heritability (17,18). Moreover, the vast majority of the identified variants have no direct biological relevance to disease or clinical usage for prognosis or treatment. Recently, low-frequency (MAF defined here as from 0.5 to 5%) and rare (MAF < 0.5%) variants have been hypothesized to convey larger relative risks than common variants and may contribute to the missing heritability of complex diseases (19,20). Using a new platform, the ‘exome chip’ or ‘exome array,’ to capture low-frequency or rare variants in coding regions based on genetic variants discovered from whole-exome sequencing of >12 000 individuals, a series of uncommon variants have been found to contribute to human biological processes and disorders, such as coronary heart disease, non-alcoholic fatty liver disease, insulin processing and secretion and hematological traits with established biological relevance (2124).

To assess the role of low-frequency or rare variants in NOA development, we genotyped 247 870 variants in 962 NOA cases and 1348 controls with a healthy birth history using the exome chip and subsequently validated promising associations in an additional 2168 cases and 5248 controls.

Results

Single-variant association analysis

As shown in Supplementary Material, Figure S1, the quantile–quantile plot and the genomic-control inflation factor (λ) of 1.037 indicated a low possibility of false-positive associations resulting from population stratification. However, in the discovery stage, we did not find any low-frequency or rare variants associated with NOA risk at our predefined exome-wide significance level (P < 1.25 × 10−6) (Fig. 1). We next conducted a 2-stage replication study for seven promising variants in line with selection criteria detailed in the Materials and Methods section. In Replication I with 714 NOA cases and 1398 healthy male controls, four variants (rs2298090, rs3130785, rs11754464 and rs200847762) with P < 0.05 showed consistent associations with those in the discovery stage (Supplementary Material, Table S1). For Replication II, an additional 1454 NOA cases and 3850 healthy male controls were further genotyped to verify the significant associations of these four loci. Finally, three of them (rs2298090, rs200847762 and rs11754464) showed significant associations in the same direction as those observed in the discovery stage and Replication I (Supplementary Material, Table S1).

Figure 1.

Figure 1.

Manhattan plot of exome-wide association analyses for NOA. Exome-wide association results for NOA in Han Chinese men. The association results (–log10(P) values, y-axis) are plotted against genomic position (x-axis by chromosome and chromosomal position of NCBI build 37). The green horizontal line represents P = 1.0 × 10−4.

When we combined the discovery and replication stages, all three low-frequency variants reached our predefined exome-wide significance threshold (P < 1.25 × 10−6) for NOA susceptibility in 3107 cases and 6583 controls without heterogeneity (Table 1). Among them, one low-frequency variant was located at 6p22.2 (rs2298090 in HIST1H1E encoding p.Lys152Arg: OR = 0.30, P = 2.40 × 10−16) and two at 6p21.33 (rs200847762 in FKBPL encoding p.Pro137Leu: OR = 0.11, P = 3.77 × 10−16; rs11754464 in MSH5: OR = 1.78, P = 3.71 × 10−7).

Table 1.

Newly identified low-frequency variants associated with NOA risk

Chr. Gene Variant ID Major/minor allele Location Stage Casesa Controlsa MAFb
ORaddc Paddc P-valued
(N = 3107) (N = 6583) Cases Controls (95%CI)
6p22.2 HISTIH1E rs2298090 A/G p.Lys152Arg Discovery 0/12/927 0/65/1270 0.006 0.024 0.26 (0.14–0.49) 2.87E-05
exon 1 Replication I 0/19/691 0/125/1261 0.013 0.045 0.28 (0.17–0.45) 3.15E-07
Replication II 0/11/1438 0/75/3241 0.004 0.011 0.33 (0.18–0.62) 1.06E-04
Combined All 0/42/3056 0/265/5772 0.007 0.022 0.30 (0.22–0.42) 2.40E-16
Meta-analysis 0.29 (0.21–0.40) 6.93E-14 0.860
6p21.33 FKBPL rs200847762 G/A p.Pro137Leu Discovery 0/3/936 1/53/1281 0.002 0.021 0.08 (0.02–0.25) 1.62E-05
exon 2 Replication I 0/2/711 1/42/1340 0.001 0.016 0.09 (0.02–0.37) 8.43E-04
Replication II 0/2/1444 0/30/3286 0.001 0.005 0.15 (0.04–0.64) 6.67E-04
Combined All 0/7/3091 2/125/5907 0.001 0.011 0.11 (0.05–0.23) 3.77E-16
Meta-analysis 0.10 (0.05–0.22) 1.16E-08 0.792
6p21.33 MSH5 rs11754464 G/A intron 13 Discovery 2/49/888 1/23/1311 0.028 0.009 2.67 (1.64–4.36) 8.49E-05
Replication I 1/34/669 0/39/1352 0.026 0.014 1.84 (1.16–2.90) 9.07E-03
Replication II 1/62/1381 0/105/3211 0.022 0.016 1.41 (1.03–1.94) 3.45E-02
Combined All 4/145/2938 1/167/5874 0.025 0.014 1.78 (1.43–2.22) 3.71E-07
Meta-analysis 1.74 (1.38–2.19) 2.54E-06 0.095

aMinor homozygote/Heterozygote/Major homozygote.

bMinor allele frequency.

cDerived from logistic regression model adjusting for the top three principal components (for the discovery stage only) assuming an additive genetic model.

dP for heterogeneity test.

Conditional association analysis

To further analyze the two identified regions of 6p22.2 and 6p21.33 containing HIST1H1E, FKBPL and MSH5, we carefully re-evaluated the exome chip genotype calls in the flanking regions (1 Mb upstream or downstream) of rs2298090, rs200847762 and rs11754464, respectively. As shown in Figure 2 and Supplementary Material, Table S2, among 338 uncommon variants near FKBPL and 89 uncommon variants near HIST1H1E, we did not find any more prominent association than that of rs200847762 or rs2298090, and there were no other variants in strong LD (r2 > 0.5) with either of these two variants. In contrast, among 290 uncommon variants in the MSH5 region remaining after quality control, four variants showed lower association P-values than rs11754464, including newly identified rs200847762 and two variants (rs3130785 and rs11751198) that failed to be validated in the replication stages. The other variant is rs3129987, which is an intergenic variant in strong LD with rs3130785 (r2/D′ = 0.91/1.00), and the association between rs3129987 and NOA risk was found to be abolished after conditioning on rs11754464 (P = 0.056). Furthermore, after controlling for each other, the two lead variants (rs11754464 and rs200847762) still showed substantially significant associations with NOA risk (Supplementary Material, Table S2). Additionally, rs200847762 was independent of the GWAS reported variants rs3129878 and rs7194 (312 and 315 kb apart from rs200847762, respectively) with conditional P-values of 3.33 × 10−5 and 2.89 × 10−5, respectively. After conditioning on rs3129878 or rs7194 (685 and 689 kb apart from rs11754464, respectively), the significance of rs11754464 decreased slightly (Pcond = 1.15 × 10−3 and 1.23 × 10−3, respectively) (Supplementary Material, Table S3). The findings above indicated that these two low-frequency variants were independent signals of NOA susceptibility on 6p21.33.

Figure 2.

Figure 2.

Figure 2.

Regional association plots with lead variants indicated by a purple diamond at 6p21.33 (A, B) and 6p22.2 (C). The association of an individual variant is plotted as –log10 P against chromosomal position. y-Axis shows the recombination rate estimated from 1000 Genomes Project CHB and JPT data.

Moreover, we also looked for promising uncommon variants in the regions identified by our previous GWAS, and we screened the region 400 kb upstream or downstream of the seven tagSNPs (14,15). However, it appears that the exome chip did not provide promising uncommon variants for these regions, except for 6p21.33 (Supplementary Material, Table S4).

Gene-based analysis

We further conducted gene-based analysis through both the burden test with MAF < 0.01 and the SKAT with MAF < 0.05. Here, we found a significant association between NOA and the gene CFTR (P = 5.77 × 10−7) (Table 2). After conditioning on the most significant rare variant rs113857788 (MAF = 0.001 in the controls) for the gene, the significance of CFTR decreased by multiple orders of magnitude (P = 9.65 × 10−3), indicating that the gene-based signal was driven mainly by the rare missense variant rs113857788 (p.Gln1352His) (Table 2). However, the lead variant rs113857788 was not validated in Replication I. We further excluded rs113857788 and found that the association between NOA and CFTR was abolished (P = 5.05 × 10−3) (Table 2).

Table 2.

The significant and top non-significant association results by gene-based analysis

Test Chr. Gene Number of variants Variants (minor allele countsa) P-valueb Lead variant Condition analysis Pc Exclusion analysis Pd
(P-value of single-variant test)
Burden T1 7q31.2 CFTR 14 rs1800073(10/14), rs115545701(4/1), rs35516286(1/0), rs1800079(1/0), rs77646904(1/0), rs1800095(1/0), rs140455771(1/1), rs1800103(1/0), rs201864483(4/1), rs200321110(2/2), rs201591901(2/2), rs139729994(1/1), rs113857788(22/4), rs4148725(2/1) 5.77 × 10−7 rs113857788 (4.56 × 10−5) 9.65 × 10−3 5.05 × 10−3
21q22.11 XRRA1 5 rs200857751(1/0), rs148106244(2/0), rs77341304(1/0), rs78981281(0/1), rs180810179(17/3) 1.83 × 10−5
6p21.33 PSORS1C2 4 rs79153019(23/10), rs139472873(1/0), rs150591468(3/3), rs148095595(4/1) 1.93 × 10−5
SKAT 7q34 WEE2 2 rs202005800(1/0), rs35683659(63/49) 1.04 × 10−3
6p21.33 LY6G6F 1 rs17200983(48/28) 1.11 × 10−3
12q24.31 SETD1B 4 rs61736010(58/41), rs61734124(6/4), rs58801491(36/30), rs200570124(0/1) 1.27 × 10−3

aMinor allele counts for cases/controls in the discovery stage.

bAfter adjusting for the top three principal components.

cAfter additionally adjusting for the lead variant in the gene.

dAfter excluding the lead variant in the gene.

Discussion

The previous two GWAS of NOA have identified several independent common variants (rs7194, rs498422 and rs3129878) at 6p22.2–21.33 associated with the risk of NOA in the Chinese population, indicating that variants in this region may mediate the response to testicular microenvironmental antigens and cause testicular azoospermia through autoimmune inflammatory responses. However, none of these common variants were proved to be causal (15,16). Of note, in the current 3-stage case–control study, we showed strong evidence of three low-frequency NOA susceptibility loci in this chromosome region. Among them, variant alleles of rs2298090 in HIST1H1E at 6p22.2 and rs200847762 in FKBPL at 6p21.33 conferred a decreased risk of NOA, while the variant allele of rs11754464 in MSH5 at 6p21.33 was associated with an increased risk of NOA.

The low-frequency variant rs2298090 (dbSNP: NC_000006.11: g.26157073A>G) at 6p22.2 is located in the sole exon of HIST1H1E (histone cluster 1, H1e) (Ensembl: ENSG00000168298). It is an A to G variant at the 515 codon, resulting in an amino acid alteration from lysine (Lys) to arginine (Arg). The protein encoded by HISTIH1E belongs to the H1 family of linker histones, which is involved in the compaction of chromatin into higher-order structures (25). As shown in Figure 2A, HIST1H1E is located within large histone gene clusters, including HIST1H1T and HIST1H1A (26). HIST1H1A, which lies 138 kb upstream of HIe, was expressed in testes and restricted to early round spermatids that belonged to the fraction of postmeiotic sperm cells (27). Moreover, testis-specific HIST1H1T, 48 kb upstream of H1e, could only be detected in the cytoplasm of mid- and late-pachytene spermatocytes (28). However, to date, no data are available regarding the function of H1e in spermatogenesis. Of note, on the basis of the online PolyPhen-2 tools (http://genetics.bwh.harvard.edu/pph2/), we found rs2298090 to be probably damaging, with a score of 0.978 (sensitivity: 0.76; specificity: 0.96), indicating that HIST1H1E p.Lys152Arg might have an impact on the structure or function of the protein H1e. Additionally, we downloaded the RNA-Seq data of 67 normal testis tissues from the GTEx (Genotype-Tissue Expression) Project and found two testes to be heterozygotes for rs2298090 among the testes available. We tested the hypothesis that rs2298090 influences the expression of HIST1H1E by comparing the abundance of transcripts carrying the A and G alleles and found a higher expression level for the A allele (binomial test, P = 8.55 × 10−4).

The locus rs200847762 (dbSNP: NC_000006.11: g.32097148G>A), located in the second exon of FKBPL (FK506 binding protein like) (Ensembl: ENSG00000204315), is a missense variant encoding p.Pro137Leu. FKBPL, a member of the FK506 binding protein family, is similar to the immunophilin protein family, which plays a vital role in immunoregulation and basic cellular processes. A previous study showed that FKBPL was widely expressed in mouse testes (including in the Sertoli and Leydig cells), with expression up-regulation during sexual maturation at puberty. Moreover, evidence has shown that FKBPL can increase the transcription of androgen receptor (AR) targets in response to androgen, suggesting that protein FKBPL might be a co-chaperone for AR in the testis (29). Additionally, other genes that encode FK506 binding protein family members, such as FKBP6, FKBP52 and FKBP12, were also found to be involved in spermatogenesis and fertilization in humans and mice (3032). Therefore, it is biologically plausible that the missense variant rs200847762 of FKBPL may be implicated in NOA etiology through the imbalances of the sex hormone metabolic pathway. However, no heterozygote of rs200847762 was found among the 67 testes available from the GTEx Project.

The other low-frequency locus rs11754464 (dbSNP: NC_000006.11: NC_000006.11:g.31723735G>A) at 6p21.33 is located in the 13th intron of MSH5 (mutS homolog 5) (Ensmbl: ENSG00000204410). MSH5 is a member of the mismatch repair gene family and appears to play a role in meiosis, with expression induced during spermatogenesis between the late primary spermatocytes and the elongated spermatid phase (33). Moreover, it interacts specifically with another member of this family, MSH4 (mutS homolog 4). In mice, disruption of either one of these two genes resulted in male and female sterility due to meiotic arrest, characterized by an extended zygotene stage and greatly diminished chromosome synapsis (34,35). Likewise, a reduction in the MSH4 and MSH5 transcript concentration per spermatocyte was also observed in patients with spermatogenic failure (36). Furthermore, several previous association studies have revealed that MSH5 polymorphisms were associated with an increased risk of sperm DNA damage and male infertility (37,38). In view of the above information, we inferred that the low-frequency variant rs11754464 could impact the meiotic recombination, which is a key step of spermatogenesis.

Furthermore, we report a newly identified signal for NOA risk in a gene previously undetected through GWAS, in which the association comprises multiple rare and low-frequency variants in CFTR (cystic fibrosis transmembrane conductance regulator) (Ensembl: ENSG00000001626) when using an MAF upper bound of 1% (Table 2). The glycosylated transmembrane protein encoded by CFTR is a cyclic AMP-regulated chloride channel that conducts the regulation of other transport pathways. Recently, in vivo and in vitro data have indicated that CFTR is involved in the activation of the cAMP/PKA/CREB pathway pertinent to spermatogenesis (39). However, the most significant variant rs113857788 (dbSNP: NC_000007.13:g.117304834G>C) for the gene was not replicated in our first replication phase, and the association between NOA and CFTR was abolished after excluding the lead variant.

There are several limitations to the current study. Firstly, the exome chip mainly focuses on coding regions, while non-coding regions have also been demonstrated to be functional. Sequencing studies are warranted for these non-coding regions in the future. Secondly, although this chip was developed based on whole-exome sequencing of over 12 000 individuals, most of them were Europeans and Americans, so the results might not fully reflect the situation of rare and low-frequency variants among the Chinese population. Thirdly, functional data on the identified variants would greatly enhance this study, which warrants future studies.

In conclusion, we report an instance of newly identified signals for NOA risk in genes previously undetected through GWAS on chromosome 6p22.2–6p21.33 in the Chinese population and highlight the role of low-frequency variants with a large effect during the process of spermatogenesis. This study addresses part of the missing heritability in NOA risk and provides novel insight into the etiology of NOA. Our study is also the first to utilize the exome array for NOA and serves as an extension of our previous analysis of common variants in NOA risk. Meanwhile, along with previous GWAS-identified low-penetrance loci, our newly identified high-penetrance susceptibility markers might be applied as genetic information references in the screening of at-risk populations and help to evaluate the quality of sperm, further guide targeted therapy, and, in some cases, indicate the necessity for artificial insemination or adoption.

Materials and Methods

Ethics statement

This study was approved by the institutional review board of Nanjing Medical University, China (FWA00001501) and conducted according to the Declaration of Helsinki. All participants volunteered for the study and provided informed consent before taking part in this research.

Study participants

We designed a 3-stage case–control study. The discovery stage included 962 NOA cases recruited from the Nanjing Center of Reproductive Medicine between April 2005 and January 2012 and 1348 healthy male controls from Nanjing. The first replication stage (replication I) included 714 NOA cases and 1398 healthy male controls from Jiangsu Province and Shanghai. The second replication stage (replication II) included 1454 NOA cases from Wuhan and Shenyang and 3850 healthy male controls from the matched regions, which was partly described previously (15). All infertile male subjects were genetically unrelated Han Chinese men determined to have idiopathic NOA and selected on the basis of comprehensive andrological testing, including an examination of medical history, a physical examination, semen analysis, scrotal ultrasound, hormone analysis, karyotyping and Y chromosome microdeletion screening. Patients were excluded from the study if they had a history of cryptorchidism, vascular trauma, orchitis, obstruction of the vas deferens, vasectomy, abnormalities in chromosome number or microdeletions of the azoospermia factor region on the Y chromosome. Semen analysis for sperm concentration, motility and morphology was performed following the World Health Organization (WHO) criteria (1999). Subjects with NOA had no detectable sperm in the ejaculate after evaluation of the centrifuged pellet. To differentiate from obstructive azoospermia, only idiopathic azoospermic patients with small and soft testes, normal fructose and neutral alpha glucosidase in seminal plasma were included in the study. Each individual was examined twice to ensure the reliability of the diagnosis, and the absence of spermatozoa from both replicate samples was taken to indicate azoospermia. All the control subjects were also unrelated ethnic Han Chinese who had fathered one or more healthy children without assisted reproductive technologies and were frequency-matched to the cases on the basis of age and area of residence in each stage. For each participant, 5 ml of whole blood was obtained to extract genomic DNA for further genotyping analysis.

Genotyping and quality control in the discovery stage

In the discovery stage, 962 NOA cases and 1348 male controls were genotyped using the Illumina HumanExome Beadchip. Genotypes were called using the Illumina GenomeStudio software, and cluster plots were manually edited as necessary. A systematic quality control was applied on the raw genotyping data to filter unqualified genetic variants and samples before further association analysis (Supplementary Material, Fig. S2). Of the 2310 samples, 23 cases and 13 controls were removed because they (i) had an overall genotyping rate of <95%; (ii) were duplicates or showed familial relationships (PI_HAT > 0.25); or (iii) had an extreme heterozygosity rate with more than six standard deviations from the mean. We detected population outliers and stratification using a principle component analysis-based method. No population outlier was identified, and the cases and controls were almost genetically matched, with the genomic control inflation factor (λ) equal to 1.037 (Supplementary Material, Figs S1 and S3).

Among the 247 870 variants captured by the exome chip, 207 745 variants were excluded for the following reasons: (i) duplicate variants (n = 831); (ii) did not map to autosomal chromosomes (n = 5574); (iii) monomorphic in our study subjects (n = 176 185); (iv) had a call rate of <95% (n = 40); (v) had a genotype distribution in the controls that deviated from that expected with Hardy–Weinberg equilibrium (P < 1 × 10−3) (n = 18); or (vi) MAF among controls ≥0.05 (n = 25 097). After quality control processing, a total of 40 125 uncommon variants, 939 NOA cases and 1335 healthy male controls were included for further analysis.

Genotyping in replication stages

Associations were assessed in an additive model using logistic regression analyses (Fig. 1). We next selected promising variants for further genotyping in replication stages according to the following criteria: (i) single-variant association P ≤ 1.0 × 10−4; (ii) clear genotyping clusters upon visual inspection of cluster plot; and (iii) only one variant was selected when multiple variants showed strong linkage disequilibrium (LD) (r2 ≥ 0.8). Using this screening process, we selected seven variants for the first replication stage (Supplementary Material, Table S1). Significantly associated variants (P < 0.05) in the first replication stage (Replication I) were further genotyped in the second replication stage (Replication II). The genotyping analyses for the two replication sample sets were performed using the TaqMan allelic discrimination assay on the ABI 7900 system (Applied Biosystems, Foster City, CA, USA). Detailed information regarding the primers and probes is shown in Supplementary Material, Table S5. The genotypes were called using the SDS 2.3 Allelic Discrimination Software (Applied Biosystems). A series of methods were used to control the quality of genotyping: (i) case and control samples were mixed on each plate; (ii) genotyping was carried out without knowing the case or control status; (iii) two water controls were used in each 384-well plate as blank controls, and four samples with known genotypes confirmed by Sanger Sequencing were used in each plate as positive internal standards; and (iv) 5% of the samples were randomly selected for repeat genotyping.

Statistical analysis

Single-variant association analysis was conducted in an additive genetic model using a logistic regression analysis implemented in PLINK (40). In the discovery stage, ancestry and population structure were evaluated by principal component analysis using EIGENSOFT based on the 4604 autosomal ancestry informative markers included in the exome chip (41). The top three principal components of ancestry were included as covariates in the logistic regression analysis when estimating odds ratios (ORs) and 95% confidence intervals (95% CIs). The results were validated using the mixed linear model corrected for kinship matrices (42). Meta-analysis was performed using a random-effects model when heterogeneity was detected among studies, i.e when the P-value for the heterogeneity test was <0.05; otherwise, a fixed-effects model was used. We also performed two gene-based tests based on rare and low-frequency variants: a simple burden test with an MAF cutoff <1% (burden T1) (43) and a sequence kernel association test (SKAT) with an MAF cutoff <5% (44), which were implemented in the sequence kernel association optimal (SKAT-O) test (45). We defined statistical significance using Bonferroni correction and set the significance levels at 1.25 × 10−6 and 3.36 × 10−6 for single-variant analysis (0.05/40 125 variants tested) and gene-based analysis (6919 genes in the burden test and 7979 genes in the SKAT-O test), respectively. The Manhattan plot and quantile–quantile plot were generated using R 2.3.1, and regional plots were created using LocusZoom (46). We annotated variants according to GENCODE version 7 coding transcripts (47), dbNSFP v2.0 (48) or documentation files obtained from the Illumina Product Support Files at ftp://ussd-ftp.illumina.com/downloads/ProductFiles/HumanExome-12/HumanExome-12v1-2_A.annotated.txt. The full set of summary association statistics (P-value and direction of effect) for the discovery sample is available in the Supplementary Material, Data.

Supplementary Material

Supplementary Material is available at HMG online.

Funding

This work was supported by the National Key Basic Research Program Grant (2013CB911400, 2011CB944300), the National Science Foundation for Distinguished Young Scholars of China (81225020), the Foundation of Jiangsu Province for Distinguished Young Scholars (BK2012042), the Jiangsu Natural Science Foundation (BK20130060), the National Program for Support of Top-notch Young Professionals from the Organization Department of the CPC Central Committee, the Jiangsu Specially Appointed Professor Project, Jiangsu Province Clinical Science and Technology Projects (BL2012008), the Qing Lan Project of Jiangsu Province, the Priority Academic Program for the Development of Jiangsu Higher Education Institutions (Public Health and Preventive Medicine), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (12KJB330003) and Recovery Medical Science Foundation.

Supplementary Material

Supplementary Data

Acknowledgements

We gratefully acknowledge the contribution of all participants who contributed to this work.

Conflict of Interest statement. None declared.

References

  • 1.de Kretser D.M. (1997) Male infertility. Lancet, 349, 787–790. [DOI] [PubMed] [Google Scholar]
  • 2.Hong K., Xu Q.Q., Zhao Y.P., Gu Y.Q., Jiang H., Wang X.F., Zhu J.C. (2011) Andrology in China: current status and 10 years’ progress. Asian. J. Androl., 13, 512–518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hirsh A. (2003) Male subfertility. BMJ, 327, 669–672. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ferlin A., Arredi B., Foresta C. (2006) Genetic causes of male infertility. Reprod. Toxicol., 22, 133–141. [DOI] [PubMed] [Google Scholar]
  • 5.Huynh T., Mollard R., Trounson A. (2002) Selected genetic factors associated with male infertility. Hum. Reprod. Update, 8, 183–198. [DOI] [PubMed] [Google Scholar]
  • 6.Maduro M.R., Lamb D.J. (2002) Understanding new genetics of male infertility. J. Urol., 168, 2197–2205. [DOI] [PubMed] [Google Scholar]
  • 7.Ferlin A., Raicu F., Gatta V., Zuccarello D., Palka G., Foresta C. (2007) Male infertility: role of genetic background. Reprod. Biomed. Online, 14, 734–745. [DOI] [PubMed] [Google Scholar]
  • 8.Dohle G.R., Halley D.J., Van Hemel J.O., van den Ouwel A.M., Pieters M.H., Weber R.F., Govaerts L.C. (2002) Genetic risk factors in infertile men with severe oligozoospermia and azoospermia. Hum. Reprod., 17, 13–16. [DOI] [PubMed] [Google Scholar]
  • 9.O'Flynn O'Brien K.L., Varghese A.C., Agarwal A. (2010) The genetic causes of male factor infertility: a review. Fertil. Steril., 93, 1–12. [DOI] [PubMed] [Google Scholar]
  • 10.Bashamboo A., Ferraz-de-Souza B., Lourenco D., Lin L., Sebire N.J., Montjean D., Bignon-Topalovic J., Mandelbaum J., Siffroi J.P., Christin-Maitre S. et al. (2010) Human male infertility associated with mutations in NR5A1 encoding steroidogenic factor 1. Am. J. Hum. Genet., 87, 505–512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Hu Z., Wu C., Shi Y., Guo H., Zhao X., Yin Z., Yang L., Dai J., Hu L., Tan W. et al. (2011) A genome-wide association study identifies two new lung cancer susceptibility loci at 13q12.12 and 22q12.2 in Han Chinese. Nat. Genet., 43, 792–796. [DOI] [PubMed] [Google Scholar]
  • 12.Sladek R., Rocheleau G., Rung J., Dina C., Shen L., Serre D., Boutin P., Vincent D., Belisle A., Hadjadj S. et al. (2007) A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature, 445, 881–885. [DOI] [PubMed] [Google Scholar]
  • 13.Adeyemo A., Gerry N., Chen G., Herbert A., Doumatey A., Huang H., Zhou J., Lashley K., Chen Y., Christman M. et al. (2009) A genome-wide association study of hypertension and blood pressure in African Americans. PLoS Genet., 5, e1000564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hu Z., Xia Y., Guo X., Dai J., Li H., Hu H., Jiang Y., Lu F., Wu Y., Yang X. et al. (2012) A genome-wide association study in Chinese men identifies three risk loci for non-obstructive azoospermia. Nat. Genet., 44, 183–186. [DOI] [PubMed] [Google Scholar]
  • 15.Hu Z., Li Z., Yu J., Tong C., Lin Y., Guo X., Lu F., Dong J., Xia Y., Wen Y. et al. (2014) Association analysis identifies new risk loci for non-obstructive azoospermia in Chinese men. Nat. Commun., 5, 3857. [DOI] [PubMed] [Google Scholar]
  • 16.Zhao H., Xu J., Zhang H., Sun J., Sun Y., Wang Z., Liu J., Ding Q., Lu S., Shi R. et al. (2012) A genome-wide association study reveals that variants within the HLA region are associated with risk for nonobstructive azoospermia. Am. J. Hum. Genet., 90, 900–906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Manolio T.A., Collins F.S., Cox N.J., Goldstein D.B., Hindorff L.A., Hunter D.J., McCarthy M.I., Ramos E.M., Cardon L.R., Chakravarti A. et al. (2009) Finding the missing heritability of complex diseases. Nature, 461, 747–753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gibson G. (2011) Rare and common variants: twenty arguments. Nat. Rev. Genet., 13, 135–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Tennessen J.A., Bigham A.W., O'Connor T.D., Fu W., Kenny E.E., Gravel S., McGee S., Do R., Liu X., Jun G. et al. (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science, 337, 64–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Eichler E.E., Flint J., Gibson G., Kong A., Leal S.M., Moore J.H., Nadeau J.H. (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet., 11, 446–450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kozlitina J., Smagris E., Stender S., Nordestgaard B.G., Zhou H.H., Tybjaerg-Hansen A., Vogt T.F., Hobbs H.H., Cohen J.C. (2014) Exome-wide association study identifies a TM6SF2 variant that confers susceptibility to nonalcoholic fatty liver disease. Nat. Genet., 46, 352–356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Huyghe J.R., Jackson A.U., Fogarty M.P., Buchkovich M.L., Stancakova A., Stringham H.M., Sim X., Yang L., Fuchsberger C., Cederberg H. et al. (2013) Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nat. Genet., 45, 197–201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Auer P.L., Teumer A., Schick U., O'Shaughnessy A., Lo K.S., Chami N., Carlson C., de Denus S., Dube M.P., Haessler J. et al. (2014) Rare and low-frequency coding variants in CXCR2 and other genes are associated with hematological traits. Nat. Genet., 46, 629–634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Peloso G.M., Auer P.L., Bis J.C., Voorman A., Morrison A.C., Stitziel N.O., Brody J.A., Khetarpal S.A., Crosby J.R., Fornage M. et al. (2014) Association of low-frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks. Am. J. Hum. Genet., 94, 223–232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Paranjape S.M., Kamakaka R.T., Kadonaga J.T. (1994) Role of chromatin structure in the regulation of transcription by RNA polymerase II. Annu. Rev. Biochem., 63, 265–297. [DOI] [PubMed] [Google Scholar]
  • 26.Happel N., Doenecke D. (2009) Histone H1 and its isoforms: contribution to chromatin structure and function. Gene, 431, 1–12. [DOI] [PubMed] [Google Scholar]
  • 27.Burfeind P., Hoyer-Fender S., Doenecke D., Hochhuth C., Engel W. (1994) Expression and chromosomal mapping of the gene encoding the human histone H1.1. Hum. Genet., 94, 633–639. [DOI] [PubMed] [Google Scholar]
  • 28.Steger K., Klonisch T., Gavenis K., Drabent B., Doenecke D., Bergmann M. (1998) Expression of mRNA and protein of nucleoproteins during human spermiogenesis. Mol. Hum. Reprod., 4, 939–945. [DOI] [PubMed] [Google Scholar]
  • 29.Sunnotel O., Hiripi L., Lagan K., McDaid J.R., De Leon J.M., Miyagawa Y., Crowe H., Kaluskar S., Ward M., Scullion C. et al. (2010) Alterations in the steroid hormone receptor co-chaperone FKBPL are associated with male infertility: a case-control study. Reprod. Biol. Endocrinol., 8, 22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hong J., Kim S.T., Tranguch S., Smith D.F., Dey S.K. (2007) Deficiency of co-chaperone immunophilin FKBP52 compromises sperm fertilizing capacity. Reproduction, 133, 395–403. [DOI] [PubMed] [Google Scholar]
  • 31.Walensky L.D., Dawson T.M., Steiner J.P., Sabatini D.M., Suarez J.D., Klinefelter G.R., Snyder S.H. (1998) The 12 kD FK 506 binding protein FKBP12 is released in the male reproductive tract and stimulates sperm motility. Mol. Med., 4, 502–514. [PMC free article] [PubMed] [Google Scholar]
  • 32.Raudsepp T., McCue M.E., Das P.J., Dobson L., Vishnoi M., Fritz K.L., Schaefer R., Rendahl A.K., Derr J.N., Love C.C. et al. (2012) Genome-wide association study implicates testis-sperm specific FKBP6 as a susceptibility locus for impaired acrosome reaction in stallions. PLoS Genet., 8, e1003139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bocker T., Barusevicius A., Snowden T., Rasio D., Guerrette S., Robbins D., Schmidt C., Burczak J., Croce C.M., Copeland T. et al. (1999) hMSH5: a human MutS homologue that forms a novel heterodimer with hMSH4 and is expressed during spermatogenesis. Cancer Res., 59, 816–822. [PubMed] [Google Scholar]
  • 34.Kneitz B., Cohen P.E., Avdievich E., Zhu L., Kane M.F., Hou H. Jr., Kolodner R.D., Kucherlapati R., Pollard J.W., Edelmann W. (2000) MutS homolog 4 localization to meiotic chromosomes is required for chromosome pairing during meiosis in male and female mice. Genes Dev., 14, 1085–1097. [PMC free article] [PubMed] [Google Scholar]
  • 35.de Vries S.S., Baart E.B., Dekker M., Siezen A., de Rooij D.G., de Boer P., te Riele H. (1999) Mouse MutS-like protein Msh5 is required for proper chromosome synapsis in male and female meiosis. Genes Dev., 13, 523–531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Terribas E., Bonache S., Garcia-Arevalo M., Sanchez J., Franco E., Bassas L., Larriba S. (2010) Changes in the expression profile of the meiosis-involved mismatch repair genes in impaired human spermatogenesis. J. Androl., 31, 346–357. [DOI] [PubMed] [Google Scholar]
  • 37.Ji G., Long Y., Zhou Y., Huang C., Gu A., Wang X. (2012) Common variants in mismatch repair genes associated with increased risk of sperm DNA damage and male infertility. BMC Med., 10, 49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Xu K., Lu T., Zhou H., Bai L., Xiang Y. (2010) The role of MSH5 C85T and MLH3 C2531T polymorphisms in the risk of male infertility with azoospermia or severe oligozoospermia. Clin. Chim. Acta., 411, 49–52. [DOI] [PubMed] [Google Scholar]
  • 39.Xu W.M., Chen J., Chen H., Diao R.Y., Fok K.L., Dong J.D., Sun T.T., Chen W.Y., Yu M.K., Zhang X.H. et al. (2011) Defective CFTR-dependent CREB activation results in impaired spermatogenesis and azoospermia. PLoS One, 6, e19120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J. et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet., 81, 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet., 38, 904–909. [DOI] [PubMed] [Google Scholar]
  • 42.Kang H.M., Sul J.H., Service S.K., Zaitlen N.A., Kong S.Y., Freimer N.B., Sabatti C., Eskin E. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat. Genet., 42, 348–354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Li B., Leal S.M. (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet., 83, 311–321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wu M.C., Lee S., Cai T., Li Y., Boehnke M., Lin X. (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet., 89, 82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Lee S., Wu M.C., Lin X. (2012) Optimal tests for rare variant effects in sequencing association studies. Biostatistics, 13, 762–775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Pruim R.J., Welch R.P., Sanna S., Teslovich T.M., Chines P.S., Gliedt T.P., Boehnke M., Abecasis G.R., Willer C.J. (2010) LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics, 26, 2336–2337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F., Aken B.L., Barrell D., Zadissa A., Searle S. et al. (2012) GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res., 22, 1760–1774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Liu X., Jian X., Boerwinkle E. (2013) dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum. Mutat., 34, E2393–E2402. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Human Molecular Genetics are provided here courtesy of Oxford University Press

RESOURCES