Abstract
Coevolution of genes that encode interacting proteins expressed on the surfaces of sperm and eggs can lead to variation in reproductive compatibility between mates and reproductive isolation between members of different species. Previous studies in mice and other mammals have focused in particular on evidence for positive or diversifying selection that shapes the evolution of genes that encode sperm-binding proteins expressed in the egg coat or zona pellucida (ZP). By fitting phylogenetic models of codon evolution to data from the 1000 Genomes Project, we identified candidate sites evolving under diversifying selection in the human genes ZP3 and ZP2. We also identified one candidate site under positive selection in C4BPA, which encodes a repetitive protein similar to the mouse protein ZP3R that is expressed in the sperm head and binds to the ZP at fertilization. Results from several additional analyses that applied population genetic models to the same data were consistent with the hypothesis of selection on those candidate sites leading to coevolution of sperm- and egg-expressed genes. By contrast, we found no candidate sites under selection in a fourth gene (ZP1) that encodes an egg coat structural protein not directly involved in sperm binding. Finally, we found that two of the candidate sites (in C4BPA and ZP2) were correlated with variation in family size and birth rate among Hutterite couples, and those two candidate sites were also in linkage disequilibrium in the same Hutterite study population. All of these lines of evidence are consistent with predictions from a previously proposed hypothesis of balancing selection on epistatic interactions between C4BPA and ZP3 at fertilization that lead to the evolution of co-adapted allele pairs. Such patterns also suggest specific molecular traits that may be associated with both natural reproductive variation and clinical infertility.
Keywords: Fertilization, Zona pellucida, Linkage disequilibrium, Epistasis, Balancing selection
Introduction
Proteins expressed on the surfaces of gametes mediate several types of cellular interactions at the time of fertilization, including chemoattraction of swimming sperm towards the egg, sperm binding to the egg extracellular coat, lysis or dissolution of the egg coat, and fusion of the gamete plasma membranes. Because those interactions may directly affect reproductive success, the genes encoding such proteins are sensitive to selection and often show evidence of diversifying or positive selection leading to high rates of amino acid substitution (Swanson et al., 2001; Swanson & Vacquier, 2002; Hamm et al., 2007; Clark et al., 2009; Vicens & Roldan, 2014). Different adaptive solutions to selection acting on variation in gamete attraction, binding, or fusion can lead to significant population differentiation within species (e.g., Hart et al., 2014), and are implicated in the origin or maintenance of reproductive isolation between species (Coyne & Orr, 2004; Palumbi, 2009; Lessios, 2011; Gavrilets, 2014). Understanding the nature of selection on mammalian genes that encode gamete-binding proteins may help to characterize the causes of natural variation in reproductive success, the potential for development of contraceptive vaccines (Lou et al., 1995; Delves, Lund & Roitt, 2002; Meeusen et al., 2007), or the targets of clinical treatment for infertility (Männikkö et al., 2005; Huang et al., 2014).
The human genome includes three functional members of the ZP domain gene family (ZP1, ZP2, and ZP3) that are shared with the mouse genome and encode the peptide portion of three glycoproteins that form the thick, fibrous extracellular coat or zona pellucida surrounding the mammalian egg (Hirohashi et al., 2008; Burnett et al., 2011; Avella, Xiong & Dean, 2013). The mouse genes that encode proteins known to interact with the egg coat (see Turner & Hoekstra, 2008a) include Zp3r (also called sp56; Buffone et al., 2008; Wassarman, 2009), which is expressed in sperm, localized to the acrosomal vesicle in the sperm head, and binds to the mouse egg coat (Buffone et al., 2008; Wassarman, 2009). The mouse Zp3r gene is homologous and syntenic with the human gene encoding the complement component four binding protein alpha (C4BPA; OMIM 120830, 1q32.2). In experimental studies, mouse proteins in the sperm head (including ZP3R) interact with a sperm-binding domain in the C-terminal region of ZP3 (Chen, Litscher & Wassarman, 1998; Williams et al., 2006) and another in the N-terminal region of ZP2 (Baibakov et al., 2012; Avella, Xiong & Dean, 2013), but the role and relative significance of ZP3 versus ZP2 in the specificity of mouse sperm binding is controversial (Litscher, Williams & Wassarman, 2009; Baibakov et al., 2012; Muro et al., 2012; Avella, Xiong & Dean, 2013).
Comparative evolutionary studies revealed that Zp3 has evolved under diversifying selection among species in some mammalian clades, and that the signal of positive selection in mouse Zp3 is strongest in the exon that encodes the C-terminal sperm-binding domain (Swanson et al., 2001; Turner & Hoekstra, 2006; Turner & Hoekstra, 2008b). Unfortunately, there is little complementary information about the evolution of Zp3r/sp56. In the genomes of rodents, Zp3r occurs with C4bpa in a cluster of genes called the regulator of complement activator (RCA) that encode many of the proteins involved in innate immunity (Mayilyan, 2012). Evolutionary relationships among genes related to Zp3r and C4bpa are complicated by gene duplications, functional divergence, or loss of function leading to pseudogenes (e.g., Rodríguez de Córdoba et al., 1994; De Villena & Rodríguez de Córdoba, 1995; Hillarp et al., 1997; Krushkal, Bat & Gigli, 2000). Comparative analyses show strong evidence of diversifying selection acting on many sites in C4BPA among humans and other great apes (Cagliani et al., 2016), but the evolution of Zp3r/sp56 among mice and other rodents has not been studied (see Morgan et al., 2010; Morgan et al., 2017). In general, it has been difficult to determine which sperm-expressed genes mediate the specificity or compatibility of sperm binding to the mammalian egg coat (Turner & Hoekstra, 2008a). The role of particular sperm-expressed genes or their epistatic interactions with ZP genes is controversial in mouse studies, which indicate that ZP3R binds to the egg coat but is not required for male fertility (Wassarman, 2009; Muro et al., 2012; Claw, George & Swanson, 2014; Wilburn & Swanson, 2016; Wang et al., 2017). The role of other genes in the RCA cluster related to Zp3r (including C4BPA) in fertilization has not been studied directly in humans or other primates.
Important evidence for selection acting on epistatic interactions between genes has come from measures of the strength of gametic disequilibrium, or associations between polymorphisms at physically unlinked loci that encode networks of interacting gene products (Sinervo & Svensson, 2002; Single et al., 2007; Qian, Zhou & Tang, 2015). Because such associations are expected to break down at meiosis, their persistence within populations can be ascribed to ongoing selection that favours some functionally advantageous combinations of allele pairs. For genes involved in gamete recognition, such allele pairs may be maintained by a fitness advantage at fertilization (Palumbi, 1999; Clark et al., 2009; Buzbas, Joyce & Rosenberg, 2011). Rohlfs, Swanson & Weir (2010) analyzed nonrandom associations among single nucleotide polymorphisms (SNPs) in 1,504 individuals from the 1958 British Birth Cohort study (Wellcome Trust Case Control Consortium, 2007). That analysis used two measures of gametic disequilibrium (or linkage disequilibrium, LD) between genotypes at SNPs in the genomic region near human C4BPA and near ZP3 (OMIM 182889, 7q11.23). Permutation tests and simulations of selection acting on tag SNPs suggested that some of the pairwise associations between two-locus genotypes were significantly stronger than expected, and might be accounted for by selection favouring particular combinations of C4BPA and ZP3 genotypes at fertilization. Rohlfs, Swanson & Weir (2010) proposed an evolutionary scenario in which sexual conflict between mates over the optimal rate of sperm-egg binding (fast for males, slow for females) leads to cyclical episodes of coevolution between male and female molecular adaptations. In this scenario, selection on male adaptations for enhancing fertilization rates (at C4BPA) is followed by selection on female countermeasures for reducing rates of sperm-egg contact and mitigating the lethal effects of polyspermy on offspring (at ZP3); both parts of this cycle are driven by negative frequency-dependent selection (a form of balancing selection); and the process results in the accumulation of coadapted pairs of C4BPA and ZP3 alleles in LD.
The scenario and mechanism for human C4BPA-ZP3 coevolution proposed by Rohlfs, Swanson & Weir (2010) can account for polymorphism in both genes and LD between them. That scenario is consistent with other examples of temporal or cyclical variation in selection on genes encoding gamete-binding proteins (Levitan, 2012; Wilburn & Swanson, 2016), the evidence for widespread balancing selection across the human genome (Andrés et al., 2009), the proposed role of such balancing selection in the maintenance of Zp3 polymorphisms in mouse populations in nature (Turner & Hoekstra, 2008b), and the evidence for ZP3-ZP3R functional interactions at the protein and cellular level during experimental fertilizations of mouse eggs in the lab (Litscher, Williams & Wassarman, 2009). This scenario is also consistent with the general predictions from analogous models of balancing selection acting on cooperating or cheating genotypes in systems of social conflict (similar to systems of sexual conflict), in which negative frequency dependence leads to an evolutionary arms race between adaptations for cooperation or for cheating, maintains polymorphism for both traits, and creates LD between them (Brockhurst et al., 2014; Ostrowski et al., 2015).
The specific targets of selection on functional features of sperm-egg binding that could lead to coevolution of C4BPA and ZP3 are expected to be nonsynonymous nucleotide polymorphisms that affect amino acid sequence, protein structure, and the functional interaction between proteins at fertilization. However, few of the tag SNPs analyzed by Rohlfs, Swanson & Weir (2010) were in exons: none of the seven tag SNPs in ZP3 were in coding sequences; two of nine tag SNPs in C4BPA were in coding sequences, but only one of these (rs4844573) was a nonsynonymous SNP encoding a C4BPA amino acid difference. As a result, most of these pairwise associations between SNPs may be indirectly implicated in C4BPA-ZP3 coevolution, but are probably not the direct targets of selection acting on functional variation between coding sequences expressed on the surfaces of gametes.
Here we use data and methods that complement and expand on those of Rohlfs, Swanson & Weir (2010) to revisit the evidence for diversifying selection (and in particular evidence for the effects of balancing selection) on and coevolution among human genes involved in sperm binding to the ZP. Using previously published genome sequences, we focused on coding sequence variation in C4BPA and ZP3, and in the ZP2 gene (OMIM 182888, 16p12.2) that is also implicated in sperm-egg binding (Baibakov et al., 2012; Avella, Xiong & Dean, 2013). As a negative control, we also analyzed ZP1 (OMIM 615774, 11q12.2), which encodes a structural protein that interacts with ZP3 and ZP2 to form the fibrous matrix of the zona pellucida but is not known to interact directly with sperm. We found one candidate site each under episodic diversifying selection in C4BPA, ZP3, and ZP2 (but not ZP1) in human populations. Other patterns of variation at those three candidate sites were consistent with the hypothesis of balancing selection favouring polymorphisms in those genes. We also found evidence of significant covariation between two of the candidate sites and reproductive success in a previously well-studied human founder population. Altogether the results were consistent with the hypothesis of some form of balancing selection leading to coevolution among these gamete-recognition genes (as proposed by Rohlfs, Swanson & Weir, 2010, and indicate targets of ongoing selection that affect human fertility.
Methods and Results
Overview
Our motivation for this study came from previous analyses of sperm- and egg-expressed genes with epistatic interactions at fertilization in sea stars, in which we found evidence of selection for amino acid differences between alleles from diverging conspecific populations (Sunday & Hart, 2013) and from closely-related species (Popovic et al., 2014; Patiño et al., 2016), and strong effects of those amino acid differences on sperm-egg compatibility and fertilization rates in laboratory experiments (Hart et al., 2014). Our goals in this study were similar: to identify specific sites under selection that could account for previously discovered LD between human C4BPA and ZP3 (Rohlfs, Swanson & Weir, 2010), extend those analyses to include human ZP2, and identify possible phenotypic effects of those polymorphisms on reproductive success. For those reasons, we used a work flow similar to our previous studies of sea star genes that mediate sperm-egg binding and reproductive success.
We first used a phylogenetic model of codon evolution designed to identify individual codons that have experienced episodic or diversifying selection by the estimation of relative rates of nonsynonymous (dN) and synonymous nucleotide changes (dS) mapped onto a phylogeny of relationships among sampled haplotypes. The effects of selection are estimated as the ratio of those relative rates (ω = dN∕dS): selection favouring the accumulation of amino acid differences at rates higher than expected from neutral processes (mutation and genetic drift) alone is inferred for codons with ω > 1. We carried out this analysis first because it provides a powerful method to identify specific candidate sites under selection, although such codon models were originally developed for comparative analyses of coding sequence divergence between reproductively isolated species (Kryazhimskiy & Plotkin, 2008; Mugal, Wolf & Kaj, 2014). These sensitive models of codon evolution can identify small numbers of candidate sites under diversifying selection, and distinguish them from a background of overall purifying selection (ω < 1) or neutral evolution (ω∼1) at other sites (Murrell et al., 2012). In these models, parallel nonsynonymous changes at several times in the evolutionary history of the gene or in several lineages of the gene tree make up much of the signal of high ω values associated with specific codons under episodic or diversifying positive selection. When applied to conspecific populations, high ω values might indicate ongoing diversifying (or balancing) selection. Important caveats about that application and interpretation are summarized below.
Second, we used population genetic methods to explore variation at candidate sites that might be consistent with the scenario of Rohlfs, Swanson & Weir (2010) that these genes evolve under selection for sperm-egg compatibility mediated by sexual conflicts of interest. We used several population genetic tests for selection, and we examined patterns of haplotype frequency variation for each gene. These population genetic analyses can be used to corroborate our results from models of codon evolution, and may also help to account for possible false positives that are known to be associated with some models of codon evolution when those models are fitted to data from conspecific populations (Kryazhimskiy & Plotkin, 2008; Mugal, Wolf & Kaj, 2014).
Third, we searched for covariation between candidate sites under selection and their effect on gamete compatibility. Because sperm-egg binding and fertilization rates are not as readily observed in humans as in sea stars or other organisms with external fertilization, we used proxies for gamete binding and fertilization success based on family size and birth rate data from a previously well-studied human founder population (Kosova et al., 2012).
At several steps in the study our choice of the next analysis to carry out depended on the observations or analytical results from a previous step in the study. This type of data-dependent analysis has been called a “garden of forking paths” (Gelman & Loken, 2014) in which the choices of analyses by the researcher (sometimes called “researcher degrees of freedom”; Simmons, Nelson & Simonsohn, 2011) are not specified in advance of the study. Researcher degrees of freedom can be used to preferentially report only positive results with significant P-values, and that practice may inflate the apparent strength of results and lead to a high expected rate of false positive discoveries (Forstmeier, Wagenmakers & Parker, 2017). For those reasons, in the sections below we present both the analytical method used and the results of that analysis together for each step in order to show and acknowledge the data-dependent nature of the work flow. Our discoveries should be interpreted as the basis for follow-up studies and analysis, rather than as conclusive tests of the balancing selection and coevolution scenario proposed by Rohlfs, Swanson & Weir (2010).
Phylogenetic models of codon evolution indicate episodic diversifying selection at three candidate sites
We obtained 2,184 full-length coding sequences for C4BPA, ZP3, ZP2, and ZP1 from 1,092 individual humans in the Phase 1 release (20110521) of the 1000 Genomes Project database (1000 Genomes Project Consortium, 2012), which includes samples of 14 populations from four continental groups including Asia (Japan, northern and southern China), Europe (Finland, Britain, Spain, Italy, and European Americans from Utah), the Americas (Puerto Rico, Mexico, Colombia), and Africa (Kenya, Nigeria, and African Americans in the southwestern USA). Population sample sizes in Phase 1 were 14–100 individuals (28–200 gene copies). We used the Phase 1 release because these data consist of phased haplotypes suitable for analysis using phylogenetic or coalescent models of codon evolution that employ gene trees. We analyzed the longer of two transcriptional variants for ZP3. Each alignment was visualized in the Se-Al aligner (Rambaut, 2002), trimmed to coding sequence only, and exported for downstream analysis.
We used blastn searches of the GenBank nonredundant nucleotide database, and the SMART tool (Letunic, Doerks & Bork, 2015) for characterizing protein sequence domains, to confirm the identity of each gene downloaded from the 1000 Genomes database; we used the UCSC genome browser (Kent et al., 2002) to confirm the synteny of each gene with the corresponding mouse reference genes. The C4BPA protein consists of eight complement control peptide domains (CCP, also known as sushi domains; reviewed by Ermert & Blom, 2016) each approximately 55–60 amino acids in length. The ZP genes each encode several distinctive protein domain types, including a short signal peptide, a ZP domain about 260 amino acids in length, a furin-type peptide cleavage sequence, and a propeptide following the cleavage motif that includes a transmembrane domain involved in cytoplasmic trafficking of the protein into the extracellular egg coat (reviewed by Monné & Jovine, 2011 and Wilburn & Swanson, 2016).
We used the branch-site model of codon evolution implemented in MEME (mixed-effects model of evolution; Murrell et al., 2012) to identify sites in each gene with high relative rates of nonsynonymous change that might evolve under diversifying selection. We used tools on the Datamonkey web interface (Delport et al., 2010) to screen each sequence alignment for recombination and to choose a model of nucleotide evolution (Kosakovsky Pond et al., 2006). We then estimated a phylogeny using the maximum-likelihood method (and the best-fit nucleotide model) in MEGA 5 (Tamura et al., 2011), and used that gene tree in a MEME analysis. The MEME method is similar to the branch-sites models in the PAML software package (Yang, 2007), but has several advantages including the ability to model variation among codons in both the synonymous and nonsynonymous rates of change, and greater sensitivity to identify targets of selection (ω > 1) at a small number of functionally important codons among only some lineages in a gene tree.
The MEME analyses identified 1–3 codons in each of the three genes known to be involved in sperm-egg binding (C4BPA, ZP3, ZP2) that showed evidence of episodic diversifying selection (positive selection on those individual codons among some lineages in the gene tree; Table 1). For each of those codons, a likelihood ratio test indicated a significantly better model fit (P < 0.05) when the nonsynonymous rate β at those sites was allowed to exceed the synonymous rate α in comparison to a null model with the constraint β = α.
Table 1. Site-specific parameter values estimated in a model of codon evolution fitted to alignments of human gamete-recognition genes.
Codon | MEME model parameter value | ||||
---|---|---|---|---|---|
α | β− | β+ | P | q | |
C4PBA 300, rs4844573 | 0 | 0 | 184 | 0.0079 | 1 |
ZP3 31, rs2286428 | 0 | 0 | 73.1 | 0.048 | 1 |
ZP3 315, rs2906999 | 5.2 × 10−6 | 5.2 × 10−6 | 186 | 0.012 | 1 |
ZP3 371, rs200481427 | 0 | 0 | 197 | 0.021 | 1 |
ZP2 36, rs2075520 | 4.1 × 10−9 | 4.1 × 10−9 | 148 | 0.027 | 1 |
ZP1 | na | na | na | all P > 0.10 |
In C4BPA there was one candidate site under selection (C4BPA 300; rs4844573) in the N-terminal region of the fifth sushi domain (Table 1) that showed a nonsynonymous polymorphism (a second-position transition encoding a threonine-isoleucine polymorphism; Table 2). Notably, this was the only nonsynonymous site included among the tag SNPs analyzed by Rohlfs, Swanson & Weir (2010), and was also one of the sites found by Rohlfs, Swanson & Weir (2010) to be in LD with a polymorphism near ZP3.
Table 2. Genomic organization of human gamete-recognition genes including candidate sites under selection.
Gene | GRCh38/hg38 Coordinates | Number of exons | Number of codons | Candidate site under selection | Ancestral Allele | Derived Allele |
---|---|---|---|---|---|---|
C4BPA | chr1:207,113,026- 207,144,717 | 11 | 597 | rs4844573 chr1:207,131,555 C4BPA 300 | ACT Thr | ATT Ile |
ZP3 | chr7:76,424,965- 76,442,056 | 8 | 424 | rs2906999 chr7:76,440,494 ZP3 315 | CCG Pro | TCG Ser |
ZP2 | chr16:21,197,480- 21,211,547 | 19 | 745 | rs2075520 chr16:21,211,351 ZP2 36 | GTG Val | GGG Gly |
ZP1 | Chr11:60,867,562- 60,875,656 | 12 | 638 | None |
There were three sites under selection in ZP3 (Table 1), including one near the signal sequence (ZP3 31; rs2286428) in a protein domain of unknown function that is not part of the modeled protein structure based on X-ray crystallography of chicken ZP3 (Han et al., 2010; Monné & Jovine, 2011), and a second site in the C-terminal propeptide region (ZP3 371; rs200481427) downstream of the peptide cleavage site and thus not part of the mature functional protein in the egg coat. Because these two sites are in protein domains not expected to be involved in sperm binding, we focused our subsequent analyses of ZP3 on a third candidate site (ZP3 315; rs2906999) that showed a nonsynonymous polymorphism (a first-position transition encoding a proline-serine polymorphism) in the known sperm-binding domain (Table 2). In the structural model of chicken ZP3 by Han et al. (2010), this candidate site is in a C-terminal region of secondary structure that is strongly constrained by three disulphide bridges among six highly conserved cysteine residues upstream (C6, C8) and downstream (C9 − C12) of human ZP3 315 (see Monné & Jovine, 2011).
There was one candidate site under selection in ZP2 in the N-terminal part of the predicted protein (ZP2 36; rs2075520) that showed a nonsynonymous polymorphism (a second-position transversion encoding a valine-glycine polymorphism; Table 2). Like the candidate site in ZP3 noted above, this single codon under selection in ZP2 was also in the known sperm-binding domain.
In contrast to those results for three genes that are expected to be involved in sperm-egg binding, we found no candidate sites under selection in MEME analyses of ZP1 variation. All likelihood ratio tests suggested that the nonsynonymous rate β was not significantly different from the synonymous rate at any individual codon (all P > 0.10). That result suggests that diversifying selection acting on ZP genes may be limited to those genes (ZP3, ZP2) known to be involved in sperm binding. For those reasons we focused the remainder of our analyses on the three candidate sites under selection in C4BPA, ZP3, and ZP2.
Population genetic analyses also indicate selection at the same candidate sites
Models of codon evolution like MEME were originally developed to analyze sequences from highly divergent lineages such as reproductively isolated species that have diverged on a long time scale and for which phylogenetic relationships can be characterized accurately (Goldman & Yang, 1994; Muse & Gaut, 1994). An important model assumption is that differences between those sampled lineages are fixed substitutions (possibly caused by selection) and not segregating polymorphisms (influenced by genetic drift). False-positive results can arise when the model parameters are estimated from alignments of closely-related conspecific haplotypes that have diverged on short time scales: if the data include segregating nonsynonymous polymorphisms that arose as recent mutations under weak purifying selection with insufficient time for selection to eliminate those mutations, then a signal of high ω may be falsely ascribed to selection favouring the accumulation of nonsynonymous differences at some codons or in some lineages. Similarly, models of codon evolution (like MEME) based on phylogenies may be misled by uncertainty about relationships among closely-related haplotypes sampled within a single species, and an important source of such uncertainty is recombination. These potential problems of false discovery in codon-model analyses of population data are well known (Stajich & Hahn, 2005; Hofer et al., 2009; Kryazhimskiy & Plotkin, 2008; Anisimova & Liberles, 2012; Mugal, Wolf & Kaj, 2014). The high q values in Table 2 (q = 1 for all five sites with high ω values identified by the MEME method) are also consistent with the expected high false-discovery rate, and suggest that our MEME results could include some false positives.
We used three approaches to test the reliability of our MEME results. First, we used an additional model of codon evolution called omegaMap v.0.5 (Wilson & McVean, 2006) that bridges the gap between interspecific analyses of selection based on phylogenies and intraspecific analyses of selection based on population samples. The omegaMap model is fitted to data by estimating mutation rate and recombination rate parameters as well as ω for different regions of a gene alignment using a sliding window method. Rather than mapping those rates onto a single phylogenetic tree, omegaMap uses a population genetic approximation of the coalescent within a Bayesian framework that accounts for uncertainty about genealogical relationships among haplotypes.
The omegaMap algorithm is computationally intensive and would not converge on a model fit using the full alignment for any of the three genes we analyzed. Instead we analyzed a reduced dataset for each gene consisting of the two haplotypes for one individual selected at random from each population in the 1000 Genomes Phase 1 release. All omegaMap results reported were based on this reduced dataset of 28 haplotypes total. We used inverse or improper inverse priors for model parameters, and allowed both the selection and recombination parameters to vary along the alignment. We carried out a series of preliminary analyses to find suitable prior ranges, parameter starting values, burn-in lengths, and block widths for the sliding-window analyses. We then ran the final MCMC model fit for 500,000 or 700,000 iterations, and sampled the model parameter values at intervals of 500 or 700 steps.
In comparison to MEME model parameter values (Table 2), omegaMap identified the same candidate sites under diversifying selection, but gave much lower (and more realistic) estimates of ω (2–4; Fig. 1). The posterior probabilities of positive selection (ω > 1) at the three candidate sites were high (P > 0.8; Fig. 1). Those results suggest that our MEME analyses correctly identified candidate sites under selection, but may have overestimated the magnitude of the response to selection.
Second, we used the 1000 Genomes Selection Browser v.1.0 (http://hsb.upf.edu; Pybus et al., 2014) to carry out population genetic tests for selection. This tool implements a series of selection analyses applied to population samples from Caucasian Europeans in Utah (CEU), Han Chinese from Beijing (CHB), and Yorubans from Ibidan, Nigeria (YRI), that are representative of continental population groups in the 1000 Genomes database. Among the methods available using this tool, we chose two based on population allele frequency differences including pairwise FST between populations and global FST among all three populations (Weir & Cockerham, 1984); two based on linkage disequilibrium among sites within each gene including the integrated haplotype score within each population sample (iHS; Voight et al., 2006), and extended haplotype homozygosity in comparisons between pairs of populations (XP-EHH; Sabeti et al., 2007; see also Zhong et al., 2011); and three based on the allele frequency spectrum within each population including Tajima’s D (Tajima, 1989), the composite likelihood ratio test (CLR; Nielsen et al., 2005), and Fay & Wu’s H (Fay & Wu, 2000). Although FST values for a single locus may be interpreted as measures of population differentiation when compared to a neutral expectation (based on permutation), such values can also be interpreted as indices of selection when compared to other loci across the genome (sometimes called outlier tests; Foll & Gaggiotti, 2008; Holsinger & Weir, 2009). All seven tests are designed to detect significant departures from expectations based on the effects of neutral processes alone (mutation, genetic drift, recombination; reviewed by Vitti, Grossman & Sabeti, 2013). They are typically interpreted as tests of directional or purifying selection leading to differences among populations, but the tests may also be sensitive to the effects of diversifying or balancing selection when the results of such selection also resemble an incomplete selective sweep. We analyzed a 2 Mbp interval around each candidate site under selection in order to characterize the evidence for selection associated with all variable sites. The 1000 Genomes Selection Browser estimates the P-value associated with each test statistic for each candidate site under selection as its rank score relative to other sites in the 2 Mbp interval (the proportion of all other sites with a test statistic value as high as or higher than the candidate site). Because that database uses the GRCh37/hg19 reference genome, we centered each 2 Mbp interval around the coordinates for each of the candidate sites under selection in that reference, including chr1:207,304,900 (rs4844573, C4BPA), chr7:75069811-77069811 (rs2906999, ZP3), and chr16:21,222,672 (rs2075520, ZP2), instead of the coordinates shown in Table 2 for each of those sites (in the GRCh38/hg38 reference genome).
We found evidence for selection acting on those three candidate sites from two of the three types of tests. Four of the tests based on population differentiation (three of six pairwise FST values, and one of two global FST values) suggested significant population allele frequency differences at both the candidate site in C4BPA and the candidate site in ZP2 (Table 3). All three of the significant results for pairwise FST tests involved comparisons between the African sample (YRI) and one of the non-African samples (for ZP2) or both of the non-African samples (for C4BPA), and for C4BPA we found significant global differentiation as well. Overall evidence for selection was stronger for the candidate site in C4BPA (three of four significant test results) than for the candidate site in ZP2 (one of four).
Table 3. Population genetic tests of selection on candidate sites.
We also found evidence of selection in some tests based on measures of LD across sites within the C4BPA and ZP2 genes. Like the tests based on population differentiation, evidence for selection was limited to variation within the African sample (iHS) and comparisons between the African and non-African samples (XP-EHH) (Table 3), and was stronger for the candidate site in C4BPA (three of six significant test results) than for the candidate site in ZP2 (no significant test results). This evidence for selection acting on C4BPA variation seems convincing because the two tests differ in their sensitivity to detect homozygosity and LD caused by selection acting in the recent (iHS) or the more distant past (XP-EHH) (Vitti, Grossman & Sabeti, 2013), but both tests indicated significantly greater extent of homozygosity around rs4844573 in C4BPA compared to expected extent of homozygosity under neutral processes alone.
By contrast, none of the individual tests based on properties of the allele frequency spectrum (Table 3) suggested significant departures from neutral evolution for the candidate sites in C4BPA or ZP2. Values for all three test statistics were small within each of the three population samples, and none were unusual (all P > 0.09) relative to test statistic values for other variable sites in the 2 Mbp interval surrounding both of the candidate sites.
Unfortunately, the candidate site under selection in ZP3 (rs2906999) was not coded as a variable site in the 1000 Genomes Selection Browser database, so we used other methods to implement some of the same tests for that missing candidate site. We focused on the two classes of tests (based on population differentiation, and based on LD) that did indicate some evidence of selection on the other two candidate sites in tests using the 1000 Genomes Selection Browser. We calculated pairwise FST between the CEU, CHB, and YRI populations (all FST < 0.040) and global FST among all three populations (FST = 0.021) for the candidate site in ZP3 using the method of Raymond & Rousset (1995). When compared to the ranked scores for FST values at other sites in the 2 Mbp interval spanning that candidate site (rs2906999), the global (P = 0.53) and all pairwise FST comparisons (all P > 0.38) were not significant. Those comparisons do not indicate selection on the candidate site in ZP3 leading to population allele frequency differences.
We also used the software package selscan (Szpiech & Hernandez, 2014) to calculate iHS and XP-EHH values for the candidate site in ZP3. Unlike the 1000 Genomes Selection Browser, selscan calculates an unstandardized test statistic value (for iHS, or XP-EHH) and then characterizes the significance of that value by binning together SNPs with similar allele frequencies (we used 20 bins), and calculating a normalized test statistic value (a z-score) relative to normally-distributed test statistic values for all other SNPs in the same bin with a mean of zero and standard deviation of one. Among those six tests (three iHS values for individual populations, three pairwise XP-EHH values), only one was significant: the iHS score for the CHB population sample (iHS = 2.423, z = 2.353, P = 0.018). Other iHS scores for the candidate site in ZP3 were lower (iHS < 2.05, P > 0.05), and all XP-EHH scores were low (<0.6) and not significant (P > 0.6). Like the FST tests, the analyses based on LD provide only limited evidence of selection on the candidate site in ZP3 leading to extensive homozygosity around that site.
Third, we used McDonald–Kreitman tests (McDonald & Kreitman, 1991) to assess the relative frequency of non-synonymous polymorphisms within human populations, and compare them to the relative frequency of non-synonymous fixed substitution differences between humans and chimpanzees. These tests are typically interpreted as evidence of positive selection favouring functional (nonsynymous) divergence between species when the tests reveal a high relative rate of interspecific nonsynonymous substitutions. However, the same test can also be used to identify evidence of balancing selection favouring the maintenance of functional (nonsynonymous) diversity within species in the form of a high relative rate of intraspecific nonsynonymous polymorphisms (e.g., Schoville, Flowers & Burton, 2012). The test is conservative because it compiles evidence across all sites into a single test statistic for each gene, so a significant result would be strong evidence for balancing selection, but nonsignificant results might be ascribed to weak sensitivity to detect the effects of selection on one candidate site in each gene. We used a single chimpanzee (Pan troglodytes) haplotype from GenBank for each gene (XM_514156.4, XM_016929619.1, XM_009453354.2); the chimpanzee and human coding sequences were the same length for each gene, so no additional alignment was required. We used the method of Egea, Casillas & Barbadilla (2008) to calculate the test statistic and associated P values. For all three genes, a large proportion (63–73%) of human polymorphisms was nonsynonymous (Table S1). However, in all three tests the proportion of nonsynonymous polymorphisms was not significantly different from the proportion of nonsynonymous fixed differences in the 2 × 2 contingency table (all P > 0.18), thus none of the tests gave evidence of balancing selection.
Unlike Rohlfs, Swanson & Weir (2010), we did not use measures of LD based on allelic or genotypic associations between pairs of candidate sites themselves in the 1000 Genomes Project data to assess the strength of selection acting on those three genes because such tests were expected to be weak in this case. Brown (1975) showed that, for simple allelic associations, the power to reject the null hypothesis of linkage equilibrium was low for population sample sizes like those in Phase 1 of the 1000 Genomes Project (all samples ≤100), unless the true underlying LD was very strong. Similarly, Rohlfs, Swanson & Weir (2010) showed that the power to detect genotypic associations was also low for small samples (<200) unless the value of the selection coefficient for genotypic associations was high (s > 0.4). We avoided the creation of larger sample sizes by pooling population samples from the 1000 Genomes Project data because such pooled samples may show LD among unlinked sites as an artifact of population differentiation rather than as a consequence of selection for allele or genotype pairs. For those reasons, we were not able to compare directly the patterns of LD among three candidate sites under selection in the (small) 1000 Genomes Project samples to the previously documented patterns of LD between two of those candidate sites in the (larger) 1958 British Birth Cohort sample (Rohlfs, Swanson & Weir, 2010).
All three candidate sites under selection are balanced polymorphisms
We used the parsimony network method in TCS (Clement, Posada & Crandall, 2000) to illustrate both the frequencies and phylogenetic relationships among all 2,184 gene copies for each gene. Each of those three networks included many closed circuits that indicated uncertainty about phylogenetic relationships among closely related gene copies. To illustrate the patterns more clearly, we pruned each of the closed circuits (and reduced the network for each gene down to one example genealogy) by favouring connections to more common haplotypes (and pruning edges leading to less common haplotypes).
We found similar patterns of frequency variation and phylogenetic relationships among haplotypes across all three genes (Fig. 2). In both C4BPA and ZP3 (Figs. 2A and 2B), there were two most common haplotypes. The two C4BPA common haplotypes differed only by the nonsynonymous difference at the candidate site in the fifth sushi domain (C4BPA 300); they accounted for 36.7% and 35.4% of 2,184 gene copies, compared to the much lower frequencies of 45 other haplotypes. Similarly, the two ZP3 common haplotypes differed from each other only by the nonsynonymous difference at the candidate site in the ZP3 sperm-binding domain (ZP3 315), and accounted for 35.9% and 26.6% of all chromosomes, compared to lower frequencies of 63 other haplotypes.
We found a slightly more complex pattern of variation among ZP2 haplotypes (Fig. 2C), including four common haplotypes plus 40 less common haplotypes. A single nonsynonymous difference at the candidate site in the ZP2 sperm-binding domain (ZP2 36) distinguished the two pairs of common haplotypes from each other. The two common haplotypes distinguished by the candidate site under selection accounted for 14.0% and 26.1% of all chromosomes; each of those haplotypes differed from another common haplotype by a single synonymous change (at a different codon). The four common haplotypes together accounted for 92.3% of chromosomes. These patterns of diversity for each gene, characterized by a pair of common haplotypes that differ only by a single nonsynonymous difference at a candidate site under selection, are consistent with the predictions of the balancing selection hypothesis of Rohlfs, Swanson & Weir (2010).
In order to infer the direction of nucleotide changes at each candidate site in those phylogenetic networks, we used comparisons to other great ape reference genomes (chimpanzee, gorilla, orangutan) in the UCSC genome browser (Kent et al., 2002). We found that for each candidate site the chimpanzee, gorilla, and orangutan reference genomes as well as the mRNA sequences from bonobos all had the same single human allele (C4BPA 300 Thr; ZP3 315 Pro; ZP2 36 Val), which suggested that the alternate allele found in humans (C4BPA 300 Ile; ZP3 315 Ser; ZP2 36 Gly) is derived (Table 2). Mapping those ancestral and derived states onto the gene genealogies allowed us to infer the directions of changes (Fig. 2). In two cases (C4BPA, ZP2), the minor allele summed over all populations was also the ancestral allele; in the third case (ZP3), the ancestral and derived allele frequencies were nearly identical and the minor allele (by a small difference) was the derived allele (Table 4). All three genes showed multiple changes at the candidate site. On the gene trees shown in Fig. 2 these included two cases of parallel changes from the ancestral allele (C4BPA 300 Thr; ZP3 315 Pro) to the derived allele (C4BPA 300 Ile; ZP3 315 Ser), and one case of a secondary reversal from the derived allele (ZP2 36 Gly) back to the ancestral allele (ZP2 36 Val). Such parallelisms may account for the signal of selection found in our analyses using models of codon evolution (MEME, omegaMap) that are sensitive to the effects of multiple episodes of selection on the same codon (e.g., Turner & Hoekstra, 2006). Vahdati & Wagner (2016) argued that such parallelisms are a reliable signal of positive selection (but possibly not balancing selection) in genome scans of protein-coding sequences. The discovery of similar patterns of parallelism among C4BPA, ZP3, and ZP2 is also consistent with the hypothesis of selection acting on epistatic interactions among these three genes.
Table 4. Allele frequencies at candidate sites under selection in human gamete-recognition genes.
Candidate site | Population group | MAF | SNPs (N)≥MAF | P | ||||
---|---|---|---|---|---|---|---|---|
Africa | Americas | Asia | Europe | |||||
C4BPA 300, rs4844573 | ACT Thr | 0.815 | 0.431 | 0.327 | 0.367 | 0.467 | 19 (1,408) | 0.013 |
ATT Ile | 0.185 | 0.569 | 0.673 | 0.633 | ||||
ZP3 315, rs2906999 | CCG Pro | 0.589 | 0.522 | 0.477 | 0.461 | |||
TCG Ser | 0.411 | 0.478 | 0.523 | 0.538 | 0.495 | 1 (740) | 0.0014 | |
ZP2 36, rs2075520 | GTG Val | 0.648 | 0.356 | 0.460 | 0.273 | 0.420 | 61 (1,445) | 0.042 |
GGG Gly | 0.352 | 0.644 | 0.540 | 0.727 |
Balanced polymorphisms at candidate sites have unusually high minor allele frequencies
High frequencies for both ancestral and derived alleles, and minor allele frequency (MAF) differences between African and non-African populations, may arise by neutral demographic processes instead of (or in addition to) the effects of selection (Hofer et al., 2009; Coop et al., 2009). To explore that possibility, we carried out two tests.
First, we compared the observed MAFs at the three candidate sites under selection to the expected occurrence of sites with such high MAFs under a purely neutral model. This is a relatively stringent test given the expectation that nonsynonymous mutations in coding sequences are less likely than other mutations to evolve under neutrality. For each candidate site under selection we compared the observed range of MAFs in African versus non-African population groups (Table 4) to a neutral expectation based on a Poisson random field framework (Kaj & Mugal, 2016). We used the minimum MAFs for each gene from the four continental population groups (Table 4), and those MAFs (0.185, 0.411, 0.273) serve as a lower bound and thus make for a more conservative test.
We computed for each gene the expected number of sites at which the derived allele exists in both populations with an intermediate frequency in the range [x, 1 − x]. We let fx be defined by fx(y) = 1, if x < y < 1 − x, and fx(y) = 0 otherwise. Then fx ∈ F0 (see Kaj & Mugal, 2016), and the expected number of such sites has the form θHx, where
(1) |
The function is known (in terms of hypergeometric sums) so we can find Hx numerically. We computed Hx for each gene with demographic conditions characteristic of the scenario for the late-Pleistocene migration of early modern humans out of Africa (population size N = 5, 000; population divergence time t = 0.5 coalescent units). Values of Hx were 0.8761, 0.0699, and 0.2149 for C4BPA, ZP3, and ZP2, respectively. We then used the numerical solution for Hx to calculate the probability p(x) = 1 − exp(−θHx) of observing at least one site that has intermediate allele frequencies in [x, 1 − x] under selectively neutral conditions for each of the three candidate sites (Table 4). For those calculations we used values of θ, the population mutation rate per gene, based on the 1,092 individuals of the 1000 Genomes phase 1 data (1.6333, 1.5727, and 1.9962 for C4BPA, ZP3, and ZP2, respectively). This approach leads to values for p(x) that are all relatively high (0.7609, 0.1042, 0.3488, respectively). Those high probabilities suggest that the observed MAFs for those three candidate sites in the African population group might have arisen by neutral processes alone under those demographic model conditions (the out-of-Africa migration of early modern humans).
Second, we compared MAFs at the three candidate sites under selection (summed over all populations) to MAFs at other nonsynonymous SNPs on the same three human chromosomes. In contrast to the analyses described above, this may be a less stringent (and more realistic) test given the expectation that most nonsynonymous mutations in coding sequences are likely to evolve under purifying selection. In order to standardize that comparison across candidate sites on three different chromosomes, we examined a window of 90.3 Mbp (the length of chromosome 16, the shortest of the three chromosomes studied here) spanning each of the candidate sites, and downloaded MAFs for all biallelic nonsynonymous SNPs in the 1000 Genomes Phase 1 release (3,593 SNPs total, including the three candidate sites). We characterized the significance of those balanced polymorphisms by estimating the proportion of other nonsynonymous SNPs on each chromosome that had MAFs greater than the MAF observed for the candidate site on that chromosome.
The high MAFs at each candidate site were unusual compared to all other nonsynonymous SNPs in a 90.3 Mbp window spanning each of those candidate sites. In all three cases, the frequency of the minor allele at the candidate site was higher than the MAFs at more than 95% of all other nonsynonymous SNPs on the same chromosome (Table 4). In the case of the ZP3 candidate site, no other nonsynonymous SNP in the same region of chromosome 7 showed such a high MAF. For each candidate site, the ancestral allele frequency tended to be higher in African samples compared to all non-African samples (Table 4), but both alleles at each candidate site were found in all populations. These unusual balanced polymorphisms are consistent with the predicted effects of episodic or diversifying selection that favours functionally divergent alleles at each candidate site, and maintains each allele at high frequency within populations. However, the differences between African and non-African population samples at all three candidate sites are also consistent with other hypotheses based on selection (such as an incomplete selective sweep in some or all populations).
The two tests together suggest that we cannot entirely reject neutrality as the cause of high observed MAFs at the three candidate sites (based on a stringent comparison to neutrality using calculations in the Poisson random field framework), but we find evidence for either relaxed selective constraints or diversifying selection (based on a more realistic comparison to expected variation under purifying selection at other nonsynonymous sites). Those results indicate that we should be cautious in interpreting the causes of allele frequency variation at those three candidate sites, but they do not contradict other evidence for diversifying selection acting on those three candidate sites.
Variation in fertility is associated with candidate sites under selection
To test for covariation between polymorphisms at candidate sites and measures of fertility we assessed genotype-phenotype associations in a founder population, the Hutterites (see Kosova, Abney & Ober, 2010; Kosova et al., 2010; Kosova et al., 2012), in which both reproductive data and genome-wide genotype data were available. As in previous analyses (e.g., Kosova et al., 2010), we focused on two measures of fertility: family size (number of births) in 361 families; and birth rate (calculated as [number of births − 1]/[sum of all interbirth intervals]) in 341 families with at least two children. We analyzed associations between these fertility measures and two of the three candidate sites (in C4BPA and ZP2); genotypes for the third candidate site (in ZP3) were not available for most individuals in the sample because that specific SNP (rs2906999) was not included in the SNP chip used to screen genetic polymorphisms in the original study (Kosova et al., 2012). We estimated genotype-phenotype associations using a linear mixed model as implemented in GEMMA (Zhou & Stephens, 2014). This model allowed us to account for the known pedigree relatedness among individuals in the sample. Following Kosova et al. (2012), we included wife’s birth year, her age at marriage, and the time from marriage to the most recent birth as covariates in the linear models because these covariates make substantial non-genetic contributions to fertility variation among Hutterite men and women. For each family we characterized the wife’s genotype at the ZP2 candidate site and the husband’s genotype at the C4BPA candidate site as having zero, one, or two derived alleles. We carried out a series of four parallel analyses in which we first asked whether variation in the wife’s ZP2 genotype or the husband’s C4BPA genotype was significantly associated with residual variation in family size or birth rate. We then repeated those analyses by including the spouse’s genotype at the other candidate site as an additional covariate, and by adding an interaction term for covariation between the two individual genotypes.
Although these post hoc analyses of functional variation were carried out after the identification of candidate sites under selection in C4BPA and ZP2, and the original genetic screens were not carried out with these genetic variants in mind, the analyses provide an important check on our inferences based on phylogenetic and population genetic methods. The models of codon evolution that we used to identify candidate sites under selection are only able to infer the effects of selection that acted in the past, and may have a high risk of false discoveries. One powerful source of evidence for ongoing (rather than past) balancing selection on such sites is their association with phenotypic variation related to fitness in present-day populations (Key et al., 2014). Such tests also can directly address the possibility that candidate sites under selection are false positives. Association studies in groups such as the Hutterites can be especially useful because fertility is generally high among Hutterite couples, and the Hutterites’ communal lifestyle and shared religious and cultural practices minimize variation in other non-genetic contributions to fertility (e.g., Kosova, Abney & Ober, 2010).
We found a significant association between the wife’s ZP2 36 genotype and variation in family size (P = 0.040) (Fig. 3A). Adding the husband’s C4BPA 300 genotype as a covariate, with or without an interaction term between the two genetic variables, did not improve the model fit and reduced the P value associated with the wife’s ZP2 36 genotype. There was no effect of the wife’s ZP2 36 genotype on birth rate, with or without an interaction term between the partners’ genotypes.
By contrast, there was a significant effect of the husband’s C4BPA 300 genotype on variation in birth rate (P = 0.020; Fig. 3B). Similar to the analyses described above, adding the wife’s ZP2 36 genotype as a covariate did not improve the model fit. The husband’s C4BPA 300 genotype had no significant association with variation in family size.
Genotypes at these two candidate sites influenced fertility in opposite directions and were associated with different components of fertility. For the C4BPA candidate site, a larger number of derived male C4BPA 300 Ile alleles was associated with increased fertility (higher birth rate, or shorter interbirth intervals); but for the ZP2 candidate site, a smaller number of derived female ZP2 36 Gly alleles was associated with increased fertility (larger family size, or more births). Although these two candidate sites did not appear to have interactive effects on either family size or birth rate, linkage disequilibrium between them would be consistent with the expectation that selection acts on the functional interaction between C4BPA and ZP2 at fertilization. Using the same measure of genotype associations (the contingency table method) used by Rohlfs, Swanson & Weir (2010), we found significant LD between those two candidate sites in this large sample of Hutterite families (χ2 = 13.72, P = 0.0082). This finding reflected both the low observed frequency of double homozygotes for the two ancestral alleles and the high observed frequency of double homozygotes for the derived C4BPA 300 allele (Ile/Ile) with the ancestral ZP2 36 allele (Val/Val) (Table 5). Overall, the evidence for LD between these two candidate sites and their individual effects on reproductive success are consistent with both our phylogenetic and population genetic analyses, and with the hypothesis that these genes evolve under some form of balancing selection that favours divergent alleles at specific functionally important sites under selection at fertilization.
Table 5. Genotype associations between two candidate sites under selection in a founder population.
ZP2 36 | C4BPA 300 | |||
---|---|---|---|---|
Thr/Thr | Thr/Ile | Ile/Ile | Total | |
Val/Val | 9 (20.5) | 41 (35.5) | 20 (13.9) | 70 |
Val/Gly | 151 (138.7) | 242 (240.0) | 80 (94.2) | 473 |
Gly/Gly | 255 (255.7) | 435 (442.4) | 182 (173.8) | 872 |
Total | 415 | 718 | 282 | 1,415 |
Discussion
Selection on human gamete traits
Genes underlying morphological, behavioural, physiological, and molecular traits that affect human mate choice and reproductive success are expected to evolve in response to sexual selection or sexual conflict over the specificity or choosiness of mating (Andersson & Simmons, 2006; Dixson, 2009), and those traits could include molecules expressed on the surfaces of sperm and eggs (Swanson & Vacquier, 2002; Palumbi, 2009). Behavioural and morphological traits involving premating interactions are widely viewed (e.g., Puts, 2010) as the more important targets of sexual selection on human mate choice compared to molecular traits involving interactions between gametes after mating. However, molecular evidence for selection on genes encoding human sperm-egg binding molecules and other reproductive proteins (Swanson et al., 2001; Swanson & Vacquier, 2002; Hamm et al., 2007; Meslin et al., 2012; Good et al., 2013) indicates that human populations also respond to selection on reproductive success at the level of biochemical and cellular interactions within the female reproductive tract at the time of fertilization (Vacquier, Swanson & Lee, 1997; Swanson & Vacquier, 2002).
Genetic and phenotypic evidence for coevolution under selection
An important area of focus for studies of the human molecular mating system has been zona pellucida glycoproteins on the egg and ZP-binding proteins in sperm (Litscher, Williams & Wassarman, 2009; Wassarman, 2009; Avella, Xiong & Dean, 2013). Using data and methods that complement the population genetic analyses of this system by Rohlfs, Swanson & Weir (2010), we found patterns that are consistent with (1) their hypothesis that LD between two physically unlinked gamete-recognition genes is caused by balancing selection favouring alternative pairs of alleles, and (2) the functional interpretation that such selection may arise from allelic differences at each of those interacting genes through their influence on gamete binding and fertilization success.
The significant functional correlations between maternal or paternal genotypes at candidate sites and variation in both family size and birth rate (Fig. 3) suggest that these candidate sites are indeed targets of selection and are not false positives. In the available data, allelic effects at these two candidate sites in C4BPA and ZP2 appear to be additive, similar to the additive effects of SNP markers on fertility identified through genetic association analyses of Hutterite families (Kosova et al., 2012). Moreover, one of those candidate sites (C4BPA 300) was also discovered as a tag SNP associated with ZP3 variation using a different method (LD calculations) in a different study population (Rohlfs, Swanson & Weir, 2010). The overall consistency in the identification of sites under selection, their associations with reproductive variation from three different methodological approaches (genotypic association, models of codon evolution, LD), the discovery of the same candidate sites using two different codon-model methods (MEME, omegaMap), the specific location of those candidate sites in each gene (especially the single candidate site in the known sperm-binding domain of each ZP gene; Fig. 1), and the distinctive patterns of population genetic variation including strongly balanced polymorphisms (Table 4) that were similar across all three genes (Fig. 2), all point toward selection acting on functional variation involving these three genes and their epistatic interactions.
These interpretations have several corollaries that could form the basis for further tests of the hypothesis of coevolution among candidate sites in these three gamete-recognition genes under selection. The similar patterns of population genetic variation across the three genes suggest that both ZP3 and ZP2 coevolve with C4BPA, but the available data do not allow us to test the functional association between fertility and variation at the ZP3 candidate site. Genotype and fertility data for all three candidate sites including ZP3 315 from large population samples are needed to test that expected association, and compare it to the effects of ZP2 36 variation on fertility. Similarly, LD is expected to be strong between C4BPA 300 and both of the candidate sites in ZP genes, but the available data do not include all three candidate sites genotyped in the same large population samples.
The evidence for functional and evolutionary covariation between ZP genes and C4BPA is also consistent with evidence that those ZP gene products interact as a complex in the zona pellucida. Purified ZP3 in solution forms a homodimer complex in which a pair of proteins binds in antiparallel orientation (Han et al., 2010), and this head-to-tail dimerization brings the N-terminal region of one protein into contact with the C-terminal region of the other (including the C-terminal sperm-binding domain of ZP3 that shows evidence of selection). Moreover, ZP3 expressed in the zona pellucida is hypothesized to form a heterodimer with ZP2 via the same antiparallel orientation of their N- and C-terminal regions (Darie et al., 2008; Monné & Jovine, 2011; Clark, 2013; Wassarman & Litscher, 2013). Such ZP3-ZP2 heterodimers could form a single sperm-binding structure that includes both ZP3 315 and ZP2 36, interacts with ZP-binding proteins on the sperm head (including C4BPA), and mediates coevolution among all three genes via compensatory changes (e.g., Hughes, 2012) between ZP3 and ZP2. That functional interpretation is supported by recent observations that inherited mutations in human ZP3 and ZP2 cause infertility in double heterozygotes, and that mouse models of those double mutants are infertile in part due to susceptibility to polyspermy in laboratory fertilizations (Liu et al., 2017). Additional structural information for ZP proteins and their polymeric organization (e.g., Han et al., 2010) could form the basis for testing the predicted functional significance of the three candidate sites under selection that were identified here.
Our functional interpretation of the genetic evidence for selection on these three human genes is based mainly on mouse studies. An important assumption of that interpretation is that the mouse and human genes are orthologous. There is little doubt that mouse Zp3 and human ZP3 (and mouse Zp2 and human ZP2) are orthologs, but orthology between mouse Zp3r and human C4BPA is unlikely. The two genes have similar coding sequence organization (including a series of 7 or 8 sushi domains), and occur in similar locations on chromosome 1 relative to other genes in the RCA gene cluster; in the mouse genome, Zp3r is adjacent to a C4 binding protein gene (called C4bp in mouse, and C4BPA in humans), and to a C4bp-like pseudogene (C4bp-ps1). However, in the rat genome this part of the RCA gene cluster includes three functional genes: Zp3r, C4bpa, and C4bpb. These comparisons to the rat genome suggest that mouse C4bp is orthologous with rat C4bpa and human C4BPA; C4bpb has become a pseudogene in the mouse (C4bp-ps1); and that the rodent gene expressed in the sperm acrosomal vesicle (Zp3r) is not found in humans and may be descended from a gene-duplication event in a common ancestor shared by mice, rats, and other rodents but not shared with humans, other primates, or other mammalian lineages.
The possibility that mouse Zp3r is not orthologous with human C4BPA adds a complication to our interpretation of the genetic evidence for selection on both sperm- and egg-expressed genes in humans. A single positively-selected codon in C4BPA (C4BPA 300, rs2075520) is in linkage disequilibrium with SNPs in both ZP3 (Rohlfs, Swanson & Weir, 2010) and ZP2 (Table 5); shows population genetic evidence of selection (Table 3); and shows genealogical patterns of haplotype variation similar to both ZP3 and ZP2 (Fig. 2). We propose that those patterns are the consequence of selection acting on functional interactions between human C4BPA and ZP proteins at fertilization, similar to previously documented interactions between ZP3R in the mouse sperm acrosome and ZP proteins in the mouse zona pellucida. That proposal, and our analyses of human C4BPA evolution (and coevolution with ZP3 and ZP2), is not dependent on orthology between C4BPA and Zp3r. However, functional studies of expression of C4BPA in human sperm, and of interactions between human C4BPA and ZP proteins in the human egg coat at fertilization, are needed in order to more fully understand the roles of those genes in both primate and rodent fertilization.
Significance of coevolution among human fertilization genes
Molecular evidence for selection on balanced polymorphisms in human gamete-binding genes is potentially important because it implicates sexual selection in the evolution of human fertility and identifies possible causes of clinical infertility (e.g., Männikkö et al., 2005; Pökkylä et al., 2011). However, other possible sources of selection should also be considered. Expression of C4BPA is well known to be involved in regulation of the classical complement pathway in human immunity (Müller-Eberhard & Miescher, 1985; Krushkal, Kemper & Gigli, 1998), and is implicated in immune responses to viral pathogens including HIV (Chang et al., 2012). Numerous codons were identified as targets of positive selection in divergence of C4BPA among human and other primate species (Cagliani et al., 2016). Balanced polymorphisms may be common among human genes involved in host-pathogen interactions (Leffler et al., 2013; DeGiorgio, Lohmueller & Nielsen, 2014; Azevedo et al., 2015; Cagan et al., 2016). In this context, environmental variation in selection by pathogens could favour multiple C4BPA alleles (e.g., in African and non-African populations) when C4BPA alleles are expressed in the complement pathway, while the same C4BPA polymorphisms could lead to balanced polymorphisms in ZP3 and ZP2 caused by selection favouring egg-sperm allele or genotype matching when C4BPA alleles are expressed in sperm. That hypothesis is consistent with recent evidence for genome-wide positive selection on primate genes (including C4BPA) that encode proteins involved in innate immunity (Van der Lee et al., 2017). In that study, positive selection for nonsynonymous substitutions among primate species was correlated with the accumulation of nonsynonymous polymorphisms within the human lineage in the same genes (including polymorphisms in C4BPA), and was interpreted as evidence of ongoing selection among humans driven by viruses and other pathogens. Moreover, among the three candidate sites analyzed in our study, the largest allele-frequency difference among population groups in the 1000 Genomes Phase 1 data was found at C4BPA 300, which has a four-fold difference in the minor allele frequency (Table 4). That large difference is consistent with the possibility that the evolution of this three-gene system may be driven in part by pleiotropic selection on other effects of C4BPA polymorphisms such as in innate immunity. How this pleiotropy might interact with selection on the effects of C4BPA in fertility and its coevolution with ZP genes is not known.
Candidate sites under selection may be ancient balanced polymorphisms
The hypothesis of coevolution based on balancing selection and allelic matching between male- and female-expressed gamete-recognition genes (Rohlfs, Swanson & Weir, 2010) is also consistent with limited data from the genomes of archaic humans in Eurasia, in which the occurrence of these same polymorphisms at candidate sites under selection suggests that the polymorphisms may be older than the origins of early modern humans in Africa. Among the relatively low-coverage genomes for several Neanderthal individuals (Green et al., 2010) in the UCSC genome browser, no sequence reads are mapped to the candidate sites in C4BPA (rs4844573) or ZP3 (rs2906999); one of those genomes (Vi33.16) includes several sequence reads that map to the candidate site in ZP2 (rs2075520), and all of those reads include the derived allele (ZP2 36 Gly). However, in the more recent high-coverage genome for a Neanderthal individual (Prüfer et al., 2014), there is a mix of ancestral and derived alleles at these sites: two ancestral C4BPA 300 alleles (a Thr-Thr homozygote); one ancestral and one derived ZP3 315 allele (a Pro-Ser heterozygote); and two derived ZP2 36 alleles (a Gly-Gly homozygote). Similarly, in the high-coverage genome for a Denisovan individual (Meyer et al., 2012), multiple reads map to all three of those candidate sites, and all reads encode the derived allele for each site. The occurrence of both ancestral and derived alleles in archaic humans outside of Africa is similar to variation at the same candidate sites in modern human genomes, and is consistent with the hypothesis that these balanced polymorphisms have been maintained for an evolutionarily long time, perhaps as long as the early-Pleistocene divergence between archaic and modern humans (>500,000 years; see Kelso & Prüfer, 2014). That pattern of shared alleles between archaic and modern humans is similar to the pattern of shared Zp3 and Zp2 haplotypes between closely related mouse species (Turner & Hoekstra, 2006) that were interpreted as the products of balancing selection acting on long time scales and leading to trans-specific polymorphisms (Turner & Hoekstra, 2008a; Turner & Hoekstra, 2008b). That pattern is also similar to (but not as old as) trans-specific polymorphisms shared between extant humans and other extant great apes that have been attributed to the effects of balancing selection (Azevedo et al., 2015). The parallel patterns of balanced polymorphisms at all three genes studied here indicate possibly important effects of selection on the expression and function of those gene products during human fertilization.
Supplemental Information
Acknowledgments
Thanks to M Elliot (University of Cambridge) for scripts and help in formatting data from the 1000 Genomes Project; and to F Breden and B Crespi (Simon Fraser University) for advice and comments.
Funding Statement
This work was supported by Discovery Grants from the Natural Sciences and Engineering Research Council of Canada to M Hart (No. 05404) and B Crespi (No. 06505). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Additional Information and Declarations
Competing Interests
The authors declare there are no competing interests.
Author Contributions
Michael W. Hart conceived and designed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Daryn A. Stover, Vanessa Guerra, Sahar V. Mozaffari and Carina F. Mugal analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Carole Ober and Ingemar Kaj wrote the paper, reviewed drafts of the paper.
Data Availability
The following information was supplied regarding data availability:
A text file that summarizes the workflow and code that was used to download and format haplotype data from the 1000 Genomes Project Phase 1 database, and a text file that contains summary statistics on fertility in Hutterite families, have been provided as Supplemental Files. These are population-level summaries only.
References
- 1000 Genomes Project Consortium (2012).1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andersson & Simmons (2006).Andersson M, Simmons LW. Sexual selection and mate choice. Trends in Ecology and Evolution. 2006;21:296–302. doi: 10.1016/j.tree.2006.03.015. [DOI] [PubMed] [Google Scholar]
- Andrés et al. (2009).Andrés AM, Hubisz MJ, Indap A, Torgerson DG, Degenhardt JD, Boyko AR, Gutenkunst RN, White TJ, Green ED, Bustamante CD, Clark AG, Nielsen R. Targets of balancing selection in the human genome. Molecular Biology and Evolution. 2009;26:2755–2764. doi: 10.1093/molbev/msp190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anisimova & Liberles (2012).Anisimova M, Liberles DA. Detecting and understanding natural selection. In: Cannarozzi GM, Schneider A, editors. Codon evolution. Oxford University Press; Oxford: 2012. pp. 73–96. [Google Scholar]
- Avella, Xiong & Dean (2013).Avella MA, Xiong B, Dean J. The molecular basis of gamete recognition in mice and humans. Molecular Human Reproduction. 2013;19:279–289. doi: 10.1093/molehr/gat004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Azevedo et al. (2015).Azevedo L, Serrano C, Amorim A, Cooper DN. Trans-species polymorphism in humans and the great apes is generally maintained by balancing selection that modulates the host immune response. Human Genomics. 2015;9 doi: 10.1186/s40246-015-0043-1. Article 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baibakov et al. (2012).Baibakov B, Boggs NA, Yauger B, Baibakov G, Dean J. Human sperm bind to the N-terminal domain of ZP2 in humanized zonae pellucidae in transgenic mice. Journal of Cell Biology. 2012;197:897–905. doi: 10.1083/jcb.201203062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brockhurst et al. (2014).Brockhurst MA, Chapman T, King KC, Mank JE, Paterson S, Hurst GDD. Running with the Red Queen: the role of biotic conflicts in evolution. Proceedings of the Royal Society B: Biological Sciences. 2014;281 doi: 10.1098/rspb.2014.1382. Article 20141382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown (1975).Brown AHD. Sample sizes required to detect linkage disequilibrium between two or three loci. Theoretical Population Biology. 1975;8:184–201. doi: 10.1016/0040-5809(75)90031-3. [DOI] [PubMed] [Google Scholar]
- Buffone et al. (2008).Buffone MG, Zhuang T, Ord TS, Hui L, Moss SB, Gerton GL. Recombinant mouse sperm ZP3-binding protein (ZP3R/sp56) forms a high order oligomer that binds eggs and inhibits mouse fertilization in vitro. Journal of Biological Chemistry. 2008;283:12438–12445. doi: 10.1074/jbc.M706421200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burnett et al. (2011).Burnett LA, Anderson DM, Rawls A, Bieber AL, Chandler DE. Mouse sperm exhibit chemotaxis to allurin, a truncated member of the cysteine-rich secretory protein family. Developmental Biology. 2011;360:318–328. doi: 10.1016/j.ydbio.2011.09.028. [DOI] [PubMed] [Google Scholar]
- Buzbas, Joyce & Rosenberg (2011).Buzbas EO, Joyce P, Rosenberg NA. Inference on the strength of balancing selection for epistatically interacting loci. Theoretical Population Biology. 2011;79:102–113. doi: 10.1016/j.tpb.2011.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cagan et al. (2016).Cagan A, Theunert C, Laayouni H, Santpere G, Pybus M, Casals F, Kay K, Navarro A, Marques-Bonet T, Bertranpetit J, Andrés AM. Natural selection in the great apes. Molecular Biology and Evolution. 2016;33:3268–3283. doi: 10.1093/molbev/msw215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cagliani et al. (2016).Cagliani R, Forni D, Filippi G, Mozzi A, De Gioia L, Pontremoli C, Pozzoli U, Bresolin N, Clerici M, Sironi M. The mammalian complement system as an epitome of host-pathogen genetic conflicts. Molecular Ecology. 2016;25:1324–1339. doi: 10.1111/mec.13558. [DOI] [PubMed] [Google Scholar]
- Chang et al. (2012).Chang H-H, Soderberg K, Skinner JA, Banchereau J, Chaussabel D, Haynes BF, Ramoni M, Letvin NL. Transcriptional network predicts viral set point during acute HIV-1 infection. Journal of the American Medical Informatics Association. 2012;19:1103–1109. doi: 10.1136/amiajnl-2012-000867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen, Litscher & Wassarman (1998).Chen J, Litscher ES, Wassarman PM. Inactivation of the mouse sperm receptor, mZP3, by site-directed mutagenesis of individual serine residues located at the combining site for sperm. Proceedings of the National Academy of Sciences of the United States of America. 1998;95:6193–6197. doi: 10.1073/pnas.95.11.6193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clark (2013).Clark GF. The role of carbohydrate recognition during human sperm-egg binding. Human Reproduction. 2013;28:566–577. doi: 10.1093/humrep/des447. [DOI] [PubMed] [Google Scholar]
- Clark et al. (2009).Clark NL, Gasper J, Sekino M, Springer SA, Aquadro CF, Swanson WJ. Coevolution of interacting fertilization proteins. PLOS Genetics. 2009;5:e1000570. doi: 10.1371/journal.pgen.1000570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Claw, George & Swanson (2014).Claw KG, George RD, Swanson WJ. Detecting coevolution in mammalian sperm-egg fusion proteins. Molecular Reproduction and Development. 2014;81:531–538. doi: 10.1002/mrd.22321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clement, Posada & Crandall (2000).Clement M, Posada D, Crandall K. TCS: a computer program to estimate gene genealogies. Molecular Ecology. 2000;9:1657–1660. doi: 10.1046/j.1365-294x.2000.01020.x. [DOI] [PubMed] [Google Scholar]
- Coop et al. (2009).Coop G, Pickrell JK, Novembre J, Kudaravalli S, Li J, Absher D, Myers RM, Cavalli-Sforza LL, Feldman MW, Pritchard JK. The role of geography in human adaptation. PLOS Genetics. 2009;5:e1000500. doi: 10.1371/journal.pgen.1000500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coyne & Orr (2004).Coyne JA, Orr HA. Speciation. Sinauer Associates; Sunderland: 2004. [Google Scholar]
- Darie et al. (2008).Darie CC, Janssen WG, Litscher ES, Wassarman PM. Purified trout egg vitelline envelope proteins VEβ and VEγ polymerize into homomeric fibrils from dimers in vitro. Biochimica Et Biophysica Acta. 2008;1784:385–392. doi: 10.1016/j.bbapap.2007.10.011. [DOI] [PubMed] [Google Scholar]
- De Villena & Rodríguez de Córdoba (1995).De Villena FP-M, Rodríguez de Córdoba S. C4BPAL2: a second duplication of the C4BPA gene in the human RCA gene cluster. Immunogenetics. 1995;41:139–143. doi: 10.1007/BF00182326. [DOI] [PubMed] [Google Scholar]
- DeGiorgio, Lohmueller & Nielsen (2014).DeGiorgio M, Lohmueller KE, Nielsen R. A model-based approach for identifying signatures of ancient balancing selection in genetic data. PLOS Genetics. 2014;10:e1004561. doi: 10.1371/journal.pgen.1004561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delport et al. (2010).Delport W, Poon AFY, Frost SDW, Pond SLK. Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics. 2010;26:2455–2457. doi: 10.1093/bioinformatics/btq429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delves, Lund & Roitt (2002).Delves PJ, Lund T, Roitt IV. Antifertility vaccines. Trends in Immunology. 2002;23:213–219. doi: 10.1016/S1471-4906(02)02182-8. [DOI] [PubMed] [Google Scholar]
- Dixson (2009).Dixson AF. Sexual selection and the origins of human mating systems. Oxford University Press; New York: 2009. [Google Scholar]
- Egea, Casillas & Barbadilla (2008).Egea R, Casillas S, Barbadilla A. Standard and generalized McDonald–Kreitman test: a website to detect selection by comparing different classes of DNA sites. Nucleic Acids Research. 2008;36:W157–W162. doi: 10.1093/nar/gkn337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ermert & Blom (2016).Ermert D, Blom AM. C4b-binding protein: the good, the bad and the deadly. Novel functions of an old friend. Immunology Letters. 2016;169:82–92. doi: 10.1016/j.imlet.2015.11.014. [DOI] [PubMed] [Google Scholar]
- Fay & Wu (2000).Fay JC, Wu CI. Hitchhiking under positive Darwinian selection. Genetics. 2000;155:1405–1413. doi: 10.1093/genetics/155.3.1405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Foll & Gaggiotti (2008).Foll M, Gaggiotti OE. A genome scan method to identify seleted loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics. 2008;180:977–993. doi: 10.1534/genetics.108.092221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Forstmeier, Wagenmakers & Parker (2017).Forstmeier W, Wagenmakers E-J, Parker TH. Detecting and avoiding likely false-positive findings—a practical guide. Biological Reviews. 2017;92:1941–1968. doi: 10.1111/brv.12315. [DOI] [PubMed] [Google Scholar]
- Gavrilets (2014).Gavrilets S. Is sexual conflict an “engine of speciation”? Cold Spring Harbor Perspectives in Biology. 2014;6 doi: 10.1101/cshperspect.a017723. Article a017723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gelman & Loken (2014).Gelman A, Loken E. The statistical crisis in science. American Scientist. 2014;102:460–465. doi: 10.1511/2014.111.460. [DOI] [Google Scholar]
- Goldman & Yang (1994).Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Molecular Biology and Evolution. 1994;11:725–736. doi: 10.1093/oxfordjournals.molbev.a040153. [DOI] [PubMed] [Google Scholar]
- Good et al. (2013).Good JM, Wiebe V, Albert FW, Burbano HA, Kircher M, Green RE, Halbwax M, Andre C, Atencia R, Fischer A, Pääbo S. Comparative population genomics of the ejaculate in humans and the great apes. Molecular Biology and Evolution. 2013;30:964–976. doi: 10.1093/molbev/mst005. [DOI] [PubMed] [Google Scholar]
- Green et al. (2010).Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH-Y, Hansen NF, Durand EY, Malaspinas A-S, Jensen JD, Marques-Bonet T, Alkan C, Prüfer K, Meyer M, Burbano HA, Good JM, Schultz R, Aximu-Petri A, Butthof A, Höber B, Höffner B, Siegemund M, Weihmann A, Nusbaum C, Lander ES, Russ C, Novod N, Affourtit J, Egholm M, Verna C, Rudan P, Brajkovic D, Kucan Z, Gusic I, Doronichev VB, Golovanova LV, Lalueza-Fox C, De la Rasilla M, Fortea J, Fosas A, Schmitz RW, Johnson PLF, Eichler EE, Falush D, Birney E, Mullikin JC, Slatkin M, Nielsen R, Kelso J, Lachmann M, Reich D, Pääbo S. A draft sequence of the Neandertal genome. Science. 2010;328:710–722. doi: 10.1126/science.1188021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hamm et al. (2007).Hamm D, Mautz BS, Wolfner MF, Aquadro CF, Swanson WJ. Evidence of amino acid diversity-enhancing selection within humans and among primates at the candidate sperm-receptor gene PKDREJ. American Journal of Human Genetics. 2007;81:44–52. doi: 10.1086/518695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han et al. (2010).Han L, Monné M, Okumura H, Schwend T, Cherry AL, Flot D, Matsuda T, Jovine L. Insights into egg coat assembly and egg-sperm interaction from the X-ray structure of full-length ZP3. Cell. 2010;143:404–415. doi: 10.1016/j.cell.2010.09.041. [DOI] [PubMed] [Google Scholar]
- Hart et al. (2014).Hart MW, Sunday JM, Popovic I, Learning KJ, Konrad CM. Incipient speciation of sea star populations by adaptive gamete recognition coevolution. Evolution. 2014;68:1294–1305. doi: 10.1111/evo.12352. [DOI] [PubMed] [Google Scholar]
- Hillarp et al. (1997).Hillarp A, Wiklund H, Thern A, Dahlbäck Molecular cloning of rat C4b binding protein alpha- and beta-chains: structural and functional relationships among human, bovine, rabbit, mouse, and rat proteins. The Journal of Immunology. 1997;158:1315–1323. [PubMed] [Google Scholar]
- Hirohashi et al. (2008).Hirohashi J, Kamei N, Kubo H, Sawada H, Matsumoto M, Hoshi M. Egg and sperm recognition systems during fertilization. Development Growth and Differentiation. 2008;50:S221–S238. doi: 10.1111/j.1440-169X.2008.01017.x. [DOI] [PubMed] [Google Scholar]
- Hofer et al. (2009).Hofer T, Ray N, Wegmann D, Excoffier L. Large allele frequency differences between human continental groups are more likely to have occurred by drift during range expansions than by selection. Annals of Human Genetics. 2009;73:95–108. doi: 10.1111/j.1469-1809.2008.00489.x. [DOI] [PubMed] [Google Scholar]
- Holsinger & Weir (2009).Holsinger KE, Weir BS. Genetics in geographically structured populations: defining, estimating and interpreting FST. Nature Reviews Genetics. 2009;10:639–650. doi: 10.1038/nrg2611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang et al. (2014).Huang H-L, Lv C, Zhao Y-C, Li W, He X-M, Li P, Sha AG, Tian X, Papasian CJ, Deng HW, Lu GX, Xiao HM. Mutant ZP1 in familial infertility. New England Journal of Medicine. 2014;370:1220–1226. doi: 10.1056/NEJMoa1308851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hughes (2012).Hughes AL. Amino acid sequence coevolution in the insect bursicon ligand-receptor system. Molecular Phylogenetics and Evolution. 2012;63:617–624. doi: 10.1016/j.ympev.2012.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaj & Mugal (2016).Kaj I, Mugal CF. The non-equilibrium allele frequency spectrum in a Poisson random field framework. Theoretical Population Biology. 2016;111:51–64. doi: 10.1016/j.tpb.2016.06.003. [DOI] [PubMed] [Google Scholar]
- Kelso & Prüfer (2014).Kelso J, Prüfer K. Ancient humans and the origin of modern humans. Current Opinion in Genetics & Development. 2014;29:133–138. doi: 10.1016/j.gde.2014.09.004. [DOI] [PubMed] [Google Scholar]
- Kent et al. (2002).Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The human genome browser at UCSC. Genome Research. 2002;12:996–1006. doi: 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Key et al. (2014).Key FM, Teixeira JC, De Filippo C, Andrés AM. Advantageous diversity maintained by balancing selection in humans. Current Opinion in Genetics & Development. 2014;29:45–51. doi: 10.1016/j.gde.2014.08.001. [DOI] [PubMed] [Google Scholar]
- Kosakovsky Pond et al. (2006).Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SDW. Automated phylogenetic detection of recombination using a genetic algorithm. Molecular Biology and Evolution. 2006;23:1891–1901. doi: 10.1093/molbev/msl051. [DOI] [PubMed] [Google Scholar]
- Kosova, Abney & Ober (2010).Kosova G, Abney M, Ober C. Heritability of reproductive fitness traits in a human population. Proceedings of the National Academy of Sciences of the United States of America. 2010;107(1):S1772–S1778. doi: 10.1073/pnas.0906196106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kosova et al. (2010).Kosova G, Pickrell JK, Kelley JL, McArdle PD, Shuldiner AR, Abney M, Ober C. The CFTR Met 470 allele is associated with lower birth rates in fertile men from a population isolate. PLOS Genetics. 2010;6:e1000974. doi: 10.1371/journal.pgen.1000974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kosova et al. (2012).Kosova G, Scott NM, Niederberger C, Prins GS, Ober C. Genome-wide association study identifies candidate genes for male fertility traits in humans. American Journal of Human Genetics. 2012;90:950–961. doi: 10.1016/j.ajhg.2012.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krushkal, Bat & Gigli (2000).Krushkal J, Bat O, Gigli I. Evolutionary relationships among proteins encoded by the regulator of complement activation gene cluster. Molecular Biology and Evolution. 2000;17:1718–1730. doi: 10.1093/oxfordjournals.molbev.a026270. [DOI] [PubMed] [Google Scholar]
- Krushkal, Kemper & Gigli (1998).Krushkal J, Kemper C, Gigli I. Ancient origin of human complement factor H. Journal of Molecular Evolution. 1998;47:625–630. doi: 10.1007/PL00013152. [DOI] [PubMed] [Google Scholar]
- Kryazhimskiy & Plotkin (2008).Kryazhimskiy S, Plotkin JB. The population genetics of dN/dS. PLOS Genetics. 2008;4:e1000304. doi: 10.1371/journal.pgen.1000304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leffler et al. (2013).Leffler EM, Gao Z, Pfeifer S, Ségurel L, Auton A, Venn O, Bowden R, Bontrop R, Wall JD, Sella G, Donnelly P, McVean G, Przeworski M. Multiple instances of ancient balancing selection shared between humans and chimpanzees. Science. 2013;339:1578–1582. doi: 10.1126/science.1234070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lessios (2011).Lessios HA. Speciation genes in free-spawning marine invertebrates. Integrative and Comparative Biology. 2011;51:456–465. doi: 10.1093/icb/icr039. [DOI] [PubMed] [Google Scholar]
- Letunic, Doerks & Bork (2015).Letunic I, Doerks T, Bork P. SMART: recent updates, new developments and status in 2015. Nucleic Acids Research. 2015;43:D257–D260. doi: 10.1093/nar/gku949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levitan (2012).Levitan DR. Contemporary evolution of sea urchin gamete-recognition proteins: experimental evidence of density-dependent gamete performance predicts shifts in allele frequencies over time. Evolution. 2012;66:1722–1736. doi: 10.1111/j.1558-5646.2012.01608.x. [DOI] [PubMed] [Google Scholar]
- Litscher, Williams & Wassarman (2009).Litscher ES, Williams Z, Wassarman PM. Zona pellucida glycoprotein ZP3 and fertilization in mammals. Molecular Reproduction and Development. 2009;76:933–941. doi: 10.1002/mrd.21046. [DOI] [PubMed] [Google Scholar]
- Liu et al. (2017).Liu W, Kunming L, Bai D, Yin J, Tang Y, Chi F, Zhang L, Wang Y, Pan J, Liang S, Guo Y, Ruan J, Kou X, Zhao Y, Wang H, Chen J, Teng X, Gao S. Dosage effects of ZP2 and ZP3 heterozygous mutations cause human infertility. Human Genetics. 2017;136:975–985. doi: 10.1007/s00439-017-1822-7. [DOI] [PubMed] [Google Scholar]
- Lou et al. (1995).Lou YH, Ang J, Thai H, McElveen F, Tung KSK. A zona pellucida 3 peptide vaccine induces antibodies and reversible infertility without ovarian pathology. Journal of Immunology. 1995;155:2715–2720. [PubMed] [Google Scholar]
- Männikkö et al. (2005).Männikkö M, Törmälä R-M, Tuuri T, Haltia A, Martikainen H, Ala-Kokko L, Tapanainen JS, Lakkakorpi JT. Association between sequence variation in genes encoding human zona pellucida glycoproteins and fertilization failure in IVF. Human Reproduction. 2005;20:1578–1585. doi: 10.1093/humrep/deh837. [DOI] [PubMed] [Google Scholar]
- Mayilyan (2012).Mayilyan KR. Complement genetics, deficiencies, and disease associations. Protein & Cell. 2012;3:487–496. doi: 10.1007/s13238-012-2924-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McDonald & Kreitman (1991).McDonald JH, Kreitman M. Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991;351:652–654. doi: 10.1038/351652a0. [DOI] [PubMed] [Google Scholar]
- Meeusen et al. (2007).Meeusen ENT, Walker J, Peters A, Pastoret PP, Jungersen G. Current status of veterinary vaccines. Clinical Microbiology Reviews. 2007;20:489–510. doi: 10.1128/CMR.00005-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meslin et al. (2012).Meslin C, Mugnier S, Callebaut I, Laurin M, Pascal G, Poupon A, Goudet G, Monget P. Evolution of genes involved in gamete interaction: evidence for positive selection, duplications, and losses in vertebrates. PLOS ONE. 2012;7:e44548. doi: 10.1371/journal.pone.0044548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyer et al. (2012).Meyer M, Kircher M, Gansauge M-T, Li H, Racimo F, Mallick S, Schraiber JG, Jay F, Prüfer K, De Filippo C, Sudmant PH, Alkan C, Fu QM, Do R, Rohland N, Tandon A, Siebauer M, Green RE, Bryc K, Briggs AW, Stenzel U, Dabney J, Shendure J, Kitzman J, Hammer MF, Shunkov MV, Derevianko AP, Patterson N, Andres AM, Eichler EE, Slatkin M, Reich D, Kelso J, Pääbo S. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012;338:222–226. doi: 10.1126/science.1224344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Monné & Jovine (2011).Monné M, Jovine L. A structural view of egg coat architecture and function in fertilization. Biology of Reproduction. 2011;85:661–669. doi: 10.1095/biolreprod.111.092098. [DOI] [PubMed] [Google Scholar]
- Morgan et al. (2010).Morgan CC, Loughran NB, Walsh TA, Harrison AJ, O’Connell MJ. Positive selection neighboring functionally essential sites and disease-implicated regions of mammalian reproductive proteins. BMC Evolutionary Biology. 2010;10:39. doi: 10.1186/1471-2148-10-39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morgan et al. (2017).Morgan CC, Loughran NB, Walsh TA, Harrison AJ, O’Connell MJ. Erratum to: positive selection neighboring functionally essential sites and disease-implicated regions of mammalian reproductive proteins. BMC Evolutionary Biology. 2017;17:170. doi: 10.1186/s12862-017-1015-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mugal, Wolf & Kaj (2014).Mugal CF, Wolf JBW, Kaj I. Why time matters: codon evolution and the temporal dynamics of dN/dS. Molecular Biology and Evolution. 2014;31:212–231. doi: 10.1093/molbev/mst192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Müller-Eberhard & Miescher (1985).Müller-Eberhard HJ, Miescher PA, editors. Complement. Springer-Verlag; New York: 1985. [Google Scholar]
- Muro et al. (2012).Muro Y, Buffone MG, Okabe M, Gerton GL. Function of the acrosomal matrix: zona pellucida 3 receptor (ZP3R/sp56) is not essential for mouse fertilization. Biology of Reproduction. 2012;86:1–6. doi: 10.1095/biolreprod.111.095877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murrell et al. (2012).Murrell B, Wertheim JO, Moola S, Weighill T, Scheffler K, Pond SLK. Detecting individual sites subject to episodic diversifying selection. PLOS Genetics. 2012;8:e1002764. doi: 10.1371/journal.pgen.1002764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muse & Gaut (1994).Muse SV, Gaut BS. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Molecular Biology and Evolution. 1994;11:715–724. doi: 10.1093/oxfordjournals.molbev.a040152. [DOI] [PubMed] [Google Scholar]
- Nielsen et al. (2005).Nielsen R, Williamson S, Kim Y, Hubisz MJ, Clark AG, Bustamante C. Genomic scans for selective sweeps using SNP data. Genome Research. 2005;15:1566–1575. doi: 10.1101/gr.4252305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ostrowski et al. (2015).Ostrowski EA, Shen Y, Tian X, Sucgang R, Jiang H, Qu J, Katoh-Kurasawa M, Brock DA, Dinh C, Lara-Garduno F, Lee SL, Kovar CL, Dinh HH, Korchina V, Jackson L, Patil S, Han Y, Chaboub L, Shaulsky G, Muzny DM, Worley KC, Gibbs RA, Richards S, Kuspa A, Strassmann JE, Queller DC. Genomic signatures of cooperation and conflict in the social amoeba. Current Biology. 2015;25:1661–1665. doi: 10.1016/j.cub.2015.04.059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Palumbi (1999).Palumbi SR. All males are not created equal: fertility differences depend on gamete recognition polymorphisms in sea urchins. Proceedings of the National Academy of Sciences of the United States of America. 1999;96:12632–12637. doi: 10.1073/pnas.96.22.12632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Palumbi (2009).Palumbi SR. Speciation and the evolution of gamete recognition genes: pattern and process. Heredity. 2009;102:66–76. doi: 10.1038/hdy.2008.104. [DOI] [PubMed] [Google Scholar]
- Patiño et al. (2016).Patiño S, Keever CC, Sunday JM, Popovic I, Byrne M, Hart MW. Sperm bindin divergence under sexual selection and concerted evolution in sea stars. Molecular Biology and Evolution. 2016;33:1988–2001. doi: 10.1093/molbev/msw081. [DOI] [PubMed] [Google Scholar]
- Pökkylä et al. (2011).Pökkylä R-M, Lakkakorpi JT, Nuojua-Huttunen SH, Tapanainen JS. Sequence variations in human ZP genes as potential modifiers of zona pellucida architecture. Fertility and Sterility. 2011;95:2669–2672. doi: 10.1016/j.fertnstert.2011.01.168. [DOI] [PubMed] [Google Scholar]
- Popovic et al. (2014).Popovic I, Marko PB, Wares JP, Hart MW. Selection and demographic history shape the molecular evolution of the gamete compatibility protein bindin in Pisaster sea stars. Ecology and Evolution. 2014;4:1567–1588. doi: 10.1002/ece3.1042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prüfer et al. (2014).Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, Heinze A, Renaud G, Sudmant PH, De Filippo C, Li H, Mallick S, Dannemann M, Fu Q, Kircher M, Kuhlwilm M, Lachmann M, Meyer M, Ongyerth M, Siebauer M, Theunert C, Tandon A, Moorjani P, Pickrell J, Mullikin JC, Vohr SH, Green RE, Hellman I, Johnson PLF, Blanche H, Cann H, Kitzman JO, Shendure J, Eichler EE, Lein ES, Bakken TE, Golovanova LV, Doronichev VB, Shunkov MV, Derevianko AP, Viola B, Slatkin M, Reich D, Kelso J, Pääbo S. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature. 2014;505:43–49. doi: 10.1038/nature12886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Puts (2010).Puts DA. Beauty and the beast: mechanisms of sexual selection in humans. Evolution and Human Behavior. 2010;31:157–175. doi: 10.1016/j.evolhumbehav.2010.02.005. [DOI] [Google Scholar]
- Pybus et al. (2014).Pybus M, Dall’Olio GM, Luisi P, Uzkudun M, Carreno-Torres A, Pavlidis P, Laayouni H, Bertranpetit J, Engelken J. 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of natural selection in modern humans. Nucleic Acids Research. 2014;42:D903–D909. doi: 10.1093/nar/gkt1188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qian, Zhou & Tang (2015).Qian W, Zhou H, Tang K. Recent coselection in human populations revealed by protein-protein interaction network. Genome Biology and Evolution. 2015;7:136–153. doi: 10.1093/gbe/evu270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rambaut (2002).Rambaut A. Department of Zoology, University of Oxford; Oxford: 2002. [Google Scholar]
- Raymond & Rousset (1995).Raymond M, Rousset F. GENEPOP: population genetics software for exact tests and ecumenicism. Journal of Heredity. 1995;86:248–249. doi: 10.1093/oxfordjournals.jhered.a111573. [DOI] [Google Scholar]
- Rodríguez de Córdoba et al. (1994).Rodríguez de Córdoba S, Perez-Blas M, Ramos-Ruiz R, Sánchez-Corral P, De Villena FP-M, Rey-Campos J. The gene coding for the β-chain of C4b-binding protein (C4BPB) has become a pseudogene in the mouse. Genomics. 1994;21:501–509. doi: 10.1006/geno.1994.1308. [DOI] [PubMed] [Google Scholar]
- Rohlfs, Swanson & Weir (2010).Rohlfs RV, Swanson WJ, Weir BS. Detecting coevolution through allelic association between physically unlinked loci. American Journal of Human Genetics. 2010;86:674–685. doi: 10.1016/j.ajhg.2010.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sabeti et al. (2007).Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES, The International HapMap Consortium Genome-wide detection and characterization of positive selection in human populations. Nature. 2007;449:913–918. doi: 10.1038/nature06250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schoville, Flowers & Burton (2012).Schoville SD, Flowers JM, Burton RS. Diversifying selection underlies the origin of allozyme polymorphism at the phosphoglucose isomerase locus in Tigriopus californicus. PLOS ONE. 2012;7:e40035. doi: 10.1371/journal.pone.0040035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simmons, Nelson & Simonsohn (2011).Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science. 2011;22:1359–1366. doi: 10.1177/0956797611417632. [DOI] [PubMed] [Google Scholar]
- Sinervo & Svensson (2002).Sinervo B, Svensson E. Correlational selection and the evolution of genomic architecture. Heredity. 2002;89:329–338. doi: 10.1038/sj.hdy.6800148. [DOI] [PubMed] [Google Scholar]
- Single et al. (2007).Single RM, Martin MP, Gao X, Meyer D, Yeager M, Kidd JR, Kidd KK, Carrington M. Global diversity and evidence for coevolution of KIR and HLA. Nature Genetics. 2007;39:1114–1119. doi: 10.1038/ng2077. [DOI] [PubMed] [Google Scholar]
- Stajich & Hahn (2005).Stajich JE, Hahn MW. Disentangling the effects of demography and selection in human history. Molecular Biology and Evolution. 2005;22:63–73. doi: 10.1093/molbev/msh252. [DOI] [PubMed] [Google Scholar]
- Sunday & Hart (2013).Sunday JM, Hart MW. Sea star populations diverge by positive selection at a sperm-egg compatibility locus. Ecology and Evolution. 2013;3:640–654. doi: 10.1002/ece3.487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swanson & Vacquier (2002).Swanson WJ, Vacquier VD. Reproductive protein evolution. Annual Review of Ecology and Systematics. 2002;33:161–179. doi: 10.1146/annurev.ecolsys.33.010802.150439. [DOI] [Google Scholar]
- Swanson et al. (2001).Swanson WJ, Yang Z, Wolfner MF, Aquadro CF. Positive Darwinian selection drives the evolution of several female reproductive proteins in mammals. Proceedings of the National Academy of Sciences of the United States of America. 2001;98:2509–2514. doi: 10.1073/pnas.051605998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szpiech & Hernandez (2014).Szpiech ZA, Hernandez RD. selscan: an efficient multithreaded program to perform EHH-based scans for positive selection. Molecular Biology and Evolution. 2014;31:2824–2827. doi: 10.1093/molbev/msu211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tajima (1989).Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–595. doi: 10.1093/genetics/123.3.585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tamura et al. (2011).Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular Biology and Evolution. 2011;28:2731–2739. doi: 10.1093/molbev/msr121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Turner & Hoekstra (2006).Turner LM, Hoekstra HE. Adaptive evolution of fertilization proteins within a genus: variation in ZP2 and ZP3 in deer mice (Peromyscus) Molecular Biology and Evolution. 2006;32:1656–1669. doi: 10.1093/molbev/msl035. [DOI] [PubMed] [Google Scholar]
- Turner & Hoekstra (2008a).Turner LM, Hoekstra HE. Causes and consequences of the evolution of reproductive proteins. International Journal of Developmental Biology. 2008a;52:769–780. doi: 10.1387/ijdb.082577lt. [DOI] [PubMed] [Google Scholar]
- Turner & Hoekstra (2008b).Turner LM, Hoekstra HE. Reproductive protein evolution within and between species: maintenance of divergent ZP3 alleles in Peromyscus. Molecular Ecology. 2008b;17:12616–12628. doi: 10.1111/j.1365-294X.2008.03780.x. [DOI] [PubMed] [Google Scholar]
- Vacquier, Swanson & Lee (1997).Vacquier VD, Swanson WJ, Lee YH. Positive Darwinian selection on two homologous fertilization proteins: what is the selective pressure driving their divergence? Journal of Molecular Evolution. 1997;44:S15–S22. doi: 10.1007/PL00000049. [DOI] [PubMed] [Google Scholar]
- Vahdati & Wagner (2016).Vahdati AR, Wagner A. Parallel or convergent evolution in human population genomic data revealed by genotype networks. BMC Evolutionary Biology. 2016;16:154. doi: 10.1186/s12862-016-0722-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van der Lee et al. (2017).Van der Lee R, Wiel L, Van Dam TJP, Huynen MA. Genome-scale detection of positive selection in nine primates predicts human-virus evolutionary conflicts. Nucleic Acids Research. 2017;45:10634–10648. doi: 10.1093/nar/gkx704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vicens & Roldan (2014).Vicens A, Roldan ERS. Coevolution of positively selected IZUMO1 and CD9 in rodents: evidence of interaction between gamete fusion proteins? Biology of Reproduction. 2014;90 doi: 10.1095/biolreprod.113.116871. Article 113. [DOI] [PubMed] [Google Scholar]
- Vitti, Grossman & Sabeti (2013).Vitti JJ, Grossman SR, Sabeti PC. Detecting natural selection in genomic data. Annual Review of Genetics. 2013;47:97–120. doi: 10.1146/annurev-genet-111212-133526. [DOI] [PubMed] [Google Scholar]
- Voight et al. (2006).Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLOS Biology. 2006;5:e147. doi: 10.1371/journal.pbio.0050147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang et al. (2017).Wang H, Choi Y, Tayo B, Wang X, Morris N, Zhang X, Broeckel U, Hanis C, Kardia S, Redline S, Cooper RS, Tang H, Zhu X. Genome-wide survey in African Americans demonstrates potential epistasis of fitness in the human genome. Genetic Epidemiology. 2017;41:122–135. doi: 10.1002/gepi.22026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wassarman (2009).Wassarman PM. Mammalian fertilization: the strange case of sperm protein 56. BioEssays. 2009;31:153–158. doi: 10.1002/bies.200800152. [DOI] [PubMed] [Google Scholar]
- Wassarman & Litscher (2013).Wassarman PM, Litscher ES. Biogenesis of the mouse egg’s extracellular coat, the zona pellucida. Current Topics in Developmental Biology. 2013;102:243–266. doi: 10.1016/B978-0-12-416024-8.00009-X. [DOI] [PubMed] [Google Scholar]
- Wellcome Trust Case Control Consortium (2007).Wellcome Trust Case Control Consortium Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weir & Cockerham (1984).Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution. 1984;38:1358–1370. doi: 10.2307/2408641. [DOI] [PubMed] [Google Scholar]
- Wilburn & Swanson (2016).Wilburn DB, Swanson WJ. From molecules to mating: rapid evolution and biochemical studies of reproductive proteins. Journal of Proteomics. 2016;135:12–25. doi: 10.1016/j.jprot.2015.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams et al. (2006).Williams Z, Litscher ES, Jovine L, Wassarman PM. Polypeptide encoded by mouse ZP3 exon-7 is necessary and sufficient for binding of mouse sperm in vitro. Journal of Cellular Physiology. 2006;207:30–39. doi: 10.1002/jcp.20532. [DOI] [PubMed] [Google Scholar]
- Wilson & McVean (2006).Wilson DJ, McVean G. Estimating diversifying selection and functional constraint in the presence of recombination. Genetics. 2006;172:1411–1425. doi: 10.1534/genetics.105.044917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang (2007).Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution. 2007;24:1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
- Zhong et al. (2011).Zhong M, Zhang YW, Lange K, Fan RZ. A cross-population extended haplotype-based homozygosity score test to detect positive selection in genome-wide scans. Statistics and its Interface. 2011;4:51–63. doi: 10.4310/SII.2011.v4.n1.a6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou & Stephens (2014).Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature Methods. 2014;11:407–409. doi: 10.1038/nmeth.2848. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The following information was supplied regarding data availability:
A text file that summarizes the workflow and code that was used to download and format haplotype data from the 1000 Genomes Project Phase 1 database, and a text file that contains summary statistics on fertility in Hutterite families, have been provided as Supplemental Files. These are population-level summaries only.