Abstract
Sexually antagonistic (SA) selection, a form of selection that can occur when both sexes have different fitness optima for a trait, is a major force shaping the evolution of organisms. A seminal model developed by Rice (Rice WR. 1984. Sex chromosomes and the evolution of sexual dimorphism. Evolution 38:735–742.) predicts that the X chromosome should be a hotspot for the accumulation of loci under SA selection as compared with the autosomes. Here, we propose a methodological framework designed to detect a specific signature of SA selection on viability, differences in allelic frequencies between the sexes. Applying this method on genome-wide single nucleotide polymorphism (SNP) data in human populations where no sex-specific population stratification could be detected, we show that there are overall significantly more SNPs exhibiting differences in allelic frequencies between the sexes on the X chromosome as compared with autosomes, supporting the predictions of Rice’s model. This pattern is consistent across populations and is robust to correction for potential biases such as differences in linkage disequilibrium, sample size, and genotyping errors between chromosomes. Although SA selection is not the only factor resulting in allelic frequency differences between the sexes, we further show that at least part of the identified X-linked loci is caused by such a sex-specific processes.
Keywords: sexually antagonistic selection, intralocus sexual conflict, sexual dimorphism, genome scan, X chromosome
Introduction
In species with separate sexes, males and females often undergo different selective pressures arising from divergent ecological niches and reproductive behaviors. These divergences in both natural and sexual selective pressures can lead to sexually antagonistic (SA) selection, where males and females have different fitness optima for a trait. Moreover, if a trait undergoing SA selection is determined by the same set of genes in the two sexes and if each genotype induces the same phenotype in males and females (i.e., a positive intersexual genetic correlation exists for this trait), a single genome cannot encode both sex-specific optima, leading to an intralocus sexual conflict (IASC) (van Doorn 2009). To resolve such conflicts, the sex-specific optima must be reached in the two sexes through the establishment of sex-specific phenotypes, notably by dissociating gene expression between sexes. Hence, the resolution of IASCs leads to the evolution of stable sexual dimorphisms (Badyaev 2002). Such a process can take a long time to achieve (Stewart et al. 2010): Pleiotropic effects of genes controlling such traits and fluctuations of selective pressures across time and environment are indeed expected to impede the resolution of IASCs (van Doorn 2009). It is therefore expected that unresolved IASCs are common in nature.
A common method for detecting IASCs consists in measuring the fitness of different phenotypes associated with one trait for each sex and assessing the intersexual genetic correlation between the sexes for this trait. Using this approach, evidences for ongoing IASC have been found for locomotory behavior in Drosophila melanogaster (Long and Rice 2007), bill color in zebra finch Taeniopygia guttata (Price and Burley 1993, 1994), body size in the collared flycatcher Ficedula albicollis (Merila et al. 1998), and adult height in humans (Stulp et al. 2012). However, in these studies, the genetic target of SA selection is unknown and, to our knowledge, only one study aimed at detecting transcripts under SA selection in a laboratory population of D. melanogaster using measurements of fitness and gene expression (Innocenti and Morrow 2010). Importantly, no studies attempted to map loci under SA selection at the genome-wide level in natural populations.
In this study, we aimed at detecting a specific genomic signature of IASC, namely differences in allelic frequencies between the sexes, as previously highlighted by Balaresque et al. (2004), in order to identify the genetic target of SA selection in human populations. Indeed, during the initial phase of the evolution of sexual dimorphism, genetic variability is expressed in the two sexes (Rice 1984) and the difference in fitness between males and females may induce differences in allelic frequencies between the sexes. Such differences are expected within the whole population if SA selection acts on viability, or within the subgroup of individuals who reproduced if SA selection acts on fertility or fecundity. Theoretical studies (Kidwell et al. 1977; Rice 1984) have shown that, under some assumptions, a polymorphism at a locus undergoing unresolved IASC could persist and reach equilibrium in the genome. Such loci, named SA polymorphisms, are therefore under balancing selection. A seminal model proposed by Rice (1984) predicts that the X chromosome is especially prone to the accumulation of loci under IASC, as compared with autosomes, because of male hemizygosity. Indeed, because of the asymmetry of transmission and expression of the X chromosome between the sexes, partially recessive male-advantageous alleles and partially dominant female-advantageous alleles can increase in frequency under more lenient conditions if X-linked than if autosomal. Although this theoretical prediction has been extensively discussed over the last 30 years, this question has provided conflicting results (Fry 2010): Some in favor of an X-linked location of loci under IASC, in accordance with the predictions of Rice’s model (Gibson et al. 2002; Foerster et al. 2007; Innocenti and Morrow 2010), and others providing evidence for an autosomal location of such loci (Calsbeek and Sinervo 2004; Fedorka and Mousseau 2004; Delcourt et al. 2009).
Here, we developed a framework to detect differences in allelic frequencies between males and females using HapMap III, a data set of approximately 1.5 million single nucleotide polymorphisms (SNPs) genotyped in 11 worldwide human populations (Altshuler et al. 2010), and assessed whether the X chromosome is enriched for loci showing differences in allelic frequencies between males and females. Because such differences can also result from other processes such as sex-specific demographic events, we performed a number of analyses to verify that the observed differences were more likely due to SA selection.
Materials and Methods
Polymorphism Data Set in Humans
The HapMap III.3 data set consists of approximately 1.5 million genome-wide SNPs typed in 1,397 individuals sampled in 11 worldwide human populations (Altshuler et al. 2010). We only included unrelated individuals from the HAP1161 set proposed by Pemberton et al. (2010) in which a member of every first degree relative pair was excluded (supplementary table S1, Supplementary Material online). The number of males and females after quality controls were overall similar in each population. SNPs with a minor allele frequency lower than 5% were removed to further reduce the amount of possible genotyping errors. The X-linked SNPs located in the pseudoautosomal regions (PARs) were not considered in this study. Because we focused our analysis on differences between the sexes, any SNP mapping on the Y chromosome would be critical, as it would skew allelic frequency in a sex-specific manner. Therefore, the X-linked SNPs outside of the PARs that were reported as heterozygous in males were excluded. To control whether other SNPs, including autosomal SNPs, could map on the Y chromosome and bias our results, we performed a systematic sequence similarity search using BLAST (Altschup et al. 1990) of their flanking sequences (±30 bp around the SNP) against the Y chromosome sequence (using the hg18 human genome assembly). Any SNP whose flanking sequences had a perfect match with Y-linked sequences was removed (36 SNPs; supplementary table S2, Supplementary Material online). After quality control, the mean number of SNPs per population was 1,122,087 SNPs.
We focused most of our analyses on genic SNPs. The positions of all genes known to date were downloaded from the UCSC database (build NCBI36/hg18; Karolchik et al. 2008), and genic SNPs were defined as being within ±5 kb of a gene’s coordinates. The mean number of genic SNPs per population was 556,558 SNPs.
Investigating a Sex-Specific Population Structure
We first evaluated the level of sex-specific stratification in each of the 11 populations from HapMap. We calculated the allele sharing distance (ASD) (Bowcock et al. 1994) between individuals using the software asd (http://szpiech.com/software.html, last accessed May 2, 2016) developed by Zachary Szpiech and performed a multidimensional scaling (MDS) analysis on these genetic distances. Because missing data result in skewed ASD values, we only included individuals harboring less than 0.5% of missing data. Additionally, we LD-pruned (linkage disequilibrium) the data to keep only independent SNPs using the “–indep-pairwise” option in PLINK (Purcell et al. 2007) with an r2 threshold of 0.25 within a window of 50 SNPs and a step of 10 SNPs. For the X chromosome, to circumvent the problem of male hemizygosity when calculating ASD, we considered only one X chromosome per female by randomly selecting one allele for each SNP and calculated a haploid ASD. We repeated this sampling 30 times. To test if the observed mean genetic distance between males and females was significant, we randomly permutated 10,000 times the “male” and “female” labels in the distance matrix, and calculated a new mean distance between males and females for each permutation. We then tested whether our observed value was significantly different from the distribution obtained after permutation (P < 5%).
Detection of Sexually Differentiated SNPs
Within each population, we considered males and females as two distinct samples. We performed Fisher’s exact tests per SNP to assess if the difference in allelic frequencies between males and females was significant. SNPs with a significant Fisher’s exact test P-value were named sexually differentiated (SD) SNPs.
To correct the Fisher’s test P-values for multiple testing, we performed a false discovery rate (FDR) correction (Benjamini and Hochberg 1995) in each population at two different levels independently. First, we focused our study on genic SNPs and corrected their P-values for the number of SNPs per gene to correct for gene length bias (supplementary fig. S1, Supplementary Material online). SD SNPs with a significant q-value after per-gene FDR correction were named genic SD SNPs. Second, we corrected for the number of SNPs at the genome-wide level. The SD SNPs with a significant q-value after genome-wide FDR correction were named genome-wide SD SNPs.
Potential Factors Biasing the X Chromosome Versus Autosomes Comparison
To investigate the effect of potential confounding factors on the difference in significance between the X chromosome and the autosomes, we applied different filters to our data set and checked that we obtained similar results.
Simulated Haploidization of Autosomes in Males
To assess the effect of smaller sample sizes in males for X-linked loci compared with autosomal loci on Fisher’s exact tests, we simulated a haploidization of males’ autosomes by randomly removing one autosomal copy in males.
Correction for LD
To remove any bias due to differences in LD between chromosomes, we LD-pruned the genic SNPs in each population using PLINK (Purcell et al. 2007) with a stringent r2 threshold of 0.1 within a window of 50 SNPs and a step of 10 SNPs (resulting in 37,159 genic SNPs on average per population after filtering).
Concordance between the 1000 Genomes Project and the HapMap Data Set
We compared the allelic frequencies of the SD SNPs (genic and genome-wide) between data sets obtained with different molecular technologies: HapMap III.3 (Altshuler et al. 2010), a genotyping data set, versus the 1000 Genomes (1000 Genome Project Consortium 2010), a resequencing data set. For this comparison, we used the 1000 Genomes Phase 1 SNP calls data set, which contains the genotype of 1,092 individuals for 14 populations, including 8 populations shared with the HapMap data set. This allowed us to detect genotypes with discordant allelic frequencies between the two molecular technologies which therefore are likely to present technical issues. To assess significance, two Fisher’s exact tests were performed, respectively comparing the allelic frequencies of males and females. The SNPs for which either one of the Fisher’s exact tests was significant (P < 0.05) were considered as discordant between the datasets. If possible, when a population from HapMap was absent from the 1000 Genomes project, we compared the allelic frequencies with a population from the same geographical area.
Signal Distribution Close to Candidate SNPs
To test if the signal we measured on genic SD SNPs spreads in their vicinity, we studied the region surrounding each genic SD SNP in a window of ±300 kb. To quantify the signal of differentiation in allelic frequencies between the sexes, we used the fixation index FST, a measure of divergence between groups of chromosomes. In order to take into account the differences in sample sizes between the sexes for the X chromosome, we used the haploid formula in Weir (1996). We used the per-population genome-wide list of genic SD SNPs after LD pruning. We performed the same analysis with random genic SNPs as focal SNPs, chosen to match the allelic diversity of the genic SD SNPs. We considered intervals of 1 kb for the first 50 kb, and then of 5 kb between 50 and 300 kb. Intervals were nonoverlapping, and each SNP was considered at most once. Any SNP located near several focal SNPs was associated with the closest. We first computed the FST per SNP. For each focal SNP, we then computed a mean FST for every interval. Finally, the distribution of FST per interval was built over all focal SNPs. The mean and the 95% quantiles were calculated for each interval.
Pattern of Genic SD SNP Proportions for Random Groups of Individuals
We assessed whether the observation that the X chromosome shows higher proportions of genic SD SNPs than the autosomes was mostly due to the comparison of allelic frequencies between males and females by performing the same analysis on random groups of individuals. For this analysis, we removed the admixed ASW (African ancestry in Southwest USA) and MEX (Mexican ancestry in Los Angeles, CA) populations because they do not follow the same pattern as the other populations (see discussion in supplementary text S1, Supplementary Material online).
For each population, we generated independent bipartitions by randomly dividing the population into two groups of individuals (supplementary fig. S2, Supplementary Material online). We then calculated Δp, the difference in sex ratio between the two groups, that is, the absolute value of the difference between the proportion of females in Group A and the proportion of females in Group B. The bipartitions were built to obtain all possible values for Δp, which rank from 0, when both groups have the same proportion of males and females, to 1, when there are only females in one group and only males in the other group. We used a different framework for autosomes and X chromosomes to maintain the same number of chromosomes in each group across the bipartitions, and consequently the same statistical power. We repeated the sampling of males and females ten times for each value of Δp.
For each bipartition, we detected SNPs showing differences in allelic frequencies between the two groups. We performed a Fisher’s exact test per SNP and applied an FDR correction per gene. The set of genic SNPs was LD-pruned as described previously. The proportion of SNPs showing significant differences in allelic frequencies after FDR correction (called genic SD SNPs) was computed for each chromosome. We then combined the results obtained from all populations (and over all chromosomes for the autosomes) by grouping together bipartitions with similar Δp.
Additionally, we calculated the percentage of similarity between each SNP list and the list of genic SD SNPs detected when comparing groups composed exclusively of males or females (Δp = 1). The percentage of similarity corresponds to the number of SNPs found in both lists divided by the number of SNPs in the list considered. We also calculated the mean percentage of similarity between replicates for each Δp.
Simulation of Loci under SA Selection
We simulated loci with two alleles, A and B, under SA selection acting on viability. First, we generated a set of 10,000 loci under neutrality using ms (Hudson 2002). From this pool, we then created 3,000 individuals by randomly sampling 2 alleles for the autosomes, and 1 or 2 alleles for the X chromosome depending on whether the individual was a male or a female. Then, we simulated random mating and selection for 21 generations while maintaining a constant population size of 3,000 individuals. The model for SA selection was parameterized as in Fry (2010) (supplementary table S3, Supplementary Material online). We considered that dominance was the same in both sexes (i.e., hm = 1−hf = h), as in Rice (1984), and that selective coefficients were equal in both sexes (i.e., sf = sm = s), but acting on different genotypes. We varied the h and s coefficients between 0 and 1 and between 0 and 0.5, respectively. Each simulation was independently replicated ten times. After the last generation, we performed 10 subsamplings of 100 individuals.
Functional Enrichment Analysis
A gene was defined as SD if it contained at least one genic SD SNP in one population within ±5 kb of its coordinates. Each SD gene was associated with the lowest P-value of its genic SD SNPs. For this analysis, we did not use the list of LD-pruned genic SD SNPs because this would result in randomly removing one of the linked SNPs and therefore excluding SNPs with potential functional significance. We detected 5,705 SD genes, by pooling all populations and all chromosomes. We performed a genome-wide functional enrichment analysis on the SD gene list using the DAVID functional annotation chart tool (Huang et al. 2009). A subset of the Gene Ontology database, the GO FAT set, was used to calculate the functional enrichment, allowing a better readability of the results by keeping only specific terms. We used the “Functional Annotation Clustering” tool to rank the biological significance of groups of genes and to assign them an enrichment score (a cluster being usually considered as significantly enriched if its score is 1.3 or more; Huang et al. 2009) as this analysis is only possible for input gene lists with less than 3,000 genes, we performed the clustering analysis on the 3,000 SD genes with the lowest P-values (SDG3000), the 1,000 SD genes with the lowest P-values (SDG1000), and the SD genes shared by at least 2 populations (SDGsh, 672 genes; supplementary table S4, Supplementary Material online). We also performed the analysis only on the list of X-linked SD genes (192 genes).
Results
Our study aims at detecting differences in allelic frequencies between males and females, a signature of SA selection on viability. For each locus in each population, we assessed significance of such differences using Fisher’s exact tests and obtained a set of SD SNPs.
Sex-Specific Population Structure
Sex-specific population structures could potentially result in differences in allelic frequencies between males and females at the genome-wide level. Such structures might occur for various reasons, including sex-biased sampling in a population with fine-scale genetic structure or sex-specific admixture processes (i.e., if mating occurs preferentially between males and females originating from genetically differentiated populations). Importantly, in the case of a single event of sex-specific admixture, the differences in allelic frequencies between sexes would disappear after one generation on the autosomes, while it would take longer to vanish on the X chromosome, leading to a higher differentiation between males and females as compared with the autosomes (Balaresque et al. 2004). However, recurrent sex-specific admixture events, with constant migrations biased toward one sex, would affect allelic frequencies both on the autosomes and the X chromosome. To assess whether such structures exist in our data set, we performed an MDS analysis on each of the 11 HapMap populations and tested whether the genetic distance between males and females was significant. The MDS analyses were based on pairwise ASDs among individuals, computed either for the autosomes or for the X chromosome (supplementary figs. S3 and S4, Supplementary Material online). On the autosomes (supplementary table S5, Supplementary Material online), the LWK population (Luhya from Webuye, Kenya) showed a significant mean genetic distance between males and females (P = 0.03) and was therefore removed from the study. On the X chromosome, the CHB population (Han Chinese from Beijing) exhibited a significant genetic distance between males and females for all repeats (supplementary table S6, Supplementary Material online). This structure was found to be driven by two outliers on the MDS. When these two individuals were removed, the genetic distances observed were no more significant, indicating that the permutation method we used to assess significance is greatly influenced by outliers. As it is unlikely that 2 males over 53 would affect the overall differences in allelic frequencies between males and females, we did not remove this population from the study. We found no X-linked or autosomal sex-specific structures in the other populations, therefore biased demographic processes between the sexes or biased sampling are unlikely to create differences in allelic frequencies between the sexes in these populations.
SA Polymorphisms in Genic Regions
To evaluate the functional significance of SD SNPs, we first focused our analysis on genic SNPs. Because long genes are more likely to contain an excess of significant SNPs by chance compared with shorter genes, we corrected the Fisher’s exact tests P-values of genic SNPs for the number of SNPs per gene with an FDR correction. The SNPs with a significant q-value were defined as genic SD SNPs. We found a mean of 2,441 nonindependent genic SD SNPs per population.
The distributions of the proportions of genic SD SNPs per chromosome over the ten HapMap populations included in the study are shown in figure 1A. The X chromosome presents a significantly higher proportion of genic SD SNPs than most autosomes (using one-sided Wilcoxon–Mann–Whitney test, P < 0.05), except for chromosomes 17, 19, and 22.
Fig. 1.—
Proportions of genic SD SNPs per chromosome (A) before LD pruning and (B) after LD pruning (filtering of genic SNPs with an r2 > 0.1). For each chromosome, the distribution of the proportions of genic SD SNPs after FDR correction at the gene level over the ten HapMap populations is represented by a boxplot. Levels of significance of one-sided Wilcoxon–Mann–Whitney tests between the distributions of proportions on the X chromosome (in red) and each of the autosomes (in blue) are reported. *P < 0.05, **P < 10−2, ***P < 10−3.
Effect of Potentially Confounding Factors
Next, we assessed the effect of various potentially confounding factors on the proportions of genic SD SNPs.
Because Fisher’s exact tests, used here to assess the significance of differentiation between the sexes, are sensitive to sample sizes (Crans and Shuster 2008), and because autosomes have a sample size twice as large as that of the X chromosome in males, our test on the X chromosome might lack statistical power. To evaluate this effect, we simulated a haploidization of autosomes in males by randomly removing one of the two autosomal copies for each SNP. As expected, the distributions of proportions of genic SD SNPs on autosomes were shifted toward lower values after haploidization (supplementary fig. S5, Supplementary Material online), and the signal of enrichment in genic SD SNP of the X-chromosome was now highly significant as compared with all autosomes (one-sided Wilcoxon–Mann–Whitney tests, P < 10−4).
LD patterns can greatly vary across chromosomal regions, and the X chromosome is known to carry SNPs in higher LD than other chromosomes (Pritchard and Przeworski 2001). If not corrected, this bias could lead to a larger number of SNPs hitchhiking on the X chromosome. To correct for such LD effects, we LD-pruned the genic SNPs (i.e., randomly removed one SNP from each pair of SNPs showing an r2 > 0.1) and calculated the new proportions of genic SD SNPs over the ten populations. After LD-pruning, the number of identified loci was strongly reduced, with a mean of 118 independent genic SD SNPs per population. For all autosomes but one, chromosome 19, the mean proportions of genic SD SNPs were significantly lower than that of the X chromosome (one-sided Wilcoxon–Mann–Whitney test; fig. 1B). Therefore, differences in LD between chromosomes do not lead to a bias of significant SNPs in favor of the X chromosome. Noteworthy, chromosome 19, which tends to show higher proportions of genic SD SNPs compared with the other autosomes, is known to have a high gene density compared with the other chromosomes (Grimwood et al. 2004).
To evaluate whether genotyping errors in the HapMap data set could lead to an excess of genic SD SNPs on the X chromosome, notably because of genotyping quality issues owing to male hemizygosity, we also performed our analysis exclusively on SNPs that had concordant allelic frequencies between the genotyping data set used here (HapMap III) and a resequencing data set—the 1000 Genomes project (1000 Genome Project Consortium 2010). Even though four autosomes that were previously significantly different from the X chromosome did not reach significance in this analysis (supplementary fig. S6, Supplementary Material online), the proportion of genic SD SNP on the X chromosome was still significantly higher than that of all autosomes pooled together (one-sided Wilcoxon–Mann–Whitney test, P = 4.5 × 10−3). Therefore, the pattern observed in this reduced data set was consistent with our initial analysis and the difference might be due to a lack of statistical power.
Heterogeneity among Populations
To test whether patterns of enrichment on the X chromosome were consistent among populations, we then compared the proportions of genic SD SNPs between the X chromosome and a subset of 2 autosomes for each of the 10 populations (fig. 2). We used the set of LD-pruned genic SD SNPs because LD patterns vary among populations. We chose chromosomes 10 and 22 for this comparison because the former contains approximately the same number of genes and the latter the same number of SNPs after LD-pruning as the X chromosome. The proportions of genic SD SNPs were significantly greater for the X chromosome compared with chromosomes 10 and 22 in six populations and four populations, respectively (one-sided Fisher’s exact test at a significance level of 5%). Although the differences were not significant in the CEU (Utah residents with Northern and Western Europe ancestry) and JPT (Japanese from Tokyo) populations, the same trend was observed. In two populations, namely ASW and MEX, the proportions of genic SD SNPs on the X chromosome were not significantly different from that of both autosomes and all proportions appeared to be overall lower. These two populations are both known to be recently admixed (Pemberton et al. 2010) and have approximately half the number of individuals as those in other HapMap populations (supplementary table S1, Supplementary Material online), two characteristics that could influence their proportions of genic SD SNPs (supplementary text S1 and fig. S7A and B, Supplementary Material online). In the case of ASW, though, this does not seem to fully explain the observed pattern (supplementary text S1, Supplementary Material online).
Fig. 2.—
Proportions of genic SD SNPs per population for chromosomes X, 10, and 22. For each population, the proportions of genic SD SNPs after FDR correction at the gene level are shown for chromosomes X, 10, and 22 after LD pruning (r2 > 0.1). A Fisher’s exact test was performed to compare the proportions between the X chromosome and each autosome. *P < 0.05, **P < 10−2, ***P < 10−3.
We then compared the mean proportions of genic SD SNPs between the X chromosome and the autosomes when pooling all populations but the two admixed ones. The proportion on the X chromosome became significantly higher than that of all autosomes, including chromosome 19 (supplementary fig. S8, Supplementary Material online).
Additionally, we looked at the concordance of the lists of genic SD SNPs across populations. We found that, before LD pruning, the concordance between populations is low (supplementary table S4, Supplementary Material online). However, although only 3.6% of genic SD SNPs are shared between at least two populations, 12.6% of SD genes are identified in several populations, suggesting that there are more sharing of genes than SNPs, maybe due to differences in LD between populations or to functional convergence.
Signal of Differentiation in Neighboring Regions of Genic SD SNPs
To test whether there is a signal of differentiation in the vicinity of genic SD SNPs, we computed the fixation index (FST), a standardized measure of frequency differences between males and females, in a window of ±300 kb around each LD-pruned genic SD SNPs (supplementary figs. S9 and S10, Supplementary Material online). We observed that the FST signal indeed spreads in regions neighboring genic SD SNPs, and is significantly higher than for random SNPs at least up to 100 kb (one-sided Wilcoxon–Mann–Whitney test at a significance level of 5%; supplementary fig. S10, Supplementary Material online), which indicates that the observed differences in allelic frequencies between the sexes result from a biological process rather than a technical artifact.
Pattern of Genic SD SNP Proportions for Random Groups of Individuals
To further ascertain that the higher proportion of genic SD SNPs observed on the X chromosome is a consequence of SA selection rather than, for example, inherent demographic or genomic differences between the X chromosome and the autosomes, we performed our analyses between random groups of individuals instead of comparing males and females. If the observed enrichment of signal on the X chromosome is due to SA selection, we expect to observe an equal proportion of genic SD SNPs between the X chromosome and the autosomes when groups have unbiased sex ratios (Δp = 0). Furthermore, if the signal is due to sex-specific processes, we expect the proportions of genic SD SNPs to increase with the sex-ratio bias.
Surprisingly, using the set of LD-pruned SNPs and the eight nonadmixed populations, we found that the proportions on the X chromosome are significantly higher than those on the autosomes, even when the sex ratio does not differ between the two groups (fig. 3).
Fig. 3.—
Effect of Δp, the absolute difference between the proportion of females in one group and the proportion of females in the other group, on the proportions of genic SD SNPs on the X chromosome and on the autosomes. The ten repetitions are pooled together.
Interestingly, though, the overall pattern differs greatly between the X chromosome and the autosomes. Indeed, the proportions of genic SD SNPs on the X chromosome increase with the sex-ratio bias (Spearman’s rank correlation, rho = 0.48, P = 3 × 10−14), and are significantly higher for Δp = 0.94 than for Δp = 0 (one-sided Wilcoxon–Mann–Whitney test, P = 2 × 10−5) (fig. 3). The number of genic SD SNPs on the X chromosome increases with the sex-ratio bias from a mean of 49.5 to 57 SNPs across populations, therefore resulting in an enrichment of 7 X-linked SNPs. In contrast, the proportions on the autosomes do not increase with the sex-ratio bias, and even show a slight decrease (Spearman’s rank correlation, rho = −0.06, P = 8.70 × 10−6). Accordingly, the proportions of genic SD SNPs for Δp = 0.94 are not significantly greater than those for Δp = 0 (one-sided Wilcoxon–Mann–Whitney test, P = 0.98).
For each Δp, we calculated the percentage of similarity between the genic SD SNPs detected when comparing two groups of randomly selected individuals versus when contrasting males and females. We expect the percentage of similarity to increase with the sex-ratio bias because the number of combinations of individuals decreases and therefore the compared individuals are increasingly similar. However, we found that the percentage of similarity increases more drastically on the X chromosome than on the autosomes (supplementary fig. S11, Supplementary Material online). The same pattern is observed for the percentage of similarity among independent replicates for each Δp (supplementary fig. S11, Supplementary Material online). This indicates that the lists of genic SD SNPs are more similar on the X chromosome than on the autosomes when the sex-ratio bias increases.
These results therefore suggest that sex-specific selective processes are more strongly influencing the differences in allelic frequencies between males and females on the X chromosome as compared with the autosomes.
Strong Signals of SA Polymorphisms at the Genome-Wide Level
We were also interested in characterizing the strongest signals in each population, including all SNPs, whether genic or nongenic. We performed an FDR correction at the genome-wide level, correcting for the total number of SNPs included in the data set. We found a total of ten significant SNPs with the genome-wide correction, named genome-wide SD SNPs (supplementary table S7, Supplementary Material online). To prevent potential biases due to genotyping errors, we compared the allelic frequencies of these SNPs between the HapMap and 1000 Genomes data sets and retained only concordant SNPs (supplementary table S7, Supplementary Material online). Many SNPs (eight out of ten) were excluded using this criterion, highlighting technical errors associated with one data set or the other.
The two remaining genome-wide SD SNPs, showing strong signals of allelic frequency differentiation between the sexes, were both X-linked SNPs detected in the GIH population (Gujarati Indians in Houston, TX). This signal is potentially functionally relevant, as both SNPs map in a gene causing different neuronal pathologies between the sexes when defective (supplementary text S2, Supplementary Material online). However, the GIH population is not present in the 1000 Genomes data set, so we were unable to control for genotyping errors and we cannot rule out that the extreme difference in allelic frequencies observed for these SNPs is due to technical issues.
Simulation of Loci under SA Selection
We computed for the X chromosome and the autosomes the distributions of FST from the whole simulated data (3,000 individuals) (supplementary figs. S12 and S13, Supplementary Material online), and for each independent repetition, we performed 10 sampling of 100 individuals, which is approximately the sample size of the HapMap populations (supplementary table S1, Supplementary Material online). We aimed to estimate the effect of sampling a small number of individuals (as compared with the effective population size) on the FST values. We observed that, for a given selection and dominance coefficient, sampling individuals results in much higher variances and more extreme values of the FST (supplementary figs. S12 and S13, Supplementary Material online).
This sampling effect, combined with a winner’s curse effect, that is, the tendency to detect the most extreme values as significant and therefore to overestimate these values, artificially increases the FST values of genic SD SNPs. Indeed, with small samples, the statistical power to detect significant FST is lower, and we likely only detect extremely high values of FST.
Furthermore, after sampling, the FST distributions seem to show higher variances, and therefore more extreme values, on the X chromosome than on the autosomes. This pattern is not observed before sampling, suggesting that the sampling effect might have a greater impact on the FST values of the X chromosome, which is expected considering the smaller sample size of the X chromosome. This could partly explain why we observe a higher proportion of genic SD SNPs on the X chromosome as compared with the autosomes when comparing groups with equal sex ratios.
In our data, the mean FST observed for genic SD SNPs after LD-pruning in the nonadmixed HapMap populations is of 0.092 on the X chromosome, while it is of 0.067 on autosomes. Considering our simulation results before resampling, these FST values would require extreme selection coefficients, higher than 0.5, which are exceedingly unrealistic. However, when we take into account the sampling effect, we show that such FST values can be observed for selection coefficients as low as s = 0.1. Given the high variance, it is difficult to assess the selection coefficients necessary to produce the observed FST, and the study of a data set with larger sample sizes will be necessary to estimate the strength of SA selection on viability acting on human populations.
Are Loci under IASC Involved in Specific Functions?
To assess if the SD genes were enriched in specific functions, we performed a functional enrichment analysis using the DAVID annotation chart tool (Huang et al. 2009). This analysis was performed on the genome-wide 3000 SD genes with the lowest P-values (SDG3000), the genome-wide 1000 SD genes with the lowest P-values (SDG1000), the genome-wide SD genes shared by at least two populations (SDGsh), and the X-linked SD genes (192 genes), using the systematic clustering method implemented in DAVID.
The results for the genome-wide lists are shown in supplementary table S8, Supplementary Material online. The clusters “response to stimulus and immune system” and “epidermis development” are significantly enriched in the SDG3000 and SDG1000 lists. However, the genes included in epidermis development are located in two clusters of genes that are involved in the same functions. These genes might display the same signal for SA selection as an artifact of LD and bias the functional enrichment. A cluster of functions referring to “glycolysis” is found enriched in the SDG3000 list, as well as a cluster of functions referring to “reproduction” in the SDG1000 list. One cluster highlighted in SDGsh refers to functions associated with morphogenesis.
For the X-linked SD genes, we found one significantly enriched cluster of functions involved in “nucleotide and nucleoside binding” (enrichment score of 1.33).
Discussion
In this study, we identified a list of SD SNPs in a genome-wide genotyping data set of 10 human populations, with a mean of 118 independent genic SD SNPs per population. This signal may reflect different mechanisms, including SA selection on viability that occurs between gametogenesis and adulthood, when the individuals were sampled and genotyped. Severe selection is indeed suspected to occur in humans during early development; the probability of fetus survival from fertilization to term could be lower than 50% (Benagiano et al. 2010). SA selection occurring during early development would result in a sex-specific transmission distortion. This hypothesis is consistent with a study in which significant differences in allelic frequencies between the sexes were observed in newborn humans (Ucisik-Akkaya et al. 2010).
A fraction of the signal could also be due to SA selection acting on reproduction, because five of the ten populations incorporated in the study include couples that had at least one child. These couples represent 40–98% of the populations’ sample size (supplementary table S9, Supplementary Material online). Indeed, when considering subgroups of individuals who successfully reproduced, we can also capture signals of SA selection on fecundity or fertility. If, for a given locus, an allele is beneficial to males and another allele is beneficial to females in terms of fecundity or fertility, and if we consider only individuals who successfully reproduced, we expect to observe an enrichment for the beneficial allele in each sex, leading to differences in allelic frequencies between males and females.
Alternative Mechanisms Leading to Differences in Allelic Frequencies between the Sexes
Differences in allelic frequencies between the sexes may also arise from sex-specific (but not antagonistic) selection on viability, where a given locus is under selection in one sex but is neutral in the other, or cases where the selective pressures act in the same direction in both sexes but not with the same intensity. It is therefore possible that the list of SD loci we identified contains SNPs undergoing this kind of selection. However, such selective processes will likely lead to smaller (and less detectable) differences in allelic frequencies between the sexes than SA selection. Furthermore, polymorphisms induced by such processes are expected to be transient and, as a result, should be rarely observed. On the contrary, it is expected that SA selection leads to more stable polymorphisms.
Additionally, differences in allelic frequencies between the sexes may arise from sex-specific demographic processes. However, in this study, we tested for sex-specific population stratification, and found that demographic processes or sampling bias were unlikely to be at the origin of the differences in allelic frequencies between males and females observed in the studied populations.
It is important to note that a difference in age between the sampled males and females could also lead to differences in allelic frequencies between the sexes, reflecting selective pressures occurring at specific ages but in both sexes. Although it would be interesting to explore this hypothesis with an adequate data set, it is not possible to ascertain if such bias exists in our data, as the age of the participants is unknown. However, five the ten populations incorporated in the study include couples, and the difference in age within couples is known to be, on average, about 3.5 years (Fenner 2005), which would only slightly affect our results.
Chromosomal Location of Genic SD SNPs
Furthermore, we studied the chromosomal location of the genic SD SNPs, and we found that the X chromosome exhibits a significantly higher mean proportion of genic SD SNPs than most autosomes, which is consistent with Rice’s model predicting that the X chromosome offers a more favorable environment for the accumulation of loci under SA selection. The observed signal of differentiation between the sexes is not restricted to the identified genic SD SNPs, but spreads in their neighboring regions, suggesting that it is likely due to biological processes rather than technical errors. Moreover, we showed that this pattern is not influenced by LD or by outlier populations. The presence of marked differences in statistical power in favor of the autosomes due to differences in sample sizes further supports our conclusions.
Patterns of Enrichment for Random Groups of Individuals
Additionally, to assess whether the enrichment of genic SD SNP on the X chromosome was likely to be due to SA selection, we compared different groups created at random and presenting a varying range of sex-ratio bias. We found that there is a significant difference in proportions of genic SD SNPs between the X chromosome and the autosomes even for groups with unbiased sex ratio. This difference could be explained by a more severe effect of sampling on the X chromosome as compared with the autosomes, but also other factors that would need to be characterized in further studies.
However, we found that the proportions of genic SD SNPs detected on the X chromosome are significantly higher for groups composed exclusively of males or females than for groups with unbiased sex ratio, while this tendency was not found on the autosomes. This highlights that part of the signal we identify on the autosomes are likely false positives, maybe because of the sampling of individuals, as shown in our simulations. More importantly, it also indicates that a portion of the signal detected on the X chromosome is clearly due to sex-specific selective processes. We were further able to determine that, taking into account all populations, there is an enrichment of at least seven X-linked SNPs for the groups composed of exclusively males or females. We observed a steeper increase in the similarity between the lists of genic SD SNP when the sex-ratio bias was increased in the X chromosome as compared with the autosomes, suggesting that the X-linked genic SD SNP list is enriched for SA loci. Therefore, the X chromosome seems indeed to be more prone to the accumulation of SA loci as compared with the autosomes.
The Extreme Signals Detected
The strongest signals identified, after the genome-wide FDR correction, are two SNPs on the X chromosome. These SNPs present an extremely high difference in allelic frequency between the sexes (0.61; supplementary table S7, Supplementary Material online). If these were due to SA selection, they would imply very strong selection coefficients. The high differences observed here could however result from a winner’s curse effect, which would lead to the detection of artificially high values of differences in allelic frequencies (Ioannidis 2008).
Estimation of the Strength of Selection
Although we have shown that the selection coefficients needed to obtain the mean FST values are high, our simulations indicate that we cannot rely on selection coefficient estimations from small samples. Indeed, the selection coefficients estimated are higher than in reality, because of a combined effect of sampling and winner’s curse (Ioannidis 2008). Moreover, we argue that SA selection is likely to act on haplotypes rather than on separate loci, and that more than one combination of alleles are likely to be advantageous in one sex, therefore leading to less stringent selection. Further study using haplotype-based statistics and modeling would be needed to untangle the underlying selective coefficients at the origin of these patterns.
Functional Enrichment Analysis
The last aim of our analysis was to perform a functional enrichment of the genome-wide and X-linked SD genes lists. Although it is difficult to draw any conclusion from the functional enrichment of the X-linked SD genes, for the genome-wide SD genes several functional categories identified as being enriched encompass traits known to show at least minor sexual dimorphisms in humans: Reproductive process, glycolysis (Shi and Clegg 2009; Nookaew et al. 2013), and immune system (Marriott and Huet-Hudson 2006; Klein 2012). The category reproductive system might reflect that a fraction of the detected genic SD SNPs is actually under SA selection on reproduction rather than viability. These findings are consistent with the expectation that when studying traits under IASC, one should find sexually dimorphic traits (Bonduriansky and Chenoweth 2009).
Pseudoautosomal Regions
Another region of particular interest to study SA polymorphism would be the PARs, which have recently been the subject of several theoretical and empirical studies (Qiu et al. 2013; Charlesworth et al. 2014; Kirkpatrick and Guerrero 2014). Indeed, they are the only regions of the sex chromosomes to exhibit autosomal features for recombination and inheritance, but their evolutionary dynamics are also influenced by their sex linkage. Under some conditions, it has been theoretically shown that the maintenance of SA polymorphisms on PARs is facilitated compared with sex-specific regions of the X chromosome and to the autosomes (Otto et al. 2011). The genotyping of the PARs in the HapMap data set is currently not of high enough quality to be further studied. It would however be interesting to investigate in the future signatures for SA selection in these regions, for example, with resequencing data sets.
Conclusion
In this study, for the first time, we used a genome-wide scan to identify differences in allelic frequencies between males and females in ten human populations. Our results support a preferential location of SA loci on the X chromosome as compared with the autosomes. Although our analyses could not establish that the signal observed on the autosomes was due to sex-specific processes, we found that SA selection on viability is likely to act on several SNPs on the X chromosome. The use of trio data sets, including genome-wide sequencing of parents and children, would enable studying sex-specific transmission distortions to assess the strength of SA selection during embryonic development and childhood.
Supplementary Material
Supplementary tables S1–S9, figures S1–S14, and text S1 and S2 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).
Acknowledgments
We thank Mark Kirkpatrick, Paul Verdu, Frédéric Austerlitz, Patricia Balaresque and two anonymous reviewers for useful comments and suggestions. E.A.L. was financed by a PhD grant from the French Ministry of Higher Education and Research. Part of this work used the high performance computing resources “Calcul Intensif et Algorithmique” (PCIA) from the Museum National d'Histoire Naturelle (MNHN).
Literature Cited
- 1000 Genome Project Consortium. 2010. A map of human genome variation from population-scale sequencing. Nature 467:1061–1073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Altschup SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol. 215:403–410. [DOI] [PubMed] [Google Scholar]
- Altshuler DM, et al. 2010. Integrating common and rare genetic variation in diverse human populations. Nature 467:52–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Badyaev AV. 2002. Growing apart: an ontogenetic perspective on the evolution of sexual size dimorphism. Trends Ecol Evol. 17:369–378. [Google Scholar]
- Balaresque P, Toupance B, Quintana-Murci L, Crouau-Roy B, Heyer E. 2004. Sex-specific selection on the human X chromosome? Genet Res. 83:169–176. [DOI] [PubMed] [Google Scholar]
- Benagiano G, Farris M, Grudzinskas G. 2010. Fate of fertilized human oocytes. Reprod Biomed Online. 21:732–741. [DOI] [PubMed] [Google Scholar]
- Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 57:289–300. [Google Scholar]
- Bonduriansky R, Chenoweth SF. 2009. Intralocus sexual conflict. Trends Ecol Evol. 24:280–288. [DOI] [PubMed] [Google Scholar]
- Bowcock AM, et al. 1994. High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368:455–457. [DOI] [PubMed] [Google Scholar]
- Calsbeek R, Sinervo B. 2004. Within-clutch variation in offspring sex determined by differences in sire body size: cryptic mate choice in the wild. J Evol Biol. 17:464–470. [DOI] [PubMed] [Google Scholar]
- Charlesworth B, Jordan CY, Charlesworth D. 2014. The evolutionary dynamics of sexually antagonistic mutations in pseudoautosomal regions of sex chromosomes. Evolution 68:1339–1350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crans G, Shuster J. 2008. How conservative is Fisher’s exact test? A quantitative evaluation of the two-sample comparative binomial trial. Stat Med. 27:3598–3611. [DOI] [PubMed] [Google Scholar]
- Delcourt M, Blows MW, Rundle HD. 2009. Sexually antagonistic genetic variance for fitness in an ancestral and a novel environment. Proc R Soc B. 276:2009–2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fedorka KM, Mousseau TA. 2004. Female mating bias results in conflicting sex-specific offspring fitness. Nature 429:65–67. [DOI] [PubMed] [Google Scholar]
- Fenner JN. 2005. Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies. Am J Phys Anthropol. 128:415–423. [DOI] [PubMed] [Google Scholar]
- Foerster K, et al. 2007. Sexually antagonistic genetic variation for fitness in red deer. Nature 447:1107–1110. [DOI] [PubMed] [Google Scholar]
- Fry JD. 2010. The genomic location of sexually antagonistic variation: some cautionary comments. Evolution 64:1510–1516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gibson JR, Chippindale AK, Rice WR. 2002. The X chromosome is a hot spot for sexually antagonistic fitness variation. Proc R Soc B. 269:499–505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grimwood J, et al. 2004. The DNA sequence and biology of human chromosome 19. Nature 428:529–535. [DOI] [PubMed] [Google Scholar]
- Huang DW, Sherman BT, Lempicki RA. 2009. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 4:44–57. [DOI] [PubMed] [Google Scholar]
- Innocenti P, Morrow EH. 2010. The sexually antagonistic genes of Drosophila melanogaster. PLoS Biol. 8:e1000335.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ioannidis JPA. 2008. Why most discovered true associations are inflated. Epidemiology 19:640–648. [DOI] [PubMed] [Google Scholar]
- Karolchik D, et al. 2008. The UCSC genome browser database: 2008 update. Nucleic Acids Res. 36:D773–D779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kidwell JF, Clegg MT, Stewart FM, Prout T. 1977. Regions of stable equilibria for models of differential selection in the two sexes under random mating. Genetics 85:171–183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kirkpatrick M, Guerrero RF. 2014. Signatures of sex-antagonistic selection on recombining sex chromosomes. Genetics 197:531–541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klein SL. 2012. Immune cells have sex and so should journal articles. Endocrinology 153:2544–2550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Long TAF, Rice WR. 2007. Adult locomotory activity mediates intralocus sexual conflict in a laboratory-adapted population of Drosophila melanogaster. Proc Biol Sci. 274:3105–3112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marriott I, Huet-Hudson YM. 2006. Sexual dimorphism in innate immune responses to infectious organisms. Immunol Res. 34:177–192. [DOI] [PubMed] [Google Scholar]
- Merila J, Sheldon B, Ellegren H. 1998. Quantitative genetics of sexual size dimorphism in the collared flycatcher, Ficedula albicollis. Evolution 52:870–876. [DOI] [PubMed] [Google Scholar]
- Nookaew I, et al. 2013. Adipose tissue resting energy expenditure and expression of genes involved in mitochondrial function are higher in women than in men. J Clin Endocrinol Metab. 98:E370–E378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Otto SP, et al. 2011. About PAR: the distinct evolutionary dynamics of the pseudoautosomal region. Trends Genet. 27:358–367. [DOI] [PubMed] [Google Scholar]
- Pemberton TJ, Wang C, Li JZ, Rosenberg NA. 2010. Inference of unexpected genetic relatedness among individuals in HapMap Phase III. Am J Hum Genet. 87:457–464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price DK, Burley NT. 1993. Constraints on the evolution of attractive traits: genetic (co)variance of zebra finch bill colour. Heredity (Edinb) 71:405–412. [DOI] [PubMed] [Google Scholar]
- Price DK, Burley NT. 1994. Constraints on the evolution of attractive traits: selection in male and female zebra finches. Am Nat. 144:908–934. [Google Scholar]
- Pritchard JK, Przeworski M. 2001. Linkage disequilibrium in humans: models and data. Am J Hum Genet. 69:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Purcell S, et al. 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 81: 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qiu S, Bergero R, Charlesworth D. 2013. Testing for the footprint of sexually antagonistic polymorphisms in the pseudoautosomal region of a plant sex chromosome pair. Genetics 194:663–672. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rice WR. 1984. Sex chromosomes and the evolution of sexual dimorphism. Evolution 38:735–742. [DOI] [PubMed] [Google Scholar]
- Shi H, Clegg DJ. 2009. Sex differences in the regulation of body weight. Physiol Behav. 97:199–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stewart AD, Pischedda A, Rice WR. 2010. Resolving intralocus sexual conflict: genetic mechanisms and time frame. J Hered. 101: S94–S99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stulp G, Kuijper B, Buunk AP, Pollet TV, Verhulst S. 2012. Intralocus sexual conflict over human height. Biol Lett. 8:976–978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ucisik-Akkaya E, et al. 2010. Examination of genetic polymorphisms in newborns for signatures of sex-specific prenatal selection. Mol Hum Reprod. 16:770–777. [DOI] [PubMed] [Google Scholar]
- van Doorn GS. 2009. Intralocus sexual conflict. Ann N Y Acad Sci. 1168:52–71. [DOI] [PubMed] [Google Scholar]
- Weir BS. 1996. Genetic data analysis II. Sunderland (MA)Sinauer Associates. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.