Skip to main content
Genome Biology and Evolution logoLink to Genome Biology and Evolution
. 2018 Apr 28;10(5):1333–1350. doi: 10.1093/gbe/evy086

High Levels of Copy Number Variation of Ampliconic Genes across Major Human Y Haplogroups

Danling Ye 1,#, Arslan A Zaidi 1,#, Marta Tomaszkiewicz 1,#, Kate Anthony 1, Corey Liebowitz 2, Michael DeGiorgio 1, Mark D Shriver 2, Kateryna D Makova 1,
Editor: Esther Betran
PMCID: PMC6007357  PMID: 29718380

Abstract

Because of its highly repetitive nature, the human male-specific Y chromosome remains understudied. It is important to investigate variation on the Y chromosome to understand its evolution and contribution to phenotypic variation, including infertility. Approximately 20% of the human Y chromosome consists of ampliconic regions which include nine multi-copy gene families. These gene families are expressed exclusively in testes and usually implicated in spermatogenesis. Here, to gain a better understanding of the role of the Y chromosome in human evolution and in determining sexually dimorphic traits, we studied ampliconic gene copy number variation in 100 males representing ten major Y haplogroups world-wide. Copy number was estimated with droplet digital PCR. In contrast to low nucleotide diversity observed on the Y in previous studies, here we show that ampliconic gene copy number diversity is very high. A total of 98 copy-number-based haplotypes were observed among 100 individuals, and haplotypes were sometimes shared by males from very different haplogroups, suggesting homoplasies. The resulting haplotypes did not cluster according to major Y haplogroups. Overall, only two gene families (RBMY and TSPY) showed significant differences in copy number among major Y haplogroups, and the haplogroup of a male could not be predicted based on his ampliconic gene copy numbers. Finally, we did not find significant correlations either between copy number variation and individual’s height, or between the former and facial masculinity/femininity. Our results suggest rapid evolution of ampliconic gene copy numbers on the human Y, and we discuss its causes.

Keywords: ampliconic genes, Y chromosome, haplotypes

Introduction

Studying the Y chromosome provides insights into sex determination, sex-specific disease risks, and evolutionary history that cannot be determined by studying the female genome alone (Skaletsky et al. 2003; van Oven et al. 2013). However, for the vast majority of mammalian species, only female genomes have been sequenced and assembled. Mammalian females have diploid sex chromosomes (XX), which allows easier sequencing and assembly of the X chromosome compared with the highly repetitive haploid Y chromosome (Tomaszkiewicz et al. 2017).

The eutherian sex chromosomes evolved from a pair of autosomes, with the X chromosome keeping the original autosomal size and the Y chromosome shrinking over time. The male-specific region (MSY) constitutes ∼95% of the length of the Y chromosome. The MSY encompasses a mosaic of euchromatic—X-degenerate, X-transposed, and ampliconic—and heterochromatic sequences. The human MSY is flanked on both sides by pseudoautosomal regions (PARs), the only parts of the Y that recombine with the X (Skaletsky et al. 2003).

The Y chromosome acquired the sex-determining gene, SRY, and subsequently underwent a series of inversions that suppressed its ability to recombine with the X chromosome over most of its length (Lahn et al. 2001). As a result, the Y chromosome has become prone to accumulation of deleterious mutations via Muller’s ratchet, genetic hitchhiking along with beneficial alleles, and background selection against deleterious alleles (Charlesworth and Charlesworth 2000; Filatov et al. 2000; Bachtrog 2008, 2013). The Y chromosome is present only in males and is haploid. Therefore, its effective population size is a fraction of that for autosomes, making it more susceptible to genetic drift (Charlesworth and Charlesworth 2000; Filatov et al. 2000). Because the Y is nonrecombining over most of its length and inherited exclusively along the paternal lineage, it provides information about patterns of male-specific dispersal and gene flow (Hammer et al. 2008).

Previous studies have noted reduced nucleotide diversity on human MSY relative to autosomes (e.g., Dorit et al. 1995; Wilson et al. 2014) and attempted to explain this observation by its small effective population size (Charlesworth and Charlesworth 2000; Filatov et al. 2000), high variance in reproductive success among males (Hammer et al. 2008; Wilder et al. 2004), high levels of gene conversion among palindrome arms (Rozen et al. 2003; Marais et al. 2010; Helgason et al. 2015), and purifying selection (Wilson Sayres et al. 2014). In contrast, structural diversity on the Y is known to be high (Repping et al. 2006), which is consistent with frequent intrachromosomal rearrangements facilitated by the repetitive nature of the Y (Skaletsky et al. 2003).

In humans, as in most other mammals studied, the MSY plays an important biological role. It harbors the SRY gene that produces the transcription factor initiating male development, while suppressing signals leading to the development of female reproductive organs (Harley et al. 1992). A number of genes located in the MSY are critical to male reproduction, as their deletion can cause spermatogenic failure (Dhanoa et al. 2016). Additionally, the MSY has been implicated in skeletal growth (Tanner et al. 1959), germ-line and somatic tumorigenesis (Kido and Lau 2015), and graft rejection (Kido and Lau 2015; Scott et al. 1997). As the MSY accumulated genes important for male function to resolve sexually antagonistic selection, it is conceivable that some of them are important for the development of sexually dimorphic traits (Dean and Mank 2014; Case and Teuscher 2015).

The human MSY harbors nine multi-copy ampliconic gene families—BPY, CDY, DAZ, HSFY, PRY, RBMY, TSPY, VCY, and XKRY (Skaletsky et al. 2003; Bhowmick et al. 2007). All but one (TSPY) of these gene families are located within either palindromes (P1, P2, P3, P4, P5, and P8) or an inverted repeat (IR2; Skaletsky et al. 2003). The TSPY gene family is arrayed in tandem outside palindromes and more widely spaced inverted repeats (Skaletsky et al. 2003). Seven of the nine families are implicated in spermatogenesis or sperm production, and all nine gene families are expressed predominantly or exclusively in testes (Skaletsky et al. 2003; Bhowmick et al. 2007). Ampliconic gene copies within each family have high sequence identity (>99.9%) that is maintained by gene conversion, which prevents degeneration of these gene families critical for male function (Rozen et al. 2003). It has been proposed that multiple copies of ampliconic genes accumulated on the Y because they increase male reproductive fitness via enhanced sperm production (Rozen et al. 2003; Betrán et al. 2012; Bellott et al. 2014).

Several studies have focused on exploring associations between ampliconic gene copy number and reproductive diseases, and/or fertility. The regions that have been reported to be deleted on the Y chromosome in infertile males are azoospermia factor (AZF) regions a, b, and c (AZFa, AZFb, and AZFc), the latter two containing ampliconic gene families (Krausz and Degl’Innocenti 2006; Vogt et al. 1996; Yu et al. 2015). AZFb contains CDY2, XKRY, HSFY, and PRY families, deletions in which have been shown to lead to spermatogenic arrest (Krausz et al. 2014; Foresta et al. 2001). AZFc contains DAZ, BPY2, CDY1A, and CDY1B families, deletions in which can result in different levels of spermatogenic failure (Pryor et al. 1998; Krausz et al. 1999) and can be heritable (Page et al. 1999; Rozen et al. 2012). The AZFc region is highly repetitive, harbors palindromes (Kuroda-Kawaguchi et al. 2001) and thus is more prone to deletions than the other AZF regions (Navarro-Costa et al. 2010; Knebel et al. 2011). Indeed, AZFc deletions constitute 80% of all AZF deletions (Bansal et al. 2016). Ampliconic gene families outside of AZF regions are also implicated in reproductive diseases. For example, copy number reductions in DAZ, BPY, and CDY gene families have been associated with low total motile sperm counts in men (Bansal et al. 2016; Noordam et al. 2011). Contradictory results have been reported on the association between TSPY and fertility (Krausz et al. 2010). Nickkholgh et al. (2010) did not find a statistically significant difference in TSPY copy number between men with low versus high sperm counts, whereas Giachini et al. (2009) reported that low TSPY copy number is associated with low sperm production. No studies have been conducted to explore potential associations of Y chromosome ampliconic gene copy numbers and traits besides fertility, for example, sexually dimorphic traits.

We presently have only limited knowledge about Y chromosome ampliconic gene copy number variation in healthy males within and among human populations. In fact, the only available information comes from the analysis of small samples of persons of European ancestry. Earlier studies have determined copy number for a total of only three males (Tomaszkiewicz et al. 2016; Skaletsky et al. 2003). Recently, Skov et al. (2017) investigated Y chromosome ampliconic gene copy number variation in 62 males of Danish descent.

In the present study, we experimentally determined the copy number of all nine ampliconic genes in 100 men representing ten major Y haplogroups (Y Chromosome Consortium 2002) using droplet digital PCR (ddPCR; Hindson et al. 2011; McDermott et al. 2013). We used these data to obtain a view of ampliconic gene copy number variation within and across human populations around the world by addressing the following questions: 1) Are ampliconic genes more variable between major Y haplogroups than within haplogroups? 2) Can ampliconic gene copy number variation be used to classify major Y haplogroups accurately? 3) How variable are haplotypes reconstructed based on ampliconic gene copy number? 4) Does ampliconic gene copy number variation underlie variation in sexually dimorphic traits such as height and facial masculinity/femininity (FMF)? Thus, by answering these questions, we characterized evolution of ampliconic gene copy number variation in a large number of individuals representing major Y haplogroups.

Materials and Methods

Sample Collection, Consent, SNP Typing, and DNA Extraction

A total of 100 men were recruited with written informed consent as part of the ADAPT and ADAPT2 studies (IRB #44929 and #45727) conducted at the Pennsylvania State University. According to the approved protocol, saliva samples were obtained and two phenotypes—height and facial masculinity/femininity FMF (see below)—were measured for all participants. The saliva samples were sent to 23andMe for genotyping on their v3 and v4 arrays (23andMe, Mountainview, CA). DNA was extracted from the saliva samples using a salting-out method followed by an ammonium acetate cleanup (Quinque et al. 2006) and quantified using Qubit dsDNA BR Assay Kit (Invitrogen, Carlsbad, CA).

ddPCR

For each of the 100 DNA samples, we performed ddPCR for nine ampliconic gene families of interest (BPY, CDY, DAZ, HSFY, PRY, RBMY, TSPY, VCY, and XKRY) and for SRY, a single-copy gene on the Y chromosome, used as a reference. Each sample was run in at least three replicates (supplementary table S1, Supplementary Material online). The ddPCR copy number assays were performed using the QX200 system and EvaGreen dsDNA dye (Bio-Rad, Hercules, CA) with the protocol and primers described in our previous publication (Tomaszkiewicz et al. 2016). Briefly, for a completion of one assay replicate for each DNA sample included in the study, BPY, CDY, HSFY, TSPY, and XKRY were amplified at an annealing temperature of 59 °C on one plate, and DAZ, PRY, RBMY, and VCY were amplified with an annealing temperature of 63 °C on another plate. SRY was amplified on each plate for the ampliconic gene copy number inference. On the basis of the human reference genome sequence, the primers designed were specific for capturing functional ampliconic gene families (one primer pair per gene family) except for TSPY, for which primers were designed to anneal to the smallest number of pseudogenes (Tomaszkiewicz et al. 2016).

The fluorescence in each droplet was measured and an automatic threshold was drawn using QuantaSoft software (Bio-Rad, Hercules, CA). Droplets above the threshold were counted as positive, and those below it were counted as negative. The concentration of the ampliconic gene family of interest was divided by the concentration of the reference, SRY, a single-copy gene in a human male genome (Tomaszkiewicz et al. 2016). For each gene family in every individual, we had at least three measurements of ampliconic gene copy number because each sample was run in at least three replicates. The measurement most distant from the median was removed to reduce the effect of outliers (supplementary table S1, Supplementary Material online). After this, ampliconic gene copy number was determined by calculating the mean across the replicates (supplementary table S2, Supplementary Material online). The coefficient of variation calculated across technical replicates is shown in supplementary figure S1, Supplementary Material online.

Construction of Phylogeny Based on SNP Data

A maximum likelihood phylogenetic tree generated from an alignment based on 187 Y chromosome SNPs for 100 male individuals was constructed using the Tamura–Nei model in MEGA7 (Kumar et al. 2016). These SNPs are a subset of the Y-specific SNPs on the 23andMe array, and were selected because they were polymorphic in our sample. The initial trees for the heuristic search were obtained automatically by applying the BioNJ algorithm (Gascuel 1997) to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with the highest log likelihood value. The most recent common ancestor (MRCA) of the E haplogroup, which is the oldest haplogroup in our phylogeny based on (Karmin et al. 2015), was set as the root of the tree for visualization and downstream analyses.

Evaluating Differences in Ampliconic Gene Copy Numbers among Haplogroups

We tested whether ampliconic gene copy number is different among different haplogroups for each gene family separately. This was done using two different approaches. First, we applied the conventional one-way analysis of variance (ANOVA), which does not take into account the phylogenetic relationships among Y-haplogroup lineages. The simple ANOVA was performed for each ampliconic gene family using major haplogroup (C, E, G, I, J, L, O, Q, R, and T) as factor.

Second, we applied the Expression Variance and Evolution (EVE) model (Rohlfs and Nielsen 2015), which accounts for the phylogenetic structure among haplogroups. Whereas the EVE model was developed with the intention of testing for nonneutral evolution of gene expression in a given phylogeny, it can be applied to any quantitative trait as long as it is measured on multiple individuals from every species in the phylogeny (Rohlfs and Nielsen 2015). Our goal was to measure the ratio of variation in copy number within haplogroups to the variation between haplogroups, denoted by βi for every gene family, i = 1, 2, …, 9. We expect this ratio to be similar across gene families evolving neutrally in the phylogeny (i.e., βi = βshared, i = 1, 2, …, 9). Deviations from this expectation can be suggestive of selection. As such, we test whether βi for any one gene family i deviates from this expectation (i.e., βi ≄ βshared). If βi < βshared, then there is more variation across haplogroups than within haplogroups, which could be suggestive of directional selection in some haplogroups. Conversely, if βi > βshared, then there is more variation within haplogroups than across haplogroups, which could be indicative of high conservation of copy number across haplogroups.

To apply the EVE model to the copy number data, we first constructed an ultrametric tree connecting the major haplogroups from the phylogenetic tree based on Y-chromosomal SNPs. This was done by first collapsing all individual branches from the same haplogroup such that each major haplogroup is represented by one terminal branch in the phylogeny. Then, we scaled the tree by setting the time of the MRCA of all lineages to 71,600 years ago based on the MRCA of the major haplogroup lineages represented in our data set and the Y phylogeny presented by Karmin et al. (2015). The calibration was carried out using the chronos function in the APE package in R using a “relaxed” substitution rate model (Popescu et al. 2012; Paradis et al. 2004). We estimated the parameter βi for each gene from the copy number data using EVE, as well as the βshared across all genes, and calculated the likelihood ratio between the null hypothesis (H0: βi = βshared) and alternative hypothesis (H1: βi ≄ βshared). A P value for each test was calculated assuming that the likelihood ratio asymptotically follows a chi-square distribution with one degree of freedom.

Clustering of Major Haplogroups by Copy Number

Principal Component Analysis (PCA) was performed on the centered and scaled ampliconic gene copy numbers (xijx¯iσi2), where xij is the copy number of the ith gene family and jth individual, to visualize the clustering of major haplogroups based on ampliconic gene copy number (R core Team 2016). For comparison, we also carried out PCA on the genotypes of SNPs on the Y chromosome using Plink 1.9 (Chang et al. 2015).

In addition to the unsupervised PCA, we also carried out Linear Discriminant Analysis (LDA) to determine whether ampliconic gene copy number of an individual can be used to correctly predict their major haplogroup. This was carried out using the lda function in the MASS package in R (Venables and Ripley 2002). With leave-one-out cross validation, we calculated the posterior probability that each individual can be assigned to their correct haplogroup.

Haplotype Variability and Network Analysis

Rounding the fractional copy numbers generated by ddPCR could artificially introduce variation in the data, which could overestimate the number of haplotypes. To evaluate whether this was the case, we calculated the range of haplotypes observed by randomly rounding the original data—the values produced by averaging the replicates for each gene family and individual—up or down (i.e., floor or ceiling; supplementary tables S3 and S4, Supplementary Material online). This was done by generating 100 sets of haplotypes, each of which was obtained by rounding a value y either up or down if [floor(y) + 0.25] < y < [ceiling(y) — 0.25], where floor(y) refers to the greatest integer less than y and ceiling(y) refers to the smallest integer greater than y. Values outside this range were rounded to the nearest integer. For example, a mean copy number of 2.35 was either rounded up or down to 2 or 3, respectively, but a copy number of 2.15 was always rounded down to 2. We performed the same experiment on unrounded ampliconic gene copy numbers from the data in Skov et al. (2017; supplementary table S5, Supplementary Material online). A total of 100 data sets, each consisting of randomly rounded values for each of the 100 (our data set) and 62 (Skov et al.’s data set) individuals, were produced (supplementary tables S4 and S6, Supplementary Material online) and the range of the number of haplotypes observed was calculated (supplementary tables S7 and S8, fig. S3, Supplementary Material online). We found the number of haplotypes in our data set to vary from 95 to 100 (median = 99, supplementary table S7, fig. S2A, Supplementary Material online) and in the Skov data set to vary from 40 to 52 (median = 45; supplementary table S8, fig. S2B, Supplementary Material online).

Haplotype networks based on Y-chromosomal SNP genotypes and based on ampliconic gene copy numbers were constructed separately. The alignment of SNP genotypes from 100 males was inserted as an input for reconstructing haplotypes using “pegas” package in R (Paradis 2010; R core Team 2016). To construct haplotype networks, we rounded the copy numbers to the nearest integers for both our and Skov et al. (2017) data sets. Next, for each individual, we generated a string of nine groups of characters corresponding to nine ampliconic gene families; each character reflected a copy number (e.g., for one gene family, “A” was used for one copy, “AA” for two copies, etc.; for the next gene family we used “C” for one copy, “CC” for two copies, etc.). Because we studied 100 individuals, we generated 100 such strings and aligned them to the consensus sequence representing the maximal copy number for each gene family. The haplotype network was then built based on a pairwise distance matrix constructed from that alignment given as an input to the R “haplotypes” package, specifically accounting for indel mutations (Aktas 2015; R core Team 2016). The same approach was used to construct the haplotype network for 62 males from the Danish population (Skov et al. 2017). The alignments for our and Skov et al.’s data are provided on github https://github.com/makovalab-psu/Ampliconic_CNV/blob/master/Haplotype_network_Amp_gene_CNVs/CNV_alignments/. Haplotype distance matrices used for the haplotype network reconstructions are provided in supplementary tables S9 and S10, Supplementary Material online. Haplotypes were separated by deletions or insertions of ampliconic gene copies, and each link reflected one-copy number difference. For instance, two haplotypes differing only by two copies of TSPY (and having the same copy numbers for the other gene families), 18 and 20, were separated by two links. Similarly, two haplotypes, differing in copy number of two gene families, for example, TSPY and RBMY, by one copy each (in the first haplotype TSPY = 18 and RMBY = 10, whereas in the second haplotype TSPY = 19 and RMBY = 9) were also separated by two links.

To get an idea of which ampliconic gene families were contributing most to the variability observed among haplotypes, we sampled pairs of haplotypes, uniformly at random, separately from within and between major Y haplogroups, and counted the copy number differences per ampliconic gene family between each pair. A total of 1,000 such pairs for each comparison, within and between major haplogroups, were generated.

Measurement of Height and FMF

For the participants in the ADAPT study (a total of 64 men), height was measured using a standard stadiometer. Self-reported height was used for 36 participants from the ADAPT2 study due to remote sampling and lack of a portable stadiometer. Facial masculinity was calculated from 3D images collected on participants using a method developed by Claes et al. (2014), as described briefly below. FMF scores were estimated by orthogonally projecting the participants' faces onto the regression line that represents facial sexual dimorphism. A spatially dense mesh of 7,150 quasi-landmarks (QL) was superimposed on participant’s 3D facial scans and differences in translation, rotation, and scale were removed by applying a Generalized Procrustes Superimposition (GPS) on the set of facial coordinates (Claes et al. 2014). The first 60 principal components, which explained 98% of the variance, were retained. To calculate FMF, we used a leave-one-out cross-validation approach, that is, the participant face for whom we wanted FMF to be estimated was left out of the regression model whereas the remaining participants were used to estimate regression coefficients with a multivariate linear regression of facial Principal Components on sex and height. Height was used too as a covariate to remove the influence of size differences on facial shape from the estimation of FMF. The average female face was set as the origin of the facial PCA, allowing higher values to reflect more masculine faces. Using the regression line for sex, the FMF score was orthogonally projected for the participant’s face. Both height and FMF data are provided in supplementary table S11, Supplementary Material online.

Evaluating Correlations between Haplogroups and Phenotypic Traits

We evaluated correlations between ampliconic gene families and phenotypic traits using the phylogenetic generalized least square method (PGLS) implemented using the nlme package in R (R core Team 2016; Pinheiro et al. 2017). As some individuals are more closely related to each other than to other individuals, the phenotypic data among individuals cannot be treated as independent data points. We take this relatedness into account by letting the correlation structure of the residuals to be specified by the Y-chromosomal phylogeny among the 100 males in our study. To accomplish this, we first calibrated the Y-chromosomal phylogeny to an ultrametric tree generated by setting the MRCA at 71,600 years ago (Karmin et al. 2015). The slope between copy number and phenotype, as well as Pagel’s λ, which is a measure of the degree of phylogenetic signal (Pagel 1999), were simultaneously estimated using maximum likelihood (Revell 2010). Results of the PGLS, including estimates of lambda, are provided in supplementary table S12.

Code Availability

All scripts for this study are provided at GitHub: https://github.com/makovalab-psu/Ampliconic_CNV.

Results

Ampliconic Gene Copy Number Variation

To study copy number variation of Y chromosome ampliconic genes, we applied ddPCR. This method allows absolute quantification of the target DNA copies without the need to run a standard curve. This is in contrast to other methods such as quantitative real-time PCR (qRT-PCR), in which suboptimal amplification efficiency influences cycle threshold values and can ultimately result in an inaccurate quantification of the target (Hindson et al. 2011; McDermott et al. 2013; Pinheiro et al. 2012). ddPCR was recently used to evaluate the copy number of ampliconic Y chromosome genes in humans and gorillas (Tomaszkiewicz et al. 2016) and to verify computationally derived ampliconic gene copy number estimates for chimpanzees and bonobos (Oetjens et al. 2016).

In this study, the ddPCR assays, with the primers previously developed by us (Tomaszkiewicz et al. 2016), were used to estimate the copy number for Y chromosome ampliconic genes in 100 male participants from the ongoing Anthropometrics, DNA and the Appearances and Perceptions of Traits (ADAPT) study. The goal of the ADAPT study (http://ched.la.psu.edu/projects/adapt), based at the Pennsylvania State University, is to study the evolutionary, genetic, and socio-cultural factors shaping complex phenotypic variation within and across human populations. Among ADAPT participants, we selected 100 males harboring Y chromosomes from ten major haplogroups (Y Chromosome Consortium 2002): C, E, G, I, J, L, O, Q, R, and T (table 1). Individuals with subhaplogroups that are evolutionarily close to each other were grouped into a “major haplogroup” category to increase the statistical power in subsequent analyses. For example, individuals from the O1, O2, and O3 subhaplogroups were grouped into the “O” major haplogroup category. These haplogroups were selected because they find their origins in different regions of the world (table 1).

Table 1.

Male Samples Utilized in the Study

Major Y Haplogroups Number of Males Y Subhaplogroups Number of Males Major Geographic Location (Karmin et al. 2015)
C 5 C3 5 Asia
E 22 E1b1a 5 Africa
E1b1a1a1g1a 7
E1b1b1 5
E1b1b1a 5
G 5 G2 5 Africa
I 15 I1 5 Europe
I2a2a 5
I2Aa1b 5
J 5 J2 5 Western Asia
L 4 L1 4 Western Asia
O 14 O1 3 Eastern and Southeastern Asia
O2 6
O3 5
Q 5 Q1 5 Central Asia
R 20 R1b1a2a1a2c 5 Europe
R1b1a2a1a2b 5
R1b1a2a1a1 5
R1a1a1 5
T 5 T 5 Western Asia
Total 100 100

The copy number for each gene family for every individual was estimated using at least three technical replicates (supplementary table S1, Supplementary Material online). In total, we processed 100 males × 9 gene families = 900 samples, all of which were analyzed in three or more replicates. To assess the consistency of measurements among replicates, we calculated the coefficient of variation (i.e., SD divided by mean), CV, across replicates. The median CV was low, 3.5% of the mean across all samples (red dashed line in supplementary fig. S1A, Supplementary Material online). After removing the most distant value among the replicates (see Materials and Methods), the median CV was even lower; 1.10% of the mean (red dashed line in supplementary fig. S1B, Supplementary Material online). We averaged the values of the remaining replicates and used them in all subsequent analyses (supplementary table S1 and S2, Supplementary Material online). We used these unrounded average values for all the analyses, except for counting the number of haplotypes and building haplotype networks, where we rounded the averaged values to the nearest integer.

Variation in Copy Number among Gene Families

We first tested whether larger gene families were also more variable in their copy number among individuals. Such a relationship is expected because the probability of copy insertions and deletions increases with copy number (Ghenu et al. 2016). Indeed, the median copy number for ampliconic gene families across individuals is positively correlated with the variance in copy number (Spearman’s r = 0.99; fig. 1). Larger gene families are indeed more variable, on average (fig. 1; table 2).

Fig. 1.

Fig. 1.

—Larger gene families tend to be more variable. The median and variance of copy number were calculated across all individuals in the sample (N = 100). The grey line shows the line of best fit (from ordinary least squares regression).

Table 2.

Median, Standard Deviation (SD) and Range of Unrounded Copy Number Values per Ampliconic Gene Family (Based on the Data from supplementary table S1, Supplementary Material online)

Gene Median SD Range
BPY 3.23 1.03 0.96–8.51
CDY 4.17 0.74 2.74–5.88
DAZ 4.21 1.32 1.89–10.27
HSFY 2.15 0.33 1.37–3.12
PRY 2.13 0.31 1.18–2.92
RBMY 10.73 2.42 5.13–19.42
TSPY 30.33 5.27 15.92–40.86
VCY 2.33 0.52 1.50–4.81
XKRY 2.90 0.30 1.03–2.99

Lack of a Phylogenetic Pattern in Ampliconic Gene Copy Number Variation

To examine whether there is a phylogenetic pattern underlying ampliconic gene copy number variation in the humans studied, we constructed a phylogenetic tree based on Y chromosome single nucleotide polymorphisms (SNPs) and superimposed copy numbers for each of the ampliconic gene families per individual next to this phylogeny (fig. 2), following (Skov et al. 2017). As expected, individuals from the same haplogroup clustered together based on Y chromosome SNPs. However, ampliconic gene copy number variation did not show discernible patterns with respect to the Y-specific phylogeny.

Fig. 2.

Fig. 2.

—The phylogenetic tree based on Y-chromosomal SNPs. The evolutionary tree was inferred from 187 Y-chromosomal SNPs using maximum likelihood (log-likelihood = -993.63). The branches are colored according to Y haplogroup. Ampliconic gene copy number averaged across replicates after removing one outlier is presented on the right. For comparison, we included the copy numbers for an individual sequenced by Skaletsky et al. (2003; indicated in black font in parenthesis).

Differences in Ampliconic Gene Copy Numbers among Y Haplogroups

We sought to understand the degree of differentiation in ampliconic gene copy number among the ten major Y haplogroups. To do so, we first tested whether ampliconic gene copy numbers are significantly different among the ten major Y haplogroups analyzed. The distribution of ampliconic gene copy numbers per family across all Y-haplogroups is shown in figure 3. Using a one-way ANOVA test (table 3) we found that copy numbers of BPY, CDY, DAZ, HSFY, PRY, VCY, and XKRY gene families were not significantly different among major Y haplogroups (P-value cutoff of 0.05/9 ∼ 0.006). However, copy numbers for RBMY (P = 6.825 × 10−06) and TSPY (P = 1.830 × 10−04) differed significantly among major haplogroups even after applying Bonferroni correction for multiple testing (table 3).

Fig. 3.

Fig. 3.

—The distribution of ampliconic gene copy numbers across major Y haplogroups. Between four and 22 individuals per major Y-haplogroup were analyzed (see table 1 for sample sizes for each haplogroup).

Table 3.

Analysis of Variance of the Ampliconic Gene Copy Number Data

Gene Conventional ANOVA
Phylogenetic ANOVA (EVE)
F P Log(β) LR P
BPY 1.590 0.131 7.325 0.590 0.442
CDY 1.168 0.326 7.704 1.850 0.174
DAZ 2.548 0.012 0.856 0.388 0.533
HSFY 0.342 0.959 7.955 3.214 0.073
PRY 0.519 0.858 8.019 2.919 0.088
RBMY 5.393 6.825×10−06 0.523 5.720 0.017
TSPY 4.120 1.830×10−4 0.558 5.041 0.025
VCY 0.697 0.710 8.234 1.468 0.226
XKRY 0.426 0.918 6.160 2.546 0.111

Note.—Both conventional one-way ANOVA and phylogenetic ANOVA (EVE) were performed to determine which ampliconic gene families vary significantly in their copy numbers among major haplogroups. F is the f-statistic for the one-way ANOVA. P-values that pass a Bonferroni corrected cutoff for nine tests (0.05/9 ∼ 0.006) are highlighted in bold. β and LR are the ratio of the within-haplogroup variance to the between-haplogroup variance in copy number and the likelihood ratio between the null model and the alternative model, respectively, from the phylogenetic ANOVA (see Materials and Methods).

In addition to the conventional, one-way ANOVA, we carried out a phylogenetic ANOVA with the EVE software (Rohlfs and Nielsen 2015). The test estimates a parameter for each gene i, βi, which is the ratio of the variance in ampliconic gene copy number within haplogroups to the variance between haplogroups. It assumes that genes sharing their variability level will share a common β parameter, βshared. On the basis of a likelihood ratio test, we used EVE to identify genes with either βi < βshared (higher variation between haplogroups than within haplogroups), or βi > βshared (higher variation within haplogroups than between haplogroups). We did not find any gene families that show values of βi that were significantly different from βshared, given a Bonferroni corrected P-value cutoff for nine genes (0.05/9 ∼ 0.006; table 3). Thus, whereas some ampliconic genes exhibit significant copy number variation across haplogroups, this divergence appears to be due to neutral processes.

Because copy numbers for some ampliconic gene families are significantly different among major haplogroups (table 3), we next tested whether individuals cluster based on ampliconic gene copy number. To answer this question, we carried out PCA on ampliconic gene copy numbers. The first three PCs explain ∼52% of the total variation (supplementary fig. S3A, Supplementary Material online). The resulting clustering of individuals indicated that, whereas there is some separation of major haplogroups based on ampliconic gene copy number (fig. 4A and B), it is not nearly as pronounced as clustering based on Y chromosome SNPs (fig. 4C and D;supplementary fig. S3B, Supplementary Material online).

Fig. 4.

Fig. 4.

—(A) and (B) Results of PCA on ampliconic gene copy number data (A. PC1 vs. PC2; B. PC1 vs. PC3). (C) and (D) Results of PCA on SNP genotype data (C. PC1 vs. PC2, D. PC1 vs. PC3). Individuals are colored based on the haplogroup determined from SNP genotype data. Individuals cluster by haplogroup based on SNP genotype data but not clearly based on ampliconic gene copy number.

Finally, in order to test whether we can correctly classify the haplogroup of an individual based on his ampliconic gene copy numbers, we carried out LDA with major haplogroup as the response variable and all nine ampliconic gene copy numbers as predictors. Using a leave-one-out approach, we determined the posterior probability that an individual belongs to a major haplogroup based on his copy number profile. The results are displayed as barplots in figure 5, where individuals are represented by a pair of vertical bars and the probability of being classified correctly (blue), or incorrectly (orange), in the known haplogroup (determined by SNPs) is represented by the height of the bars. We can conclude that the major haplogroups are often ambiguously or incorrectly predicted from copy number variation data alone, which confirms the patterns seen in the PCA plots (fig. 4), that is, that most of the variation in ampliconic gene copy number is shared among haplogroups. Consequently, it is difficult to predict the haplogroup of a person based on his ampliconic gene copy number profile.

Fig. 5.

Fig. 5.

—Barplots showing the posterior probability of classifying each individual to his known haplogroup correctly (blue) versus incorrectly (orange). The known haplogroup of the individual, determined by SNP genotypes, is written on top of each bar plot in the strip.

Haplotype Variability and Network Analysis

We next compared the variability of haplotypes based on SNP data versus that based on ampliconic gene copy numbers. On the basis of 187 SNPs on the Y chromosome (from a total of 450 Y-chromosomal SNPs analyzed), there are 39 distinct haplotypes among 100 individuals that cluster, as expected, by either subhaplogroup or major haplogroup (fig. 6). In fact, many haplogroups are monophyletic, and usually a unique substitution path leads to each haplotype.

Fig. 6.

Fig. 6.

—Haplotype network constructed based on Y SNP genotypes from 100 males (39 haplotypes). The disc size is proportional to the number of individuals with a particular haplotype. Black lines connect each haplotype to its closest haplotype, whereas perpendicular bars correspond to mutational steps between connected haplotypes.

For the same 100 individuals, haplotypes obtained from ampliconic gene copy numbers were more numerous than those obtained from SNP data. To construct haplotypes using ampliconic gene copy numbers, we rounded the values we obtained with ddPCR (after averaging across all replicates without the outlier) to the nearest integer (supplementary table S3, Supplementary Material online). This resulted in 98 haplotypes among 100 individuals studied (supplementary table S13, Supplementary Material online), more than twice the number of haplotypes obtained from SNP data (fig. 6). The large number of haplotypes observed with copy number data was not because of variation introduced by rounding to the nearest integer (see Materials and Methods). The 98 distinct haplotypes usually differed from each other by several copies of genes either from the same or different families (supplementary table S9, Supplementary Material online). From a total of 4,753 pairwise comparisons among haplotypes, only 64 pairs (∼1%) showed a one-copy difference in one gene family (supplementary table S9, Supplementary Material online). Among the two shared haplotype pairs observed in our sample of 100 males, one pair included a male with an African (E) and a male with an Asian (O2) haplogroup, whereas in the other pair, one male had a European (I) and another one an Asian (Q) haplogroup (supplementary table S13, Supplementary Material online). Thus, shared haplotypes in these instances provide examples of homoplasy. In summary, nine ampliconic gene families still produced a greater number of haplotypes than 187 SNPs.

We also studied the variability of ampliconic gene copy number-based haplotypes using rounded ampliconic gene copy number from the data set generated by Skov et al. (2017; supplementary table S5, Supplementary Material online). Even though their data set includes 62 Danish males representing only three major European haplogroups (I, R, and Q; fig. 7B), we observed a total of 35 copy number-based haplotypes (supplementary table S14, Supplementary Material online), including 22 haplotypes carried by one individual each, and 13 haplotypes shared by two or more individuals. This network (fig. 7B) displayed more reticulations than the one based on our data (fig. 7A). One-copy differences within the same ampliconic gene family constituted a small proportion of haplotype pairwise comparisons (16%, 97 from a total of 595 haplotype pairwise comparisons; supplementary table S10, Supplementary Material online). This proportion was higher than in our data (16% vs. 1%) likely because Skov et al. (2017) only analyzed individuals of Danish ancestry, whereas we analyzed a world-wide sample. Again, several cases of homoplasy were observed (supplementary table S14, Supplementary Material online), including the same haplotypes carried by individuals belonging to different major Y haplogroups. Therefore, independently of the divergence time of the studied individuals—worldwide human populations versus a single Danish population—the number of haplotypes based on ampliconic gene copy number was high. Furthermore, in contrast to the SNP-based haplotype network, the haplotype networks constructed using ampliconic gene copy numbers from the same 100 individuals did not display clustering by major Y haplogroups for both our and Skov et al.’s data sets (fig. 7A and B).

Fig. 7.

Fig. 7.

—(A) Haplotype network constructed based on a pairwise distance matrix constructed from the alignment of 100 strings of nine groups of characters corresponding to copy numbers of nine ampliconic gene families for 100 males (98 haplotypes; rounded copy number values were used; supplementary table S3, Supplementary Material online). Each big colored disc represents a different haplotype. Small colored discs represent intermediate haplotypes. Black lines connect each haplotype to its closest relative. A link between two haplotypes corresponds to a one-copy difference in one gene family. If extant or ancestral haplotypes are joined by several consecutive links, this indicates several copy number differences (either within the same or different gene families) between them, and the number of such links corresponds to the number of copy number differences. Pink rings indicate haplotypes that were observed in more than one individual. (B) Same as A, but for the data from 62 Danish males in (Skov et al. 2017; rounded copy number values were used; supplementary table S5, Supplementary Material online).

The ampliconic gene copy number-based haplotype variability observed in our data and in the data generated by Skov and colleagues (Skov et al. 2017) was mostly due to the variability of the most diverse TSPY and RBMY gene families (fig. 8). In our data, after removing TSPY, the most variable gene family (fig. 1), the total haplotype number decreased from 98 to 81. An additional removal of the RBMY family led to 58 haplotypes. The effect was even more dramatic for the Skov et al.’s data set. After removing TSPY from the haplotype analysis, only 19 haplotypes remained, whereas an additional removal of RBMY led to a substantial drop to only nine haplotypes.

Fig. 8.

Fig. 8.

—Copy number differences per ampliconic gene family between two haplotypes picked uniformly at random from within and between major Y haplogroups (1,000 samplings within and between haplogroups each; see Materials and Methods).

Phenotypic Traits

We further tested whether ampliconic gene copy number is associated with two sexually dimorphic traits, namely height and FMF (see Materials and Methods). The premise here is that ampliconic genes on the Y chromosome could be involved in the development of sexually dimorphic traits. If ampliconic genes are associated with fertility, they might also have pleiotropic effects on sexually dimorphic traits. Only weak phylogenetic signal was found in our data, as evident from low Pagel’s λ values (supplementary table S12, Supplementary Material online), and no statistically significant correlations between these traits and ampliconic gene copy number were discovered (table 4). Thus, ampliconic gene copy number does not appear to be associated with height or FMF.

Table 4.

Results of Phylogenetic Generalized Least Squares (PGLS) Regression Showing the Association between Phenotypic Traits (Height and FMF Scores) and Ampliconic Gene Copy Number (See Additional Details in supplementary table S12, Supplementary Material online)

Height
FMF
Gene T P T P
BPY 0.903 0.369 0.434 0.665
CDY −0.100 0.921 0.867 0.389
DAZ 0.909 0.365 −0.214 0.831
HSFY 1.064 0.290 1.455 0.149
PRY −0.868 0.388 1.311 0.193
RBMY 0.406 0.686 0.563 0.575
TSPY 1.530 0.129 1.163 0.248
VCY 0.460 0.647 −1.455 0.149
XKRY 1.735 0.086 0.747 0.457

Note.—T is the T-statistic of the slope. P are the respective P values for the significance of each predictor.

Discussion

Very little is known about the variability in copy number of the Y chromosome ampliconic genes in humans and about how such variability impacts phenotypes. These genes, organized in nine multi-gene families, constitute 80% of only 78 protein-coding genes present on the Y chromosome (as annotated in the reference human genome; Skaletsky et al. 2003) and are important for spermatogenesis. Here we experimentally determined the copy number of ampliconic genes in 100 individuals across the world and analyzed this variation in light of Y chromosome haplogroups based on SNPs. Additionally, we assessed whether ampliconic gene copy number is associated with two sexually dimorphic traits.

Variability in Ampliconic Gene Copy Number

Substantial variability in ampliconic gene copy number was observed among gene families (table 2). As a rule, gene families with high copy numbers (RBMY and TSPY) had higher variance in copy number among individuals than gene families with low copy numbers (HSFY, PRY, VCY, and XKRY). This is not surprising as the probability of gene duplication and deletion should be proportional to gene copy number, allowing for greater variation in large gene families (Ghenu et al. 2016). TSPY had the highest copy number and the highest level of variability from all ampliconic gene families analyzed.

In contrast to the generally low levels of nucleotide diversity on the human Y chromosome humans (e.g., Wilson Sayres et al. 2014), we observed high levels of variability on the Y chromosome in terms of ampliconic gene copy numbers, among individuals. A total of 98 different haplotypes were observed among 100 individuals. Thus, almost each male analyzed had his own, unique haplotype. Previously, high levels of variation in ampliconic gene copy number were reported in chimpanzee and bonobo (Oetjens et al. 2016). Thus, our results are consistent with high levels of intrachromosomal rearrangements seen on the Y chromosome (Repping et al. 2006) and with rapid evolution of Y-chromosomal multi-copy (i.e., ampliconic) genes in primates (Ghenu et al. 2016). The substantial copy number variation of the Y-chromosomal ampliconic genes echoes the copy number variation patterns observed previously for loci outside of the Y chromosome (Nozawa et al. 2007; Nozawa and Nei 2008; Redon et al. 2006; Perry et al. 2008; Hsu et al. 2002). Additionally, the diversity of haplotypes based on ampliconic gene copy numbers observed in our study reflects the tendency previously observed for autosomal gene families, for which the number of different haplotypes outnumbers the size of the gene family (Hsu et al. 2002).

Potential Evolutionary Mechanisms and Other Factors

Mutation and Drift

Most gene families are not significantly different in their copy number among major Y chromosome haplogroups (i.e., haplogroups determined by SNPs). Only larger families—DAZ, RBMY and TSPY—showed significant differences (table 3). In other words, most of the variation in copy number is shared among populations. While a more formal investigation is required, this pattern is largely consistent with random genetic drift driving copy number variation among ampliconic genes. A similar conclusion was reached from the analysis of olfactory receptor gene families (Nozawa et al. 2007).

A multitude of back-and-forth duplication/deletion mutations could lead to the observed diversity of haplotypes among human worldwide populations that resulted in some homoplastic haplotypes shared by individuals belonging to different major Y haplogroups. This pattern of variation contrasts that for SNPs, which are virtually free of homoplasies and thus allow us to follow the evolution of Y chromosomes unambiguously. Interestingly, this pattern is reminiscent of that observed for microsatellite haplotype variability (Cooper 1996). Such variation patterns highlight the different nature of SNP versus ampliconic gene copy number mutation mechanisms, but similarities between microsatellite and ampliconic gene copy number mutation mechanisms. While our purpose was not to study ampliconic gene mutational mechanisms, indirectly we can infer very rapid mutations changing ampliconic gene copy numbers that occurred among different haplotypes. More directed studies including pedigrees will have to be conducted to study the rates and relative prevalence of one- versus multi-copy mutations in ampliconic genes from generation to generation.

Gene Conversion

Gene conversion, prevalent at Y chromosome genes located in palindromes likely contributes to homogenization of ampliconic gene sequences, rescuing them from accumulation of deleterious mutations (Rozen et al. 2003; Betrán et al. 2012; Bellott et al. 2014). In theory, gene conversion is unlikely to influence the evolution of ampliconic gene copy number itself, because gene conversion operates at a scale smaller than individual gene copies, that is, at the scale of a few hundreds of bases (Chen et al. 2007). Simulation studies have indicated that gene conversion acting alone does not facilitate gene duplication on the Y chromosome (Connallon and Clark 2010; Marais et al. 2010). Interestingly, it has been suggested that gene conversion can slow down the loss of redundant duplicates, nevertheless contributing to copy number evolution in this manner (Connallon and Clark 2010). Recently, gene conversion on the human Y was found to be biased towards ancestral alleles and towards GC (Skov et al. 2017). Future studies should combine sequence information of ampliconic genes together with copy number data on them to investigate Y chromosomes from humans around the globe.

Selection

Selection could have contributed to the observed patterns of ampliconic gene copy number variation. In particular, we observe that most of the variation in gene copy number is shared across different haplogroups. If we assume that this is not due to back mutations, uniform selection—selection that is uniform in its pressure across different human populations—could potentially explain this result (Lynch 1986; Whitlock 2008). For instance, if copy number is associated with a specific trait, and the same trait is maintained across populations by uniform selection, it might also facilitate maintenance of an optimal copy number (Hammer et al. 2008). Copy number could then be allowed to “drift” around this optimum within populations by mutation.

Another potential explanation for the lack of copy number divergence across populations is balancing selection within populations via negative frequency-dependent selection (van Hooft et al. 2010). However, this contradicts the generally low nucleotide diversity on the human Y (e.g., Dorit et al. 1995; Wilson Sayres et al. 2014) and thus is unlikely.

Our results for the comparison of between-haplogroup variation versus within-haplogroup variation based on the EVE model (Rohlfs and Nielsen 2015) suggest that the copy number of two of the nine ampliconic gene families, TSPY and RBMY, have diverged more across haplogroups than the overall level of divergence observed in all gene families together. This could be due to directional selection in one or more haplogroup lineages. However, we state this result with caution for a number of reasons. First, we only studied nine ampliconic genes and the combined pattern of divergence across these genes may not represent patterns of neutral evolution and could be skewed by one or two genes evolving nonneutrally. Second, we calculated the P values for the likelihood obtained from the EVE model assuming that the likelihood ratio follows a chi-square distribution with one degree of freedom. For the small number of genes studied here, this is a rough approximation (Rohlfs and Nielsen 2015). More sophisticated modeling is required to elucidate the role of selection on copy number in ampliconic genes. In the future, it will also be interesting to compare the patterns of copy number variation between functional Y chromosome ampliconic genes and their pseudogenes. The latter are expected to evolve neutrally, and thus deviant patterns between these two groups would be suggestive of selection operating on the functional copies (Nozawa and Nei 2008).

Selection on expression levels might have also played a role in determining the observed variation in ampliconic gene copy number. Increased expression levels of some genes can lead to an increase in fitness. In this case, chromosomes carrying higher copy numbers of such genes might rise in frequency simply because a higher copy number is correlated with higher gene expression, especially for genes that are associated with fitness-related traits such as fertility (Marais et al. 2010). However, there is likely to be an upper limit for ampliconic gene copy number, as the probability of ectopic crossover events with deleterious consequences increases with the number of copies (Connallon and Clark 2010). Similarly, there might be a lower limit for each gene family, below which gene expression levels would be inadequate for spermatogenesis. These dosage-dependent factors might act as selective limits keeping copy number for ampliconic genes within a certain range (Rozen et al. 2003; Betrán et al. 2012; Bellott et al. 2014). Within this range, which might be different for each gene family, the copy number would be allowed to drift neutrally. The role of dosage-dependent selection on ampliconic gene copy number needs to be explored further by studying the relationship between ampliconic gene copy number and expression levels.

Technical Artifacts

One potential technical factor contributing to the high haplotype variability observed for copy number variation data is amplification of pseudogenes together with functional genes. While highly accurate given the primers used, ddPCR might amplify nonfunctional copies if the primers anneal to them. We made a substantial effort to construct our primers in such a manner that they capture functional copies only, based on the information in the reference human chromosome Y (Tomaszkiewicz et al. 2016). However high sequence identity among gene copies might not have allowed us to completely achieve this goal. This is particularly true for the TSPY gene family, which is the largest tandem protein-coding array present in the human genome (Skaletsky et al. 2003). Because of its size, it is challenging to design primers that capture only functional copies of the TSPY family (Tomaszkiewicz et al. 2016). Other groups have reported similar difficulties with TSPY. For example, a recent study (Oetjens et al. 2016) used a k-mer based approach to detect ampliconic gene copy number variation in chimpanzees from whole-genome sequences. However, they found that the utility of their method for the repetitive TSPY array was limited, and their estimates of TSPY copy number included truncated gene copies (Oetjens et al. 2016). Ghenu et al. (2016) were unable to develop a robust qPCR assay to analyze TSPY copy number in macaques. Therefore, different methods will have to be developed to determine functional TSPY copy number more accurately. Nevertheless, this limitation is unlikely to be the reason behind the large number of haplotypes observed in our data. Even with the TSPY gene family excluded, the number of haplotypes based on ampliconic gene copy number is higher than that based on SNPs (81 vs. 39).

Ampliconic Gene Copy Number and Male-Specific Sexually Dimorphic Traits

In this study, we tested for a potential association between ampliconic gene copy number and two sexually dimorphic traits, height and FMF. We found no significant correlations between facial masculinity or height and copy number of any gene family. Having said that, we state these results should be interpreted with caution for a number of reasons. Firstly, the sample size we analyzed here was relatively small (N = 100) given that the samples were taken from multiple populations worldwide. While we corrected for phylogenetic dependence among the Y chromosomes, we did not correct for variation in their nuclear genome. Sexually dimorphic traits, like many other complex traits, are likely influenced by genes located on several chromosomes. For instance, height is a polygenic trait and GWAS analyses of height have identified hundreds of common variants, each with a small effect, distributed throughout the genome (Yang et al. 2010; Wood et al. 2014). Traits specific to males and related to their reproduction are also influenced by variants located on multiple chromosomes outside of the Y. For instance, nonobstructive azoospermia, a reproductive disease characterized by the absence of sperm in semen, displays synergistic and antagonistic interactions between Y-chromosomal haplogroups and certain autosomal SNPs (Lu et al. 2016). It would be interesting to study the effect of Y ampliconic gene copy number variation on sexually dimorphic traits in light of variation in the nuclear genome.

Furthermore, future studies would benefit from focusing on males from both extremes of the trait distribution (for example, the shortest and the tallest individuals within the data set) and from the same population/haplogroup. Additionally, we only used two phenotypic traits for analysis; a more comprehensive understanding of the role of ampliconic genes and sexually dimorphic characteristics will be gained by including other traits in the analysis.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online.

Supplementary Material

Supplementary Data

Acknowledgments

The authors are grateful to Tomas Benjamin Gonzalez Zarzar for providing facial masculinity scores and to David Puts, Paul Medvedev, and Rahul Vegesna for helpful discussions. We also thank the ADAPT study participants, without whom this research would not have been possible. Funding for the project was provided by the Penn State Center for Human Evolution and Disease (CHED) seed grant, the Huck Institutes for the Life Sciences, the Eberly College of Sciences, the Institute of Cyberscience at Penn State, and by a grant from the Pennsylvania Department of Health using Tobacco Settlement Funds. The Department specifically disclaims responsibility for any analyses, interpretations, or conclusions.

Literature Cited

  1. Bachtrog D. 2008. “The temporal dynamics of processes underlying Y chromosome degeneration.” Genetics 179(3):1513–1525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bachtrog D. 2013. “Y-chromosome evolution: emerging insights into processes of Y-chromosome degeneration.” Nat Rev Genet. 14(2):113–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bansal SK, et al. . 2016. “Gr/gr deletions on Y-chromosome correlate with male infertility: an original study, meta-analyses, and trial sequential analyses.” Sci Rep. 6(1):19798.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bellott DW, et al. . 2014. “Mammalian Y chromosomes retain widely expressed dosage-sensitive regulators.” Nature 508(7497):494–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Betrán E, Demuth JP, Williford A.. 2012. “Why chromosome palindromes?” Int J Evol Biol. 2012(July):1.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bhowmick BK, Satta Y, Takahata N.. 2007. “The origin and evolution of human ampliconic gene families and ampliconic structure.” Genome Res. 17(4):441–450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Case LK, Teuscher C. 2015. “Y genetic variation and phenotypic diversity in health and disease.” Biol Sex Diff. 6(1):6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chang CC, et al. . 2015. “Second-generation PLINK: rising to the challenge of larger and richer datasets.” GigaScience 4(1):7.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Charlesworth B, Charlesworth D.. 2000. “The degeneration of Y chromosomes.” Philos Trans R Soc Lond Ser B, Biol Sci. 355(1403):1563–1572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chen J-M, Cooper DN, Chuzhanova N, Férec C, Patrinos GP.. 2007. “Gene conversion: mechanisms, evolution and human disease.” Nat Rev Genet. 8(10):762–775. [DOI] [PubMed] [Google Scholar]
  11. Claes P, et al. . 2014. “Modeling 3D facial shape from DNA.” PLoS Genet. 10(3):e1004224.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Connallon T, Clark AG.. 2010. “Gene duplication, gene conversion and the evolution of the Y chromosome.” Genetics 186(1):277–286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Cooper G. 1996. “Network analysis of human Y microsatellite haplotypes.” Hum Molec Genet. 5(11):1759–1766. [DOI] [PubMed] [Google Scholar]
  14. Dean R, Mank JE.. 2014. “The role of sex chromosomes in sexual dimorphism: discordance between molecular and phenotypic data.” J Evol Biol. 27(7):1443–1453. [DOI] [PubMed] [Google Scholar]
  15. Dhanoa JK, Mukhopadhyay CS, Arora JS.. 2016. “Y-chromosomal genes affecting male fertility: a review.” Vet World. 9(7):783–791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Dorit RL, Akashi H, Gilbert W.. 1995. “Absence of polymorphism at the ZFY locus on the human y chromosome.” Science 268(5214):1183–1185. [DOI] [PubMed] [Google Scholar]
  17. Filatov DA, Monéger F, Negrutiu I, Charlesworth D.. 2000. “Low variability in a Y-linked plant gene and its implications for Y-chromosome evolution.” Nature 404(6776):388–390. [DOI] [PubMed] [Google Scholar]
  18. Foresta C, Moro E, Ferlin A.. 2001. “Y chromosome microdeletions and alterations of spermatogenesis.” Endocrine Rev. 22(2):226–239. [DOI] [PubMed] [Google Scholar]
  19. Gascuel O. 1997. “BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data.” Molec Biol Evol. 14(7):685–695. [DOI] [PubMed] [Google Scholar]
  20. Ghenu A-H, Bolker BM, Melnick DJ, Evans BJ.. 2016. “Multicopy gene family evolution on primate Y chromosomes.” BMC Genomics. 17(1):157.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Giachini C, et al. . 2009. “TSPY1Copy number variation influences spermatogenesis and shows differences among Y lineages.” J Clin Endocrinol Metab. 94(10):4016–4022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hammer MF, Mendez FL, Cox MP, Woerner AE, Wall JD.. 2008. “Sex-biased evolutionary forces shape genomic patterns of human diversity.” PLoS Genet. 4(9):e1000202.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Harley VR, et al. . 1992. “DNA binding activity of recombinant SRY from normal males and XY females.” Science 255(5043):453–456. [DOI] [PubMed] [Google Scholar]
  24. Helgason A, et al. . 2015. “The Y-chromosome point mutation rate in humans.” Nat Genet. 47(5):453–457. [DOI] [PubMed] [Google Scholar]
  25. Hindson BJ, et al. . 2011. “High-throughput droplet digital PCR system for absolute quantitation of DNA copy number.” Anal Chem. 83(22):8604–8610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Hsu KC, Chida S, Geraghty DE, Dupont B.. 2002. “The killer cell immunoglobulin-like receptor (KIR) genomic region: gene-order, haplotypes and allelic polymorphism.” Immunol Rev. 190(1):40–52. [DOI] [PubMed] [Google Scholar]
  27. Karmin M, et al. . 2015. “A recent bottleneck of Y chromosome diversity coincides with a global change in culture.” Genome Res. 25(4):459–466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kido T, Lau Y-FC.. 2015. “Roles of the Y chromosome genes in human cancers.” Asian J Androl. 17(3):373–380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Knebel S, Pasantes JJ, Thi DAD, Schaller F, Schempp W.. 2011. “Heterogeneity of pericentric inversions of the human Y chromosome.” Cytogenet Genome Res. 132(4):219–226. [DOI] [PubMed] [Google Scholar]
  30. Krausz C, Hoefsloot L, Simoni M, Tüttelmann F, European Academy of Andrology, and European Molecular Genetics Quality Network. 2014. “EAA/EMQN best practice guidelines for molecular diagnosis of Y-chromosomal microdeletions: state-of-the-art 2013.” Andrology 2(1):5–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Krausz C, et al. . 1999. “A high frequency of Y chromosome deletions in males with nonidiopathic infertility.” J Clin Endocrinol Metab. 84(10):3606–3612. [DOI] [PubMed] [Google Scholar]
  32. Krausz C, Degl’Innocenti S.. 2006. “Y chromosome and male infertility: update, 2006.” Front Biosci: J Virt Libr. 11(1):3049–3061. [DOI] [PubMed] [Google Scholar]
  33. Krausz C, Giachini C, Forti G.. 2010. “TSPY and male fertility.” Genes 1(2):308–316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Kumar S, Stecher G, Tamura K.. 2016. “MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets.” Molec Biol Evol. 33(7):1870–1874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Kuroda-Kawaguchi T, et al. . 2001. “The AZFc region of the Y chromosome features massive palindromes and uniform recurrent deletions in infertile men.” Nat Genet. 29(3):279–286. [DOI] [PubMed] [Google Scholar]
  36. Lahn BT, Pearson NM, Jegalian K.. 2001. “The human Y chromosome, in the light of evolution.” Nat Rev Genet. 2(3):207–216. [DOI] [PubMed] [Google Scholar]
  37. Lu C, et al. . 2016. “Y chromosome haplogroups based genome-wide association study pinpoints revelation for interactions on non-obstructive azoospermia.” Sci Rep. 6(1):33363.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Lynch M. 1986. “Random drift, uniform selection, and the degree of population differentiation.” Evol Int J Organ Evol. 40(3):640–643. [DOI] [PubMed] [Google Scholar]
  39. Marais GAB, Campos PRA, Gordo I.. 2010. “Can intra-Y gene conversion oppose the degeneration of the human Y chromosome? A Simulation Study.” Genome Biol Evol. 2:347–357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. McDermott GP, et al. . 2013. “Multiplexed target detection using DNA-binding dye chemistry in droplet digital PCR.” Anal Chem. 85(23):11619–11627. [DOI] [PubMed] [Google Scholar]
  41. Navarro-Costa P, Goncalves J, Plancha CE.. 2010. “The AZFc region of the Y chromosome: at the crossroads between genetic diversity and male infertility.” Hum Reprod Update. 16(5):525–542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Nickkholgh B, et al. . 2010. “Y chromosome TSPY copy numbers and semen quality.” Fertil Steril. 94(5):1744–1747. [DOI] [PubMed] [Google Scholar]
  43. Noordam MJ, et al. . 2011. “Gene copy number reduction in the azoospermia factor c (AZFc) region and its effect on total motile sperm count.” Hum Molec Genet. 20(12):2457–2463. [DOI] [PubMed] [Google Scholar]
  44. Nozawa M, Kawahara Y, Nei M.. 2007. “Genomic drift and copy number variation of sensory receptor genes in humans.” Proc Natl Acad Sci USA. 104(51):20421–20426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Nozawa M, Nei M.. 2008. “Genomic drift and copy number variation of chemosensory receptor genes in humans and mice.” Cytogenet Genome Res. 123(1–4):263–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Oetjens MT, Shen F, Emery SB, Zou Z, Kidd JM.. 2016. “Y-Chromosome structural diversity in the Bonobo and Chimpanzee lineages.” Genome Biol Evol. 8(7):2231–2240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Page DC, Silber S, Brown LG.. 1999. “Men with infertility caused by AZFc deletion can produce sons by intracytoplasmic sperm injection, but are likely to transmit the deletion and infertility.” Hum Reprod. 14(7):1722–1726. [DOI] [PubMed] [Google Scholar]
  48. Pagel M. 1999. “Inferring the historical patterns of biological evolution.” Nature 401(6756):877–884. [DOI] [PubMed] [Google Scholar]
  49. Paradis E. 2010. “Pegas: an R package for population genetics with an integrated-modular approach.” Bioinformatics 26(3):419–420. [DOI] [PubMed] [Google Scholar]
  50. Paradis E, Claude J, Strimmer K.. 2004. “APE: analyses of phylogenetics and evolution in R language.” Bioinformatics 20(2):289–290. [DOI] [PubMed] [Google Scholar]
  51. Perry GH, et al. . 2008. “The fine-scale and complex architecture of human copy-number variation.” Am J Hum Genet. 82(3):685–695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team. 2017. Nlme: linear and Nonlinear Mixed Effects Models (version R package version 3.1-131). Available from: https://CRAN.R-project.org/package=nlme.
  53. Pinheiro LB, et al. . 2012. “Evaluation of a droplet digital polymerase chain reaction format for DNA copy number quantification.” Anal Chem. 84(2):1003–1011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Popescu A-A, Huber KT, Paradis E.. 2012. “Ape 3.0: new tools for distance-based phylogenetics and evolutionary analysis in R.” Bioinformatics 28(11):1536–1537. [DOI] [PubMed] [Google Scholar]
  55. Pryor JL, et al. 1998: “Microdeletions in the Y chromosome of infertile men.” J Urol. 159(2):608–609. [Google Scholar]
  56. Quinque D, Kittler R, Kayser M, Stoneking M, Nasidze I.. 2006. “Evaluation of saliva as a source of human DNA for population and association studies.” Anal Biochem. 353(2):272–277. [DOI] [PubMed] [Google Scholar]
  57. Redon R, et al. . 2006. “Global variation in copy number in the human genome.” Nature 444(7118):444–454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Repping S, et al. . 2006. “High mutation rates have driven extensive structural polymorphism among human Y chromosomes.” Nat Genet. 38(4):463–467. [DOI] [PubMed] [Google Scholar]
  59. Revell LJ. 2010. “Phylogenetic signal and linear regression on species data.” Methods Ecol Evol/Br Ecol Soc. 1(4):319–329. [Google Scholar]
  60. Rohlfs RV, Nielsen R.. 2015. “Phylogenetic ANOVA: the expression variance and evolution model for quantitative trait evolution.” Syst Biol. 64(5):695–708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Rozen SG, et al. . 2012. “AZFc deletions and spermatogenic failure: a population-based survey of 20,000 Y chromosomes.” Am J Hum Genet. 91(5):890–896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Rozen S, et al. . 2003. “Abundant gene conversion between arms of palindromes in human and ape Y chromosomes.” Nature 423(6942):873–876. [DOI] [PubMed] [Google Scholar]
  63. RStudio Team (2016). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/.
  64. Scott DM, Ehrmann IE, Ellis PS, Chandler PR, Simpson E.. 1997. “Why do some females reject males? The molecular basis for male-specific graft rejection.” J Molec Med. 75(2):103–114. [DOI] [PubMed] [Google Scholar]
  65. Skaletsky H, et al. . 2003. “The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes.” Nature 423(6942):825–837. [DOI] [PubMed] [Google Scholar]
  66. Skov L, Danish Pan Genome Consortium. Schierup MH. 2017. “Analysis of 62 hybrid assembled human Y chromosomes exposes rapid structural changes and high rates of gene conversion.” PLoS Genet. 13(8):e1006834.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Tanner JM, Prader A, Habich H, Ferguson-Smith MA.. 1959. “Genes on the Y chromosome influencing rate of maturation in man.” Lancet 274(7095):141–144. [DOI] [PubMed] [Google Scholar]
  68. Tomaszkiewicz M, Medvedev P, Makova KD.. 2017. “Y and W chromosome assemblies: approaches and discoveries.” Trends Genet: TIG. 33(4):266–282. [DOI] [PubMed] [Google Scholar]
  69. Tomaszkiewicz M, et al. . 2016. “A time- and cost-effective strategy to sequence mammalian Y chromosomes: an application to the de novo assembly of Gorilla Y.” Genome Res. 26(4):530–540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. van Hooft P, et al. . 2010. “Rainfall-driven sex-ratio genes in african buffalo suggested by correlations between Y-chromosomal haplotype frequencies and foetal sex ratio.” BMC Evol Biol. 10(1):106.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. van Oven M, Van Geystelen A, Kayser M, Decorte R, Larmuseau MHD.. 2014. “Seeing the wood for the trees: a minimal reference phylogeny for the human Y chromosome.” Hum Mutat. 35(2):187–191. [DOI] [PubMed] [Google Scholar]
  72. Venables WN, Ripley BD.. 2002. Modern Applied Statistics with S. Ripley BD, editor. New York: Springer [Google Scholar]
  73. Vogt PH, et al. . 1996. “Human Y chromosome azoospermia factors (AZF) mapped to different subregions in Yq11.” Hum Molec Genet. 5(7):933–943. [DOI] [PubMed] [Google Scholar]
  74. Whitlock MC. 2008. “Evolutionary inference from QST.” Molec Ecol. 17(8):1885–1896. [DOI] [PubMed] [Google Scholar]
  75. Wilder JA, Mobasher Z, Hammer MF.. 2004. “Genetic evidence for unequal effective population sizes of human females and males.” Molec Biol Evol. 21(11):2047–2057. [DOI] [PubMed] [Google Scholar]
  76. Wilson Sayres MA, Lohmueller KE, Nielsen R.. 2014. “Natural selection reduced diversity on human Y chromosomes.” PLoS Genet. 10(1):e1004064.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Wood AR, et al. . 2014. “Defining the role of common variation in the genomic and biological architecture of adult human height.” Nat Genet. 46(11):1173–1186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Yang J, et al. . 2010. “Common SNPs explain a large proportion of the heritability for human height.” Nat Genet. 42(7):565–569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Y Chromosome Consortium. 2002. “A nomenclature system for the tree of human Y-chromosomal binary haplogroups.” Genome Res. 12(2):339–348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Yu X-W, Wei Z-T, Jiang Y-T, Zhang S-L.. 2015. “Y chromosome azoospermia factor region microdeletions and transmission characteristics in azoospermic and severe oligozoospermic patients.” Int J Clin Exp Med. 8(9):14634–14646. [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Genome Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES