Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2014 May 13;111(21):7713–7718. doi: 10.1073/pnas.1319227111

Genetic degeneration of old and young Y chromosomes in the flowering plant Rumex hastatulus

Josh Hough 1,1, Jesse D Hollister 1, Wei Wang 1, Spencer C H Barrett 1, Stephen I Wright 1
PMCID: PMC4040613  PMID: 24825885

Significance

Evolutionary theory predicts that in dioecious organisms with sex chromosomes, suppressed X-Y recombination should lead to a loss of Y-chromosome gene content and function. However, the extent to which this process occurs in plants, where sex chromosomes evolved relatively recently, is poorly understood. We tested for Y degeneration in Rumex hastatulus, an annual plant that has both XY and XY1Y2 sex chromosome systems. We found that Y-linked genes are undergoing degeneration despite their recent origin; they show a faster accumulation of amino acid substitutions, contain more unpreferred changes in codon usage, and are reduced in expression relative to X-linked alleles. Significantly, the magnitude of these effects depended on sex chromosome age, being greater for genes that have been nonrecombining for longer.

Keywords: molecular evolution, sex linkage, dioecy

Abstract

Heteromorphic sex chromosomes have originated independently in many species, and a common feature of their evolution is the degeneration of the Y chromosome, characterized by a loss of gene content and function. Despite being of broad significance to our understanding of sex chromosome evolution, the genetic changes that occur during the early stages of Y-chromosome degeneration are poorly understood, especially in plants. Here, we investigate sex chromosome evolution in the dioecious plant Rumex hastatulus, in which X and Y chromosomes have evolved relatively recently and occur in two distinct systems: an ancestral XX/XY system and a derived XX/XY1Y2 system. This polymorphism provides a unique opportunity to investigate the effect of sex chromosome age on patterns of divergence and gene degeneration within a species. Despite recent suppression of recombination and low X-Y divergence in both systems, we find evidence that Y-linked genes have started to undergo gene loss, causing ∼28% and ∼8% hemizygosity of the ancestral and derived X chromosomes, respectively. Furthermore, genes remaining on Y chromosomes have accumulated more amino acid replacements, contain more unpreferred changes in codon use, and exhibit significantly reduced gene expression compared with their X-linked alleles, with the magnitude of these effects being greatest for older sex-linked genes. Our results provide evidence for reduced selection efficiency and ongoing Y-chromosome degeneration in a flowering plant, and indicate that Y degeneration can occur soon after recombination suppression between sex chromosomes.


Systems of sex determination involving X and Y chromosomes have evolved multiple times in both plants and animals, with Y chromosomes having lost much of their genetic function in many species (13). Evidence of DNA sequence homology between X- and Y-linked gene pairs in flowering plants (47) and fish (8) supports the idea that sex chromosomes have evolved from autosomes and subsequently diverged following the suppression of recombination between genes involved in sex determination. Evolutionary models predict that when regions of suppressed recombination evolve on Y chromosomes, the associated reduction in the effectiveness of selection should lead to a pattern of Y-chromosome degeneration in which genes carried on the Y become impaired in function and are eventually lost (13). The well-studied Y chromosomes in humans and Drosophila melanogaster, for example, show clear signs of degeneration: They almost completely lack homology to the X chromosome, exhibit a highly heterochromatic chromatin structure consisting largely of repetitive and ampliconic DNA, and carry few remaining protein-coding genes (913).

Recent genomic studies of sex chromosomes in humans, rhesus macaques, and chimpanzees (12, 13) have provided detailed information regarding the genetic structure and gene content of Y chromosomes, shedding light on the processes contributing to their deterioration. However, we still know little about the changes characterizing the early stages of Y-chromosome degeneration or the time scales over which they occur. This situation arises because sex chromosomes in these well-studied mammalian species evolved >200 Mya (14, 15), and therefore provide few clues about their early evolutionary history. Genomic studies of younger plant Y chromosomes (1619) and Drosophila neo-Y chromosomes (2023), where degeneration is in progress, thus provide excellent opportunities to gain insight into the early processes involved in sex chromosome divergence.

Here, we investigate X- and Y-chromosome evolution in the annual, dioecious plant Rumex hastatulus (Polygonaceae). Sex chromosomes in R. hastatulus represent an interesting case of the recent evolution of sex chromosome heteromorphism, with age estimates based on nuclear and chloroplast phylogenies suggesting that sex chromosomes evolved within the past 15–16 million years (24). The presence of a neo-Y sex chromosome system (XX/XY1Y2), recently derived from an XX/XY system following a fusion of the X chromosome and a former autosome (25), provides a unique opportunity to contrast patterns of sex chromosome evolution between different sex chromosome systems and to investigate the effect of sex chromosome age on patterns of divergence and degeneration within a species. We used high-throughput transcriptome sequencing of multiple parent–offspring families and an analysis of SNP segregation patterns to identify and compare the expression and molecular evolution of sex-linked genes, with the aim of determining whether Y-linked genes are accumulating deleterious mutations, exhibit reduced expression, or have undergone gene loss.

Results and Discussion

We identified genes linked to sex chromosomes by tracing the inheritance of SNPs from parents to first generation (F1) progeny in two crosses, one from each sex chromosome system (XX/XY and XX/XY1Y2). We identified genes in which SNPs segregated in a manner characteristic of sex linkage, with Y alleles transmitted from fathers to sons and X alleles transmitted from fathers to daughters, a method validated in previous studies (16, 17). This approach allowed us to identify 698 genes with four or more sex-linked SNPs in XX/XY populations and 1,298 such genes in XX/XY1Y2 populations (Table 1 and SI Appendix, Table S1). Approximately 70% of sex-linked genes from the XY system were identified in the XY1Y2 system, and ∼40% of genes in the XY1Y2 system were shared with the XY system. This suggests that the XY1Y2 system has acquired many new sex-linked genes since the fusion event, and our analysis allowed us to identify a set of 488 “old” sex-linked genes shared between the systems, as well as 607 “young” genes unique to the XY1Y2 system.

Table 1.

Numbers of identified sex-linked genes in R. hastatulus

Gene set Sex-linked genes with Y-linked copies* Hemizygous genes Hemizygous genes, %
XY system 698 (565) 119 24
XY1Y2 shared 510 (460) 100 28
XY1Y2 unique§ 788 (223) 44 8
*

Numbers indicate genes with at least four supporting SNPs showing sex-linked segregation and having no SNPs with autosomal segregation. Values in parentheses identify the numbers of genes with at least one fixed X-Y difference in the population sample.

Estimates of percentage of hemizygous genes were calculated by comparing the number of hemizygous genes and the number of X/Y genes that had at least four segregating X polymorphisms.

Shared genes represent genes in the XX/XY1Y2 system that were also identified in the XX/XY system.

§

Unique genes represent genes identified as unique to the XX/XY1Y2 system.

Cytological measurements of X-chromosome size in R. hastatulus suggest that the X is ∼20% of the diploid female genome for the XY system and ∼30% of the genome in the XY1Y2 system (25). Using the estimated number of genes reported in other dicotyledonous plants [28,000 in Arabidopsis thaliana (26)], we obtained a rough estimate of the expected number of sex-linked genes of 5,600 and 8,400 for the XY and XY1Y2 systems, respectively. Our screen for sex-linked genes using segregating polymorphisms in expressed genes therefore captures ∼13% and ∼15% of the total number of sex-linked genes for the XY and XY1Y2 systems, respectively.

Because some of our candidate sex-linked genes may be in a pseudoautosomal region, and therefore partially recombining with the sex-determining region, we independently sequenced transcriptomes from a single male and female from each of six populations per sex chromosome system and checked for the presence of fixed differences between males and females (Table 1 and SI Appendix, Tables S2 and S3). This approach led to validation of ∼80% of the sex-linked genes from the XY system, 90% from the XY1Y2 system shared with the XY system, but only 28% of the young XY1Y2 genes. This suggests that fewer variants have fixed between the neo-sex chromosomes, potentially due to ongoing recombination in a pseudoautosomal region or very recent recombination suppression, with residual shared polymorphism between the chromosomes. For subsequent analyses, we excluded genes without fixed differences between X and Y, as well as a small number of genes with one or more SNPs displaying autosomal segregation (SI Appendix, Table S4).

Phylogenetic Relationships and Evolutionary Divergence of Sex-Linked Genes.

To investigate relatedness and levels of divergence of sex-linked genes, we obtained additional transcriptome data and identified orthologous sequences from the closest known nondioecious outgroup that lacks sex chromosomes, Rumex bucephalophorus (24). We developed a maximum likelihood method to infer the phased X and Y sequences from both sex chromosome systems for each gene. We confirmed the reliability of our method using simulations (SI Appendix, Figs. S1 and S2) and constructed phylogenetic trees of these sequences, including the outgroup (Methods and SI Appendix). Of 354 old sex-linked genes, 150 (42%) X alleles were monophyletic from the two sex chromosome systems, whereas 179 (51%) Y alleles were monophyletic (Fisher’s exact test, P < 0.04), consistent with the origins of these Y-linked genes predating the divergence of the two sex chromosome systems. Overall, only 78 (22%) exhibited complete reciprocal monophyly for both X and Y between the systems, highlighting their very recent divergence and indicating that a significant proportion of even the old genes may have experienced recent suppression of recombination and some may be pseudoautosomal. Consistent with this, maximum likelihood estimates of synonymous substitution rates (Ks) for both young and old X- and Y-linked genes (Fig. 1) suggest that the majority have low levels of nucleotide divergence, implying that many genes are in an early stage of divergence or experience ongoing recombination.

Fig. 1.

Fig. 1.

Synonymous site divergence in sex-linked genes of the XY1Y2 system of R. hastatulus. Maximum likelihood estimates of lineage-specific rates of per-site synonymous substitution are shown for the X chromosome (A) and Y chromosome (B). Old sex-linked genes refer to genes that are shared between the ancestral XY system and the derived XY1Y2 system. Young sex-linked genes refer to those that are unique to the derived XY1Y2 system.

It is of interest to infer the extent to which sex-linked genes fall into distinct evolutionary strata, which has been found in animal and plant sex chromosomes (e.g., refs. 14, 27, 28) and is characterized by a stratified increase in divergence of X/Y genes with increasing distance from the pseudoautosomal region. We found a range of Ks values for sex-linked genes within each system, which may reflect that recombination suppression occurred at different times for different genes (which is thought to be the underlying cause of strata). In addition, we found significant differences in average branch-specific Ks values when comparing old vs. young X-linked genes (0.00870 and 0.00276, respectively; P << 10−10) and old vs. young Y-linked genes (0.0120 and 0.00297, respectively; P << 10−10), with the younger sets showing more left-shifted Ks distributions and much lower average Ks values (Fig. 1). Overall, these results highlight that there has been little sequence divergence for young sex-linked genes (the youngest evolutionary stratum), whereas older genes likely include genes that have experienced recent restricted recombination either before or following the divergence between sex chromosome systems, some genes that may still be pseudoautosomal, and genes that have been nonrecombining for a much longer period (i.e., belong to an older evolutionary stratum).

Y Chromosome Gene Loss and Loss of Expression.

The relatively recent evolution of recombination suppression and low sequence divergence between many genes on R. hastatulus sex chromosomes raises the question of whether Y-linked genes have been lost, or have lost expression relative to X-linked genes. Gene loss has occurred extensively on human and Drosophila Y chromosomes (reviewed in ref. 3), and it might be driven by adaptive silencing of Y-linked genes to mask their deleterious effects (22, 29) or, more passively, as a consequence of harmful mutations occurring in regions essential for gene function (30, 31). We inferred the amount of gene loss in R. hastatulus by quantifying the percentage of X-linked genes in which SNP segregation patterns indicated hemizygosity in males (Table 1 and SI Appendix). Estimates of hemizygosity based only on mRNA sequence data will include genes that have been lost, genes with nonfunctional (nonexpressed) Y-linked copies, and genes that have moved from autosomes to the X chromosome but do not have homologous copies on the Y chromosome. We note that hemizygosity could conceivably be incorrectly inferred using our RNA sequencing (RNAseq)-based approach in cases where X-linked genes have Y-linked copies but are expressed too low to be detected. Such genes would indicate partial Y degeneration rather than genuine gene loss.

By comparing the number of hemizygous genes with the number of X/Y genes with equivalent segregating X-linked polymorphisms (SI Appendix), we estimate that the percentage of genes lost from the R. hastatulus Y chromosome is as high as 28% (Table 1 and SI Appendix, Table S5). We also found that estimates of hemizygosity in XY1Y2 males were much lower (8%) than in XY males (Table 1), which is expected because the XX/XY1Y2 sex chromosome system has acquired additional X/Y gene pairs, with little time for gene degeneration and loss. Our estimates of the percentage of hemizygosity, although low in comparison to mammalian sex chromosomes [where ∼97% of the X chromosome is hemizygous in males (15, 32)], are somewhat higher than other estimates from plants [∼20% in Silene latifolia (16, 17)] and suggest that Y chromosomes in R. hastatulus have already undergone gene loss, despite their relatively recent origin.

We tested for a reduction in expression of young and old Y-linked genes by comparing the ratio of Y/X gene expression in males. Expression was estimated by counting the number of mRNA transcript reads mapping to X/Y SNPs in contigs with four or more such SNPs segregating in F1 offspring. Because Y-linked alleles in our segregation analysis are identified as alternate alleles at heterozygous sites (with the X allele as the reference), it is important to evaluate the extent of the reduction in the Y/X expression ratio by comparing it with the expression ratio of alternate-to-reference alleles at heterozygous sites throughout the genome. This is necessary because there is an inherent bias toward mapping more reference than alternate alleles (33), and not controlling for this would generate a false signal of lower Y expression or exaggerate signals of truly reduced expression. We therefore tested for reductions in Y/X expression ratios by using the alternate/reference expression ratio in autosomes as the null expectation.

Our analyses indicated an overall trend of reduced Y expression relative to X-linked alleles for both old and young categories (and similar results were obtained in a comparison with the full set of genes from the XY system; SI Appendix, Fig. S3), with the effect being markedly stronger for older Y-linked genes (median = 0.79; Wilcoxon test, W = 1093796, P << 10−10; Fig. 2) than for the younger category (median= 0.90; W = 495511.5, P = 0.0267; Fig. 2 and SI Appendix, Fig. S3). The overall pattern suggests that Y-linked genes that spend more time in the nonrecombining regions are more likely to show functional deterioration. However, it is also possible that X-linked alleles have been up-regulated to some extent in males [partial dosage compensation (34, 35)], thus contributing to the observed lower Y expression relative to X (see below). Our results also suggest that some genes have elevated Y-linked expression relative to X-linked alleles (Fig. 2), although this is less common. The fact that younger sex-linked genes also show a significant reduction in their Y/X ratio indicates that reduced Y expression is probably one of the initial changes that occurs following the evolution of X-Y recombination suppression.

Fig. 2.

Fig. 2.

Y/X gene expression of old and young sex-linked genes in R. hastaulus. The Y/X expression ratio distribution in males for 230 young sex-linked genes from the XY1Y2 system (not shared with the XY system) and 459 old sex-linked genes (shared with the XY system), compared with the expression ratio for alternate-to-reference (alt/ref) alleles at heterozygous sites in autosomes, is shown. Relative expression of Y alleles relative to X alleles was estimated per gene in males (i.e., within individual samples) by counting the numbers of mRNA reads covering sex-linked SNPs in sex-linked genes, and these relative estimates were averaged across all males. Expression estimates for reference and alternative alleles at heterozygous sites in autosomes were obtained similarly, using the numbers of mRNA reads covering SNPs in contigs where at least four such SNPs segregated as autosomal. The dotted line shows the expectation when X and Y alleles (or ref and alt alleles in autosomes) are equally expressed. Error bars show 1.5× the interquartile range, approximately corresponding to 2 SDs, and notches correspond approximately to 95% confidence intervals for the medians.

Disruption of normal expression levels and gene loss could negatively affect the fitness of males, potentially leading to selective pressure to up-regulate X-linked genes, a process known as dosage compensation (2, 34). To investigate this, we analyzed the expression of X-linked genes that were ascertained to be hemizygous in males (but present in two active copies in females) to determine whether such genes were hyperexpressed in males. Our analysis of 119 hemizygous genes revealed that relatively few hemizygous genes in males show evidence for a compensatory increase in gene expression compared with X-linked genes in females. The majority of X-linked genes with missing Y copies in males were expressed approximately twofold lower compared with females (Fig. 3A). In particular, a high proportion of these genes [94 (79%) of 119 genes] showed significantly lower expression in males than in females (SI Appendix, Table S6), whereas only 7 (6%) of 119 had significantly higher expression in males compared with one-half of total (X + X) expression in females. This suggests that dosage compensation is incomplete in R. hastatulus, and is evidently not mediated by a chromosome-wide mechanism that affects all X-linked genes similarly.

Fig. 3.

Fig. 3.

Average normalized gene expression in male vs. female progeny (six of each sex) from the XY1Y2 system. Hemizygous genes (A), sex-linked genes with Y homologs shared with the XY race (old) (B), sex-linked genes with Y homologs not shared with the XY system (C, young), and autosomal genes (D) are shown. The solid line shows the expectation under equal male and female expression, and the dashed line shows the expectation for male expression being equal to one-half of female expression. Median differential expression normalization was conducted using DESeq (details are provided in Methods).

In contrast, we did not find a consistent reduction in male-specific expression for either old (Fig. 3B) or young (Fig. 3C) X/Y genes (SI Appendix, Table S6) compared with total X expression in females. This implies that the observed loss of expression of Y-linked alleles does not cause total levels of sex-linked gene expression in males to be reduced, potentially reflecting up-regulation of the male X allele to compensate for the loss in expression of the Y allele. However, it is unclear whether this compensatory increase in expression of the X allele in males is adaptive and was selected because of a degenerating Y allele. Instead, it may have arisen as a consequence of existing mechanisms of gene expression regulation that are activated in the presence of small perturbations in expression or gene dosage (e.g., refs. 3638).

One potential complication of this analysis might be that changes in gene dosage on the sex chromosomes have led to sex-specific changes in autosomal expression, causing normalized estimates of male X-linked expression to be artificially deflated. To test whether there were global differences in autosomal expression between the sexes, we plotted the distribution of average expression in males divided by average expression in females for autosomal genes (39, 40) (SI Appendix, Fig. S7). This distribution is centered at 1 (n = 1,167, median = 1.01, SD = 0.349), suggesting a lack of widespread expression differences in males compared with females. A slight secondary peak around 1.9 is evident, suggesting that some genes may be differentially expressed, but the effect on the central tendency of the distribution is minimal. Although the slight right skew might mean that X up-regulation in males has been underestimated, we did not find evidence for large quantitative differences in autosomal expression, suggesting that our RNAseq-based estimates of X expression are reliable. Indeed, autosomal genes have the lowest level of differential gene expression between males and females (Fig. 3D and SI Appendix, Table S6), suggesting that most of the differential gene expression between the sexes is driven by sex chromosome evolution. Results consistent with this were obtained when examining expression differences in the XY system, as well as from independent population samples (SI Appendix, Table S6). Overall, we conclude that the majority of hemizygous genes are not dosage-compensated, whereas genes with retained Y copies have lower Y expression but no overall differential expression between the sexes.

Molecular Evolutionary Tests for Deleterious Mutations and Codon Use Bias.

We also tested whether the efficacy of purifying selection was reduced for Y-linked genes, and whether they have accumulated more deleterious mutations or changes in codon use compared with X-linked genes. This is expected because of the lower rate of recombination for Y-linked genes, which is predicted to reduce the efficacy of purifying selection (30, 31). However, given that recombination suppression was recent for many sex-linked genes, extensive deterioration of Y-linked genes may not be expected. Using our phased X and Y sequences, we used two approaches to test whether Y-linked sequences have accumulated deleterious changes. First, we used parsimony to estimate the total number of changes across sex-linked genes on the X and Y lineages, using orthologous sequences from R. bucephalophorus. The number of synonymous changes on the X vs. the Y for the old gene set is nearly equal, providing no evidence for elevated mutation rates on the Y chromosome (Fig. 4A and SI Appendix, Fig. S4). In contrast, nearly twice as many nonsynonymous changes have occurred on the Y lineage (1,646 vs. 835), implying reduced selection efficacy since the suppression of recombination. This difference is highly significant (Fisher’s exact test, P < 0.001). For the young gene set, a weaker trend was apparent (339 vs. 215 nonsynonymous changes on the Y vs. the X; Fisher’s exact test, P < 0.001; Fig. 4A).

Fig. 4.

Fig. 4.

Synonymous and nonsynonymous substitutions in X and Y genes. The number of parsimony-estimated lineage-specific substitutions (A) and changes in codon use (B) on the X and Y sequences from the XY1Y2 system are shown, using orthologous sequences from R. bucephalophorus to polarize changes along the X and Y lineages separately. Old genes represent those shared with the XY system, whereas young genes represent those that are not shared.

We also generated maximum-likelihood estimates of ω, the nonsynonymous (dN) to synonymous (dS) substitution ratio for each lineage, including the X and Y sequences of both systems and the outgroup. Consistent with the parsimony approach, we found that old Y-linked genes in the XY1Y2 system had a higher number of nonsynonymous relative to synonymous substitutions per site compared with X-linked genes (average ωY_old = 0.401 and average ωX_old = 0.156; Wilcoxon test, P << 10−10), but the difference was much less and not significant for younger Y-linked genes (average ωY_young = 0.209 and ωX_young = 0.145; P = 0.114). No significant difference in synonymous substitution rate was observed between X and Y chromosomes (Fig. 4A and SI Appendix, Fig. S4), suggesting that differences in ω are not due to differences in underlying mutation rates. Further, we found that old and young X sequences did not have significantly different ω values (P < 0.399), but the comparison of old vs. young Y genes revealed a significant difference (P < 4 × 10−8). As expected, analysis of substitution rates in the XY chromosome system gave comparable results to the old gene set in the XY1Y2 system (SI Appendix, Table S7). Together, these results indicate that elevated ωY_old is not due to changes on the X but is caused by a significantly higher substitution rate on the Y.

Finally, we also tested whether Y-linked genes have undergone more changes toward unpreferred codons than X-linked genes. Here, we used a parsimony approach to examine changes in codon use along the X vs. Y lineages, using the outgroup sequence to polarize changes on X and Y branches. To count the number of changes from preferred to unpreferred codons, and vice versa, we assumed shared codon preferences from A. thaliana (41). Old Y-linked genes had significantly more preferred-to-unpreferred changes in codon use relative to unpreferred-to-preferred changes compared with X-linked genes (Fig. 4B and SI Appendix, Fig. S5; Fisher’s exact test, P < 0.01). However, no significant difference was observed in the ratio of codon changes for the young Y-linked genes (Fisher’s exact test, P > 0.05). The larger number of codon substitutions in the old Y-linked genes may reflect a greater reduction in the efficacy of selection on codon use; additionally, differences in biased gene conversion due to recombination suppression may play a role. Collectively, these molecular evolutionary comparisons of X- and Y-linked sequences support the hypothesis that deleterious changes are accumulating in Y lineages as a result of a reduction in the efficacy of selection, with the magnitude of the effects depending on the time since recombination suppression.

Conclusions

Our segregation-based analysis using RNAseq has led to the identification of hundreds of sex-linked genes in a nonmodel dioecious plant species with a neo-Y sex chromosome system. This has allowed us to compare the changes in expression and sequence evolution that have occurred following recombination suppression between X and Y chromosomes. The majority of X/Y genes in R. hastatulus have become nonrecombining recently and exhibit low X-Y sequence divergence; however, the older Y-linked genes that are shared between the XX/XY and XX/XY1Y2 systems show clear signs of degeneration, and many of the oldest sex-linked genes are likely in our hemizygous set. The older Y-linked genes have undergone gene loss, are accumulating nonsynonymous substitutions likely to impair gene function, contain more unpreferred changes in codon use, and show a loss of expression compared with X-linked genes. In contrast, we find that these features of Y degeneration are either significantly reduced or absent in the younger X/Y genes unique to the XX/XY1Y2 system. Our contrast between young and old sex-linked genes, made possible because of the unusual occurrence in R. hastatulus of intraspecific polymorphism in the sex chromosome system, provides a unique glimpse into the early stages and chronology of Y-chromosome degeneration in a flowering plant.

Methods

RNA Sequencing.

To identify sex-linked genes in R. hastatulus, we sequenced transcriptomes from parents and F1 progeny from two within-population crosses, one from a population with XY males (Many, LA; LA-MAN) and one from a population with XY1Y2 males (Branchville, SC; SC-BRA). We extracted RNA from leaf tissue using Spectrum Plant Total RNA kits (Sigma-Aldrich), and the isolation of mRNA and cDNA synthesis was conducted according to standard Illumina RNAseq procedures. Sequencing was conducted on the Illumina GAII platform for XX/XY parental samples with 80-bp end reads at the Center for the Analysis of Genome Evolution and Function (University of Toronto) and on the Illumina HiSeq platform by the Genome Quebec Innovation Center (GQIC) with 150-bp end reads for XX/XY1Y2 parental samples. F1 samples were sequenced by multiplexing and barcoding six male and six female samples from each cross on a separate Illumina HiSeq lane with 150-bp end reads at the GQIC. Samples used for validation (see below; SNP segregation analysis and ascertaining sex linkage) were sequenced by barcoding and multiplexing on an Illumina HiSeq lane with 150-bp end reads at the GQIC. We also obtained 150-bp end RNAseq data for the transcriptome of one R. bucephalophorus individual from Spain, which was also sequenced at the GQIC with 150-bp end reads. This species has no sex chromosomes and was used as an outgroup.

Assembly of R. hastatulus Transcriptomes.

We assembled a reference transcriptome de novo using Velvet [version 1.2.07 (42)] and Oases [version 0.2.08 (43)] and pooled paired end reads from six F1 females of the XY1Y2 system. Using this as the reference transcriptome facilitated identification of sex-linked genes shared between the XY and XY1Y2 systems (as discussed in the next section). Before assembly, we trimmed the data to remove reads <50 bp, and VelvetOptimizer (version 2.2.4) was used to choose the best k-mer size for each individual transcript. To avoid missing low-coverage transcripts, the final total number of bases in each assembly was used to evaluate the best k-mer size, which was 43. Oases (version 0.2.08) was then run under default parameters. For each set of transcript isoforms, the longest was chosen as the final transcript. This reference assembly yielded 38,828 contigs (N50 = 2,089, total length = 44,585,937 bp). For the outgroup R. bucephalophorus, the assembly was run with the same pipeline, yielding a best k-mer length of 43 and 35,525 contigs (N50 = 1923, total length = 38,120,382 bp).

SNP Segregation Analysis and Ascertaining Sex Linkage.

To assign sex linkage to assembled contigs in which nucleotide variants were identified, we mapped reads from both XX/XY and XX/ XY1Y2 samples to the reference transcriptome, assembled using reads from females of the XY1Y2 system. We conducted mapping using the Burrows–Wheeler Aligner [release 0.6.2-r126 (44)], followed by Stampy [release 1.0.20 (45)] for mapping more divergent reads. We used Picard tools (release 1.78, http://picard.sourceforge.net) to modify mapping output into the format required for the Genome Analysis Toolkit [GATK, version 2.1-11 (46)] variant calling software. We then conducted segregation analysis on both systems separately (SI Appendix) to obtain the set of sex-linked genes shared between the XY and XY1Y2 systems (referred to as the old sex-linked genes) and those that were unique to the XY1Y2 system (referred to as young sex-linked genes). The number of sex-linked genes identified as a function of the number of diagnostic polymorphisms is shown in SI Appendix (Table S1) for each system, along with the number shared between them. We required contigs to have four or more high-quality (Phred-scaled SNP quality score >60) SNPs, with genotype calls made for all parents and progeny from both sex chromosome systems and segregation patterns indicating sex linkage. Such sex-linked SNPs were identified based on either (i) the presence of a segregating Y-linked variant, where fathers and sons were heterozygous but mothers and daughters were homozygous, or (ii) the presence of a segregating X-linked variant, where fathers and daughters were heterozygous but mothers and sons were homozygous. To ensure that such X/Y contig assignments were reliable, we further filtered putative sex-linked contigs to include only those in which a segregating Y-linked variant was ascertained and showed the expected sex-specific genotypes in 12 population samples (SI Appendix). Such sites represent fixed differences between the X and Y. Similar approaches were used to identify hemizygous and autosomal genes (SI Appendix). All data parsing was done using Bash, R, or Perl. Scripts are available on request.

Comparisons of Sex-Linked Gene Expression.

The number of mRNA reads covering sex-linked SNPs in sex-linked contigs was counted from the SNP output from GATK to obtain estimates of the relative expression of X- and Y-linked alleles in males. This enabled us to compare young and old sex-linked genes by determining their respective Y/X expression ratio distributions (Fig. 2). Because the relative expression of X and Y alleles was estimated per gene in males (i.e., within individual samples), it is unnecessary to normalize the counts across samples, and these relative estimates were averaged across all males. Expression estimates for reference and alternative alleles at heterozygous sites in autosomes were obtained similarly using the numbers of mRNA reads covering SNPs across all samples in contigs where at least four such SNPs segregated as autosomal. For gene-level (rather than allele-specific) expression comparisons of sex-linked and autosomal genes across the sexes, we estimated expression in coding sequences using HTSeq (47) with the “intersection-nonempty” option. We focused on coding sequences and excluded putative untranslated regions due to observed high variance in read counts in these regions. Following HTSeq, we used DESeq (48) to conduct median differential coverage normalization and test for differential expression using the beta binomial distribution. Genes with a maximum total read count across samples <20 were removed to eliminate loci with little power to test for differential expression. The possibility of widespread chromosome-wide differences in gene expression may complicate normalized expression tests in this system; however, we found that normalization using just autosomes gave nearly identical results, with no consistent bias by sex (SI Appendix, Fig. S6). Significant expression differences between the sexes were assessed using both a 5% cutoff and a 10% false discovery rate correction (SI Appendix, Fig. S6).

Consensus Contigs for Molecular Evolutionary Analysis.

To analyze the molecular evolution of sex-linked genes, we generated X and Y consensus sequences based on parent and progeny genotypes using a phasing algorithm implemented in an R script (available upon request). For each nucleotide position within candidate sex-linked loci, we used sequencing coverage/quality information from parental samples to call sites that were identical on both X and Y copies. Sites were accepted as identical if both parental strains were called as homozygous and both had eightfold or greater sequencing coverage and genotype quality scores ≥60. Otherwise, sites were annotated as missing data. Candidate X/Y variants were initially identified as sites homozygous in the female parent and heterozygous in the male parent. Our method used a likelihood ratio approach to evaluate the relative support for the heterozygous site representing a true X/Y variant (male: XAYa, female: XAXA) vs. a segregating X variant in the male (male: XaYA, female: XAXA). To test the performance of this method, we implemented a simulation that calculated likelihood ratio tests for simulated parent/progeny genotype arrays in which variants were either heterozygous X variants in the male parent or true X/Y variants (SI Appendix, Figs. S1 and S2).

ORF Identification, Sequence Alignment, and Phylogeny Reconstruction.

We identified ORFs from consensus sequences for all X and Y consensus sequences and from orthologous R. bucephalophorus sequences (identified using a three-way reciprocal BLAST of contigs from each sex chromosome system plus the outgroup) using the “getorf” program from the EMBOSS suite (version 6.3.1) (49). For each locus, the X and Y ORFs from the XX/XY and XX/XY1Y2 systems, as well as the orthologs from the outgroup sequence, were aligned using MUSCLE (version 3.8.31) (50). We used ORF alignments to guide nucleotide alignments with in-frame gaps using a custom Perl script (available upon request). Maximum likelihood phylogenetic trees were produced from each nucleotide alignment using RaXML (version 7.0.4) (51).

Analysis of Evolutionary Rates.

We used phylogenies as starting trees for the analysis of evolutionary rate at synonymous and nonsynonymous sites using PAML (version 4.6) (52). For each locus, we fit a “free-ratio” model (model = 1), allowing dN/dS to vary across branches. Branch-specific silent site divergence, dN/dS ratios, and tree topologies were then extracted and analyzed in R using the “phytools” package (53). For loci in which X and Y sequences were monophyletic across the sex chromosome systems, we estimated dN/dS as the average of the population-specific and ancestral branches, weighted by the corresponding dS values. For all other comparisons, only values at terminal branches were considered. For each alignment, we also used a modified version of Polymorphorama (54) to count the number of parsimony-estimated lineage-specific changes (synonymous, nonsynonymous, preferred→unpreferred, unpreferred→preferred) on the X and Y sequences, using the outgroup sequence to polarize changes. We analyzed the two sex chromosome systems separately for this analysis in a three-way alignment of X, Y, and outgroup.

Supplementary Material

Supporting Information

Acknowledgments

We thank Deborah Charlesworth for helpful advice, discussion, and comments; María Talavera Solís for seeds of R. bucephalophorus; and two anonymous reviewers for their comments on an earlier version of the manuscript. This research was funded by Natural Sciences and Engineering Research Council of Canada Discovery grants (to S.C.H.B. and S.I.W.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequences reported in this paper have been deposited in GenBank's Short Read Archive (SRA) [accession nos. SRP041588 (Rumex hastatulus) and SRP041613 (Rumex bucephalophorus)].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1319227111/-/DCSupplemental.

References

  • 1.Charlesworth B, Charlesworth D. The degeneration of Y chromosomes. Philos Trans R Soc Lond B Biol Sci. 2000;355(1403):1563–1572. doi: 10.1098/rstb.2000.0717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Charlesworth B. The evolution of chromosomal sex determination and dosage compensation. Curr Biol. 1996;6(2):149–162. doi: 10.1016/s0960-9822(02)00448-7. [DOI] [PubMed] [Google Scholar]
  • 3.Bachtrog D. Y-chromosome evolution: Emerging insights into processes of Y-chromosome degeneration. Nat Rev Genet. 2013;14(2):113–124. doi: 10.1038/nrg3366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Liu Z, et al. A primitive Y chromosome in papaya marks incipient sex chromosome evolution. Nature. 2004;427(6972):348–352. doi: 10.1038/nature02228. [DOI] [PubMed] [Google Scholar]
  • 5.Filatov DA. Evolutionary history of Silene latifolia sex chromosomes revealed by genetic mapping of four genes. Genetics. 2005;170(2):975–979. doi: 10.1534/genetics.104.037069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Yin T, et al. Genome structure and emerging evidence of an incipient sex chromosome in Populus. Genome Res. 2008;18(3):422–430. doi: 10.1101/gr.7076308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Spigler RB, Lewers KS, Main DS, Ashman TL. Genetic mapping of sex determination in a wild strawberry, Fragaria virginiana, reveals earliest form of sex chromosome. Heredity (Edinb) 2008;101(6):507–517. doi: 10.1038/hdy.2008.100. [DOI] [PubMed] [Google Scholar]
  • 8.Peichel CL, et al. The master sex-determination locus in threespine sticklebacks is on a nascent Y chromosome. Curr Biol. 2004;14(16):1416–1424. doi: 10.1016/j.cub.2004.08.030. [DOI] [PubMed] [Google Scholar]
  • 9.Kaminker JS, et al. The transposable elements of the Drosophila melanogaster euchromatin: A genomics perspective. Genome Biol. 2002;3(12):H0084 Research 0084.1–0084.20. doi: 10.1186/gb-2002-3-12-research0084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Carvalho AB. Origin and evolution of the Drosophila Y chromosome. Curr Opin Genet Dev. 2002;12(6):664–668. doi: 10.1016/s0959-437x(02)00356-8. [DOI] [PubMed] [Google Scholar]
  • 11.Carvalho AB, et al. Y chromosome and other heterochromatic sequences of the Drosophila melanogaster genome: How far can we go? Genetica. 2003;117(2-3):227–237. doi: 10.1023/a:1022900313650. [DOI] [PubMed] [Google Scholar]
  • 12.Hughes JF, et al. Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature. 2010;463(7280):536–539. doi: 10.1038/nature08700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hughes JF, et al. Strict evolutionary conservation followed rapid gene loss on human and rhesus Y chromosomes. Nature. 2012;483(7387):82–86. doi: 10.1038/nature10843. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lahn BT, Page DC. Four evolutionary strata on the human X chromosome. Science. 1999;286(5441):964–967. doi: 10.1126/science.286.5441.964. [DOI] [PubMed] [Google Scholar]
  • 15.Ross MT, et al. The DNA sequence of the human X chromosome. Nature. 2005;434(7031):325–337. doi: 10.1038/nature03440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bergero R, Charlesworth D. Preservation of the Y transcriptome in a 10-million-year-old plant sex chromosome system. Curr Biol. 2011;21(17):1470–1474. doi: 10.1016/j.cub.2011.07.032. [DOI] [PubMed] [Google Scholar]
  • 17.Chibalina MV, Filatov DA. Plant Y chromosome degeneration is retarded by haploid purifying selection. Curr Biol. 2011;21(17):1475–1479. doi: 10.1016/j.cub.2011.07.045. [DOI] [PubMed] [Google Scholar]
  • 18.Gschwend AR, et al. Rapid divergence and expansion of the X chromosome in papaya. Proc Natl Acad Sci USA. 2012;109(34):13716–13721. doi: 10.1073/pnas.1121096109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wang J, et al. Sequencing papaya X and Yh chromosomes reveals molecular basis of incipient sex chromosome evolution. Proc Natl Acad Sci USA. 2012;109(34):13710–13715. doi: 10.1073/pnas.1207833109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bachtrog D. Expression profile of a degenerating neo-Y chromosome in Drosophila. Curr Biol. 2006;16(17):1694–1699. doi: 10.1016/j.cub.2006.07.053. [DOI] [PubMed] [Google Scholar]
  • 21.Kaiser VB, Zhou Q, Bachtrog D. Nonrandom gene loss from the Drosophila miranda neo-Y chromosome. Genome Biol Evol. 2011;3:1329–1337. doi: 10.1093/gbe/evr103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhou Q, Bachtrog D. Chromosome-wide gene silencing initiates Y degeneration in Drosophila. Curr Biol. 2012;22(6):522–525. doi: 10.1016/j.cub.2012.01.057. [DOI] [PubMed] [Google Scholar]
  • 23.Zhou Q, Bachtrog D. Sex-specific adaptation drives early sex chromosome evolution in Drosophila. Science. 2012;337(6092):341–345. doi: 10.1126/science.1225385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Navajas-Pérez R, et al. The evolution of reproductive systems and sex-determining mechanisms within Rumex (Polygonaceae) inferred from nuclear and chloroplastidial sequence data. Mol Biol Evol. 2005;22(9):1929–1939. doi: 10.1093/molbev/msi186. [DOI] [PubMed] [Google Scholar]
  • 25.Smith BW. The evolving karyotype of Rumex hastatulus. Evolution. 1964;18(1):93–104. [Google Scholar]
  • 26.Yamada K, et al. Empirical analysis of transcriptional activity in the Arabidopsis genome. Science. 2003;302(5646):842–846. doi: 10.1126/science.1088305. [DOI] [PubMed] [Google Scholar]
  • 27.Bergero R, Forrest A, Kamau E, Charlesworth D. Evolutionary strata on the X chromosomes of the dioecious plant Silene latifolia: Evidence from new sex-linked genes. Genetics. 2007;175(4):1945–1954. doi: 10.1534/genetics.106.070110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Nam K, Ellegren H. The chicken (Gallus gallus) Z chromosome contains at least three nonlinear evolutionary strata. Genetics. 2008;180(2):1131–1136. doi: 10.1534/genetics.108.090324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Orr HA, Kim Y. An adaptive hypothesis for the evolution of the Y chromosome. Genetics. 1998;150(4):1693–1698. doi: 10.1093/genetics/150.4.1693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hill WG, Robertson A. The effect of linkage on limits to artificial selection. Genet Res. 1966;8(3):269–294. [PubMed] [Google Scholar]
  • 31.Charlesworth B. Model for evolution of Y chromosomes and dosage compensation. Proc Natl Acad Sci USA. 1978;75(11):5618–5622. doi: 10.1073/pnas.75.11.5618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Skaletsky H, et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature. 2003;423(6942):825–837. doi: 10.1038/nature01722. [DOI] [PubMed] [Google Scholar]
  • 33.Degner JF, et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics. 2009;25(24):3207–3212. doi: 10.1093/bioinformatics/btp579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Mank JE. Sex chromosome dosage compensation: Definitely not for everyone. Trends Genet. 2013;29(12):677–683. doi: 10.1016/j.tig.2013.07.005. [DOI] [PubMed] [Google Scholar]
  • 35.Muyle A, et al. Rapid de novo evolution of X chromosome dosage compensation in Silene latifolia, a plant with young sex chromosomes. PLoS Biol. 2012;10(4):e1001308. doi: 10.1371/journal.pbio.1001308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Malone JH, et al. Mediation of Drosophila autosomal dosage effects and compensation by network interactions. Genome Biol. 2012;13(4):r28. doi: 10.1186/gb-2012-13-4-r28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Birchler JA. A study of enzyme activities in a dosage series of the long arm of chromosome one in maize. Genetics. 1979;92(4):1211–1229. doi: 10.1093/genetics/92.4.1211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Devlin RH, Holm DG, Grigliatti TA. Autosomal dosage compensation in Drosophila melanogaster strains trisomic for the left arm of chromosome 2. Proc Natl Acad Sci USA. 1982;79(4):1200–1204. doi: 10.1073/pnas.79.4.1200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Sun L, et al. Dosage compensation and inverse effects in triple X metafemales of Drosophila. Proc Natl Acad Sci USA. 2013;110(18):7383–7388. doi: 10.1073/pnas.1305638110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Sun L, et al. Differential effect of aneuploidy on the X chromosome and genes with sex-biased expression in Drosophila. Proc Natl Acad Sci USA. 2013;110(41):16514–16519. doi: 10.1073/pnas.1316041110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Wright SI, Yau CBK, Looseley M, Meyers BC. Effects of gene expression on molecular evolution in Arabidopsis thaliana and Arabidopsis lyrata. Mol Biol Evol. 2004;21(9):1719–1726. doi: 10.1093/molbev/msh191. [DOI] [PubMed] [Google Scholar]
  • 42.Zerbino DRD, Birney EE. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Schulz MHM, Zerbino DRD, Vingron MM, Birney EE. Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics. 2012;28(8):1086–1092. doi: 10.1093/bioinformatics/bts094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Lunter GG, Goodson MM. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 2011;21(6):936–939. doi: 10.1101/gr.111120.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.McKenna AA, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Anders S, Pyl PT, Huber W. HTSeq—A Python framework to work with high-throughput sequencing data. bioRxiv. 2014 doi: 10.1101/002824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Rice P, Longden I, Bleasby A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000;16(6):276–277. doi: 10.1016/s0168-9525(00)02024-2. [DOI] [PubMed] [Google Scholar]
  • 50.Edgar RC. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Stamatakis A. RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22(21):2688–2690. doi: 10.1093/bioinformatics/btl446. [DOI] [PubMed] [Google Scholar]
  • 52.Yang Z. PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
  • 53.Revell LJ. Phytools: An R package for phylogenetic comparative biology (and other things) Methods Ecol Evol. 2012;3:217–223. [Google Scholar]
  • 54.Andolfatto P. Hitchhiking effects of recurrent beneficial amino acid substitutions in the Drosophila melanogaster genome. Genome Res. 2007;17(12):1755–1762. doi: 10.1101/gr.6691007. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES