Skip to main content
Nature Portfolio logoLink to Nature Portfolio
. 2024 Nov 25;10(12):1944–1954. doi: 10.1038/s41477-024-01858-x

ZW sex chromosome structure in Amborella trichopoda

Sarah B Carey 1, Laramie Aközbek 1,2, John T Lovell 1,3, Jerry Jenkins 1, Adam L Healey 1, Shengqiang Shu 3, Paul Grabowski 1,3, Alan Yocca 1, Ada Stewart 1, Teresa Jones 1, Kerrie Barry 3, Shanmugam Rajasekar 4, Jayson Talag 4, Charlie Scutt 5, Porter P Lowry II 6,7, Jérôme Munzinger 8, Eric B Knox 9, Douglas E Soltis 10, Pamela S Soltis 10, Jane Grimwood 1,3, Jeremy Schmutz 1,3, James Leebens-Mack 11,, Alex Harkess 1,
PMCID: PMC11649558  PMID: 39587314

Abstract

Sex chromosomes have evolved hundreds of times across the flowering plant tree of life; their recent origins in some members of this clade can shed light on the early consequences of suppressed recombination, a crucial step in sex chromosome evolution. Amborella trichopoda, the sole species of a lineage that is sister to all other extant flowering plants, is dioecious with a young ZW sex determination system. Here we present a haplotype-resolved genome assembly, including highly contiguous assemblies of the Z and W chromosomes. We identify a ~3-megabase sex-determination region (SDR) captured in two strata that includes a ~300-kilobase inversion that is enriched with repetitive sequences and contains a homologue of the Arabidopsis METHYLTHIOADENOSINE NUCLEOSIDASE (MTN1-2) genes, which are known to be involved in fertility. However, the remainder of the SDR does not show patterns typically found in non-recombining SDRs, such as repeat accumulation and gene loss. These findings are consistent with the hypothesis that dioecy is derived in Amborella and the sex chromosome pair has not significantly degenerated.

Subject terms: Evolution, Genetics


The haplotype-resolved genome in Amborella trichopoda addresses outstanding questions on the structure and gene content of the recently evolved ZW sex chromosomes.

Main

The evolution of separate sexes, or dioecy, is a rare trait in angiosperms, having been identified in just 5–10% of species1. At the same time, dioecy has evolved hundreds of times independently across the flowering plant tree of life2, making flowering plants ideal for examining the evolution of sex chromosomes over both deep and shallow time scales. Comparative investigations of sex chromosomes rely on high-quality genome assemblies2, and while the availability of genomes for dioecious species has increased, there are only a few where the structure of the sex chromosome pair has been well characterized. While divergence between X and Y sex chromosomes has been described in a growing number of angiosperm species2,3, investigations of what some consider to be less common ZW systems can shed new light on the dynamics and consequences of sex chromosome evolution.

Since its discovery as the likely sister lineage to all other living angiosperms, Amborella trichopoda (Amborellaceae; hereafter, Amborella)47 has served as a pivotal taxon for investigating the origin and early diversification of flowering plants8,9. Amborella is an understory shrub or small tree endemic to New Caledonia and the sole extant species in the Amborellales. The flowers of Amborella are actinomorphic and have a perianth of undifferentiated tepals, which are characteristics shared with the reconstructed ancestral flower (Fig. 1)9. Importantly, however, Amborella is dioecious10 with ZW sex chromosomes that evolved after the lineage diverged from other flowering plants11. This implies that dioecy in Amborella is derived from a hermaphroditic mating system and that the ancestral angiosperm had perfect flowers, in agreement with ancestral state reconstructions9. Substantial progress has been made in several angiosperm species to identify the genes involved in the evolution of dioecy1217, but the molecular basis in Amborella remains unknown. Here we present a haplotype-resolved assembly of the Amborella genome and compare highly contiguous Z and W sex chromosome assemblies to address outstanding questions about their structure and gene content, including putative sex-determining genes.

Fig. 1. Amborella and its genome structure.

Fig. 1

a,b, Female (a) and male (b) Amborella flowers. c,d, The Amborella genome (c) and chromosome 9 (Chr09, d) are typical of flowering plants: gene-rich chromosome arms and repeat-dense, large pericentromeric regions. Gene positions were extracted from the protein-coding gene annotations, repeats from EDTA and exact matches of 536,985 female-specific k-mers (W-mers). Syntenic mapping was calculated using AnchorWave and processed using SyRI, only plotting inversions, insertions and deletions >10 kb. Visualization of synteny was accomplished with GENESPACE and sliding windows with gscTools. The sex-determination region of Chr09 with W-mers is highlighted in d. All chromosomes in haplotype 1 and all but four in haplotype 2 have both left and right telomeres in the assembly (flagged with red *), defined as a region of ≥150 bp made up of ≥90% plant telomere k-mers (CCCGAAA, CCCTAAA, RC) separated by no more than 100 bp. CDS, coding sequence.

Results

Improved genome assembly and annotation of Amborella

The Amborella reference genome has been a central anchor for comparative investigations of gene family and gene structure evolution across angiosperms. Despite its demonstrated utility, the 2013 Amborella genome used primarily short sequencing reads, which cannot fully resolve repetitive regions18. The repeat-derived gaps were filled in a long-read assembly11, but both biological haplotypes were collapsed into a single sequence representation. Despite the higher contiguity, the 2022 genome offers limited information regarding sex-determination regions (SDRs) because in this assembly, the Z and W chromosomes are a chimaeric mix represented as a single chromosome11.

To build a haplotype-resolved genome assembly for Amborella cv. Santa Cruz 75, we used a combination of PacBio HiFi (mean coverage = 58.81× per haplotype; mean read length = 22,900 bp) and Phase Genomics Hi-C (coverage = 42.31×; Supplementary Table 1) sequencing technologies. The final haplotype 1 (HAP1) and 2 (HAP2) assemblies include 708.1 Mb in 59 contigs (contig N50 = 36.3 Mb; L50 = 7) and 700.5 Mb in 45 contigs (contig N50 = 44.5 Mb; L50 = 7), respectively; 99.69% and 99.87% of the assembled sequence is contained in the 13 largest scaffolds for HAP1 and HAP2, respectively, corresponding to the expected chromosome number19 (Supplementary Fig. 1). We found the Merqury k-mer completeness20 of HAP1 to be 95.4% (QV 63) and of HAP2 to be 95.3% (QV 55), and the combined assemblies exhibit 98.8% completeness (QV 57). Consistent with earlier assemblies, we annotated repeats and found that they represent ~56% of the sequence for both haplotypes (Fig. 1 and Supplementary Table 2)18. To annotate gene models, we used a combination of RNA-seq and Iso-seq (~757 million 2 × 150 read pairs, ~825 K full-length transcripts). We annotated 21,800 gene models in HAP1 and 21,721 in HAP2, with embryophyte BUSCOs of 98.6% and 98.8%, respectively—an increase from 85.5% in the 2013 release18. Overall, the new assemblies represent a great improvement in the Amborella genome reference, resolving most of the previous gaps (Supplementary Fig. 2 and Table 2).

Amborella’s ancient divergence ~140 million years ago (Ma)21 from all other living angiosperms provides an opportunity to examine conserved features that were probably present in the ancestral genome of all flowering plants. For example, the repeat-dense pericentromeric region and gene-dense chromosome arms of Amborella (Fig. 1) mirror those of most angiosperm genomes, in stark contrast to the more uniform gene and repeat density of most conifers, ferns and mosses2224. The pericentromeric regions are enriched in long terminal repeats (LTRs), specifically Ty3 and Ty1 elements, as is often seen in other monocentric angiosperms25,26. Interestingly, unlike many previously examined sex chromosomes, the Amborella Z/W do not stand out as notable exceptions in terms of gene or repeat density (Fig. 1).

Identification of the phased Amborella sex chromosomes

Sex chromosomes have unique inheritance patterns relative to autosomes. In a ZW system, the non-recombining SDR of the W chromosome is only inherited by females, while the remaining pseudoautosomal region (PAR) recombines freely and is expected to show a similar lack of divergence between the sexes as the autosomes. Identification of the boundary between the SDR and PAR of sex chromosomes is non-trivial, and PAR/SDR boundaries have been shown to vary among populations in some species27,28. Standard approaches for boundary identification employ combinations of methodologies such as sex-biased read coverage and population genomic analyses29.

To delimit the PAR/SDR boundary, we performed a k-mer analysis12,30 to identify sequences that are unique to the Amborella SDR (henceforth, W-mers), using four different sampling strategies (Supplementary Methods). We found that the W-mers densely mapped to Chr09 at ~44.32–47.26 Mb of HAP1 (Figs. 1 and 2, and Supplementary Figs. 36), supporting its identity as the W chromosome. This location is consistent with previous analyses11, although we find that assessing W-mers to a haplotype-resolved assembly narrows the estimated size of the SDR from ~4 Mb to 2.94 Mb (Fig. 2 and Supplementary Fig. 7). Importantly, the W-mers show consistent coverage on Chr09 in HAP1, with low and sporadic coverage along any other chromosome or unincorporated scaffold in the assembly (for example, when using the Island-wide sampling, 97.73% of the mapped W-mers are within the SDR; Supplementary Figs. 36 and Table 3). In contrast to the chimaeric Z/W in the previous assembly, the resulting sex chromosome assemblies are nearly complete, with only four unresolved gaps in the SDR (zero gaps in the homologous region on the Z (HZR) chromosome), and are fully phased (Supplementary Fig. 7).

Fig. 2. Sex chromosome location in Amborella.

Fig. 2

a,b, W-mer coverage in the SDR (a) and HZR (b) using four different sampling strategies for isolates. c,d, SDR (c) and HZR (d) location and their proximity to the Chr09 centromere. Ty3 elements (dark blue) are often enriched in the pericentromeric regions of plants and correspond to the low-complexity block of tandem repeat arrays (grey shading) that also contain the high-complexity centromeric block, indicated by the satellite monomer density (light blue). Gene density (orange) also predictably decreases near the pericentromeric region. The SDR (red) is notably outside of the putative pericentromeric region and distant from the centromere. DTH, PIF Harbinger terminal inverted repeat transposon; DTA, hAT terminal inverted repeat transposon; DTM, Mutator terminal inverted repeat transposon.

A key characteristic of sex chromosomes is the suppressed recombination of the SDR, and in many species, structural variants have been identified as the causal mechanism. To examine this in Amborella, we first used genome alignments to identify the HZR. The HZR is located on Chr09 of HAP2 at 44.52–47.12 (~2.60 Mb; Supplementary Fig. 8), suggesting that the SDR is only 340 kb larger than the HZR, which is consistent with the observed cytological homomorphy of the ZW pair19. In the SDR, we found evidence for a ~292-kb inversion located ~20 kb within the beginning of the boundary and containing the majority of the W-specific sequence (Fig. 1b and Supplementary Fig. 9). We could not, however, find evidence for inversions or other large structural variants surrounding the remaining portion of the SDR. Instead, the Z and W chromosomes are highly syntenic with one another, similar to the autosomes (Fig. 1 and Supplementary Fig. 8). We investigated other potential mechanisms for suppressed recombination, such as proximity to centromeres, where the existing low recombination has been shown to facilitate SDR evolution in some species31. In Amborella, the SDR is not located near the centromere; rather, it is ~1.82 Mb away from the Ty3-retrotransposon-rich pericentromeric region (Fig. 2). In the absence of obvious structural variants encompassing the SDR, it suggests that Amborella has a non-canonical mechanism to enforce non-recombination between the Z and the W chromosomes.

The Amborella sex chromosomes are evolutionarily young

Amborella’s sex chromosomes have previously been shown to have evolved after the lineage split from other living flowering plants11. With our phased Z/W pairs, we can better determine Z- and W-linked genes, providing a more confident estimate of the age of the SDR, and examine gene gain events. A classic signature of multiple recombination suppression events is a stepwise pattern of synonymous substitutions (Ks) of neighbouring genes on the sex chromosomes32. Genes captured into the SDR in the same event are expected to have similar levels of Ks (that is, evolutionary strata), whereas the older strata will have higher divergence between the Z and W compared with younger strata32. Understanding this timing of gene gain is essential to understanding the genetic mechanism for sex determination, because the candidate sex-determining genes are likely to have ceased recombining first (barring turnovers29).

To examine gene gain in the Amborella SDR, we calculated Ks of one-to-one orthologues on the W and Z chromosomes (that is, gametologues). We compared the Ks values of 45 identifiable gametologues to 1,397 one-to-one orthologues in the PARs. We found that Ks varies across the SDR–HZR portion of the sex chromosomes (0.002–0.20; mean Ks = 0.0298, s.d. = 0.032) and is significantly higher than Ks in the PARs (mean Ks = 0.004, s.d. = 0.019; Kruskal–Wallis P < 0.00001) (Supplementary Fig. 10), consistent with the expectation that the SDR is diverging from the HZR on the Z chromosome. Interestingly, the gametologue pair with the highest Ks within the SDR is a homologue of Arabidopsis METHYLTHIOADENOSINE NUCLEOSIDASE MTN1-2, a gene involved in fertility, suggesting that it resides in the oldest portion of the SDR; notably, the location of the W-linked MTN1-2 homologue is within the SDR inversion.

We found that the Ks values have two distinct steps, with the higher Ks values in the region corresponding to the inversion, suggesting two strata of gene capture into the SDR (Fig. 3). Defining the precise boundary between strata without obvious structural variants can be a challenge. To delineate stratum one (S1) from two (S2), we used a change-point analysis on Ks and the average nucleotide differences between sampled females and males (Nei’s dXY), which suggested that S1 ends at ~46.08 Mb (Supplementary Fig. 11). We found Ks to be significantly different between the strata (S1 mean Ks = 0.037, s.d. = 0.037, n = 25; S2 mean Ks = 0.021, s.d. = 0.023, n = 20; Mann–Whitney U, P = 0.0014) as was the extent of non-synonymous changes in proteins (Ka; Mann–Whitney U, P = 0.008; Fig. 3), supporting the inference of two strata. We also found dXY of genes to be significantly different (Mann–Whitney U, P < 2.6 × 10−6), higher in S1 (mean = 0.0169, s.d. = 0.007, n = 57) than in S2 (mean = 0.0089, s.d. = 0.006, n = 40). Using Ks, we also estimated the age of the SDR in Amborella. Following the previously applied approach11, we found S1 to have evolved ~4.97 million years ago while S2 is nearly half as old at ~2.41 Ma. These analyses indicate that the Amborella sex chromosomes are evolutionarily young, similar to several well-characterized XY systems3, and further suggest that the sex chromosomes evolved well after the lineage split from the rest of all living angiosperms.

Fig. 3. Molecular evolution of the Amborella sex chromosomes.

Fig. 3

a, Evidence for two strata. For Ks, points above 0.06 were excluded. b,c, The repeat landscapes of the Amborella haplotypes 1 (b) and 2 (c) indicate similar patterns of expansion and minimal evidence of recent TE proliferation. Relative time was determined using the Kimura substitution level, with lower values closer to 0 representing more recent events and higher values approaching 40 representing older events. DTT, Tc1 Mariner terminal inverted repeat transposon; DTC, CACTA terminal inverted repeat transposon; LINE, long interspersed nuclear element; MITE miniature inverted-repeat transposable element.

The Amborella W shows little degeneration

The recent origin of the Amborella sex chromosomes provides an opportunity to examine the early stages of their evolution. The lack of recombination in an SDR reduces the efficacy of natural selection and drives the accumulation of slightly deleterious mutations33,34. Two parallel signatures of deleterious mutations seen across independent evolutions of sex chromosomes are the accumulation of repeats and the loss of genes3538. However, the tempo of this process of degeneration is not well understood.

In the SDR of Amborella, we curiously do not find the expected patterns of repeat expansions found in other SDRs. At 51.66% repeat elements, the SDR percentage is lower than the genome average (56%) and 0.05% lower than the HZR, even when considering S1 and S2 separately (S1 = 52.13%; S2 = 50.98%; Supplementary Table 4). The only observed enrichment in repeats is within the inversion in S1, where we find more Ty3 LTRs (4.32% increase relative to the HZR; Fig. 2). Otherwise, only a slight distinction between the SDR and its HZR is evident: the SDR exhibits a marginal increase ranging between 0.01 and 0.13% in the density of some superfamily elements (Fig. 2 and Supplementary Table 4). We examined the distribution of the divergence values for intact LTRs as a proxy for their age39 but found no patterns of distinctly younger or older LTRs within the W or Z chromosome (Supplementary Fig. 12). Moreover, to assess genome-wide repeat expansion across the major transposable element (TE) superfamilies40, we used repeat landscapes, which showed a comparable pattern within the Z/W (Fig. 3 and Supplementary Fig. 13). These observations support previous characterization of TE insertions in the Amborella genome as being quite old with little proliferation over the last 5 million years18. It has been proposed that a loss of active transposases or silencing may be playing a role in reducing TE activity across the Amborella genome18, including the SDR.

Gene loss in an SDR has been hypothesized to contribute to the evolution of heteromorphy seen in many sex chromosome pairs41,42. In Amborella, of the 97 annotated models in the SDR and 84 in the HZR, 37 were W-specific and 24 Z-specific. To examine whether these models were missing from the other haplotype for technical or biological reasons, we also used dXY and presence–absence variation (PAV; Supplementary Table 5, 6) between the sexes to evaluate gene content. For most of the W-specific models, males showed presence, and dXY within females was comparable to that of identifiable gametologues (mean dXY = 0.0136; Supplementary Table 7). Only seven models showed absence in coverage in males (dXY = 0 in females), suggesting conservatively that these represent W-specific genes, four of which are in the SDR inversion. Similarly, we identified only six Z-specific gene models. These analyses suggest that the Z and W have similar numbers of haplotype-specific genes and that the SDR has experienced similar levels of gene loss as the HZR.

Together, these results provide little evidence that degenerative processes, associated with cessation of recombination, have occurred in the Amborella SDR. This region is younger than that of Rumex (5–10 Ma43) and Silene (10 Ma44), which both show signatures of degeneration38,45. However, in Spinacia oleracea, a younger SDR (2–3 Ma) does show signs of degeneration46,47. The tempo of degeneration is apparently slower in Amborella, and there has not been sufficient time for gene loss or an accumulation of repeats as a consequence of the loss of recombination. One possible reason for the slower relative tempo is that most analyses of degeneration have focused on Y chromosomes, which are expected to degenerate faster than Ws due to male-biased mutation rates and stronger sexual selection48. Comparisons to other W chromosomes across independent origins are necessary to see whether this holds true.

Candidate sex-determining genes in Amborella

ZW sex chromosomes have been less well characterized in plants than in animals; thus, Amborella can provide unique insights regarding the genetic mechanisms associated with their evolution. The two-gene model for sex chromosome evolution associated with a transition from hermaphroditism to dioecy posits that distinct genes with antagonistic impacts on female and male function experience strong selection for tight linkage (that is, loss of recombination)49. Under this model, evolution of a ZW sex chromosome pair requires a dominant mutation causing male sterility arising on a proto-W chromosome, followed by a recessive loss-of-female-function mutation on the proto-Z (assuming a gynodioecious intermediate)49. As more sex chromosome pairs have been assembled, new models50,51 have emerged that could be congruent with the data presented here, including the possibility that recombination suppression around a sterility locus could expand due to the sheltering of deleterious mutations.

Identification of these sex-determining genes requires an understanding of when sterility arises in the carpel and stamen developmental pathways. In Amborella, ontogenetic differences between female and male flowers are seen early in development52. Whereas male flowers produce an average of 12 stamens spiralling into the centre of the flower, female flowers typically initiate a few staminodes just inside the tepals, but carpel initiation replaces staminode initiation as organ development proceeds towards the centre of the flower52 (Fig. 1). To identify candidate sex-determining genes, we examined differential expression between female and male flower buds during stage 5/6 of flower development, when carpels, stamens and microsporangia develop11,52,53. We found 1,777 significantly differentially expressed genes at an adjusted P value greater than 0.05. Of these, 34 are in the SDR, several of which are well-known flower development genes, including homologues of MTN1-2, WUSCHEL (WUS), LONELY GUY (LOG), MONOPTEROS/Auxin Response Factor 5 (MP/ARF5) and small auxin upregulated RNA (SAUR) gene families (Supplementary Fig. 14, and Tables 8 and 9). We found that ambMTN and ambLOG had higher transcript abundance in females, while ambWUS, ambMP and ambSAUR had greater expression in males. To further examine the sex-specific expression of SDR genes, we used the EvoRepro database (https://evorepro.sbs.ntu.edu.sg/), which has transcriptome data for 16 different tissue types for Amborella54. We contrasted female and male buds and flowers and found three genes with male-biased transcript abundance: ambWUS and a DUF827 gene in buds and ambLOG in flowers, the latter differing in which sex has higher abundance from the analyses using stage 5/6 flowers. Given the known functions of these genes in Arabidopsis flower development, they are strong candidates for investigation of sex determination in Amborella.

While functional analyses are not currently possible in Amborella, comparisons to other species implicate the function of candidate genes that may be playing roles in Amborella sex determination. WUS encodes a homoeobox transcription factor that is required for the maintenance of the floral meristem and has been shown to influence gynoecium and anther development55,56. In Arabidopsis, WUS knockouts have sepals, petals, a single stamen and no carpel57. WUS has also been implicated in sex determination or shown sex-specific expression in several species that have unisexual flowers. In monoecious castor bean (Ricinus communis), WUS expression was only found in the shoot apical meristem of male flowers58, and in cucumbers (Cucumis sativus), WUS expression is three times greater in the carpel primordia of male flowers than female flowers59. In Silene, gynoecium suppression is controlled by the WUSCHEL-CLAVATA feedback loop16. However, we do not see male-biased expression of the CLV3 orthologue in Amborella, but we do see female-biased transcript abundance of the Amborella CLE40 orthologue. In Arabidopsis, WUS promotes CLV3 expression in the central zone of the inflorescence meristem while suppressing CLE40 expression in the peripheral zone60. It is possible that the smaller floral meristem seen in female development relative to male floral meristems is due to reduced ambWUS expression driving increased ambCLE40 expression and encroachment of peripheral zone cells into the central zone of the floral meristem. The role of WUS in maintaining meristematic zonation, coupled with its position in S1 in the SDR, makes ambWUS a strong candidate for playing some role in gynoecium suppression. Another strong candidate is ambLOG. LOG mutants were originally characterized in rice as producing floral phenotypes with a single stamen and no carpels61; in date palms (Phoenix dactylifera), a LOG-like gene was identified as a candidate Y-chromosome-linked female suppression gene13. In Amborella, ambLOG showed greater expression in females in the stage 5/6 data but was male-biased when considering all 16 tissues in the EvoRepro dataset. This switch in sex bias, and the fact ambLOG is located in the younger stratum of SDR (S2), suggest that differential ambWUS (and ambCLE40) expression may have been a first step in the divergence of male and female flower development. Similar to ambLOG, the ambMP and ambSAUR genes were captured in S2, and their functions in Arabidopsis suggest other roles in sex-specific development. MP has been shown to be involved with apical patterning of the embryo axis62,63. SAURs are a large gene family and in general play a role in cell elongation64, including in pollen tube growth65, stamen filament elongation66 and pistil growth67. Without functional validation in Amborella, we cannot rule out the possibility of any of these genes, although based on the data available, ambWUS may be the strongest candidate for spurring divergence in male and female flower development.

The significant difference in gene expression of ambMTN is especially interesting, given that it is the gene model with the highest Ks value that is located in the SDR inversion. MTN1-2 genes encode 5′-methylthioadenosine (MTA) nucleosidase68, and double mutant mtn1-1mtn2-1 flowers in Arabidopsis have indehiscent anthers and malformed pollen grains69. Double mutants also affected carpels and ovules, although the structures were aberrant but not necessarily non-functional, and 10% looked like wild type69. The observed anther phenotype in Arabidopsis is consistent with the staminode development in female flowers in Amborella, and together these lines of evidence suggest that ambMTN may be the male-sterility gene. On the basis of our analyses, we hypothesize that the W-linked ambMTN was the initial male-sterility mutation creating the proto-W, followed by a loss-of-function mutation on the W-ambWUS and a Z-copy shift to dosage-dependant gynoecium suppression. The genes we have identified here make ideal candidates for further functional genomic investigation and validation.

Discussion

Advances in sequencing technologies and assembly algorithms have enabled the construction of telomere-to-telomere genome assemblies for humans, including the X and Y sex chromosomes70,71. The sex chromosomes in humans and other mammals are often highly heteromorphic and can be the most challenging chromosomes to sequence and assemble72. Moreover, given their antiquity, it is not possible to reconstruct events dating back to the origin and early evolution of mammalian sex chromosomes. In some plants and animals, however, sex chromosomes have repeatedly evolved from different ancestral autosomes, with different sex-determining mutations2,3,73 and with various mechanisms to impede recombination between the sex chromosome pair. Here we show that we can fully phase structurally similar sex chromosomes within a heterogametic individual. Our analyses highlight the utility of phased sex chromosomes and diversity sequencing in developing models of sex chromosome evolution when experimental investigation of gene function is currently intractable. This research lays the foundation for examining sex chromosome evolution in all angiosperms.

Methods

DNA/RNA extraction, library prep and sequencing

We sequenced A. trichopoda (var. Santa Cruz 75) using a whole-genome shotgun sequencing strategy and standard sequencing protocols. High-molecular-weight DNA was extracted from young tissue using the protocol in ref. 74 with minor modifications. Flash-frozen young leaves were ground to a fine powder in a frozen mortar with liquid nitrogen, followed by very gentle extraction in a 2% CTAB buffer (that included proteinase K, PVP-40 and beta-mercaptoethanol) for 30 min to 1 h at 50 °C. After centrifugation, the supernatant was gently extracted twice with 24:1 chloroform:isoamyl alcohol. The upper phase was transferred to a new tube and 1/10th volume of 3 M sodium acetate was added, the solution gently mixed and DNA precipitated with isopropanol. The DNA precipitate was collected by centrifugation, washed with 70% ethanol, air dried for 5–10 min and dissolved thoroughly in an elution buffer at room temperature followed by RNAse treatment. DNA purity was measured with a Nanodrop, DNA concentration was measured with Qubit HS kit (Invitrogen), and DNA size was validated using the CHEF-DR II system (Bio-Rad). The A PacBio HiFi library was constructed using DNA that was sheared using a Diagenode Megaruptor 3 instrument. Libraries were constructed using an SMRTbell Template Prep Kit 2.0 and tightly sized on a SAGE ELF instrument (1–18 kb) to a final library average insert size of 24 kb. PacBio sequencing was completed using the SEQUEL II platform at the HudsonAlpha Institute for Biotechnology (Huntsville, Alabama), yielding 83.3 Gb of raw sequence with a total coverage of 58.81× per haplotype (Supplementary Table 12).

Illumina Hi-C sequencing for Santa Cruz 75 was conducted at Phase Genomics with a single 2 × 80 Dovetail Hi-C library (42.31×; Supplementary Table 1). DNA for the Illumina PCR-free library was extracted using a Qiagen DNeasy kit (Qiagen) and was sequenced at the HudsonAlpha Institute for Biotechnology. Illumina reads were sequenced on the Illumina NovaSeq 6000 platform using a 400-bp-insert TruSeq PCR-free fragment library (49.62×). Before assembly, Illumina fragment reads were screened for phix contamination. Reads composed of >95% simple sequence and those <50 bp after trimming for adapter and quality (q < 20) were removed. The final read set consists of 158,007,088 reads for a total of 49.62× of high-quality Illumina bases.

To annotate gene models, we generated RNA-seq and Iso-seq data for several stages of leaf, flower and fruit for Santa Cruz 75 and two male isolates, ABG 2006-2975 and ABG 2008-1967 (Supplementary Table 11). Total RNAs were extracted using a Qiagen RNeasy kit. The PacBio Iso-seq libraries were constructed using a PacBio Iso-Seq Express 2.0 kit. Libraries were either sized (0.66× bead ratio) or unsized (1.2× bead ratio) to give final libraries with average transcript sizes of 2 kb or 3 kb, respectively. Libraries were sequenced using polymerase V2.1 on a PacBio Sequel II platform. The RNA-seq libraries were constructed using an Illumina TruSeq Stranded mRNA Library Prep kit using standard protocols and sequenced using a NovaSeq 6000 Instrument PE150 to 40 million reads per library.

To identify the sex chromosomes, we additionally sequenced the whole genomes of 52 Amborella individuals sampled from natural populations (Supplementary Table 11). DNA extractions were performed using a standard CTAB protocol. Illumina sequencing was performed on NovaSeq and HiSeq platforms at RAPiD Genomics using a 2 × 150 paired-end library. The voucher specimens are deposited at the New Caledonia Herbarium in Nouméa (herbarium code: NOU) and Indiana University (IND). Existing data used to support this manuscript are found in Supplementary Table 11.

Genome assembly

The version 2.0 HAP1 and HAP2 assemblies were generated by assembling the 3,605,703 PacBio circular consensus sequencing (CCS) reads (58.81× per haplotype) using the HiFiAsm+HIC assembler75 and subsequently polished using RACON76. This approach produced initial assemblies of both haplotypes. The HAP1 assembly consisted of 1,522 scaffolds (1,522 contigs), with a contig N50 of 25.5 Mb and a total genome size of 800.6 Mb (Supplementary Table 13). The HAP2 assembly consisted of 1,043 scaffolds (1,043 contigs), with a contig N50 of 43.0 Mb and a total genome size of 773.5 Mb (Supplementary Table 13).

Hi-C Illumina reads from A. trichopoda isolate Santa Cruz 75 were separately aligned to the HAP1 and HAP2 contig sets with Juicer77, and chromosome-scale scaffolding was performed using 3D-DNA78. No misjoins were identified in either the HAP1 or HAP2 assemblies. The contigs were then oriented, ordered and joined together into 13 chromosomes per haplotype using the Hi-C data. A total of 31 joins was applied to the HAP1 assembly and 20 joins for the HAP2 assembly. Each chromosome join is padded with 10,000 Ns. Contigs terminating in telomeric sequence were identified using the (TTTAGGG)n repeat, and care was taken to make sure that the repeats were properly oriented in the production assembly. The remaining scaffolds were screened against bacterial proteins, organelle sequences and GenBank non-redundant database, and any scaffold found to be a contaminant was removed. After the chromosomes were formed, it was observed that some small (<20 kb) redundant sequences were present on adjacent contig ends within chromosomes. To resolve this issue, adjacent contig ends were aligned to one another using BLAT79, and duplicate sequences were collapsed to close the gap between them. A total of 5 adjacent contig pairs were collapsed in the HAP1 assembly and 4 in the HAP2 assembly.

Finally, homozygous single nucleotide polymorphisms (SNPs) and insertions/deletions (INDELs) were corrected in the HAP1 and HAP2 releases using ~49× of Illumina reads (2 × 150, 400-bp insert) by aligning the reads using BWA-MEM80 and identifying homozygous SNPs and INDELs with GATK’s UnifiedGenotyper tool81. A total of 465 homozygous SNPs and 15,763 homozygous INDELs were corrected in the HAP1 release, while a total of 473 homozygous SNPs and 17,208 homozygous INDELs were corrected in the HAP2 release. The final version 2.0 HAP1 release contained 707.9 Mb of sequence, consisting of 59 contigs with a contig N50 of 36.3 Mb and a total of 99.69% of assembled bases in chromosomes. The final version 2.0 HAP2 release contained 700.3 Mb of sequence, consisting of 45 contigs with a contig N50 of 44.5 Mb and a total of 99.87% of assembled bases in chromosomes.

Genome annotation

Transcript assemblies were made from ~757 M pairs of 2 × 150-stranded paired-end Illumina RNA-seq reads using PERTRAN, which conducts genome-guided transcriptome short-read assembly via GSNAP82 and builds splice alignment graphs after alignment validation, realignment and correction. To obtain 825 K putative full-length transcripts, ~20 M PacBio Iso-seq CCSs were corrected and collapsed by a genome-guided correction pipeline, which aligns CCS reads to the genome with GMAP82 with intron correction for small indels in splice junctions, if any, and cluster alignments when all introns are the same or have 95% overlap for a single exon. Subsequently, 563,694 transcript assemblies were constructed using PASA83 from expressed sequence tags (EST)s and RNA-seq transcript assemblies described above. Loci were determined by transcript assembly alignments and/or EXONERATE alignments of proteins from Arabidopsis thaliana, Glycine max, Sorghum bicolor, Oryza sativa, Lactuca sativa, Helianthus annuus, Cynara cardunculus, Selaginella moellendorffii, Physcomitrella patens, Nymphaea colorata, Solanum lycopersicum and Vitis vinifera, and Swiss-Prot eukaryote proteomes to the repeat-soft-masked A. trichopoda HAP1 genome using RepeatMasker84, with up to 2 kb extension on both ends unless extending into another locus on the same strand. Gene models were predicted by homology-based predictors, FGENESH+85, FGENESH_EST (similar to FGENESH+, but using EST to compute splice site and intron input instead of protein/translated open reading frame (ORF)), EXONERATE86, PASA assembly ORFs (in-house homology-constrained ORF finder) and AUGUSTUS87 trained by the high-confidence PASA assembly ORFs and with intron hints from short-read alignments. The best-scored predictions for each locus were selected using multiple positive factors, including EST and protein support, and one negative factor: overlap with repeats. The selected gene predictions were improved using PASA, and the optimal set was selected using several curated gene quality metrics88. We assessed the gene annotations using compleasm (v.0.2.6)89 using the Embryophyta database.

We further annotated repeats with EDTA (v.2.0.0)90 using the sensitive mode that runs RepeatModeler91. To identify tandem repeats, we used Tandem Repeats Finder (v.4.09.1)92 (parameters: 2 7 7 80 10 50 500 -f -d -m -h). We ran StainedGlass (v.0.5)93 to visualize the massive tandem repeat arrays for chromosomes in both haplotypes. To build the repeat landscapes for assessing recent expansion events, we followed the methods outlined in EDTA Github Issue #92: Draw Repeat Landscapes, utilizing a library generated from an independent annotation on the combined haplotypes with EDTA v.2.0.1.

Comparisons between assembly haplotypes

To plot comparisons between the two haplotypes, including genes and repeats, we used GENESPACE (v.1.3.1)94. To generate synteny between the two haplotypes, we first performed genome alignments. HAP1 and HAP2 were aligned with AnchorWave (v.1.0.1)95 using the ‘genoAli’ method and ‘-IV’ parameter to allow for inversions. Alignment was performed using only the ‘chromosome’ sequence for each haplotype. The alignment was converted to SAM format using the ‘maf-convert’ tool provided in ‘last’ (v.460)96 and used for calling variants with SyRI (v.1.6.3)97. The output from SyRI was used to make chromosome-level synteny and SV plots using plotsr (v.0.5.4)98.

Identification of the sex chromosome non-recombining region

We used whole-genome sequencing data to identify the sex-determining region of the W. All paired-end Illumina data had adapters removed and were quality filtered using TRIMMOMATIC (v.0.39)99, with leading and trailing values of 3, sliding window of 30, jump of 10 and a minimum remaining read length of 40. We next found all canonical 21-mers in each isolate using Jellyfish (v.2.3.0)100 and used the bash ‘comm’ command to find all k-mers shared in all female isolates and not found in any male isolate (W-mers). We mapped the W-mers to both haplotype assemblies using BWA-MEM (v.0.7.17)80, with parameters ‘-k 21’ ‘-T 21’ ‘-a’ ‘-c 10’. W-mer mapping was visualized by first calculating coverage in 100,000-bp sliding windows (10,000 bp jump) using BEDTools (v.2.28.0)101 and plotted using karyoploteR (v.1.26.0)102.

Structural variation

To identify structural variants between the haplotypes, we mapped PacBio reads using minimap2 (v.2.24)103 in HiFi mode, added the MD tag using samtools (v.1.10) ‘calmd’ and called structural variants using Sniffles (v.2.0.7)104. We also performed whole-genome alignments using minimap2 (v.2.24)103 and visualized the dotplot using pafR (v.0.0.2)105.

Gene homology and protein evolution

To identify one-to-one orthologues on the ZW to examine protein evolution, we ran OrthoFinder (v.2.5.2)106,107 using only the Amborella haplotypes. We calculated synonymous (Ks) and non-synonymous (Ka) changes in codons using the Ka/Ks Calculator (v.2.0)108.

To identify the boundaries of evolutionary strata, we used the mcp (v.0.3.4)109 R package on dXY and Ks. For Ks, we first ran a test for outliers using PMCMRplus (v.1.9.10)110 to run Rosner’s generalized extreme studentized deviate many-outlier test111. For mcp, we used the model, ‘y~1, ~1’ to identify the change point between two plateaus, and we used 100,000 iterations, 3 chains and a burn-in of 100,000 (that is, ‘adapt’).

Nucleotide differences between the sexes

BWA (v.0.7.17)80 was used to map reads, and bcftools (v.1.9) ‘mpileup’ and ‘call’112 functions were used to call variants using the Island-wide sampling (9 male and 6 female plants; Supplementary Table 11). We filtered the vcf file using ‘QUAL > 20 & DP > 5 & MQ > 30’, minor allele frequency of 0.05 and dropped sites with >25% missing data. To calculate Nei’s nucleotide diversity between the sexes (dXY), we used pixy (v.1.2.7.beta1)113. dXY was calculated using 100,000-bp windows with a 10,000-bp jump, and separately on the gene models only.

Presence–absence variation

PAV was identified following the methods in ref. 114, mapping reads from the Island-wide sampling (8 male and 6 female plants; the Atlanta Botanical Gardens isolate was removed due to low resequencing depth; Supplementary Table 11) to our new reference genome and annotation. Briefly, reads for the samples were aligned to each haplotype using BWA (v.0.7.17)80. Sorted BAM files were converted to bedgraph format using bedtools (v.2.30.0)101. Genes were called absent if the horizontal coverage of exons was <5% and the average depth was <2×. A test for equality in the proportion of PAV rate across chromosomes was performed in R using the ‘prop.test()’ function.

Gene expression analyses

To examine gene expression and identify candidate sex-determining genes, we used existing RNA-seq data from 10 females and 10 males11. We first filtered reads using TRIMMOMATIC (same parameters as above). Filtered reads were mapped to the HAP1 genome assembly using STAR (v.2.7.9a)115 and expression estimated for the annotated gene models using StringTie (v.2.1.7) (-e, -G)116. We performed differential gene expression analyses using DESeq2 (v.1.32.0)117, with the contrast being between the sexes.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

Supplementary Information (7.2MB, pdf)

Supplementary Discussion, Methods, Figs. 1–15 and references.

Reporting Summary (1.4MB, pdf)
Supplementary Tables (5.7MB, xlsx)

Supplementary Tables 1–14.

Acknowledgements

The work (proposal no. 10.46936/10.25585/60001405) conducted by the US Department of Energy (DOE) Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, was supported under contract no. DE-AC02-05CH11231. Additional support for analysis was provided by the US Department of Agriculture National Institute of Food and Agriculture Postdoctoral Fellowship no. 2022-67012-38987 (S.B.C.), National Science Foundation (NSF) IOS-PGRP CAREER no. 2239530 (A.H.) and National Science Foundation GRFP (L.A.). We thank the Atlanta Botanical Garden for providing Amborella material used in this study and A. Bewick for the images of Amborella flowers.

Author contributions

S.B.C., J.S., J.L.-M. and A.H. conceptualized the project and designed the research. A.S., T.J., K.B., S.R., J.T., P.P.L., J.M., E.B.K., D.E.S., P.S.S., J.G. and J.L.-M. performed sample collection, data collection and sequencing. S.B.C., J.J. and S.S. conducted genome assembly and annotation. S.B.C., L.A., J.T.L., A.L.H., P.G. and A.Y. performed computational and statistical analyses. S.B.C., L.A., J.T.L., J.J., A.L.H., C.S., D.E.S., P.S.S., J.L.-M. and A.H. wrote the paper, with contributions from all authors.

Peer review

Peer review information

Nature Plants thanks Takashi Akagi, Susanne Renner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Data availability

The genome assemblies and annotations (v.2.1) are available on Phytozome v.13 (https://phytozome-next.jgi.doe.gov/) and have been deposited on NCBI under BioProjects PRJNA1100625 and PRJNA1167780. Sequencing libraries for the genome assembly and annotation are publicly available on NCBI under BioProject PRJNA1100625, and the whole-genome sequencing of additional isolates under PRJNA1161132. Individual accession numbers are provided in Supplementary Tables 10 and 11.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

James Leebens-Mack, Email: jleebensmack@uga.edu.

Alex Harkess, Email: aharkess@hudsonalpha.org.

Supplementary information

The online version contains supplementary material available at 10.1038/s41477-024-01858-x.

References

  • 1.Renner, S. S. The relative and absolute frequencies of angiosperm sexual systems: dioecy, monoecy, gynodioecy, and an updated online database. Am. J. Bot.101, 1588–1596 (2014). [DOI] [PubMed] [Google Scholar]
  • 2.Carey, S., Yu, Q. & Harkess, A. The diversity of plant sex chromosomes highlighted through advances in genome sequencing. Genes12, 381 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Renner, S. S. & Müller, N. A. Plant sex chromosomes defy evolutionary models of expanding recombination suppression and genetic degeneration. Nat. Plants7, 392–402 (2021). [DOI] [PubMed] [Google Scholar]
  • 4.Soltis, P. S., Soltis, D. E. & Chase, M. W. Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature402, 402–404 (1999). [DOI] [PubMed] [Google Scholar]
  • 5.Moore, M. J., Bell, C. D., Soltis, P. S. & Soltis, D. E. Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms. Proc. Natl Acad. Sci. USA104, 19363–19368 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Burleigh, J. G. et al. Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. Syst. Biol.60, 117–125 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Soltis, D. E. et al. Angiosperm phylogeny: 17 genes, 640 taxa. Am. J. Bot.98, 704–730 (2011). [DOI] [PubMed] [Google Scholar]
  • 8.One Thousand Plant Transcriptomes Initiative. One thousand plant transcriptomes and the phylogenomics of green plants. Nature574, 679–685 (2019). [DOI] [PMC free article] [PubMed]
  • 9.Sauquet, H. et al. The ancestral flower of angiosperms and its early diversification. Nat. Commun.8, 16047 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Anger, N., Fogliani, B., Scutt, C. P. & Gâteblé, G. Dioecy in Amborella trichopoda: evidence for genetically based sex determination and its consequences for inferences of the breeding system in early angiosperms. Ann. Bot.119, 591–597 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Käfer, J. et al. A derived ZW chromosome system in Amborella trichopoda, representing the sister lineage to all other extant flowering plants. New Phytol.233, 1636–1642 (2022). [DOI] [PubMed] [Google Scholar]
  • 12.Akagi, T., Henry, I. M., Tao, R. & Comai, L. A Y-chromosome-encoded small RNA acts as a sex determinant in persimmons. Science346, 646–650 (2014). [DOI] [PubMed]
  • 13.Torres, M. F. et al. Genus-wide sequencing supports a two-locus model for sex-determination in Phoenix. Nat. Commun.9, 3969 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Akagi, T. et al. Two Y-chromosome-encoded genes determine sex in kiwifruit. Nat. Plants5, 801–809 (2019). [DOI] [PubMed] [Google Scholar]
  • 15.Harkess, A. et al. Sex determination by two Y-linked genes in garden asparagus. Plant Cell32, 1790–1796 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kazama, Y. et al. A CLAVATA3-like gene acts as a gynoecium suppression function in white campion. Mol. Biol. Evol.39, msac195 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Müller, N. A. et al. A single gene underlies the dynamic evolution of poplar sex determination. Nat. Plants6, 630–637 (2020). [DOI] [PubMed] [Google Scholar]
  • 18.Amborella Genome Project. The Amborella genome and the evolution of flowering plants. Science342, 1241089 (2013). [DOI] [PubMed] [Google Scholar]
  • 19.Oginuma, K., Jaffré, T. & Tobe, H. The karyotype analysis of somatic chromosomes in Amborella trichopoda (Amborellaceae). J. Plant Res.113, 281–283 (2000). [Google Scholar]
  • 20.Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol.21, 245 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Magallón, S., Gómez-Acevedo, S., Sánchez-Reyes, L. L. & Hernández-Hernández, T. A metacalibrated time-tree documents the early rise of flowering plant phylogenetic diversity. New Phytol.207, 437–453 (2015). [DOI] [PubMed] [Google Scholar]
  • 22.Marchant, D. B. et al. Dynamic genome evolution in a model fern. Nat. Plants8, 1038–1051 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Niu, S. et al. The Chinese pine genome and methylome unveil key features of conifer evolution. Cell185, 204–217.e14 (2022). [DOI] [PubMed] [Google Scholar]
  • 24.Healey, A. L. et al. Newly identified sex chromosomes in the Sphagnum (peat moss) genome alter carbon sequestration and ecosystem dynamics. Nat. Plants9, 238–254 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Neumann, P. et al. Plant centromeric retrotransposons: a structural and cytogenetic perspective. Mob. DNA2, 4 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sigman, M. J. & Slotkin, R. K. The first rule of plant transposable element silencing: location, location, location. Plant Cell28, 304–313 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lappin, F. M. et al. A polymorphic pseudoautosomal boundary in the Carica papaya sex chromosomes. Mol. Genet. Genomics290, 1511–1522 (2015). [DOI] [PubMed] [Google Scholar]
  • 28.Cotter, D. J., Brotman, S. M. & Wilson Sayres, M. A. Genetic diversity on the human X chromosome does not support a strict pseudoautosomal boundary. Genetics203, 485–492 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Palmer, D. H., Rogers, T. F., Dean, R. & Wright, A. E. How to identify sex chromosomes and their turnover. Mol. Ecol.28, 4709–4724 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tennessen, J. A. et al. Repeated translocation of a gene cassette drives sex-chromosome turnover in strawberries. PLoS Biol.16, e2006062 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Yu, Q. et al. A physical map of the papaya genome with integrated genetic map and genome sequence. BMC Genomics10, 371 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lahn, B. T. & Page, D. C. Four evolutionary strata on the human X chromosome. Science286, 964–967 (1999). [DOI] [PubMed] [Google Scholar]
  • 33.Rice, W. R. The accumulation of sexually antagonistic genes as a selective agent promoting the evolution of reduced recombination between primitive sex chromosomes. Evolution41, 911–914 (1987). [DOI] [PubMed] [Google Scholar]
  • 34.Charlesworth, D., Charlesworth, B. & Marais, G. Steps in the evolution of heteromorphic sex chromosomes. Heredity95, 118–128 (2005). [DOI] [PubMed] [Google Scholar]
  • 35.Papadopulos, A. S. T., Chester, M., Ridout, K. & Filatov, D. A. Rapid Y degeneration and dosage compensation in plant sex chromosomes. Proc. Natl Acad. Sci. USA112, 13021–13026 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wu, M. & Moore, R. C. The evolutionary tempo of sex chromosome degradation in Carica papaya. J. Mol. Evol.80, 265–277 (2015). [DOI] [PubMed] [Google Scholar]
  • 37.Hobza, R. et al. Impact of repetitive elements on the Y chromosome formation in plants. Genes8, 302 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sacchi, B. et al. Phased assembly of neo-sex chromosomes reveals extensive Y degeneration and rapid genome evolution in Rumex hastatulus. Mol. Biol. Evol.41, msae074 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Jedlicka, P., Lexa, M. & Kejnovsky, E. What can long terminal repeats tell us about the age of LTR retrotransposons, gene conversion and ectopic recombination? Front. Plant Sci.11, 644 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Cornet, C. et al. Holocentric repeat landscapes: from micro-evolutionary patterns to macro-evolutionary associations with karyotype evolution. Mol. Ecol.10.1111/mec.17100 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Bachtrog, D. Y-chromosome evolution: emerging insights into processes of Y-chromosome degeneration. Nat. Rev. Genet.14, 113–124 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Charlesworth, D. The timing of genetic degeneration of sex chromosomes. Phil. Trans. R. Soc. Lond. B376, 20200093 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Hibbins, M. S. et al. Phylogenomics resolves key relationships in Rumex and uncovers a dynamic history of independently evolving sex chromosomes. Preprint at bioRxiv10.1101/2023.12.13.571571 (2023).
  • 44.Krasovec, M., Chester, M., Ridout, K. & Filatov, D. A. The mutation rate and the age of the sex chromosomes in Silene latifolia. Curr. Biol.28, 1832–1838.e4 (2018). [DOI] [PubMed] [Google Scholar]
  • 45.Akagi, T. et al. Rapid and dynamic evolution of a giant Y chromosome in Silene latifolia. Preprint at bioRxiv10.1101/2023.09.21.558759 (2023).
  • 46.Ma, X. et al. The spinach YY genome reveals sex chromosome evolution, domestication, and introgression history of the species. Genome Biol.23, 75 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.She, H. et al. Evolution of the spinach sex-linked region within a rarely recombining pericentromeric region. Plant Physiol.193, 1263–1280 (2023). [DOI] [PubMed] [Google Scholar]
  • 48.Bachtrog, D. et al. Are all sex chromosomes created equal? Trends Genet.27, 350–357 (2011). [DOI] [PubMed] [Google Scholar]
  • 49.Charlesworth, B. & Charlesworth, D. A model for the evolution of dioecy and gynodioecy. Am. Nat. 112, 975–997 (1978).
  • 50.Jay, P., Tezenas, E., Véber, A. & Giraud, T. Sheltering of deleterious mutations explains the stepwise extension of recombination suppression on sex chromosomes and other supergenes. PLoS Biol.20, e3001698 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Lenormand, T. & Roze, D. Y recombination arrest and degeneration in the absence of sexual dimorphism. Science375, 663–666 (2022). [DOI] [PubMed] [Google Scholar]
  • 52.Buzgo, M., Soltis, P. S. & Soltis, D. E. Floral developmental morphology of Amborella trichopoda (Amborellaceae). Int. J. Plant Sci.165, 925–947 (2004). [Google Scholar]
  • 53.Flores-Tornero, M. et al. Transcriptomic and proteomic insights into Amborella trichopoda male gametophyte functions. Plant Physiol.184, 1640–1657 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Julca, I. et al. Comparative transcriptomic analysis reveals conserved programmes underpinning organogenesis and reproduction in land plants. Nat. Plants7, 1143–1159 (2021). [DOI] [PubMed] [Google Scholar]
  • 55.Deyhle, F., Sarkar, A. K., Tucker, E. J. & Laux, T. WUSCHEL regulates cell differentiation during anther development. Dev. Biol.302, 154–159 (2007). [DOI] [PubMed] [Google Scholar]
  • 56.Zúñiga-Mayo, V. M., Gómez-Felipe, A., Herrera-Ubaldo, H. & de Folter, S. Gynoecium development: networks in Arabidopsis and beyond. J. Exp. Bot.70, 1447–1460 (2019). [DOI] [PubMed] [Google Scholar]
  • 57.Schoof, H. et al. The stem cell population of Arabidopsis shoot meristems is maintained by a regulatory loop between the CLAVATA and WUSCHEL genes. Cell100, 635–644 (2000). [DOI] [PubMed] [Google Scholar]
  • 58.Parvathy, S. T., Prabakaran, A. J. & Jayakrishna, T. Author Correction: Probing the floral developmental stages, bisexuality and sex reversions in castor (Ricinus communis L.). Sci. Rep.11, 10504 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Zhang, S. et al. The control of carpel determinacy pathway leads to sex determination in cucurbits. Science378, 543–549 (2022). [DOI] [PubMed] [Google Scholar]
  • 60.Schlegel, J. et al. Control of Arabidopsis shoot stem cell homeostasis by two antagonistic CLE peptide signalling pathways. eLife10.7554/eLife.70934 (2021). [DOI] [PMC free article] [PubMed]
  • 61.Kurakawa, T. et al. Direct control of shoot meristem activity by a cytokinin-activating enzyme. Nature445, 652–655 (2007). [DOI] [PubMed] [Google Scholar]
  • 62.Hardtke, C. S. & Berleth, T. The Arabidopsis gene MONOPTEROS encodes a transcription factor mediating embryo axis formation and vascular development. EMBO J.17, 1405–1411 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Aida, M., Vernoux, T., Furutani, M., Traas, J. & Tasaka, M. Roles of PIN-FORMED1 and MONOPTEROS in pattern formation of the apical region of the Arabidopsis embryo. Development129, 3965–3974 (2002). [DOI] [PubMed] [Google Scholar]
  • 64.Stortenbeker, N. & Bemer, M. The SAUR gene family: the plant’s toolbox for adaptation of growth and development. J. Exp. Bot.70, 17–27 (2019). [DOI] [PubMed] [Google Scholar]
  • 65.He, S.-L., Hsieh, H.-L. & Jauh, G.-Y. SMALL AUXIN UP RNA62/75 are required for the translation of transcripts essential for pollen tube growth. Plant Physiol.178, 626–640 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Chae, K. et al. Arabidopsis SMALL AUXIN UP RNA63 promotes hypocotyl and stamen filament elongation. Plant J.71, 684–697 (2012). [DOI] [PubMed] [Google Scholar]
  • 67.van Mourik, H., van Dijk, A. D. J., Stortenbeker, N., Angenent, G. C. & Bemer, M. Divergent regulation of Arabidopsis SAUR genes: a focus on the SAUR10-clade. BMC Plant Biol.17, 245 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Bürstenbinder, K. et al. Inhibition of 5′-methylthioadenosine metabolism in the Yang cycle alters polyamine levels, and impairs seedling growth and reproduction in Arabidopsis. Plant J.62, 977–988 (2010). [DOI] [PubMed] [Google Scholar]
  • 69.Waduwara-Jayabahu, I. et al. Recycling of methylthioadenosine is essential for normal vascular development and reproduction in Arabidopsis. Plant Physiol.158, 1728–1744 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Nurk, S. et al. The complete sequence of a human genome. Science376, 44–53 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Rhie, A. et al. The complete sequence of a human Y chromosome. Nature621, 344–354 (2023). [DOI] [PMC free article] [PubMed]
  • 72.Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature592, 737–746 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Zhu, Z., Younas, L. & Zhou, Q. Evolution and regulation of animal sex chromosomes. Nat. Rev. Genet. 10.1038/s41576-024-00757-3 (2024). [DOI] [PubMed]
  • 74.Doyle, J. J. & Doyle, J. L. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull.19, 11–15 (1987).
  • 75.Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods18, 170–175 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res.27, 737–746 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst.3, 95–98 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science356, 92–95 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Kent, W. J. BLAT—The BLAST-Like Alignment Tool. Genome Res.12, 656–664 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at 10.48550/arXiv.1303.3997 (2013).
  • 81.McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res.20, 1297–1303 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Wu, T. D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics26, 873–881 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res.31, 5654–5666 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Smit, A. F. A., Hubley, R. & Green, P. RepeatModeler Open-1.0. 2008–2015 (Institute for Systems Biology, accessed 1 May 2018); https://www.repeatmasker.org
  • 85.Salamov, A. A. & Solovyev, V. V. Ab initio gene finding in Drosophila genomic DNA. Genome Res.10, 516–522 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics6, 31 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics7, 62 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Lovell, J. T. et al. The genomic landscape of molecular responses to natural drought stress in Panicum hallii. Nat. Commun.9, 5213 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Huang, N. & Li, H. compleasm: a faster and more accurate reimplementation of BUSCO. Bioinformatics39, btad595 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol.20, 275 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA117, 9451–9457 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res.27, 573–580 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics38, 2049–2051 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Lovell, J. T. et al. GENESPACE tracks regions of interest and gene copy number variation across multiple genomes. eLife11, e78526 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Song, B. et al. AnchorWave: sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication. Proc. Natl Acad. Sci. USA119, e2113075119 (2022). [DOI] [PMC free article] [PubMed]
  • 96.Kiełbasa, S. M., Wan, R., Sato, K., Horton, P. & Frith, M. C. Adaptive seeds tame genomic sequence comparison. Genome Res.21, 487–493 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol.20, 277 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Goel, M. & Schneeberger, K. plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics38, 2922–2926 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics30, 2114–2120 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics27, 764–770 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics26, 841–842 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Gel, B. & Serra, E. karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics33, 3088–3090 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics34, 3094–3100 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods15, 461–468 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Winter, D., Lee, K. & Cox, M. pafr: read, manipulate and visualize ‘Pairwise mApping Format’ data in R (CRAN, 2020).
  • 106.Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol.16, 157 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol.20, 238 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Zhang, Z. et al. KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteom. Bioinform.4, 259–263 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Lindeløv, J. K. mcp: an R package for regression with multiple change points. Preprint at 10.31219/osf.io/fzqxv (2020).
  • 110.Pohlert, T. & Pohlert, M. T. PMCMR: calculate pairwise multiple comparisons of mean rank sums. R package version 1 https://cran.r-project.org/package=PMCMR (2018).
  • 111.Rosner, B. Percentage points for a generalized ESD many-outlier procedure. Technometrics25, 165–172 (1983). [Google Scholar]
  • 112.Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics27, 2987–2993 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Korunes, K. L. & Samuk, K. pixy: unbiased estimation of nucleotide diversity and divergence in the presence of missing data. Mol. Ecol. Resour.21, 1359–1368 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Hu, H. et al. Amborella gene presence/absence variation is associated with abiotic stress responses that may contribute to environmental adaptation. New Phytol.233, 1548–1555 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol.33, 290–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Love, M., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.Genome Biol.15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (7.2MB, pdf)

Supplementary Discussion, Methods, Figs. 1–15 and references.

Reporting Summary (1.4MB, pdf)
Supplementary Tables (5.7MB, xlsx)

Supplementary Tables 1–14.

Data Availability Statement

The genome assemblies and annotations (v.2.1) are available on Phytozome v.13 (https://phytozome-next.jgi.doe.gov/) and have been deposited on NCBI under BioProjects PRJNA1100625 and PRJNA1167780. Sequencing libraries for the genome assembly and annotation are publicly available on NCBI under BioProject PRJNA1100625, and the whole-genome sequencing of additional isolates under PRJNA1161132. Individual accession numbers are provided in Supplementary Tables 10 and 11.


Articles from Nature Plants are provided here courtesy of Nature Publishing Group

RESOURCES