Abstract
Egg-laying mammals (monotremes) are the only extant mammalian outgroup to therians (marsupial and eutherian animals) and provide key insights into mammalian evolution1,2. Here we generate and analyse reference genomes of the platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus), which represent the only two extant monotreme lineages. The nearly complete platypus genome assembly has anchored almost the entire genome onto chromosomes, markedly improving the genome continuity and gene annotation. Together with our echidna sequence, the genomes of the two species allow us to detect the ancestral and lineage-specific genomic changes that shape both monotreme and mammalian evolution. We provide evidence that the monotreme sex chromosome complex originated from an ancestral chromosome ring configuration. The formation of such a unique chromosome complex may have been facilitated by the unusually extensive interactions between the multi-X and multi-Y chromosomes that are shared by the autosomal homologues in humans. Further comparative genomic analyses unravel marked differences between monotremes and therians in haptoglobin genes, lactation genes and chemosensory receptor genes for smell and taste that underlie the ecological adaptation of monotremes.
Subject terms: Evolutionary genetics, Comparative genomics
New reference genomes of the two extant monotreme lineages (platypus and echidna) reveal the ancestral and lineage-specific genomic changes that shape both monotreme and mammalian evolution.
Main
The iconic egg-laying monotremes of Australasia represent one of the three major mammalian lineages. The monotreme lineage comprises two extant families, the semi-aquatic Ornithorhynchidae (platypus) and the terrestrial Tachyglossidae (echidna). At present, the single species of platypus has a restricted distribution in Eastern Australia, whereas four echidna species (T. aculeatus and three Zaglossus spp.) are present in Australia and New Guinea (Supplementary Information). Platypuses and echidnas feature radical differences in diet (carnivorous compared with insectivorous), neurophysiology (electroreception-oriented compared with olfaction-oriented), as well as specific intraspecific conflict and defence adaptations1. Owing to their distinct ecological, anatomical and physiological features, monotremes are interesting mammals well-suited for the study of the evolution of ecological adaptation. Of particular interest are their sex chromosomes, which originated independently from those of therian mammals through additions of autosomes onto an ancestral XY pair, resulting in a multiple sex chromosome system that assembles as a chain during meiosis3.
The previous female platypus genome assembly (OANA5) provided many important insights into monotreme biology and mammalian evolution. However, only about 25% of its sequence was assigned to chromosomes2. The incomplete platypus assembly without Y chromosome sequences and lack of an echidna genome have limited the interpretation of the evolution of mammals and monotremes. Here we combined PacBio long-read, 10× linked-read, chromatin conformation (Hi-C) and physical map data to produce a highly accurate chromosome-scale assembly of the platypus genome. We also produced a less-continuous assembly for the short-beaked echidna, which enables us to infer the genomic changes that occurred in the ancestral monotremes and other mammals.
Chromosome-scale monotreme genomes
Our new male platypus genome assembly (mOrnAna1) shows a 1,390-fold improvement for the contig N50 and a 49-fold improvement for the scaffold N50 compared with the previous Sanger-based assembly (OANA5) (Fig. 1a). We performed extensive error correction and manual curation to polish and anchor the assembly at the chromosome scale (Extended Data Fig. 1a, b). Ambiguous chromosome assignments were resolved with fluorescence in situ hybridization (FISH) experiments (Extended Data Fig. 1c, d). We also produced a male echidna genome (mTacAcu1) from a variety of short- and long-insert-size libraries, and further scaffolded it using the same methods as in platypus. The resulting mTacAc1 sequence shows better sequence continuity than OANA5, with a scaffold N50 size of 32.51 Mb (Supplementary Table 2).
To study the origin and evolution of monotreme sex chromosomes, we greatly improved the assembly of the platypus sex chromosomes. We anchored 172 Mb (92% compared to 22% in OANA5) X-borne sequences to chromosomes (Supplementary Tables 4, 6). This includes one 1.6-Mb segment that was previously misassigned to chromosome 14 (Extended Data Fig. 1e). We determined all of the pseudoautosomal regions (PARs) except for X4, on the basis of the different read coverage between sexes and representation of FISH markers (Supplementary Table 3). We also mapped 92% of the platypus Y-borne sequences to the five Y chromosomes using PacBio reads produced using Y-borne bacterial artificial chromosome (BAC) clones4 (Supplementary Tables 5, 6). Owing to a lack of echidna linkage markers, we used the platypus X chromosomes as a reference to anchor a similar length (177 Mb, 96%) of X chromosomes and identified 8.6 Mb Y-borne sequences in echidna.
In the final curated platypus genome (mOrnAna1) 98% of the sequence was assigned to the 21 autosomes, 5 X and 5 Y chromosomes (Supplementary Table 7), with putative telomeres and centromeres annotated for half of the chromosomes (Supplementary Table 8). mOrnAna1 fills around 90% of the gaps in OANA5 (Supplementary Table 9), recovering 161 Mb of previously missed genomic sequences, most of which are long interspersed nuclear elements (LINE)/L2 and short interspersed nuclear elements (SINE)/MIR (Supplementary Tables 10, 11). We also removed 68 Mb of redundant sequences in OANA5 (Extended Data Fig. 1f–h). The repeat elements comprising about half of the monotreme genomes are dominated by LINE/L2 elements that are more similar to reptile genomes than therian mammals (which comprise mostly LINE/L1)5 (Supplementary Table 12). The highly continuous assembly also substantially improves gene annotation. We identified 20,742 and 22,029 protein-coding genes in mOrnAna1 and mTacAcu1, respectively (Supplementary Table 13). Specifically, 19,576 coding exons from 8,303 platypus genes were recovered from the gapped regions of OANA5. Among them, 454 genes were completely missed in OANA5, and 3,961 fragmented genes in OANA5 now have complete open-reading frames. We corrected 2,395 genes that were previously split or misannotated in OANA5 (Extended Data Fig. 1i, j).
Insights into mammalian genome evolution
Our phylogenomic reconstruction shows that monotremes diverged from therians around 187 million years ago, and the two monotremes diverged around 55 million years ago (Extended Data Fig. 2a). This estimate provides a date for the monotreme–therian split that is earlier than previous estimates (about 21 million years ago)2, but agrees with recent analyses of few genes and fossil evidence6. We also inferred that monotremes had similar genome substitution rates (approximately 2.6 × 10−3 substitutions per site per million years) compared with other mammals (Supplementary Table 15). About 14 Mb of mammalian specific highly conserved elements were identified by comparison among vertebrates (Methods): around 90% of elements were located in non-coding regions (Extended Data Fig. 2c), and are associated with genes that are enriched in processes such as brain development (Extended Data Fig. 2d, e, Supplementary Results and Supplementary Tables 18–20).
Next we used chromosome information from human, opossum, Tasmanian devil, platypus, chicken and common wall lizard genomes to reconstruct the mammalian ancestral karyotype (Methods). This analysis reveals 30 mammalian ancestral chromosomes (MACs) (2n = 60) at a resolution of 500 kb, covering around 66% of the human genome and approximately 67% of the platypus genome (Fig. 1b and Supplementary Tables 24–26). Of these, 25 MACs were maintained without breaks in a single chromosome of the therian ancestor, and 17 of them have fused with other MACs in therians. Sixteen MACs were still maintained in a single human chromosome, but only MAC28 had not undergone any intrachromosomal rearrangements during therian evolution (Extended Data Fig. 2f, g). We detected at least 918 chromosome breakage events, and confirmed that the X chromosome in humans was derived from the fusion of an original therian X chromosome with an autosomal region after the divergence from marsupials7 (Fig. 1b and Extended Data Fig. 2f, g). The five X chromosomes in platypus were derived from different MACs by multiple fusion and translocation events.
We found that gene families associated with the immune response and hair growth were expanded considerably in the mammalian ancestor, perhaps contributing to the evolution of immune adaptation and fur, respectively, in mammals (Supplementary Table 30). We further manually annotated major histocompatibility complex (MHC) genes and other immune genes (Supplementary Results). As in nonmammalian vertebrates, the monotreme MHC class Ia genes colocalize with antigen-processing genes and MHC class II genes (Extended Data Fig. 3a and Supplementary Table 31). The defensin genes gave rise to unique defensin-like peptides (OavDLP genes) in platypus venom8. By contrast, echidna has only one single OavDLP pseudogene (Extended Data Fig. 3f–h), suggesting the loss of the key venom gene family in this species.
Monotreme sex chromosome evolution
To elucidate the detailed genomic composition of the monotreme sex chromosomes, we compared regions that share sequences between the sex chromosomes—that is, the PARs—with regions that have become sexually differentiated (SDRs). PAR boundaries show a sharp shift in the female-to-male sequencing coverage ratio as expected (Fig. 2a and Extended Data Fig. 4a). Both monotremes showed generally nonbiased gene expression levels between sexes within PARs, but pronounced female-biased expression within SDRs, indicating the absence of complete chromosome-wide dosage compensation in monotremes as previously suggested9 (Extended Data Fig. 4b).
The short PARs of platypus chromosomes X2–X5 have a significantly higher GC content (one-sided Wilcoxon rank-sum test, P < 0.01) than the SDRs or the longer PARs (Extended Data Fig. 4c), which probably reflects strong GC-biased gene conversion that is caused by a high recombination rate10. This is similar to the pattern of the short GC-rich human PAR, the recombination rate of which is 17-fold higher than the genome-wide average11. Notably, chicken orthologous sequences of these monotreme PARs are all located on the microchromosomes, which also have a high GC content12 (one-sided Wilcoxon rank-sum test, P < 0.01) (Extended Data Fig. 4c, d). This highly conserved recombination landscape might be partially selected in monotremes for maintaining the sequence polymorphism and balanced dosage of MHC genes, which reside in the PARs of the chromosome X3–Y3 and Y4–X5 pairs in platypus13 (Extended Data Fig. 3a). The regional selection for high recombination may also counteract further expansion of SDRs on these sex chromosomes.
Sex chromosomes of both eutherians and birds formed through stepwise suppression of recombination, resulting in a pattern of pairwise sequence divergence between SDRs termed ‘evolutionary strata’14,15. We identified at least seven strata in monotremes, named S0 to S6 from the oldest to the youngest strata (Fig. 2a and Extended Data Fig. 4a), by ranking their levels of pairwise synonymous sequence divergence between the X–Y gametologues and the phylogeny (Extended Data Fig. 5a, b). All but the most recent strata (S5 and S6) are shared by platypus and echidna. However, the PARs that border S5 and S6, as well as the shorter PARs of chromosomes X2 and X5 (Extended Data Fig. 5c, d), formed independently after their divergence. Overall, the distribution of evolutionary strata suggested a time order of incorporating different ancestral autosomes into the sex chromosome chain: it started from the S0 region of X1 containing a sex-determining gene (see below), followed by X2, X3 and X5. X4 and individual regions of X3 and X1 underwent suppression of recombination after the monotreme divergence.
Despite episodes of independent evolution, most sex chromosome regions of the platypus and echidna are homologous (Extended Data Fig. 6a), suggesting that the complex formed in the monotreme ancestor16. To reconstruct its origin, we projected the platypus sex chromosomes onto their chicken homologues (Supplementary Table 39). This refined homology map (Extended Data Fig. 4d) suggests that both fusions and reciprocal translocations among the ancestral micro- and macrochromosomal fragments gave rise to the monotreme sex chromosome complex. The platypus X chromosomes contain homologous sequences of the entire or partial chicken microchromosomes 11, 16, 17, 25 and 28. These microchromosomes also have orthologues in the spotted gar17, suggesting that they were ancestral vertebrate microchromosomes, and fused in the ancestral monotreme or mammalian chromosomes. Evidence of reciprocal translocations came from the observation that parts of every two neighbouring sex chromosomes are homologous to two adjacent regions of the same chicken chromosome (Extended Data Fig. 6c, d). For example, platypus chromosomes X1 and X2 are both homologous to parts of chicken microchromosome 12 and chromosome 13, whereas X2 and X3 are both homologous to chicken chromosome 2.
Notably, X1 at one end of the meiotic chain and Y5 at the other share this alternately overlapping relationship, and both are homologous to chicken microchromosome 28. Indeed, most of the genes on Y5 are not found on its pairing partner X5, but on X1 (Fig. 2b and Supplementary Table 40). Chromosomes X1 and Y5 do not pair at meiosis, but this homology suggests that the origin of the extant monotreme sex chromosome complex involved the opening of the ancestral chromosomal ‘ring’ as degeneration proceeded18. A conserved vertebrate sex-determining gene, the anti-Mullerian hormone, is located on chromosome Y5 (AMHY) and S0 of chromosome X1 (AMHX)14 (Fig. 2b). The ancestral X1–Y5 pairing region that encompasses AMH could, therefore, be the site at which homologous recombination was first suppressed. The degeneration of chromosome Y5 then caused the loss of homology with X1 and led to the break of the chromosome ring. Indeed, synonymous substitution rates (dS) between the retained X1–Y5 gametologue pairs are significantly higher (one-sided Wilcoxon ranked-sum test, P < 0.01) than those of any other sex chromosome pairs (Extended Data Fig. 6e). A chromosome ring configuration has been reported in plants19, but not in any animal species. Alternatively, the ancestral ring structure might have evolved after the emergence of the proto-X1–Y5 pair by translocations that involve other autosomes, so that sexually antagonistic alleles could be linked to the sex-determining genes20.
Interactions between sex chromosomes
The platypus sex chromosomes exhibit an unusual association with each other compared to autosomes during and after meiosis21. As little is known about their spatial organization in platypus somatic cells, we investigated this using Hi-C data (male liver) and chromosomal FISH with sex-chromosome-specific and autosomal BAC probes (male fibroblasts). Notably, Hi-C data showed that chromosomes Y2 and Y3 undergo frequent interchromosomal interactions, whereas autosomes confine their interactions mostly within chromosomes (Fig. 3a and Extended Data Fig. 7a–d). FISH showed that chromosomes Y2 and Y3 signals overlapped more frequently (5.2- and 7.6-fold) than signals between chromosomes Y2 and X1 or Y2 and an autosome (chromosome 17) (P = 8.67 × 10−4 and 8.57 × 10−5, respectively) (Fig. 3b, c and Supplementary Table 41). These interactions allow us to predict a zigzag three-dimensional conformation of the sex chromosomes at interphase (Extended Data Fig. 7e). A similar pattern was also present in echidna (Extended Data Fig. 7f). Notably, the high interaction frequency is conserved in human orthologous autosomal regions (Fig. 3a), suggesting functional importance unrelated to the evolution or function of sex chromosomes.
We further examined the distribution of putative binding sites of the CTCF protein, which is usually enriched at the boundaries of topologically associated domains (TADs) and mediates both intra- and interchromosomal interactions22. This revealed considerable enrichment of putative CTCF-binding sites at the TAD boundaries of the platypus genome (Extended Data Fig. 7g), which are more enriched along the interacting sex chromosomes X2 and X4, as well as along their orthologous regions in human and chicken (Fig. 3d and Extended Data Fig. 7h). These results suggest that an ancestral interaction landscape facilitated by local enrichment of CTCF-binding sites could have promoted the reciprocal translocations between spatially adjacent autosomal fragments that gave rise to the sex chromosome complex in the monotremes.
Eco-evolutionary adaptation of diet
Platypuses consume aquatic invertebrates whereas echidnas feed predominantly on social insects. Although the recent ancestor of monotremes had adult teeth, both extant monotremes lack teeth23. Of eight genes involved in tooth development24, four genes were lost in both monotreme genomes, suggesting that the loss occurred in their recent common ancestors (Extended Data Fig. 8a and Supplementary Table 42), consistent with other toothless or enamel-less eutherians25. Echidnas (but not platypuses) further lost two enamel genes. Analysis of genes involved in stomach function revealed that the considerable loss of digestive genes (reported in platypus26) is shared with echidna and probably occurred in the monotreme ancestor, although NGN3—which is essential for stomach and pancreas development—has been maintained in both species (Extended Data Fig. 8b–g and Supplementary Table 43).
Chemosensory systems mediate animal behaviour that is essential for survival and reproduction through the direct interaction with environmental chemical cues27. For example, eutherian mammals have more than 25 copies of bitter taste receptor genes (TAS2R genes)27,28, whereas this gene family is considerably smaller in monotremes (Extended Data Fig. 9a) with only 7 in platypus (Supplementary Tables 44, 45). The number is reduced to three in echidna (Fig. 4a and Supplementary Results). This reduction is also observed in pangolins, which suggests convergent evolution that results from the insectivore diet of both echidnas and pangolins29.
The nasal cavity of the platypus is closed off during diving and the size of the main olfactory bulb of the platypus is much smaller than that of the echidna1. Consistent with this, the number of olfactory receptors (OR genes) in platypus (299) is much smaller than in echidna (693) (Fig. 4a and Supplementary Table 46). The difference in the large olfactory bulb and OR repertoire in echidna may contribute to the ability to search for odours of underground prey, whereas the platypus relies on electroreception to detect prey in the water. However, the size of the accessory olfactory bulb is larger in the platypus than in the echidna1. The accessory olfactory bulb receives projections from the vomeronasal organ, and there is a marked expansion of the number of vomeronasal type-1 receptors (V1R genes) in the platypus (262) compared with the echidna (28) (Fig. 4a and Supplementary Table 47). Vomeronasal receptors probably have important roles in courtship, parental care, induction of lactation and milk ejection in monotremes23. Therefore, the diversification of the olfactory bulb and accessory olfactory bulb systems in monotremes provide an interesting example of the eco-evolutionary trade-off. V1R amplification has been associated with the size of the vomeronasal organ and nocturnal activity30. This is also consistent with the fact that the platypus closes its eyes when diving and therefore relies entirely on other senses underwater and in the burrow.
Haemoglobin degradation in monotremes
The semi-aquatic lifestyle of the platypus is supported by particularly high haemoglobin levels and large numbers of small red blood cells31. The haemoglobin–haem detoxification system in mammals provides efficient clearance to minimize oxidative damage32 in which haptoglobin is the haemoglobin chaperone32 and free haem is bound by haemopexin and alpha-1 microglobulin33.
Both the haemopexin and alpha-1 microglobulin genes are found in the monotreme genomes, whereas the haptoglobin gene is absent (Fig. 4b, Extended Data Fig. 10a, b and Supplementary Table 48), which suggests that monotremes evolved a haemoglobin clearance system that is different from that of other mammals. Haptoglobin evolved in the common ancestor of vertebrates from an immune gene of the MASP family33 but has neofunctionalized in mammals to bind to haemoglobin with a higher affinity and to bind to the CD163A receptor, which is also absent in monotremes, for clearance in macrophages34. The absence of the haptoglobin gene and CD163A in monotremes suggests that the neofunctionalization of haptoglobin happened after the divergence of monotremes from therians, not before it as previously thought34, and long after the evolution of enucleated red blood cells in the common ancestor of mammals35. Several nonmammalian vertebrates have lost haptoglobin, including chicken34 (Fig. 4b), in which an alternative, secreted CD163 family member, PIT54, is the haemoglobin-binding chaperone33. Phylogenetic analysis shows that monotremes lack genes that cluster with haptoglobin in the MASP family or a PIT54 orthologue (Extended Data Fig. 10c–e and Supplementary Table 50). We confirmed the expansion of the CD163 family in platypus2 (ten members) and found five in echidna, compared with two and three in humans and mice, respectively (Extended Data Fig. 10e, f). As mammalian CD163A can bind to haemoglobin in the absence of haptoglobin36 and one CD163 family member has become the haemoglobin chaperone in chicken, the CD163 family protein(s) may have evolved this role in monotremes.
Transition from oviparity to viviparity
Monotremes provide the key to understanding how viviparity evolved in mammals. They are not as dependent on egg proteins as egg-laying avian and reptilian species owing to their nutrient acquisition from uterine secretions23,37, and the subsequent reliance of the young on lactation. Whereas reptiles have three functional copies of the major egg protein vitellogenin (VTG)38, in monotremes we found only one functional copy (VTG2) (Extended Data Fig. 10g and Supplementary Table 52) and a partial sequence for VTG1.
Similar to marsupials, monotremes have an extended lactation period and the composition of the milk changes dynamically as the development progresses to match the changing needs of the young37. SPINT3, a major milk-specific protein that is present in early lactation of therians with a probable role in the protection of immunoincompetent young in marsupials39, is absent in monotremes. Syntenic analysis confirmed that this region is conserved in platypus but contains two copies of a new protein that contains a Kunitz domain (Extended Data Fig. 10h and Supplementary Table 53). The Kunitz family is a rapidly evolving family, and one of the new members could have a immunoprotective function similar to SPINT3 in monotremes.
The monotreme genomes contain most of the milk genes that have been identified in therian mammals38,40. Most mammals have three casein genes41, which encode the most abundant milk proteins secreted throughout lactation (Fig. 4c). In addition to these genes, monotremes have extra caseins that are not found in therian mammals, with unknown functions, an extra copy of CSN2 (CSN2B) (previously reported40) and CSN3 (CSN3B) in platypus (described here), which has the classic structure of CSN342 (Extended Data Fig. 10i and Supplementary Table 54).
All caseins are members of the secretory calcium-binding phosphoprotein (SCPP) gene family and are thought to have evolved from other SCPP genes, namely the teeth-related gene ODAM through its derivatives FDCSP and SCPPPQ142. As reported above (see ‘Eco-evolutionary adaptation of diet’), extant monotremes appear to have lost both ODAM and FDCSP. Syntenic analysis showed that the additional monotreme casein genes (CSN2B and CSN3B) are found in the same therian chromosomal region as ODAM and FDCSP and within the casein locus (Fig. 4c), providing further evidence that caseins evolved from odontogenic genes.
Summary
Complete and accurate reference genomes and annotations are critical for evolutionary and functional analyses. It remains a challenge to produce a highly accurate chromosome-level assembly, particularly for differentiated sex chromosomes. We have produced a high-quality platypus genome using a combination of single-molecule sequencing technology and multiple sources of physical mapping methods to assign most of the sequences to a chromosome-scale assembly. This permits better-resolved analyses of the origin and diversification of the complex sex chromosome system that evolved specifically in monotremes. We delineate ancient and lineage-specific changes in the sensory system, haemoglobin degradation and reproduction that represent some of the most fascinating biology of platypus and echidna. The new genomes of both species will enable further insights into therian innovations and the biology and evolution of these extraordinary egg-laying mammals.
Methods
Data reporting
No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.
Ethics and sample collection
Pmale08, Pmale09 and Emale12 were collected under AEC permits S-49-2006, S-032-2008 and S-2011-146 at Upper Barnard River (New South Wales, Australia) during the breeding season. Emale01 was collected under San Diego Zoo Global IACUC approval 18-024 and vouched at San Diego Natural History Museum.
Sequencing and assembling
Skeletal muscle of Pmale09 was used for PacBio, 10X and BioNano genome sequencing and the liver of Pmale09 was used for Hi-C (Phase Genomics); the liver of Pmale08 was used for Chicago Hi-C (Dovetail Genomics). Heart muscle of Emale01 was used for a variety of library construction and Illumina sequencing analyses. Muscle of Emale12 was used for 10X and BioNano genome sequencing and liver of Emale12 was used for Hi-C (Phase Genomics). Echidna RNA was extracted from brain, cerebellum, kidney, liver, testis and ovary and sequenced using a previously published procedure43. Platypus Y chromosome BAC isolation via hybridization was performed using a previous published procedure4 and sequenced with PacBio. The platypus genome was assembled following VGP assembly pipeline v.1.0. The echidna genome was assembled using Platanus44 (v.1.2.1) and followed by three steps of scaffolding in the order of 10X, BioNano and Hi-C. Manual curation was performed for both assemblies. Details are available in the Supplementary Methods.
Sex-borne sequence identification
Female and male reads were mapped to the genome using BWA ALN45 (v.0.7.12). The read depth of each sex was calculated in 5-kb non-overlapping windows to identify X-borne sequences and 2-kb non-overlapping windows to identify Y-borne sequences, normalized against the median depth. To identify X-borne sequences, we calculated the female-to-male (F/M) depth ratio of regions that were covered by both sexes in each scaffold, requiring a minimum coverage of 80%, and assigned sequences to X-borne if the depth ratio ranged between 1.5 and 2.5. To identify Y-borne sequences, we calculated the F/M depth ratio as well as the F/M coverage ratio and assigned scaffolds to Y-borne if either ratio was within the range of 0.0–0.3. Parameter evaluation details are available in the Supplementary Methods.
Chromosome anchoring
We collected 75 BAC and 179 marker genes (Supplementary Table 3) and ordered them according to their relative order from those papers. Protein sequences of the gene markers were compared to mOrnAna1 using TBLASTN and the best hit was kept, after which the markers were analysed using GeneWise46 (v.2.4.1) to obtain the location within a scaffold. BAC-end reads were mapped to the assembly using BWA MEM47 (v.0.7.12) and the best hits were kept. We also used the anchored sequences of OANA5 except for the sequence of chromosome 14 to anchor the scaffolds into chromosomes. Scaffolds were orientated and ordered first based on the order of FISH or gene markers then on the order in OANA5. All identified PARs were included in chromosome X. We collected assembled Y contigs from a previously published study4 and generated some Y-BAC PacBio sequencing data. Assembled Y contigs were mapped to the platypus assembly using BWA MEM and Y-BAC PacBio reads were mapped using minimap2 (v.2.13)48. As evidence of both Y2 and Y3 were found on scaffold_229_arrow_ctg1 and scaffold_269_arrow_ctg1 and the covered regions overlapped, these two scaffolds were excluded from the chromosome Y classification. Classified Y-borne scaffolds failed to anchor and orient due to the lack of information. We also curated and anchored some echidna X-borne scaffolds to chromosome X based on Mashmap49 (v.2.0) one-to-one results with platypus50.
Annotation
We identified repetitive elements in both assemblies using the same pipeline, which included homologue-based and de novo prediction. For the homology-based method, we used default repeat library from Repbase (v.21.11)51 for RepeatMasker (v.4.0.6)52, trf (v.4.07)53 and Proteinmasker (v.4.0.6)52 to annotate. For the de novo method, we first ran RepeatModeler (v.1.0.8) to construct the consensus sequence library for each monotreme using their genome as input, then aligned the genome against each consensus library to identify repeats using RepeatMasker. Gene annotation was performed by merging the homology, de novo prediction and transcriptome analyses to build a consensus gene set of each species. Protein sequences from human, mouse, opossum, platypus, chicken and green lizard (Anolis carolinensis) from Ensembl54 (release 87) were aligned to the genome using TBLASTN55 (v.2.2.26) (e < 1 × 10−5). Candidate gene regions were refined using GeneWise for more accurate gene models. We randomly selected 1,000 high score homology-based genes to train Augustus56 (v.3.0.3) for de novo prediction on a repeat N-masked genome. We also mapped RNA-sequencing reads of the platypus from a previously published study57 and echidna to their respective assemblies using HISAT258 (v.2.0.4), and constructed transcripts using stringTie59 (v.1.2.3). Results from these three methods were merged into a nonredundant gene set. Possible retrogenes were filtered according to their hit to SwissProt database60 (release 2015_12) or Iprscan61 (v.5.16-55.0). We used the SwissProt database (e < 1 × 10−5) to annotate the function of the genes. Iprscan was used to annotate the GO of genes. Detailed descriptions of the manual annotation, curation and phylogenetic analysis of genes related to imprinting, immune system, reproduction and haemoglobin degradation can be found in the Supplementary Methods.
Gap analysis
We identified gap-filling regions using an alignment-based strategy similar to a previously published study62. We considered gaps for which both flanking regions mapped to mOrnAna1 as closed gaps. Only properly closed gaps defined by (1) both flanking regions were aligned but did not overlap and (2) closed gap size were within 100 times the estimated gap size in OANA5 were considered for repeat and gene improvement analysis.
Redundant sequences analysis
We performed two rounds of Mashmap with parameters ‘-f one-to-one -s 2000’ using mOrnAna1 as reference and OANA5 as query. A one-to-one relationship was obtained in the first round of Mashmap. In the second round of Mashmap, those OANA5 sequences that were unmapped in the first round of mapping were used as query. Candidate redundant sequences were obtained from the second round Mashmap result, but excluded regions that were gaps in OANA5. Female and male reads were then mapped to OANA5 and mOrnAna1 using BWA ALN and normalized by the mode depth.
Gene set comparison
We performed LASTZ63 (v.1.04.00) alignment using OANA5 as reference with parameter set ‘--hspthresh=4500 --gap=600,150 --ydrop=15000 --notransition --format=axt’ and a score matrix for the comparison of closely related species to generate a chain file for gene location liftover from OANA5 to mOrnAna1. Gene coordinates in OANA5 were first converted to mOrnAna1 using in-house-generated scripts with the chain file. We searched for overlap between the converted OANA5 gene set and mOrnAna1 gene set. Fragmented genes were defined as multiple converted OANA5 genes that overlapped with a single mOrnAna1 gene. A one-to-one gene pair between the two gene sets was defined as the liftover of the OANA5 gene when it overlapped with only one mOrnAna1 gene. Only one-to-one pairs were used for the comparison of open-reading frame completeness. We defined a gene as having a complete open-reading frame if its first codon is a start codon and the last codon is a stop codon.
Identification of one-to-one orthologues and synteny blocks between the human sequence and sequences of other species
We defined one-to-one orthologues between the human sequence and the sequences of other species by considering both reciprocal best BLASTP hits (RBH) and synteny, taking the human sequence as reference, as previously described64. First, we conducted BLASTP for all protein sequences from human and other species including mouse, opossum, platypus, echidna, chicken and green lizard with an e-value cut-off of 1 × 10−7, and combined local alignments with the SOLAR (http://treesoft.svn.sourceforge.net/viewrc/treesoft/). Next, we identified RBH orthologues between human and every other species on the basis of the following parameters: alignment score, alignment rate and identity. From these RBH orthologues, we retained those pairs with conserved synteny across species. Synteny was determined based on their flanking genes. If RBH orthologous gene pairs shared the same flanking genes, we retained the genes for downstream analyses. Finally, we merged pairwise orthologue lists according to the human coordinates. In this way, we produced the final one-to-one orthologue set across species.
We used the human genome as the reference and aligned it with other species using LASTZ with parameter set ‘--hspthresh=4500 --gap=600,150 --ydrop=15000 –notransition --format=axt’ and a score matrix for the comparison of closely related species. Alignments were converted into ‘chain’ and ‘net’ results with different levels of alignment scores using utilities of the UCSC Genome Browser (http://genomewiki.ucsc.edu/index.php/), and the pairwise synteny blocks between genomes of each species and the human genome were extracted according to the net result. Only alignments larger than 10 kb were kept. The synteny blocks were further cleaned of overlapping genes. N50 and the total length of the synteny block inferred from each human–species pair were calculated based on the human coordinates.
Phylogenetic analysis
The phylogenetic tree was constructed using concatenated four-degenerated sites from the 7,946 one-to-one orthologues using RaxML65 (v.8.2.4) with parameter set ‘-m GTRCAT -# 100 -p 12345 -x 12345 -f a’ and chicken and green lizard were specified as the outgroup. MCMCtree in PAML66 (v.4.7) was used to estimate divergence time of each species with calibration points obtained from a previously published study67 using the same data. Points and time range included the most recent common ancestor of human–mouse, 85–94 million years ago; human–opossum 150–167 million years ago; human–platypus, 163–191 million years ago, human–chicken, 297–326 million years ago, anole–chicken, 276–286 million years ago. The seed used for MCMC was 1192664277.
Substitution rate analysis
We first performed pairwise whole-genome LASTZ alignment using 12 mammals (Macaca mulatta, Tupaia belangeri, Mus musculus, Canis lupus familiaris, Myotis lucifugus, Bos taurus, Sorex araneus, Loxodonta africana, Dasypus novemcinctus, Monodelphis domestica, O. anatinus and T. aculeatus) with the human genome as the reference genome, with the parameter set ‘--step=19 --hspthresh=2200 -inner=2000 --ydrop=3400 --gappedthresh=10000 --format=axt’ and a score matrix for the comparison of distantly related species. Pairwise alignments were merged using MULTIZ68 (v.11.2). The four-degenerated site alignment was extracted based on the human gene set (Ensembl release 87), concatenated and fed to phyloFit in the PHAST package69 (v.1.5) for the calculation of branch lengths (substitution per site). The substitution rate was calculated by dividing the branch length to the mammalian common ancestor to the mammal–reptile divergence time.
Gene family analysis
Gene families across the seven species were generated using orthoMCL70 (v.2.0.9) with BLASTP results (e < 1 × 10−7) and was fed to CAFÉ71 (v.4.2) along with the phylogenetic tree. We first estimated the assembly error by excluding families with more than 100 members. Then the estimated rate was used to infer the family size at every node for each family. The ancestral node gene number of families with more than 100 members among extant species were inferred separately. We extracted genes based on the human gene set for GO enrichment (χ2 test) of the significantly expanded family (Viterbi P < 0.05) for the mammalian ancestor. A false-discovery rate (FDR) adjustment was used for multiple-test corrections in GO enrichment analyses.
Mammalian-specific highly conserved element analysis
We used the same MULTIZ alignment of the substitution rate analysis and identified mammalian-specific highly conserved elements (MSHCEs) using a similar strategy as has previously been described72. At least 80% of species and at least one species in eutherians, marsupials and monotremes were required to be present in alignments. Type-I MSHCEs were defined as HCEs to which no outgroup could be aligned; type-II MSHCEs were HCEs that were significantly conversed (P < 0.01) in mammals compared to mammals + outgroup calculated using phyloP (Benjamini–Hochberg adjusted). We considered four sets of outgroup combinations: (1) green lizard only; (2) chicken only; (3) two reptiles and one frog; and (4) two reptiles, one frog and one fish, and only kept those that were significantly conserved in all four sets of statistical tests (Benjamini–Hochberg adjusted P < 0.01). Only elements ≥20 bp were kept for further analysis.
To annotate MSHCEs to possible functional elements, we used the human annotation (Ensembl release 87) as a reference and classified the elements into the coding sequence, 5′ and 3′ untranslated regions, non-coding RNA, pseudogene, intron, upstream 10-kb region (from start codon), downstream 10-kb region (from the stop codon) and intergenic regions, with the same hierarchical order if the regions overlapped. Genes located within the upstream or downstream 10-kb range of MSHCEs were considered to be MSHCE-associated genes, and ordered by the length of the element. The top-300 MSHCE-associated genes were used in the GO enrichment analysis (χ2 test, FDR-adjusted) and visualized using REVIGO73.
Mammalian karyotype reconstruction
We used pairwise LASTZ alignments of the opossum, Tasmanian devil, platypus, chicken and common wall lizard (Podarcis muralis) genomes to the human genome as input. Echidna was not used here as most of the sequences were not anchored to chromosomes, which would lead to a more fragmented reconstruction. With the net and chain results, conserved segments that were uniquely and universally presented in all six species were obtained using inferCARs74 (release 2006-Jun-16). Marsupial and therian ancestral karyotypes were inferred using ANGES75 (v.1.01) using the branch-and-bound algorithm, and the resulting continuous ancestral regions (CARs) were further reorganized based on the previously predicted configuration76 (Supplementary Tables 22, 23). We replaced the conserved segments of the human, opossum and Tasmanian devil genomes with those of the reconstructed therian ancestral karyotype and reconstructed marsupial ancestral karyotype using ANGES with the same parameters except setting the target reconstruction node to mammalian ancestor. We reorganized CARs on the basis of gene synteny among ingroups and outgroups inferred using MCScanX77 (release 08-05-2012), requiring that there is synteny across CARs in at least one ingroup–outgroup pair (Supplementary Tables 22, 23). Pairwise MCScanX was run among the six species with BLASTP (e < 1 × 10−7).
Rearrangement events in each lineage were inferred using GRIMM78 (v.2.1) by taking the karyotypes of the most recent ancestor and the child as input. The breakpoint number in each lineage was calculated on the basis of the output of GRIMM using an in-house-generated script, in which one breakpoint was counted in fission, two breakpoints were counted in translocation, and one or two breakpoints were counted in inversion, depending on whether the inversion happened at the end of the chromosome. Calculations were done using resolutions of 500 kb and 300 kb, and using the raw ANGES output and reorganized output, respectively (Supplementary Table 28). Differences in breakpoint rates compared to the average of all branches were tested as previously described79.
Gametologue identification
We used BLASTP to compare all Y-borne genes to all X-borne genes (e < 1 × 10−5) and kept the best hit for each Y-borne gene. Candidate gametologue pairs were further confirmed if both of the genes were mapped to the same gene in NCBI or the SwissProt database. Four gametologues (platypus AMHX and FEM1CX from OANA5, and SDHAY and HNRNPKY from ref. 14) were added as they were missing in mOrnAna1. Translated genes were aligned using PRANK80, filtered using Gblock81, and converted back into the alignment of the coding sequence. dS was calculated using codeml in PAML with ‘runmode=-2’.
Demarcate evolutionary strata
We aligned all platypus Y-borne scaffolds (N-masked) to all platypus X-borne sequences (N-masked), and aligned all echidna Y-borne scaffolds (N-masked) to all echidna X-borne sequences (N-masked), using LASTZ with the parameter set ‘--step=19 --hspthresh=2200 --inner=2000 --ydrop=3400 --gappedthresh=10000 --format=axt’ and a score matrix set for the comparison of distantly related species. On the basis of the net and ‘maf’ results, the identity of each alignment block was calculated in a 2-kb non-overlapped window and the aligned Y-borne sequences were oriented along the X chromosomes. Identity along X chromosomes was colour-coded for visualization.
Expression calculation
RNA-sequencing reads of platypus (SRP102989) and echidna were mapped to the genome using HISTA2. Uniquely mapped reads were used in the calculation and normalization of the reads per kilobase per million reads (RPKM) using DESeq82 (v.1.28.0) to generate an expression matrix for each species. For tissues that were available in both sexes, we computed the median RPKM of each X-borne gene, and computed its F/M RPKM ratio (requiring RPKM in both sexes to be ≥1) to determine dosage-compensation status. We used the median expression value in each tissue to calculate the tissue specificity index TAU83 for each gene. We defined tissue-specific expression as a gene that shows at least twofold higher expression in tissue with the highest expression than in any other tissue, the highest RPKM > 1 and TAU > 0.8.
Building genome-wide Hi-C interaction maps
Genome-wide interaction maps at a 100-kb resolution were generated for platypus, echidna and human (SRX641267) with HiC-Pro84 (v.2.10.0). For echidna, we only retained scaffolds >10 kb as the large number of short scaffolds would cause ICE normalization failure. The normalized sex chromosomes submatrix was extracted for quantification and plotting with ggplot2 (v.3.2.1). For human, we used the scaled homologous sequences of platypus for quantification and plotting.
Identification of TADs and CTCF-binding sites
HiC-Pro interaction maps were transformed to h5 format using hicConvertFormat and fed to hicFindTADs with the parameters ‘--outPrefix TAD --numberOfProcessors 32 --correctForMultipleTesting fdr’ to identify TADs with HiCExplorer85 (v.3.0). The human CTCF motif86 was used as a bait by fimo in MEME87 (v.4.12.0) to identify putative CTCF-binding sites. CTCF densities in every 100 kb non-overlapping sliding window along the platypus sex chromosomes or scaled homologous sequences of echidna, human and chicken were compared.
FISH
BACs were obtained from the Children’s Hospital Oakland Research Institute from the platypus BAC library CH236: CH236-775N6 13q2; CH236-97I3 15p1 and CUGI BAC/EST resource centre from the platypus BAC library Oa_Ab: Oa_Bb-155A12 autosomal (WSB1); Oa_Bb-145P09 Y2; Oa_Bb-397I21 Y3. The Super_Scaffold_40-specific probe was amplified from platypus genomic DNA. Gene ENSOANT00000009075.3 was amplified using primers GTCTAAAGACAAGTGTACATCTGTGAC and GTGACTTCTCTTGCGAACACAC. The 3.9-kb product was cloned into pGEM-T Easy (Promega). BAC probes were directly labelled with dUTP Alexa Fluor 594-dUTP, aminoallyl-dUTP-XX-ATTO-488 (Jena Bioscience) using the Nick Translation Kit (Roche Diagnostics) and the Super_Scaffold_40-specific probe labelled with biotin using the Biotin-Nick Translation Mix (1175824919, Roche Diagnostics). The FISH protocol was carried out on cultured fibroblasts from platypus (authenticated by karyotype, not mycoplasma tested) obtained from animals captured at the Upper Barnard River (New South Wales, Australia) during the breeding season (AEC permits S-49-2006, S-032-2008 and S-2011-146) as previously described88 with the following exceptions. Slides were denatured at 70 °C for 3 min in 70% formamide in 2× SSC, 1 mg DNA probe was used per slide, pre-annealing of repetitive DNA sequences was done at 37 °C for 30–60 min. Detection of biotin-labelled probes was done using Rhodamine Avidin D (Vector Laboratories, A-2002), goat Biotinylated anti-avidin D (Vector Laboratories, BA-0300) and Rhodamine Avidin D. Slides were blocked in 4 × SSC, 1% BSA fraction V, for 30 min at 37 °C. Rhodamine Avidin D and Biotinylated anti-avidin D and the second Rhodamine Avidin D were diluted in 4 × SSC, 1% BSA fraction V and were incubated on slides for 45 min at 37 °C, after each step washes were done in 4 × SSC, 4 × SSC, 0.1% triton, 4 × SSC at room temperature for 10 min each. Slides were mounted in VECTASHIELD with DAPI (Vector Laboratories, H-1200). Sample size was determined according to ref. 89, but was limited by material availability. Images were captured on a Nikon Ti Microscope using NIS-Elements AR 4.20.00 software and processed with ImageJ (v.2.0.0). Fisher’s exact test was performed with matrix containing mean of associated and non-associated cells from the three replicates. No blinding nor randomization was performed.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41586-020-03039-0.
Supplementary information
Acknowledgements
We thank members of BGI-Shenzhen, China National GeneBank and VGP, and P. Baybayan, R. Hall and J. Howard for help carrying out the sequencing of the platypus and echidna genomes, M. Asahara for discussion, and D. Charlesworth for comments. Work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB31020000), the National Key R&D Program of China (MOST) grant 2018YFC1406901, International Partnership Program of Chinese Academy of Sciences (152453KYSB20170002), Carlsberg foundation (CF16-0663) and Villum Foundation (25900) to G.Z. Q.Z. is supported by the National Natural Science Foundation of China (31722050, 31671319 and 32061130208), Natural Science Foundation of Zhejiang Province (LD19C190001), European Research Council Starting Grant (grant agreement 677696) and start-up funds from Zhejiang University. F.G., L.S.-W. and T.D. are supported by Australian Research Council (FT160100267, DP170104907 and DP110105396). M.B.R., J.C.F. and S.D.J. are supported by the Australian Research Council (LP160101728). We acknowledge the Kyoto University Research Administration Office for support and Human Genome Center, the Institute of Medical Science, the University of Tokyo for the super-computing resource for supporting T.H.’s research facilities. T.H. was financed by JSPS KAKENHI grant numbers 16K18630 and 19K16241 and the Sasakawa Scientific Research Grant from the Japan Science Society. The echidna RNA-sequencing analysis was supported by H.K.’s grant from the European Research Council (615253, OntoTransEvol). This work was supported by Guangdong Provincial Academician Workstation of BGI Synthetic Genomics No. 2017B090904014 (H.Y.), Robert and Rosabel Osborne Endowment, Howard Hughes Medical Institute (E.D.J.), Rockefeller University start-up funds (E.D.J.), Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health (A.R. and A.M.P.), Korea Health Technology R&D Project through the Korea Health Industry Development Institute HI17C2098 (A.R.). This work used the computational resources of BGI-Shenzhen and the NIH HPC Biowulf cluster (https://hpc.nih.gov). Animal icons are from https://www.flaticon.com/ (made by Freepik) and http://phylopic.org/.
Extended data figures and tables
Author contributions
G.Z., Q.Z. and F.G. initiated and designed the monotreme genome project with early input from J.G. G.Z., F.G., Q.L., T.D., B.H., J.M., O.F., E.D.J., O.A.R., H.K., P.D., J.K., T.P., H.Y. and J. Wang coordinated the project and were involved in the collection, extraction and sequencing of samples. A.M.P., A.R., G.Z., Y.Z., K.H., Y.S., J. Wood, Q.L., Q.Z., F.G. and L.S.-W. performed genome assembling, evaluation and chromosome assignment. G.Z., Y.Z., Z.S., Y.G. and F.G. performed the annotation and mammalian macro-evolutionary analysis including divergence time estimation, gene family and MSHCE analysis. G.Z., Y.Z., H.A.L., J.D. and Q.Z. performed ancestral karyotype analysis. Q.Z., J.L., Z.Z., F.G., F.P., G.Z., Y.Z., L.S.-W. and Y.G. performed analysis to sex chromosome and FISH validation. K.B., E.P., Y.C., Y.Z., D.S., Z.S., G.Z. and F.G. performed the immune gene analysis. Y.Z., J.C.F., M.B.R., Z.S. and G.Z. performed the tooth-related gene analysis. N.B., F.G. and Y.Z. performed the digestive gene analysis and PCR validation. T.H., H.S., M.N. and F.G. performed the chemosensory gene analysis. L.S.-W., F.G., Y.Z. and Z.S. performed the haemoglobin-degradation gene analysis. M.B.R., J.C.F. and S.J. performed the reproductive gene analysis. G.Z., Q.Z., F.G., Y.Z., L.S.-W., J.L., M.B.R., J.C.F., K.B., E.P., Y.C., D.S., N.B., F.P., T.H., H.A.L., J.D., A.M.P., A.R., K.H., J. Wood, O.F. and J.G. wrote the manuscript.
Data availability
The platypus whole-genome shotgun project has been deposited at GenBank (project accessions PRJNA489114 and PRJNA489115), CNSA (https://db.cngb.org/cnsa/) of CNGBdb (accession CNP0000130) and GenomeArk (https://vgp.github.io/genomeark-curated-assembly/Ornithorhynchus_anatinus/). The echidna whole-genome shotgun project has been deposited at GenBank (project accession PRJNA576333), CNSA of CNGBdb (accession CNP0000697) and GenomeArk at (https://vgp.github.io/genomeark/Tachyglossus_aculeatus/). Echidna RNA-sequencing data have been deposited at GenBank (project accession PRJNA591380) and CNSA of CNGBdb (accession CNP0000779). Public database used in this study include: NCBI (https://www.ncbi.nlm.nih.gov/), Ensembl (release 87) (http://dec2016.archive.ensembl.org/index.html), Uniprot (https://www.uniprot.org/) and Repbase (https://www.girinst.org/repbase/). Accession codes of genes are available in Supplementary Tables 31, 33, 37, 49, 51.
Code availability
In-house-generated scripts used in this study are shared on GitHub (https://github.com/ZhangLabSZ/MonotremeGenome).
Competing interests
J.K. is an employee of Pacific Biosciences, a company that develops single-molecule sequencing technologies.
Footnotes
Peer review information Nature thanks Janine Deakin, Rebecca Johnson and Hugues Roest Crollius for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Yang Zhou, Linda Shearwin-Whyatt, Jing Li
Contributor Information
Frank Grutzner, Email: frank.grutzner@adelaide.edu.au.
Qi Zhou, Email: zhouqi1982@zju.edu.cn.
Guojie Zhang, Email: guojie.zhang@bio.ku.dk.
Extended data
is available for this paper at 10.1038/s41586-020-03039-0.
Supplementary information
is available for this paper at 10.1038/s41586-020-03039-0.
References
- 1.Ashwell, K. Neurobiology of Monotremes: Brain Evolution in Our Distant Mammalian Cousins (CSIRO PUBLISHING, 2013).
- 2.Warren WC, et al. Genome analysis of the platypus reveals unique signatures of evolution. Nature. 2008;453:175–183. doi: 10.1038/nature06936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Grützner F, et al. In the platypus a meiotic chain of ten sex chromosomes shares genes with the bird Z and mammal X chromosomes. Nature. 2004;432:913–917. doi: 10.1038/nature03021. [DOI] [PubMed] [Google Scholar]
- 4.Kortschak RD, Tsend-Ayush E, Grützner F. Analysis of SINE and LINE repeat content of Y chromosomes in the platypus, Ornithorhynchus anatinus. Reprod. Fertil. Dev. 2009;21:964–975. doi: 10.1071/RD09084. [DOI] [PubMed] [Google Scholar]
- 5.Boissinot S, Sookdeo A. The evolution of LINE-1 in vertebrates. Genome Biol. Evol. 2016;8:3485–3507. doi: 10.1093/gbe/evw247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Phillips MJ, Bennett TH, Lee MS. Molecules, morphology, and ecology indicate a recent, amphibious ancestry for echidnas. Proc. Natl Acad. Sci. USA. 2009;106:17089–17094. doi: 10.1073/pnas.0904649106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bellott DW, et al. Mammalian Y chromosomes retain widely expressed dosage-sensitive regulators. Nature. 2014;508:494–499. doi: 10.1038/nature13206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Whittington CM, et al. Defensins and the convergent evolution of platypus and reptile venom genes. Genome Res. 2008;18:986–994. doi: 10.1101/gr.7149808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Julien P, et al. Mechanisms and evolutionary patterns of mammalian and avian dosage compensation. PLoS Biol. 2012;10:e1001328. doi: 10.1371/journal.pbio.1001328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Rousselle M, Laverré A, Figuet E, Nabholz B, Galtier N. Influence of recombination and GC-biased gene conversion on the adaptive and nonadaptive substitution rate in mammals versus birds. Mol. Biol. Evol. 2019;36:458–471. doi: 10.1093/molbev/msy243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hinch AG, Altemose N, Noor N, Donnelly P, Myers SR. Recombination in the human pseudoautosomal region PAR1. PLoS Genet. 2014;10:e1004503. doi: 10.1371/journal.pgen.1004503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Burt DW. Origin and evolution of avian microchromosomes. Cytogenet. Genome Res. 2002;96:97–112. doi: 10.1159/000063018. [DOI] [PubMed] [Google Scholar]
- 13.Dohm JC, Tsend-Ayush E, Reinhardt R, Grützner F, Himmelbauer H. Disruption and pseudoautosomal localization of the major histocompatibility complex in monotremes. Genome Biol. 2007;8:R175. doi: 10.1186/gb-2007-8-8-r175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Cortez D, et al. Origins and functional evolution of Y chromosomes across mammals. Nature. 2014;508:488–493. doi: 10.1038/nature13151. [DOI] [PubMed] [Google Scholar]
- 15.Zhou Q, et al. Complex evolutionary trajectories of sex chromosomes across bird taxa. Science. 2014;346:1246338. doi: 10.1126/science.1246338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Veyrunes F, et al. Bird-like sex chromosomes of platypus imply recent origin of mammal sex chromosomes. Genome Res. 2008;18:965–973. doi: 10.1101/gr.7101908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Braasch I, et al. The spotted gar genome illuminates vertebrate evolution and facilitates human–teleost comparisons. Nat. Genet. 2016;48:427–437. doi: 10.1038/ng.3526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gruetzner F, Ashley T, Rowell DM, Marshall Graves JA. How did the platypus get its sex chromosome chain? A comparison of meiotic multiples and sex chromosomes in plants and animals. Chromosoma. 2006;115:75–88. doi: 10.1007/s00412-005-0034-4. [DOI] [PubMed] [Google Scholar]
- 19.Golczyk H, Massouh A, Greiner S. Translocations of chromosome end-segments and facultative heterochromatin promote meiotic ring formation in evening primroses. Plant Cell. 2014;26:1280–1293. doi: 10.1105/tpc.114.122655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.de Waal Malefijt M, Charlesworth B. A model for the evolution of translocation heterozygosity. Heredity. 1979;43:315–331. [Google Scholar]
- 21.Casey AE, Daish TJ, Barbero JL, Grützner F. Differential cohesin loading marks paired and unpaired regions of platypus sex chromosomes at prophase I. Sci. Rep. 2017;7:4217. doi: 10.1038/s41598-017-04560-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Dixon JR, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Griffiths, M. The Biology of Monotremes (Academic, 1978).
- 24.Meredith RW, Zhang G, Gilbert MT, Jarvis ED, Springer MS. Evidence for a single loss of mineralized teeth in the common avian ancestor. Science. 2014;346:1254390. doi: 10.1126/science.1254390. [DOI] [PubMed] [Google Scholar]
- 25.Springer MS, et al. Odontogenic ameloblast-associated (ODAM) is inactivated in toothless/enamelless placental mammals and toothed whales. BMC Evol. Biol. 2019;19:31. doi: 10.1186/s12862-019-1359-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ordoñez GR, et al. Loss of genes implicated in gastric function during platypus evolution. Genome Biol. 2008;9:R81. doi: 10.1186/gb-2008-9-5-r81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Hayakawa T, Suzuki-Hashido N, Matsui A, Go Y. Frequent expansions of the bitter taste receptor gene repertoire during evolution of mammals in the Euarchontoglires clade. Mol. Biol. Evol. 2014;31:2018–2031. doi: 10.1093/molbev/msu144. [DOI] [PubMed] [Google Scholar]
- 28.Johnson RN, et al. Adaptation and conservation insights from the koala genome. Nat. Genet. 2018;50:1102–1111. doi: 10.1038/s41588-018-0153-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Liu Z, et al. Dietary specialization drives multiple independent losses and gains in the bitter taste gene repertoire of Laurasiatherian mammals. Front. Zool. 2016;13:28. doi: 10.1186/s12983-016-0161-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hunnicutt KE, et al. Comparative genomic analysis of the pheromone receptor class 1 family (V1R) reveals extreme complexity in mouse lemurs (genus, Microcebus) and a chromosomal hotspot across mammals. Genome Biol. Evol. 2020;12:3562–3579. doi: 10.1093/gbe/evz200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Johansen K, Lenfant C, Grigg GC. Respiratory properties of blood and responses to diving of platypus Ornithorhynchus anatinus (Shaw) Comp. Biochem. Physiol. 1966;18:597–608. doi: 10.1016/0010-406x(66)90243-x. [DOI] [PubMed] [Google Scholar]
- 32.Alayash AI. Haptoglobin: old protein with new functions. Clin. Chim. Acta. 2011;412:493–498. doi: 10.1016/j.cca.2010.12.011. [DOI] [PubMed] [Google Scholar]
- 33.Wicher KB, Fries E. Haptoglobin, a hemoglobin-binding plasma protein, is present in bony fish and mammals but not in frog and chicken. Proc. Natl Acad. Sci. USA. 2006;103:4168–4173. doi: 10.1073/pnas.0508723103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Redmond AK, et al. Haptoglobin is a divergent masp family member that neofunctionalized to recycle hemoglobin via CD163 in mammals. J. Immunol. 2018;201:2483–2491. doi: 10.4049/jimmunol.1800508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Huttenlocker AK, Farmer CG. Bone microvasculature tracks red blood cell size diminution in Triassic mammal and dinosaur forerunners. Curr. Biol. 2017;27:48–54. doi: 10.1016/j.cub.2016.10.012. [DOI] [PubMed] [Google Scholar]
- 36.Schaer DJ, et al. CD163 is the macrophage scavenger receptor for native and chemically modified hemoglobins in the absence of haptoglobin. Blood. 2006;107:373–380. doi: 10.1182/blood-2005-03-1014. [DOI] [PubMed] [Google Scholar]
- 37.Griffiths, M. Echidnas (Pergamon, 1968).
- 38.Brawand D, Wahli W, Kaessmann H. Loss of egg yolk genes in mammals and the origin of lactation and placentation. PLoS Biol. 2008;6:e63. doi: 10.1371/journal.pbio.0060063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Pharo EA, et al. The mammary gland-specific marsupial ELP and eutherian CTI share a common ancestral gene. BMC Evol. Biol. 2012;12:80. doi: 10.1186/1471-2148-12-80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lefèvre CM, Sharp JA, Nicholas KR. Characterisation of monotreme caseins reveals lineage-specific expansion of an ancestral casein locus in mammals. Reprod. Fertil. Dev. 2009;21:1015–1027. doi: 10.1071/RD09083. [DOI] [PubMed] [Google Scholar]
- 41.Holt C, Carver JA, Ecroyd H, Thorn DC. Invited review: Caseins and the casein micelle: their biological functions, structures, and behavior in foods. J. Dairy Sci. 2013;96:6127–6146. doi: 10.3168/jds.2013-6831. [DOI] [PubMed] [Google Scholar]
- 42.Kawasaki K, Lafont AG, Sire JY. The evolution of milk casein genes from tooth genes before the origin of mammals. Mol. Biol. Evol. 2011;28:2053–2061. doi: 10.1093/molbev/msr020. [DOI] [PubMed] [Google Scholar]
- 43.Cardoso-Moreira M, et al. Gene expression across mammalian organ development. Nature. 2019;571:505–509. doi: 10.1038/s41586-019-1338-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kajitani R, et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 2014;24:1384–1395. doi: 10.1101/gr.170720.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Birney E, Clamp M, Durbin R. Genewise and genomewise. Genome Res. 2004;14:988–995. doi: 10.1101/gr.1865504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Li H, Durbin R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics. 2010;26:589–595. doi: 10.1093/bioinformatics/btp698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Jain C, Koren S, Dilthey A, Phillippy AM, Aluru S. A fast adaptive algorithm for computing whole-genome homology maps. Bioinformatics. 2018;34:i748–i756. doi: 10.1093/bioinformatics/bty597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Rens W, et al. The multiple sex chromosomes of platypus and echidna are not completely identical and several share homology with the avian Z. Genome Biol. 2007;8:R243. doi: 10.1186/gb-2007-8-11-r243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Bao W, Kojima KK, Kohany O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics. 2004;5:4.10.1–4.10.14. doi: 10.1002/0471250953.bi0410s05. [DOI] [PubMed] [Google Scholar]
- 53.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Yates A, et al. Ensembl 2016. Nucleic Acids Res. 2016;44:D710–D716. doi: 10.1093/nar/gkv1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 56.Stanke M, Schöffmann O, Morgenstern B, Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006;7:62. doi: 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Brawand D, et al. The evolution of gene expression levels in mammalian organs. Nature. 2011;478:343–348. doi: 10.1038/nature10532. [DOI] [PubMed] [Google Scholar]
- 58.Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. 2015;12:357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Pertea M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.UniProt Consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47:D506–D515. doi: 10.1093/nar/gky1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Jones P, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Bickhart DM, et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat. Genet. 2017;49:643–650. doi: 10.1038/ng.3802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Harris, R. S. Improved Pairwise Alignment of Genomic DNA. PhD thesis, Pennsylvania State Univ. (2007).
- 64.Zhang G, et al. Comparative genomics reveals insights into avian genome evolution and adaptation. Science. 2014;346:1311–1320. doi: 10.1126/science.1251385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007;24:1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
- 67.Benton MJ, et al. Constraints on the timescale of animal evolutionary history. Palaeontol. Electronica. 2015;18:1–106. [Google Scholar]
- 68.Blanchette M, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004;14:708–715. doi: 10.1101/gr.1933104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Hubisz MJ, Pollard KS, Siepel A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief. Bioinform. 2011;12:41–51. doi: 10.1093/bib/bbq072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Li L, Stoeckert CJ, Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. doi: 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Han MV, Thomas GW, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol. 2013;30:1987–1997. doi: 10.1093/molbev/mst100. [DOI] [PubMed] [Google Scholar]
- 72.Seki R, et al. Functional roles of Aves class-specific cis-regulatory elements on macroevolution of bird-specific features. Nat. Commun. 2017;8:14229. doi: 10.1038/ncomms14229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Supek F, Bošnjak M, Škunca N, Šmuc T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE. 2011;6:e21800. doi: 10.1371/journal.pone.0021800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Ma J, et al. Reconstructing contiguous regions of an ancestral genome. Genome Res. 2006;16:1557–1565. doi: 10.1101/gr.5383506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Jones BR, Rajaraman A, Tannier E, Chauve C. ANGES: reconstructing ANcestral GEnomeS maps. Bioinformatics. 2012;28:2388–2390. doi: 10.1093/bioinformatics/bts457. [DOI] [PubMed] [Google Scholar]
- 76.Deakin JE, et al. Reconstruction of the ancestral marsupial karyotype from comparative gene maps. BMC Evol. Biol. 2013;13:258. doi: 10.1186/1471-2148-13-258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Wang Y, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:e49. doi: 10.1093/nar/gkr1293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Tesler G. GRIMM: genome rearrangements web server. Bioinformatics. 2002;18:492–493. doi: 10.1093/bioinformatics/18.3.492. [DOI] [PubMed] [Google Scholar]
- 79.Kim J, et al. Reconstruction and evolutionary history of eutherian chromosomes. Proc. Natl Acad. Sci. USA. 2017;114:E5379–E5388. doi: 10.1073/pnas.1702012114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Löytynoja A. Phylogeny-aware alignment with PRANK. Methods Mol. Biol. 2014;1079:155–170. doi: 10.1007/978-1-62703-646-7_10. [DOI] [PubMed] [Google Scholar]
- 81.Talavera G, Castresana J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 2007;56:564–577. doi: 10.1080/10635150701472164. [DOI] [PubMed] [Google Scholar]
- 82.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106. doi: 10.1186/gb-2010-11-10-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Yanai I, et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics. 2005;21:650–659. doi: 10.1093/bioinformatics/bti042. [DOI] [PubMed] [Google Scholar]
- 84.Servant N, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259. doi: 10.1186/s13059-015-0831-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Ramírez F, et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 2018;9:189. doi: 10.1038/s41467-017-02525-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Jolma A, et al. DNA-binding specificities of human transcription factors. Cell. 2013;152:327–339. doi: 10.1016/j.cell.2012.12.009. [DOI] [PubMed] [Google Scholar]
- 87.Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1994;2:28–36. [PubMed] [Google Scholar]
- 88.Tsend-Ayush E, et al. Higher-order genome organization in platypus and chicken sperm and repositioning of sex chromosomes during mammalian evolution. Chromosoma. 2009;118:53–69. doi: 10.1007/s00412-008-0177-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Ling JQ, et al. CTCF mediates interchromosomal colocalization between Igf2/H19 and Wsb1/Nf1. Science. 2006;312:269–272. doi: 10.1126/science.1123191. [DOI] [PubMed] [Google Scholar]
- 90.Parra ZE, et al. Comparative genomic analysis and evolution of the T cell receptor loci in the opossum Monodelphis domestica. BMC Genomics. 2008;9:111. doi: 10.1186/1471-2164-9-111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Van Laere AS, Coppieters W, Georges M. Characterization of the bovine pseudoautosomal boundary: documenting the evolutionary history of mammalian sex chromosomes. Genome Res. 2008;18:1884–1895. doi: 10.1101/gr.082487.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The platypus whole-genome shotgun project has been deposited at GenBank (project accessions PRJNA489114 and PRJNA489115), CNSA (https://db.cngb.org/cnsa/) of CNGBdb (accession CNP0000130) and GenomeArk (https://vgp.github.io/genomeark-curated-assembly/Ornithorhynchus_anatinus/). The echidna whole-genome shotgun project has been deposited at GenBank (project accession PRJNA576333), CNSA of CNGBdb (accession CNP0000697) and GenomeArk at (https://vgp.github.io/genomeark/Tachyglossus_aculeatus/). Echidna RNA-sequencing data have been deposited at GenBank (project accession PRJNA591380) and CNSA of CNGBdb (accession CNP0000779). Public database used in this study include: NCBI (https://www.ncbi.nlm.nih.gov/), Ensembl (release 87) (http://dec2016.archive.ensembl.org/index.html), Uniprot (https://www.uniprot.org/) and Repbase (https://www.girinst.org/repbase/). Accession codes of genes are available in Supplementary Tables 31, 33, 37, 49, 51.
In-house-generated scripts used in this study are shared on GitHub (https://github.com/ZhangLabSZ/MonotremeGenome).