Abstract
Ancient polyploidization events have had a lasting impact on vertebrate genome structure, organization and function. Some key questions regarding the number of ancient polyploidization events and their timing in relation to the cyclostome-gnathostome divergence have remained contentious. Here we generate de novo long-read-based chromosome-scale genome assemblies for the Japanese lamprey and elephant shark. Using these and other representative genomes and developing algorithms for the probabilistic macrosynteny model, we reconstruct high-resolution proto-vertebrate, proto-cyclostome and proto-gnathostome genomes. Our reconstructions resolve key questions regarding the early evolutionary history of vertebrates. First, cyclostomes diverged from the lineage leading to gnathostomes after a shared tetraploidization (1R) but before a gnathostome-specific tetraploidization (2R). Second, the cyclostome lineage experienced an additional hexaploidization. Third, 2R in the gnathostome lineage was an allotetraploidization event, and biased gene loss from one of the subgenomes shaped the gnathostome genome by giving rise to remarkably conserved microchromosomes. Thus, our reconstructions reveal the major evolutionary events and offer new insights into the origin and evolution of vertebrate genomes.
Subject terms: Molecular evolution, Genome evolution, Polyploidy
Early vertebrate genomes were shaped by multiple whole-genome duplication (WGD) events of debated timings. Here the authors’ reconstruction of ancestral genomes using the probabilistic macrosynteny model supports a WGD shared by all vertebrates and a gnathostome-specific WGD, and reveals evidence of a cyclostome-specific genome triplication.
Introduction
The emergence of morphologically complex vertebrates from invertebrate chordates is considered a major evolutionary transition that led to the emergence of more than 70,000 vertebrate species (http://vgpdb.snu.ac.kr/splist), including humans. The common ancestor of vertebrates that originated during the Lower Cambrian1 diverged to give rise to the two extant lineages of vertebrates, the cyclostomes (jawless vertebrates) and gnathostomes (jawed vertebrates). Cyclostomes are a monophyletic group2 comprising lampreys and hagfishes, while gnathostomes include cartilaginous fishes (Chondrichthyes, represented by chimaeras, sharks and rays) and bony vertebrates (Osteichthyes, represented by ray-finned fishes and lobe-finned fishes, including tetrapods). Cyclostomes are sometimes thought to be morphologically primitive as compared to gnathostomes as they lack hinged jaws, paired appendages and nostrils, mineralized tissue, and a discrete pancreas3–5. However, recent studies suggest that the lamprey and hagfish lineages independently acquired their seemingly simplified as well as their specialized morphology, and that the ancestral cyclostome already had a complex morphology and physiology distinct from the gnathostome lineage6,7. For example, although cyclostomes lack the major histocompatibility complex and immunoglobulin-based adaptive immune system (AIS) of gnathostomes, they have independently evolved somatically diversifying variable lymphocyte receptors for antigen recognition8.
Evolutionary innovations at the origin of vertebrates have been proposed to be the result of ancient tetraploidization events that generated additional copies of the entire genome9,10. This view is now widely accepted because genome-wide synteny and paralogy analyses11–15 have provided convincing evidence for two rounds of tetraploidization (known as 1R and 2R, respectively) during early vertebrate evolution (see for review refs. 16,17; see also refs. 18,19). However, the timing of 1R and 2R relative to the cyclostome–gnathostome divergence has remained contentious—a significant gap in our knowledge considering the important implications for the genetic basis of the shared and derived features of these two lineages. Previous studies have produced conflicting results supporting each of the three possibilities18–26, i.e. divergence occurring prior to 1R, between 1R and 2R, and after 2R (Fig. 1). This uncertainty has been further compounded by the discovery of six Hox clusters in both lampreys and hagfish19,22,27 compared to four clusters in most gnathostome lineages, suggesting the possibility of an additional tetraploidization or chromosome-scale segmental duplications in the cyclostome ancestor18,19,22.
Resolving these alternative scenarios using gene trees has proved to be challenging. This is partly due to the presence of multiple ‘ohnologues' (paralogous genes generated by polyploidy) created by successive rounds of tetraploidization; lineage-specific secondary losses of some ohnologues28,29; as well as the confounding effects of asymmetric evolutionary divergence between ohnologues30. The possibility of delayed rediploidization after a tetraploidization event28,31,32 has further complicated the interpretation of gene trees, as it uncouples gene duplication time from the divergence time of the ohnologues. In addition, the tendency of lamprey ohnologues to cluster outside gnathostome gene clades due to high GC-content and consequent codon bias22,26,33 has impeded the use of gene trees for determining the timing of 1R and 2R.
An alternative and more effective strategy for the identification of ancient polyploidy is the macrosynteny-based reconstruction of ancestral genomes12–15. In particular, this strategy has the potential to reveal chromosome fusion/fission events that occurred in the interval between 1R and 2R and/or after 2R13,15. Whether such genome rearrangements are shared by cyclostomes and gnathostomes would be potentially informative for determining the timing of the cyclostome–gnathostome divergence in relation to 1R and 2R. A prerequisite for such comparisons is the high-resolution reconstruction of the proto-cyclostome and the proto-gnathostome genomes which require high-quality, chromosome-scale genome assemblies from the most basal vertebrate lineages such as cyclostomes and cartilaginous fishes.
In the present study, we generate de novo chromosome-scale genome assemblies of a cyclostome, the Japanese lamprey (Lethenteron japonicum; also known as the Arctic lamprey Lethenteron camtschaticum) and a cartilaginous fish, the elephant shark (Callorhinchus milii), based on long single-molecule reads and chromatin conformation capture (Hi-C) data. These two species represent two crucial divergence points in the evolution of vertebrates (Fig. 1). We use our recently developed probabilistic macrosynteny model34 to reconstruct the proto-vertebrate and proto-cyclostome genomes. The major advantage of our method is that it has a high tolerance to reconstruction uncertainty caused by small-scale rearrangements that have accumulated over a long evolutionary time34. Using our strategy, we are able to reconstruct the proto-cyclostome genome, in which we integrate information from the Japanese lamprey genome, the sea lamprey genome and the Pacific lamprey linkage markers19. In addition, using the elephant shark genome, we reconstruct the proto-gnathostome genome with a higher coverage of extant gnathostome genomes than previous reconstructions (including 19,343 human genes as compared to 12,137 human genes in ref. 13, and 8,434 human genes in ref. 15
Our high-resolution reconstructions resolve the number and timing of polyploidization events during early vertebrate evolution and provide new insights into the genetic basis underlying evolutionary innovations during the origin of early vertebrates. In addition, our reconstructions serve as a reliable reference for accurate annotation of ohnologues, which will be especially important for ohnologues with low sequence similarity30 which are difficult to identify by standard approaches.
Results
Genome sequencing, assembly and annotation
We generated de novo chromosome-scale genome assemblies for elephant shark and Japanese lamprey using a combination of PacBio single-molecule real-time (SMRT) sequencing (68- and 87-fold coverage, respectively), and ‘Chicago’35 and Hi-C data aided scaffolding (see Supplementary Note 1). The resultant genome assemblies of elephant shark and Japanese lamprey span 991 Mb (N50 contig, 1.6 Mb and N50 scaffold, 69 Mb) and 1.07 Gb (N50 contig, 1.6 Mb and N50 scaffold, 10.7 Mb), respectively. These assemblies contain a substantially higher amount of repetitive sequences (42 and 50%) compared to the previous short-read assemblies of elephant shark (28%)36 and Japanese lamprey (21%)22, presumably due to the higher contiguity of the long-read assemblies. Using the MAKER pipeline (v2.31.8)37 and evidence-based and ab initio gene predictions, we predicted 18,747 protein-coding genes in the elephant shark genome assembly and 19,455 protein-coding genes in the Japanese lamprey genome assembly, respectively.
Reconstruction and validation of the proto-vertebrate genome
We reconstructed the proto-vertebrate genome structure by employing the probabilistic macrosynteny model34 and comparing the Japanese lamprey, sea lamprey (Petromyzon marinus)19, amphioxus (Branchiostoma floridae)14, and four gnathostome genomes including human, chicken38, spotted gar39 and elephant shark (see ‘Methods'). In our reconstruction procedure, the lamprey genomes were partitioned into segments of conserved macrosynteny (191 Japanese lamprey segments and 198 sea lamprey segments), where synteny breakpoints were detected using the Japanese lamprey, sea lamprey, elephant shark, spotted gar, chicken and human genomes. Then, our Bayesian inference algorithm reconstructed the proto-vertebrate genome, assuming that individual proto-vertebrate chromosomes (Pvcs) have distinct orthologue distributions over the lamprey segments34. The reconstructed proto-vertebrate genome comprises 18 putative chromosomes (designated as Pvc1–18, with Pvc18 exhibiting only weak macrosynteny conservation in the amphioxus genome) and is largely consistent with previous reconstructions with 17 putative chromosomes14,15 (Supplementary Fig. 3 and Supplementary Table 9 in Supplementary Note 3).
As a validation of our reconstruction we examined conserved macrosynteny with representative invertebrate and gnathostome genomes including the scallop Chlamys farreri40, the placozoan Trichoplax adhaerens41, human and elephant shark (Fig. 2, also see Supplementary Fig. 4). These lineages have been shown to possess relatively slow rates of genome structure evolution12,36,41–43. We therefore expect that a reliable reconstruction should show a highly non-random distribution of orthologues in these genomes. Indeed, we noted that the orthologues are not randomly scattered throughout the modern genomes, but are clustered into a small number of chromosomes (evident as concentration of blue dots in Fig. 2). The fact that we find strong macrosynteny conservation in these invertebrate genomes, which were not used in the proto-vertebrate reconstruction, supports the validity of our reconstruction and indicates that all 18 reconstructed chromosomes existed as separate chromosomes in early metazoan lineages.
Reconstruction of the proto-cyclostome chromosomes and evidence for sixfold duplication of the genome
The generation of a long-read-based high-quality genome assembly for the Japanese lamprey, in addition to the existing ‘hybrid’ genome assembly of the sea lamprey19, permitted us to investigate unresolved issues in cyclostome genome evolution. In particular, the evolutionary steps between the proto-vertebrate and proto-cyclostome genomes have remained contentious, even after the sequencing of the sea lamprey genome18,19,26. For example, the presence of six Hox clusters in two species of lampreys and the inshore hagfish could be due to more than two rounds of tetraploidization (S5 in Fig. 1); alternatively, they could be the result of a single tetraploidization event followed by chromosome duplication events (S8 in Fig. 1). Another possibility is that the cyclostome lineage experienced a hexaploidization event (whole-genome triplication) in addition to a tetraploidization (whole-genome duplication) event (S6 in Fig. 1). These alternative evolutionary models have been discussed in previous studies18,19,22,26 but remained unresolved even with the chromosome-level assembly of the sea lamprey genome.
In the present study, we have generated the first reconstruction of the proto-cyclostome genome by combining lamprey segments (described in the previous section) into 104 proto-cyclostome chromosomes (see ‘Methods' for details). Our algorithm enumerates possible combinations of lamprey segments (see Supplementary Movie 1), and reconstructs proto-cyclostome chromosomes by choosing the combination with the most significant (i.e. non-random) distribution of paralogues and orthologues (partly illustrated in Fig. 3a–c). Importantly, the algorithm explores all alternative models including segmental duplications, chromosome duplications/losses, tetraploidization and hexaploidization, under the assumption that duplicated chromosomes share significantly large numbers of paralogues. The major advantage of this reconstruction method is its robustness against lineage-specific rearrangements and fragmentation of genome assemblies. For example, Japanese lamprey Scaffold2 was partitioned into two segments (Fig. 3a) because each of the segments showed conserved synteny with two different sea lamprey scaffolds; in our reconstruction (Fig. 3b), the two segments on Scaffold2 were assigned to different proto-cyclostome chromosomes because they share a significantly large number of paralogues (dots in Fig. 3c). Thus, our reconstruction-based analysis is more reliable than scaffold-based analyses used in previous studies18,19,26 and provides the first opportunity to conclusively resolve the controversy over the origin of the proto-cyclostome genome.
To distinguish between alternative polyploidization models (i.e. S5–S8 in Fig. 1), we followed ref. 13 and used a measure we have called multiplicity, i.e. the number of proto-cyclostome chromosomes originating from individual proto-vertebrate chromosomes (Fig. 3d), and counted the numbers of Japanese lamprey genes that map to these chromosomes. If the proto-cyclostome genome was shaped by three rounds of tetraploidization (S5 in Fig. 1), it should be covered by chromosomes of multiplicity eight. Instead, if it experienced a single tetraploidization with subsequent chromosomal duplications (S8 in Fig. 1), the multiplicity should peak at two with gradual decrease toward larger multiplicities. The third possibility is that if the genome went through a single tetraploidization and a hexaploidization (genome triplication) (S6 in Fig. 1) the majority of the genome should be covered by chromosomes of multiplicity six. Our analysis indicates that 9 out of the 18 proto-vertebrate chromosomes were duplicated into six paralogous proto-cyclostome chromosomes, and that the majority (60.3%) of the proto-cyclostome genome was covered by the sixfold duplicated chromosomes. In addition, we confirmed by statistical test (see ‘Methods') that the observed peak of multiplicity (Fig. 3d) is unlikely to have been created by accumulation of chromosome scale or segmental duplications after one () or two () tetraploidization events. Thus, the clear peak at multiplicity of six is compelling evidence of sixfold duplication of the entire genome, probably through a tetraploidization and a hexaploidization event.
Although the current lamprey genomes might still be incomplete and some chromosomes might be fragmented, such limitations are unlikely to have substantially biased our analysis. First, if the proto-cyclostome genome was shaped by three rounds of tetraploidization, that would additionally require a large number of subsequent chromosome fusions to explain the current genome arrangement (e.g., 45 post-tetraploidization fusions are required to obtain the chromosome number of sea lamprey germline cells: 18 × 8−45 = 99). However, we found that the lamprey lineage had remarkably low rates of inter-chromosomal rearrangement (Supplementary Fig. 5) over ∼500 million years44 of cyclostome evolution. Specifically, our proto-cyclostome genome reconstruction shows large-scale fusions and translocations affecting only 22 out of 141 Japanese lamprey scaffolds and only 19 out of 151 sea lamprey scaffolds that have at least 10 genes. The exceptionally low rate of inter-chromosomal rearrangement and the haploid chromosome number of ∼99 in the germline sea lamprey genome45 are consistent with our evolutionary scenario in which the lamprey chromosome number is explained approximately as 18 × 6 = 108 with several subsequent fusions. Second, even though some tiny chromosomes might be missing in the current proto-cyclostome reconstruction, large chromosomes (e.g. Hox-bearing chromosomes duplicated from Pvc1) are unlikely to be missing entirely; therefore, our reconstruction is particularly reliable for the largest five proto-vertebrate chromosomes (i.e. Pvc1, 3, 10, 13 and 17), which consistently exhibited a multiplicity of six. Thus, the high coverage (60.3%) of the Japanese lamprey genome by sixfold duplicated proto-cyclostome chromosomes suggests that extant cyclostome genomes are paleo-dodecaploids (i.e. the chromosome number increased as 18 × 6 due to tetraplodization and hexaploidization), which might be similar to the situation in sturgeon where a species (Acipenser brevirostrum) with ∼180 chromosomes is considered to be a hexaploid of a tetraploid ancestor with ∼60 chromosomes46–48.
Proto-gnathostome genome and the origin of microchromosomes
Previous reconstructions of the proto-gnathostome genome12–15 included members of only bony vertebrates (Osteichthyes) and lacked representatives of its sister group, the cartilaginous fishes (Chondrichthyes). Here, we produced a substantially improved reconstruction of the proto-gnathostome genome structure with a higher coverage of modern genomes by taking advantage of our newly sequenced, chromosome-scale genome assembly of the elephant shark, in addition to the spotted gar, zebra finch, turkey, chicken, opossum, dog, mouse and human genomes (see ‘Methods', Fig. 4 and Supplementary Fig. 6). The reconstruction provided additional support for the previous finding of two rounds of tetraploidization between the proto-gnathostome and its invertebrate ancestor11–15.
Analysis of this proto-gnathostome genome also revealed the origin of microchromosomes found in some modern gnathostomes. Microchromosomes are tiny chromosomes (typically smaller than 20 Mb), characterized by high GC-content, high gene density and high recombination rate38,49. Although there are no microchromosomes in the human genome, they are present in other tetrapod lineages such as birds and reptiles. Whether microchromosomes were recently created by chromosome fission or were present in the gnathostome ancestor has been controversial (see Supplementary Note 4 for a short review). Although several recent studies supported the ancient origin of microchromosomes13,36,39,49–52, it was still unknown (1) if chromosomal features characteristic to modern avian microchromosomes (i.e. high GC-content, high gene density and high recombination rate) were already present in the ancestral gnathostome genome (cf. the chromosomal features were previously reported to be conserved between the spotted gar and chicken genomes39), and (2) why microchromosomes have been conserved in distantly related gnathostome species such as the chicken, spotted gar and elephant shark.
In the present study, our reconstruction shows that at least 15 proto-gnathostome chromosomes have remained intact as microchromosomes in some modern gnathostome genomes such as chicken, spotted gar and elephant shark (Supplementary Fig. 7) even after ∼450 million years of gnathostome evolution44,53. Furthermore, we observed that specific sequence features (namely, chromosome length and gene density) are shared by modern gnathostome chromosomal regions that were derived from such proto-gnathostome chromosomes (Fig. 5). First, the total length of segments originating from individual proto-gnathostome chromosomes is highly conserved in chicken, spotted gar and elephant shark, suggesting that the ancestral gnathostome already possessed the tiny microchromosomes and the large macrochromosomes (Fig. 5a). Second, smaller proto-gnathostome chromosomes tend to have higher gene densities in all species, suggesting that the ancestral gnathostome genome consisted of small chromosomes with high gene densities and large chromosomes with low gene densities (Fig. 5b). Third, smaller proto-gnathostome chromosomes tend to have higher ohnologue densities in individual species, suggesting that the ancestral gnathostome genome had small chromosomes with high ohnologue densities (Fig. 5c). These observations suggest that many of the proto-gnathostome chromosomes might have already exhibited distinctive features (e.g. diminutive chromosomes with high gene density) that are considered characteristics of avian microchromosomes38,49. Thus, the proto-gnathostome lineage might have already possessed many microchromosomes with high gene density, many of which are still retained in several modern gnathostome genomes due to low rates of inter-chromosomal rearrangement. On the other hand, macrochromosomes, large genome sizes and high rates of rearrangement are likely to be derived characteristics of lineages that experienced substantial expansion of repetitive sequences.
The persistence of intact microchromosomes in modern gnathostome genomes is intriguing, and raises questions about the possible mechanism and evolutionary forces maintaining them over such a long evolutionary time49. One possibility is the presence of a high density of genomic regulatory blocks (GRBs) comprising long-range interacting regulatory elements and/or topologically associating domains (TADs) that require long-range linkage to be maintained intact. To test this possibility we analysed the density of GRBs and TADs54 and observed no obvious difference between macrochromosomes and microchromosomes (see Supplementary Fig. 8 and Supplementary Note 4). An alternative possibility is that the persistent synteny conservation is a by-product of the small size and high gene density of microchromosomes49, which is corroborated by previous arguments that gene density and ohnologue density are major factors in decreasing the rates of evolutionary breakage55 and inter-chromosomal rearrangement56, respectively. Consistent with this hypothesis, we find evidence for high density of genes (including ohnologues, which were identified with the method described in Supplementary Note 2) in the proto-gnathostome chromosomes that gave rise to the modern microchromosomes (Fig. 5b, c).
Timing of gnathostome–cyclostome divergence relative to 1R and 2R
The timing of gnathostome–cyclostome divergence relative to the two basal vertebrate tetraploidization events (i.e. 1R and 2R) remains an unresolved issue in the field of vertebrate genome evolution. In order to resolve the divergence timing conclusively, we searched our reconstructions of the proto-vertebrate, proto-cyclostome and proto-gnathostome genomes for evidence of large-scale genomic changes that help distinguish between three alternative divergence models, i.e., divergence before 1R, between 1R and 2R, or after 2R. Our reconstructions revealed nine major fusion events that occurred during the interval between 1R and 2R (see Supplementary Note 3 and Supplementary Fig. 6), but none of these fusions is shared with the proto-cyclostome lineage (Supplementary Fig. 15b, c and Fig. 2), suggesting that the two lineages diverged before the chromosome fusion events and thus before 2R. Furthermore, the orthologue distribution between proto-gnathostome and proto-cyclostome chromosomes demonstrates four-to-six correspondence and a quasi-random gene retention pattern (Supplementary Fig. 15a). This lack of one-to-one or two-to-three orthology relationships indicates that the two lineages diverged shortly after 1R but before rediploidization.
In order to verify the timing of duplications and the gnathostome–cyclostome divergence, we performed a gene tree analysis by inserting lamprey genes into Ensembl gene trees or re-computing the gene trees (see Supplementary Note 5). Then, we classified human and lamprey paralogue pairs by their duplication timing and plotted vertebrate paralogues (i.e. paralogues duplicated before the gnathostome–cyclostome split), gnathostome-specific paralogues and cyclostome-specific paralogues on the proto-gnathostome and proto-cyclostome genomes (Supplementary Figs. 9–15). Intriguingly, we observed a mixture of vertebrate paralogues and cyclostome-specific paralogues between most pairs of homoeologous proto-cyclostome chromosomes, making it difficult to conclusively determine the duplication timing of individual chromosomes. This observation may be explained by (1) difficulties in gene tree inference due to the high GC content and strong codon bias in the lamprey genomes22,26,33, (2) differential gene loss between cyclostome and gnathostome lineages29, (3) delayed rediploidization28,31,32 creating cyclostome-specific paralogues between proto-cyclostome chromosomes duplicated by 1R, and (4) tetraploidization through hybridization and doubling57–59, which may have created both vertebrate-specific and cyclostome-specific paralogues due to recurrent hybridization among genetically diverse subpopulations57,58 and subsequent genetic drift60. Although these factors may have obscured the duplication timing, the presence of chromosome pairs enriched either with vertebrate-specific paralogues or cyclostome-specific paralogues are consistent with the model that the proto-cyclostome lineage diverged from the proto-gnathostome lineage shortly after 1R.
Inferred scenario of early vertebrate genome evolution
The findings described above can be brought together into a model describing the steps of early vertebrate genome evolution (Fig. 6). First, our reconstruction indicates that the proto-vertebrate genome (with 18 chromosomes) was similar in structure to the ancestral bilaterian animal genome, as suggested by the strong macrosynteny conservation between the proto-vertebrate and scallop genomes (Fig. 6a), e.g. Pvc2, 5, 6, 7, 9, 10, 16, 17 and 18 retain one-to-one correspondence with chr1, 11, 17, 8, 4, 7, 9, 3 and 13 in scallop, respectively. Second, our analysis suggests that the gnathostome–cyclostome divergence occurred shortly after 1R (Fig. 6b) but before rediploidization (Supplementary Fig. 15). This was followed by nine gnathostome-specific chromosome fusions, which were not shared with the proto-cyclostome lineage (Fig. 6c and Supplementary Fig. 6). Third, the 2R event in the proto-gnathostome was an allotetraploidization event, as our reconstruction shows biased gene loss/retention between duplicated chromosomes (Fig. 6e and Supplementary Note 4)61–63. Indeed, the ratio of retained genes between the two subgenomes in the proto-gnathostome genome is 2.25, which is considerably larger than previously reported ratios of paleo-allopolyploids: 1.47 for Brassica, 1.46 for maize, 1.24 for sorghum, 1.17 for Arabidopsis and 1.35 for Xenopus laevis61,64. A comparison with the modern gnathostome genomes (Fig. 6f and Supplementary Fig. 7) shows that a pair of chromosomes duplicated by 2R typically gave rise to a large chromosome (dashed lines) and a microchromosome (solid lines) in elephant shark and chicken, which suggests that the proto-gnathostome ancestor already possessed microchromosomes as a result of biased fractionation between the subgenomes. (A paper published after the submission of this manuscript suggested a similar evolutionary scenario65.) Fourth, we present evidence that there was a cyclostome-specific hexaploidization (Fig. 6g) that gave rise to the proto-cyclostome genome with 18 × 2 × 3 chromosomes, most of which are still retained in the modern lamprey genomes with ~99 chromosomes45 due to remarkably low rates of inter-chromosomal rearrangement.
Discussion
To the best of our knowledge our reconstruction is the first reported genome-scale evidence for hexaploidy in the cyclostome lineage. There are several documented examples of hexaploidy giving rise to new evolutionary lineages. Perhaps the most well-known example is wheat, a domesticated crop with three subgenomes (A, B and D). The formation of hexaploid wheat is believed to be a multi-step process where there was an initial tetraploid genome formed by hybridization, and a subsequent hybridization of the tetraploid with a diploid, generating a hexaploid66. Hexaploidy has also been shown in early dicot plant evolution67, the shortnose sturgeon (Acipenser brevirostrum)46, and in the Prussian carp (Carassius gibelio)68,69. In most instances the mechanism of hexaploidy origin has been inferred to have been by serial hybridizations46,66.
These instances suggest that genome hybridization may have played a significant role in the origin and evolution of early vertebrates, as already discussed in previous studies9,70. A few possible mechanisms have been suggested for explaining the establishment of allopolyploid species. First, heterosis, or hybrid vigour, confers selective advantages to the newly formed allopolyploids71,72. Second, asymmetric and unequal contribution from the subgenomes has been reported to have facilitated the evolution of complex phenotypes in some allopolyploids: for example, it was suggested that the allotetraploid cotton produces high-quality fibres by combination of long fibres from the A-genome and short fibres from the D-genome73; in the allohexaploid wheat, the A-genome is responsible for the morphological traits, while the B- and D-genomes contain most genes for response to biotic and abiotic factors74. In line with this argument, previous studies have shown an example of asymmetric contribution from quadruple paralogous regions in the human genome75. Our reconstruction suggests that such asymmetric evolution may not be limited to specific gene clusters but is a genome-scale phenomenon due to the hybrid origin of the allopolyploid proto-gnathostome genome.
In particular, our reconstruction suggests that genome hybridization might have contributed to the origin of the adaptive immune system (AIS), which is a prime example of a major evolutionary innovation in early vertebrates. The human AIS is an intricate defence system characterized by the B cell and T cell receptors and the major histocompatibility complex (MHC), which are highly conserved throughout most gnathostomes, including cartilaginous fishes, but are missing in invertebrates, including the closest relatives of vertebrates, such as sea squirts and amphioxus76,77. The seemingly abrupt emergence of such a complex molecular machinery of AIS has been described as an evolutionary ‘Big Bang’ triggered by macroevolutionary events, including the two rounds of tetraploidization76,78–80.
Of particular interest with regard to the origin of the AIS is a previous hypothesis on the evolution of highly polymorphic genes encoded in the MHC, natural killer gene complex (NKC) and leucocyte receptor complex (LRC), which are essential for the mammalian AIS. It has been proposed that the precursors of MHC, NKC and LRC were physically linked on the proto-MHC chromosome, and the tight linkage facilitated co-evolution of highly polymorphic receptors and ligands, giving rise to a putative ‘immune supercomplex’ that was subsequently fragmented as MHC, NKC and LRC in the human genome76,78–81. However, our reconstruction shows that the hypothesized proto-MHC chromosome was an artefact due to inter 1R–2R chromosome fusions and additional rearrangements in the mammalian lineages. In our reconstruction, the putative proto-MHC chromosome is divided into several proto-vertebrate chromosomes (i.e. Pvc5, 11, 13, 14, 15, 17 and 18) that are clearly separated in the lamprey, amphioxus and scallop genomes (Fig. 2), and the MHC, NKC and LRC gene clusters are derived from these distinct proto-vertebrate chromosomes including Pvc5, 15 and 17.
Intriguingly, our reconstruction shows that MHC, NKC and LRC were located on microchromosomes in the proto-gnathostome genome (i.e. Pgc38, 12 and 27 in Supplementary Data 1). This observation suggests that the post-1R tetraploid species might have already had co-evolving genes encoding the precursors of MHC, NKC and LRC, and the post-2R allopolyploid preserved this interaction network within a subgenome despite the higher rate of gene loss in microchromosomes. This view is also consistent with the previous observation that functionally linked genes involved in ‘response to stimulus’ (e.g. genes involved in adaptive immunity) tend to be retained in cis after 2R, suggesting that interacting gene clusters were preserved despite extensive gene loss82,83. In addition, we observed functional biases between the two subgenomes: the human genes in the segment derived from the shorter subgenome were enriched with ‘defense/immunity protein’ in PANTHER Protein Class (FDR , see Supplementary Note 4). Overall, our reconstruction suggests a possible role of asymmetric contribution from subgenomes for the emergence of gnathostome-like AIS, and corroborates the view that a primordial ‘adaptive’ immune system emerged in the ancestral vertebrate genome and later turned into the intricate gnathostome-like AIS through 2R76,77,80.
Finally, our reconstruction of the proto-gnathostome genome has implications for understanding the intrinsic evolutionary constraints on gnathostome genomes. It has previously been shown ohnologues are frequently dosage-sensitive and resistant to evolutionary duplication and loss84–86, and that the distribution of ohnologues constrains copy-number variations among human populations87. Interestingly, our reconstruction suggests that the high gene and ohnologue densities are conserved features (Fig. 5) associated with microchromosomes created by biased gene loss after 2R (Fig. 6), and that human chromosomal regions with high ohnologue densities originated from microchromosomes in the proto-gnathostome genome. Thus, ohnologue-rich regions that are susceptible to pathogenic copy-number variations may be regarded as a legacy from the allopolyploid proto-gnathostome genome and subsequent asymmetric evolution between the subgenomes. In addition, by referring to the evolutionary origins (Fig. 6), we can (1) identify ohnologue relationships between genes with low sequence similarity30 that might otherwise remain cryptic and (2) prioritize copy-number variations in personal genomes for potential pathogenicity.
In conclusion, we have generated high-quality, chromosome-scale genome assemblies for two phylogenetically opportune organisms, and inferred the genome structures of early vertebrate lineages. This is the first effort to reconstruct the proto-cyclostome genome, which was critical for determining the cryptic origins of the proto-cyclostome and proto-gnathostome genomes. Consequently, our reconstruction resolved several important issues including the number and relative timings of polyploidization events that occurred during the early origin of vertebrates. The resulting model offers unique perspectives on the origin and evolution of vertebrate genomes.
Methods
Probabilistic macrosynteny model
We reconstructed the proto-vertebrate genome by employing the probabilistic macrosynteny model34, which was previously used for inferring the structure of the pre-TGD genome (TGD stands for teleost-specific genome duplication). The details including the probability model, definitions of parameters/variables, algorithm and estimation accuracy can be found in ref. 34 (open access). In short, the macrosynteny model assumes that the individual pre-WGD chromosomes have distinct orthologue distributions over the present-day post-WGD genomes; then, the pre-WGD genome structure can be reconstructed by employing the variational Bayesian inference algorithm. In the present study, we used an algorithm called collapsed variational Bayes (CVB)88,89, which is more efficient than the variational Bayesian expectation-maximization algorithm described in ref. 34.
The CVB algorithm is derived as follows using the same model and variables as defined in ref. 34. In the framework of the probabilistic macrosynteny model, we infer the pre-WGD genome structure as the posterior (i.e. ) of the model parameters () and latent variables () conditioned by the orthologue information (). Since exact computation of the posterior is infeasible, the posterior needs to be approximated by tractable probability density functions. For deriving the CVB algorithm, the posterior is approximated by that can be factorized as
1 |
where S is the number of non-WGD segments and Gs is the number of genes in segment s. Then, assuming this factorization and following the derivation of the CVB (or CVB0) algorithm88,89 for the probabilistic topic model90,91, we obtain Algorithm 1 with the following update formula:
2 |
where and are given by Eqs. (11) and (12) in ref. 34 and C is a constant that cancels by normalizing so that . In the actual computation, we avoided early convergence of to suboptimal values by starting from a less extreme prior distribution with a slightly larger value of as follows: at the th iteration, we replaced with while . We continued updating until after 100 iterations or until converges, satisfying
3 |
where q′ denotes the estimate in the previous iteration.
Below is a pseudocode for the CVB0 algorithm.
Reconstruction of the proto-vertebrate genome
We reconstructed the structure of the proto-vertebrate genome in two steps: first, we partitioned the lamprey genomes into blocks of conserved synteny by comparing the lamprey genomes with each other and also with four gnathostome genomes (i.e. human, chicken, spotted gar and elephant shark); second, we inferred the structure of the pre-WGD genome by applying the macrosynteny model to the amphioxus and lamprey genomes. These steps are described below.
Segmentation of the lamprey genomes
We partitioned the lamprey scaffolds (with at least ten genes) into blocks of conserved synteny as described in ref. 34. Specifically, we employed the Bayesian segmentation model92 and computed the optimal segmentation using a dynamic programming algorithm93. Segmentation was performed in two steps: first, we compared the Japanese lamprey and sea lamprey scaffolds, and identified lineage-specific synteny breakpoints; second, we compared the lamprey genomes with human, chicken, spotted gar and elephant shark genomes, and identified breakpoints occurring between the gnathostomes and cyclostomes. Then we merged the two sets of breakpoints and obtained 191 Japanese lamprey segments and 198 sea lamprey segments. These segments have homogeneous distributions of orthologues in the other genomes under comparison, and thus they are likely to have been unaffected by large-scale inter-chromosomal rearrangements in the cyclostome lineages.
Inference of the pre-WGD genome structure
We analysed the 1-to-4 orthologue distribution among the amphioxus scaffolds14 and the Japanese lamprey and sea lamprey segments, by applying the macrosynteny model and CVB0 algorithm with the following parameter values: post-WGD species , numbers of post-WGD segments and , maximum number of co-orthologues for all , , for all , and for all and (see ref. 34 for details of these parameters). As described in ref. 34, individual amphioxus scaffolds were associated with mixture distributions over the proto-vertebrate chromosomes, which represent reconstruction confidence scores. For the sake of simplicity in visualization, we assigned each amphioxus scaffold to the proto-vertebrate chromosome with the largest reconstruction confidence score (i.e. , where denotes expectation), which is calculated by using Eq. (11) in ref. 34. In addition, we assigned each lamprey segment to the proto-vertebrate chromosome with the largest reconstruction confidence score (i.e. ), which is calculated by using Eq. (12) in ref. 34. See also Fig. 1 in ref. 34 for an intuitive explanation.
Number of proto-vertebrate chromosomes
In the macrosynteny model, the number of proto-vertebrate chromosomes is treated as an input parameter () for inferring the optimal pre-WGD genome structure. The previous studies estimated the number of proto-vertebrate chromosomes to be 10–13 in refs. 12,13,18,19,94 or 17 in refs. 14,15, but the exact number is unknown. In order to decide the optimal number of , we reconstructed the proto-vertebrate chromosomes with , and evaluated the quality of those reconstructions by comparing their paralogue distributions as follows.
The underlying assumption is that most lamprey paralogues were created by WGDs (or by chromosome-scale duplications as proposed in ref. 18); then, the paralogue distribution should be highly non-random, with most paralogues found between lamprey segment pairs both deriving from the same proto-vertebrate chromosome. We quantified such non-randomness by using the hypergeometric distribution under the null hypothesis in which paralogues are randomly distributed over the entire genome as described below.
Let and be the number of gene pairs and paralogue pairs in the genome, respectively, and be the number of gene pairs both of which derive from the same proto-vertebrate chromosomes. Let be a random variable representing the number of paralogue pairs both of which derive from the same proto-vertebrate chromosome, and be the observed number of such paralogue pairs. Then, the significance of is given as follows:
4 |
where denotes the probability and () denotes the binomial coefficient.
In both the Japanese lamprey and sea lamprey genomes, the reconstruction with was the most significant in this criterion (Supplementary Table 8). Then we labelled the proto-vertebrate chromosomes as Pvc1–Pvc18 (although we observed that Pvc18 has no clear synteny in gnathostome and invertebrate genomes).
Reconstruction of the proto-cyclostome genome
Although the 2R hypothesis was resolved by a genome-wide synteny analysis11 and reconstruction analyses12–14, the origins of the proto-gnathostome and proto-cyclostome genomes have remained contentious. In particular, the timing of gnathostome–cyclostome divergence and possibility of cyclostome-specific WGD have remained topics of debate even after sequencing of the sea lamprey genome18,22,26. For example, six Hox clusters were found in the cyclostome genomes19,22,26,27, but it was not clear if the number of Hox clusters should be explained by additional cyclostome-specific WGD followed by the loss of two entire clusters, or by chromosome-scale duplications in the cyclostome lineage18,19,22.
We considered that a reconstruction of the proto-cyclostome chromosomes would provide a conclusive answer to this question. In our macrosynteny model analysis, the Hox-bearing proto-vertebrate chromosome comprises 10 Japanese lamprey segments and 11 sea lamprey segments, which are likely to be parts of proto-cyclostome chromosomes fragmented due to inter-chromosomal rearrangements or limited scaffold length in the current genome assemblies. Thus, the reconstruction of proto-cyclostome chromosomes can be formulated as finding the correct combination from a large number of possible combinations of the lamprey segments. The enumeration of all combinations is called ‘set partitioning’ (i.e. partitioning of a set of segments into non-empty subsets), which is computationally infeasible because the number of all set partitions, known as the Bell number, can be extremely large: for example, the Bell number for the 21 lamprey segments is . To address this problem, we performed clustering of lamprey segments and reduced the number of set partitions as follows.
Step 1: Paralogous lamprey segments do not originate from the same proto-cyclostome chromosome. Therefore, we calculated the paralogue significance for each segment pair, and significant pairs were not allowed to be assigned to the same cluster in the subsequent steps.
Step 2: Orthologous segments between Japanese lamprey and sea lamprey originate from the same proto-cyclostome chromosome. Therefore, we performed a single linkage clustering of lamprey segments to make clusters of orthologous segments. First, we defined individual segments as initial clusters. Second, we sorted segment pairs by the significance of the number of orthologues between them. Third, we repeated choosing the most significant segment pair and merged the two clusters if they did not have paralogous segments.
Step 3: Some lamprey scaffolds are expected to be over-fragmented by the synteny segmentation algorithm or by lineage-specific rearrangements. In order to address such over-fragmentation of lamprey scaffolds, we merged two clusters if (i) they had segments on the same scaffold and (ii) they did not have paralogous segments.
Step 4: In addition to Step 3, we utilized the Pacific lamprey linkage markers19 and merged two clusters if (i) the clusters shared a pair of sea lamprey segments having linkage markers on the same Pacific lamprey linkage group and (ii) the clusters did not have paralogous segments.
Step 5: Reliable reconstruction is difficult for short segments having few orthologues and paralogues. Therefore, clusters of lamprey segments were excluded from the proto-cyclostome reconstruction if the clusters had fewer than five genes.
For each of Pvc1–Pvc17, we enumerated all set partitions of the clusters, and chose the optimal set partition with the most significant distribution of orthologues and paralogues as the proto-cyclostome chromosomes. During this analysis we found that some Japanese lamprey scaffolds are likely to be haplotype sequences that were not removed from the primary assembly by FALCON during the final stage of the assembly; we therefore excluded the following Japanese lamprey scaffolds from the proto-cyclostome reconstruction: Scaffolds 110, 190, 198, 105, 104, 82, 163, 69, 133, 74, 115, 139, 86, 70, 72, 171 and 192. We left Pvc18 as a single proto-cyclostome chromosome because computation of the optimal set partitioning was infeasible for Pvc18 consisting of 34 segments.
Significance of paralogues, orthologues and set partitions were calculated as follows.
Significance of the number of paralogues
The significance of the number of paralogues between two lamprey segments is calculated as follows. Let and be the number of gene pairs and paralogue pairs in the genome, respectively, and be the number of gene pairs between the two segments. Let be a random variable representing the number of paralogue pairs between the two segments, and be the observed number of such paralogue pairs. Then, the probability of observing paralogue pairs between the two segments is given by
5 |
and the number of paralogue pairs was considered significant if .
Significance of the number of orthologues
The significance of the number of orthologues between two lamprey segments is calculated as follows. Let and be the number of gene pairs and orthologue pairs between the Japanese lamprey and sea lamprey genomes, respectively, and be the number of gene pairs between the two segments. Let be a random variable representing the number of orthologue pairs between the two segments, and be the observed number of such orthologue pairs. Then, the probability of observing orthologue pairs between the two segments is given by
6 |
and the number of orthologue pairs was considered significant if .
Significance of a reconstruction
For each proto-vertebrate chromosome ( for Pvc1,…,17, respectively) and each species ( for Japanese lamprey and for sea lamprey), the significance of paralogues was calculated as follows. Let be the number of gene pairs and be the number of paralogue pairs in species such that both genes derive from proto-vertebrate chromosome . Let be the number of gene pairs between different proto-cyclostome chromosomes (i.e. inter-chromosome gene pairs). Let be a random variable representing the number of inter-chromosome paralogue pairs, and be the observed number of such paralogue pairs. Then, the significance of inter-chromosome paralogue pairs is given by
7 |
In addition, we calculated the significance of the number of orthologues between species and . Let be the numbers of gene pairs and be the number of orthologue pairs between the two species such that both genes derive from proto-vertebrate chromosome . Let be the number of gene pairs between species and , where both genes derive from the same proto-cyclostome chromosome. Let be a random variable representing the number of orthologue pairs, deriving from the same proto-cyclostome chromosome, and be the observed number of such orthologue pairs. Then, the significance of orthologue pairs is given by
8 |
Finally, we defined the significance of the set partition for proto-vertebrate chromosome c as
9 |
This method is an extension of the reconstruction of post-2R chromosomes in ref. 13, which was developed for verifying if genome quadruplication occurred in the proto-vertebrate lineage: in the previous study, set partitioning into 2, 3, 4 and 5 post-2R chromosomes were enumerated for showing that quadruplication was the most significant; we extended it to also enumerating set partitions into more than five proto-cyclostome chromosomes.
Reconstruction of the proto-gnathostome genome
We reconstructed the proto-gnathostome chromosomes by comparing the amphioxus14 and several gnathostome genomes including elephant shark. As illustrated in Fig. 4, we performed reconstruction in three steps: first, we partitioned the gnathostome chromosomes into blocks of conserved synteny (Fig. 4f); second, we applied the CVB0 algorithm and made groups of gnathostome segments that share large numbers of paralogues (Fig. 4g); third, segments in individual groups were further partitioned into several subgroups representing proto-gnathostome chromosomes (Fig. 4h).
Segmentation of the gnathostome genomes
We partitioned the gnathostome genomes as described in ref. 34. Specifically, we performed genome segmentation twice for each gnathostome genome: one with four teleost genomes (i.e. zebrafish, stickleback, medaka and Tetraodon) to identify blocks of doubly conserved synteny, and the other with chicken, turkey, zebra finch, anole lizard and spotted gar to identify additional synteny breakpoints in individual lineages. Then we merged the two sets of synteny breakpoints to define 151 human segments. Similarly, we partitioned the mouse, dog, opossum, chicken, turkey, zebra finch, spotted gar and elephant shark genomes into 258, 212, 163, 70, 69, 60, 78 and 132 segments, respectively. These numbers of segments are slightly different from the previous study34, because we used the spotted gar genome in addition to the non-mammalian amniotes in the synteny segmentation step.
Analysis with the macrosynteny model
We used the macrosynteny model and applied the CVB0 algorithm with the following parameter values: number of post-WGD species , numbers of post-WGD segments and for respectively, maximum number of co-orthologues for all , , for all , and for all and . Then we set and evaluated the reconstruction quality by comparing the significance of paralogue distribution as described for the reconstruction of proto-vertebrate genome. We found that clustering into 10 groups of segments was optimal, which is consistent with the previous study13. However, as we argue in Fig. 4, individual groups might represent multiple proto-vertebrate chromosomes due to inter-chromosomal rearrangements. Therefore, we reconstructed proto-gnathostome chromosomes by employing a rearrangement-aware method as follows.
Reconstruction of proto-gnathostome chromosomes
In this step, we used segments from the human, mouse, dog, opossum, chicken, turkey, zebra finch, spotted gar and elephant shark genomes, and enumerated set partitions for each of the 10 groups after reducing the number of set partitions by clustering of gnathostome segments as in the reconstruction of proto-cyclostome chromosomes:
Step 1: Paralogous gnathostome segments do not originate from the same proto-gnathostome chromosome. Therefore, we calculated the paralogue significance for each segment pair, and significant pairs were not allowed to be assigned to the same cluster in the subsequent steps.
Step 2: Orthologous gnathostome segments originate from the same proto-gnathostome chromosome. Therefore, we performed a single linkage clustering of gnathostome segments to make clusters of orthologous segments. First, we defined individual segments as initial clusters. Second, we sorted segment pairs by the significance of the number of orthologues between them. Third, we repeated choosing the most significant segment pair and merged the two clusters if they did not have paralogous segments.
Step 3: Reliable reconstruction is difficult for short segments having few orthologues and paralogues. Therefore, clusters of gnathostome segments were excluded from the proto-gnathostome reconstruction if the clusters had fewer than five genes.
For the reconstruction of proto-gnathostome chromosomes, we assumed that individual gnathostome segments might have derived from multiple proto-vertebrate chromosomes, since it was reported that several rearrangements occurred between the two WGD events13. Figure 4a illustrates the case of a chromosome fusion occurring between the two WGD events. As the result of the fusion, the grey post-2R chromosomes share large numbers of ohnologues with the black and white chromosomes (represented by red regions in Fig. 4c); on the other hand, there are no ohnologues between black and white chromosomes (white regions). In addition to the case of a chromosome fusion between the two WGD events, our reconstruction method considered other rearrangement scenarios, namely, (A) a chromosome fission event occurring in the period between 1R and 2R and (B) a fusion or translocation after 2R. Scenario A results in the same paralogue distribution pattern as in the case of a fusion between the two WGD events, but the two scenarios can be distinguished by checking the orthologue distribution in invertebrate genomes. In Scenario B, the paralogue distribution is different from Scenario A, since the chromosome created by post-2R fusion is paralogous to the other six post-2R chromosomes. In general, we expect to see a large number of paralogues between a pair of proto-gnathostome chromosomes, only if the two chromosomes (1) are duplicated chromosomes or (2) inherit duplicated chromosomes or duplicated segments through rearrangements (fusions, fissions and translocations). These proto-gnathostome chromosome pairs are called ‘red chromosome pairs’ (as in Fig. 4c) in the subsequent texts.
Then, for each cluster ( for the ten clusters) and for each species ( for human, mouse, dog, opossum, chicken, turkey, zebra finch, spotted gar and elephant shark, respectively), the significance of paralogues was calculated as follows. Let be the number of gene pairs and be the number of paralogue pairs in species such that both genes are in cluster . Let be the number of gene pairs between red proto-gnathostome chromosome pairs. Let be a random variable representing the number of paralogue pairs between red chromosome pairs, and be the observed number of such paralogue pairs. Then, the significance of intra-chromosome paralogue pairs is given by
10 |
Significance of orthologues between two gnathostome species was given in the same way as in the proto-cyclostome reconstruction. Then, we calculated the significance of a set partition for cluster c by multiplying the significance values for all species and all species pairs:
11 |
In the last step of the proto-gnathostome reconstruction, we chose the most significant set partition and filtered out small unreliable subgroups. This filtering step was necessary because reliable subgroups are expected to have segments from all (or most) of the gnathostome species, whereas reconstruction errors result in spurious small subgroups having short segments from few species. For this reason, we filtered out small subgroups with segments from only few (<3) species, which filtered out nine small subgroups consisting of four mouse segments, one chicken segment and nine elephant shark segments. This filtering step should have little influence on the analysis results because the number of genes on the filtered segments is only 288 in total. Finally, the remaining subgroups were defined as the proto-gnathostome chromosomes.
The proto-cyclostome genome was shaped by sixfold genome duplication
Here, we introduce a framework for calculating the probability that multiplicities of independently duplicating chromosomes converge toward a given ploidy level, where the convergence is measured in terms of the deviation () from the given ploidy level. Application to the proto-cyclostome genome shows that the observed peak of multiplicity at six is unlikely to be created by chance through accumulation of chromosome-scale duplications.
Let us consider the following situation. The proto-vertebrate genome with chromosomes underwent one or two polyploidization events, producing duplicates for each proto-vertebrate chromosome ( for all after 1R or after two rounds of tetraploidization). Subsequently, those chromosomes were duplicated by a series of independent chromosome-scale duplications, eventually creating duplicates for each proto-vertebrate chromosome . As a measure of deviation from a polyploidization-only model, we define , where is the expected multiplicity ( in our model). Assuming that all chromosomes are equally likely to be duplicated, we calculate , the probability that the deviation is smaller than or equal to the observed deviation (i.e. in our reconstruction) conditioned by the total number of proto-cyclostome chromosomes (i.e. in our reconstruction).
The desired probability is calculated as follows. First, the total number of duplication scenarios is given by , where is the gamma function. Second, for given , the number of duplication scenarios in which individual proto-vertebrate chromosomes are eventually duplicated into proto-cyclostome chromosomes is given by
12 |
where is the multinomial coefficient. Then, by enumerating all values, we can calculate the desired probability (i.e. independently duplicating proto-vertebrate chromosomes converging to multiplicity by chance alone) as
13 |
where the summation is taken over all that satisfy and .
In our reconstruction, we have , , and (see Supplementary Table 10 in Supplementary Note 3). We evaluated the following five evolutionary scenarios: (A) chromosome-scale duplications with no tetraploidization, (B) one tetraploidization followed by chromosome-scale duplications, (C) two tetraploidizations followed by chromosome-scale duplications, (D) chromosome-scale duplications followed by one tetraploidization, and (E) first tetraploidization followed by chromosome-scale duplication followed by second tetraploidization. In these scenarios we assume that for all , where we set and for Scenario A; and for Scenario B; and for Scenario C; and for Scenario D; and and for Scenario E. We set for Scenarios A/B/C and for Scenarios D/E, based on the proto-cyclostome genome reconstruction. In addition, we evaluated the case of by setting , since our model requires for all ; we also evaluated the case of , and since larger proto-vertebrate chromosomes are more reliable in our reconstruction and the largest five proto-vertebrate chromosomes have multiplicity six.
Supplementary Table 10 in Supplementary Note 3 shows small probabilities of observing convergence of multiplicities through independent chromosome-scale duplications. Thus, it is unlikely that the proto-cyclostome genome was shaped by a series of independently occurring chromosome-scale duplications.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This work is supported by funding from the European Research Council Grant Agreements 309834 and 771419 to A.McL. and the Biomedical Research Council of A*STAR, Singapore to B.V. We acknowledge the National Supercomputing Centre of Singapore for providing computational resources for this project. We thank Karsten Hokamp for technical assistance for computational analysis and Anthony Redmond for critical reading of the manuscript.
Author contributions
B.V. and A.McL. conceived and coordinated the project. P.S., V.R., N.E.P., A.P. and B.V. sequenced and annotated the genomes; Y.N. and A.McL. performed genome reconstruction. Y.N., A.McL. and B.V. wrote the manuscript.
Data availability
The Japanese lamprey and elephant shark genome sequences generated in this study have been deposited at DDBJ/ENA/GenBank under the accession numbers WFAB00000000 and WEZY00000000, respectively. The reconstruction dataset including information of orthologues, paralogues and gene names in individual chromosomal segments is available as Supplementary Data 1.
Code availability
The reconstruction software/code is available on request.
Competing interests
The authors declare no competing interests.
Footnotes
Peer review information Nature Communications thanks Daniel Ocampo Daza, Jeramiah Smith and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Yoichiro Nakatani, Prashant Shingate.
Change history
7/29/2021
A Correction to this paper has been published: 10.1038/s41467-021-25110-8
Contributor Information
Aoife McLysaght, Email: aoife.mclysaght@tcd.ie.
Byrappa Venkatesh, Email: mcbbv@imcb.a-star.edu.sg.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-021-24573-z.
References
- 1.Morris SC, Caron JB. A primitive fish from the Cambrian of North America. Nature. 2014;512:419–422. doi: 10.1038/nature13414. [DOI] [PubMed] [Google Scholar]
- 2.Miyashita T, et al. Hagfish from the Cretaceous Tethys Sea and a reconciliation of the morphological-molecular conflict in early vertebrate phylogeny. Proc. Natl Acad. Sci. USA. 2019;116:2146–2151. doi: 10.1073/pnas.1814794116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Osorio J, Retaux S. The lamprey in evolutionary studies. Dev. Genes Evol. 2008;218:221–235. doi: 10.1007/s00427-008-0208-1. [DOI] [PubMed] [Google Scholar]
- 4.Shimeld SM, Donoghue PC. Evolutionary crossroads in developmental biology: cyclostomes (lamprey and hagfish) Development. 2012;139:2091–2099. doi: 10.1242/dev.074716. [DOI] [PubMed] [Google Scholar]
- 5.Janvier, P. in Major Transitions in Vertebrate Evolution (eds. Anderson, J. S. & Sues, H. D.) (Indiana University Press, 2007).
- 6.Heimberg AM, Cowper-Sal-lari R, Sémon M, Donoghue PCJ, Peterson KJ. microRNAs reveal the interrelationships of hagfish, lampreys, and gnathostomes and the nature of the ancestral vertebrate. Proc. Natl Acad. Sci. USA. 2010;107:19379–19383. doi: 10.1073/pnas.1010350107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Donoghue P. Evolution: divining the nature of the ancestral vertebrate. Curr. Biol. 2017;27:R277–R279. doi: 10.1016/j.cub.2017.02.029. [DOI] [PubMed] [Google Scholar]
- 8.Boehm T, et al. Evolution of alternative adaptive immune systems in vertebrates. Annu. Rev. Immunol. 2018;36:19–42. doi: 10.1146/annurev-immunol-042617-053028. [DOI] [PubMed] [Google Scholar]
- 9.Ohno S. Evolution by Gene Duplication (Springer, 1970).
- 10.Holland PWH, Garcia-Fernàndez J, Williams NA, Sidow A. Gene duplications and the origins of vertebrate development. Development. 1994;1994:125–133. [PubMed] [Google Scholar]
- 11.Dehal P, Boore JL. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 2005;3:e314. doi: 10.1371/journal.pbio.0030314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Putnam NH, et al. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science. 2007;317:86–94. doi: 10.1126/science.1139158. [DOI] [PubMed] [Google Scholar]
- 13.Nakatani Y, Takeda H, Kohara Y, Morishita S. Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome Res. 2007;17:1254–1265. doi: 10.1101/gr.6316407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Putnam NH, et al. The amphioxus genome and the evolution of the chordate karyotype. Nature. 2008;453:1064–1071. doi: 10.1038/nature06967. [DOI] [PubMed] [Google Scholar]
- 15.Sacerdot C, Louis A, Bon C, Berthelot C, Roest Crollius H. Chromosome evolution at the origin of the ancestral vertebrate genome. Genome Biol. 2018;19:166. doi: 10.1186/s13059-018-1559-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Muffato M, Roest Crollius H. Paleogenomics in vertebrates, or the recovery of lost genomes from the mist of time. BioEssays. 2008;30:122–134. doi: 10.1002/bies.20707. [DOI] [PubMed] [Google Scholar]
- 17.Van de Peer Y, Maere S, Meyer A. The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 2009;10:725–732. doi: 10.1038/nrg2600. [DOI] [PubMed] [Google Scholar]
- 18.Smith JJ, Keinath MC. The sea lamprey meiotic map improves resolution of ancient vertebrate genome duplications. Genome Res. 2015;25:1081–1090. doi: 10.1101/gr.184135.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Smith JJ, et al. The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution. Nat. Genet. 2018;50:270–277. doi: 10.1038/s41588-017-0036-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Fried C, Prohaska SJ, Stadler PF. Independent Hox-cluster duplications in lampreys. J. Exp. Zool. 2003;299B:18–25. doi: 10.1002/jez.b.37. [DOI] [PubMed] [Google Scholar]
- 21.Furlong RF, et al. A degenerate ParaHox gene cluster in a degenerate vertebrate. Mol. Biol. Evol. 2007;24:2681–2686. doi: 10.1093/molbev/msm194. [DOI] [PubMed] [Google Scholar]
- 22.Mehta TK, et al. Evidence for at least six Hox clusters in the Japanese lamprey (Lethenteron japonicum) Proc. Natl Acad. Sci. USA. 2013;110:16044–16049. doi: 10.1073/pnas.1315760110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Escriva H, Manzon L, Youson J, Laudet V. Analysis of lamprey and hagfish genes reveals a complex history of gene duplications during early vertebrate evolution. Mol. Biol. Evol. 2002;19:1440–1450. doi: 10.1093/oxfordjournals.molbev.a004207. [DOI] [PubMed] [Google Scholar]
- 24.Stadler PF, et al. Evidence for independent Hox gene duplications in the hagfish lineage: a PCR-based gene inventory of Eptatretus stoutii. Mol. Phylogenet. Evol. 2004;32:686–694. doi: 10.1016/j.ympev.2004.03.015. [DOI] [PubMed] [Google Scholar]
- 25.Kuraku S, Meyer A, Kuratani S. Timing of genome duplications relative to the origin of the vertebrates: did cyclostomes diverge before or after? Mol. Biol. Evol. 2009;26:47–59. doi: 10.1093/molbev/msn222. [DOI] [PubMed] [Google Scholar]
- 26.Smith JJ, et al. Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution. Nat. Genet. 2013;45:415–421. doi: 10.1038/ng.2568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Pascual-Anaya J, et al. Hagfish and lamprey Hox genes reveal conservation of temporal colinearity in vertebrates. Nat. Ecol. Evol. 2018;2:859–866. doi: 10.1038/s41559-018-0526-2. [DOI] [PubMed] [Google Scholar]
- 28.Furlong RF, Holland PWH. Were vertebrates octoploid? Philos. Trans. R. Soc. Lond. B Biol. Sci. 2002;357:531–544. doi: 10.1098/rstb.2001.1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kuraku S. Impact of asymmetric gene repertoire between cyclostomes and gnathostomes. Semin Cell Dev. Biol. 2013;24:119–127. doi: 10.1016/j.semcdb.2012.12.009. [DOI] [PubMed] [Google Scholar]
- 30.Holland, P. W., Marletaz, F., Maeso, I., Dunwell, T. L. & Paps J. New genes from old: asymmetric divergence of gene duplicates and the evolution of development. Philos. Trans. R. Soc. Lond. B Biol. Sci.372, 20150480 (2017). [DOI] [PMC free article] [PubMed]
- 31.Martin KJ, Holland PWH. Enigmatic orthology relationships between Hox clusters of the African butterfly fish and other teleosts following ancient whole-genome duplication. Mol. Biol. Evol. 2014;31:2592–2611. doi: 10.1093/molbev/msu202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Robertson FM, et al. Lineage-specific rediploidization is a mechanism to explain time-lags between genome duplication and evolutionary diversification. Genome Biol. 2017;18:111. doi: 10.1186/s13059-017-1241-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Qiu H, Hildebrand F, Kuraku S, Meyer A. Unresolved orthology and peculiar coding sequence properties of lamprey genes: the KCNA gene family as test case. BMC Genomics. 2011;12:325. doi: 10.1186/1471-2164-12-325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Nakatani Y, McLysaght A. Genomes as documents of evolutionary history: a probabilistic macrosynteny model for the reconstruction of ancestral genomes. Bioinformatics. 2017;33:i369–i378. doi: 10.1093/bioinformatics/btx259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Putnam NH, et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 2016;26:342–350. doi: 10.1101/gr.193474.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Venkatesh B, et al. Elephant shark genome provides unique insights into gnathostome evolution. Nature. 2014;505:174–179. doi: 10.1038/nature12826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Cantarel BL, et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18:188–196. doi: 10.1101/gr.6743907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.International Chicken Genome Sequencing Consortium. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature432, 695–716 (2004). [DOI] [PubMed]
- 39.Braasch I, et al. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nat. Genet. 2016;48:427–437. doi: 10.1038/ng.3526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Li Y, et al. Scallop genome reveals molecular adaptations to semi-sessile life and neurotoxins. Nat. Commun. 2017;8:1721. doi: 10.1038/s41467-017-01927-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Srivastava M, et al. The Trichoplax genome and the nature of placozoans. Nature. 2008;454:955–960. doi: 10.1038/nature07191. [DOI] [PubMed] [Google Scholar]
- 42.Simakov O, et al. Insights into bilaterian evolution from three spiralian genomes. Nature. 2013;493:526–531. doi: 10.1038/nature11696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Wang S, et al. Scallop genome provides insights into evolution of bilaterian karyotype and development. Nat. Ecol. Evol. 2017;1:0120. doi: 10.1038/s41559-017-0120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Benton, M. J., Donoghue, P. C. J. & Asher, R. J. in The Timetree of Life (eds. Hedges, S. B. & Kumar, S.) (Oxford University Press, 2009).
- 45.Smith JJ, Stuart AB, Sauka-Spengler T, Clifton SW, Amemiya CT. Development and analysis of a germline BAC resource for the sea lamprey, a vertebrate that undergoes substantial chromatin diminution. Chromosoma. 2010;119:381–389. doi: 10.1007/s00412-010-0263-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Fontana F, et al. Evidence of hexaploid karyotype in shortnose sturgeon. Genome. 2008;51:113–119. doi: 10.1139/g07-112. [DOI] [PubMed] [Google Scholar]
- 47.Havelka M, Bytyutskyy D, Symonová R, Ráb P, Flajšhans M. The second highest chromosome count among vertebrates is observed in cultured sturgeon and is associated with genome plasticity. Genet. Sel. Evol. 2016;48:12. doi: 10.1186/s12711-016-0194-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Trifonov VA, et al. Evolutionary plasticity of acipenseriform genomes. Chromosoma. 2016;125:661–668. doi: 10.1007/s00412-016-0609-2. [DOI] [PubMed] [Google Scholar]
- 49.Burt DW. Origin and evolution of avian microchromosomes. Cytogenet. Genome Res. 2002;96:97–112. doi: 10.1159/000063018. [DOI] [PubMed] [Google Scholar]
- 50.Voss SR, et al. Origin of amphibian and avian chromosomes by fission, fusion, and retention of ancestral chromosomes. Genome Res. 2011;21:1306–1312. doi: 10.1101/gr.116491.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Louis A, Roest Crollius H, Robinson-Rechavi M. How much does the amphioxus genome represent the ancestor of chordates? Brief. Funct. Genomics. 2012;11:89–95. doi: 10.1093/bfgp/els003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Uno Y, et al. Inference of the protokaryotypes of amniotes and tetrapods and the evolutionary processes of microchromosomes from comparative gene mapping. PLoS ONE. 2012;7:e53027. doi: 10.1371/journal.pone.0053027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Inoue JG, et al. Evolutionary origin and phylogeny of the modern Holocephalans (Chondrichthyes: Chimaeriformes): a mitogenomic perspective. Mol. Biol. Evol. 2010;27:2576–2586. doi: 10.1093/molbev/msq147. [DOI] [PubMed] [Google Scholar]
- 54.Harmston, N. et al. Topologically associating domains are ancient features that coincide with metazoan clusters of extreme noncoding conservation. Nat. Commun. 8, 441 (2017). [DOI] [PMC free article] [PubMed]
- 55.Berthelot C, Muffato M, Abecassis J, Roest, Crollius H. The 3D organization of chromatin explains evolutionary fragile genomic regions. Cell Rep. 2015;10:1913–1924. doi: 10.1016/j.celrep.2015.02.046. [DOI] [PubMed] [Google Scholar]
- 56.Lv J, Havlak P, Putnam NH. Constraints on genes shape long-term conservation of macro-synteny in metazoan genomes. BMC Bioinformatics. 2011;12:S11. doi: 10.1186/1471-2105-12-S9-S11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Soltis DE, Soltis PS. Polyploidy: recurrent formation and genome evolution. Trends Ecol. Evol. 1999;14:348–352. doi: 10.1016/s0169-5347(99)01638-9. [DOI] [PubMed] [Google Scholar]
- 58.Soltis DE, Visger CJ, Soltis PS. The polyploidy revolution then… and now: Stebbins revisited. Am. J. Bot. 2014;101:1057–1078. doi: 10.3732/ajb.1400178. [DOI] [PubMed] [Google Scholar]
- 59.Holloway AK, Cannatella DC, Gerhardt HC, Hillis DM. Polyploids with different origins and ancestors form a single sexual polyploid species. Am. Nat. 2006;167:E88–E101. doi: 10.1086/501079. [DOI] [PubMed] [Google Scholar]
- 60.Wolfe KH. Yesterday’s polyploids and the mystery of diploidization. Nat. Rev. Genet. 2001;2:333–341. doi: 10.1038/35072009. [DOI] [PubMed] [Google Scholar]
- 61.Garsmeur O, et al. Two evolutionarily distinct classes of paleopolyploidy. Mol. Biol. Evol. 2014;31:448–454. doi: 10.1093/molbev/mst230. [DOI] [PubMed] [Google Scholar]
- 62.Wendel JF, Lisch D, Hu G, Mason AS. The long and short of doubling down: polyploidy, epigenetics, and the temporal dynamics of genome fractionation. Curr. Opin. Genet. Dev. 2018;49:1–7. doi: 10.1016/j.gde.2018.01.004. [DOI] [PubMed] [Google Scholar]
- 63.Cheng F, et al. Gene retention, fractionation and subgenome differences in polyploid plants. Nat. Plants. 2018;4:258–268. doi: 10.1038/s41477-018-0136-7. [DOI] [PubMed] [Google Scholar]
- 64.Session AM, et al. Genome evolution in the allotetraploid frog Xenopus laevis. Nature. 2016;538:336–343. doi: 10.1038/nature19840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Simakov O, et al. Deeply conserved synteny resolves early events in vertebrate evolution. Nat. Ecol. Evol. 2020;4:820–830. doi: 10.1038/s41559-020-1156-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Marcussen T, et al. Ancient hybridizations among the ancestral genomes of bread wheat. Science. 2014;345:1250092. doi: 10.1126/science.1250092. [DOI] [PubMed] [Google Scholar]
- 67.Jaillon O, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007;449:463–467. doi: 10.1038/nature06148. [DOI] [PubMed] [Google Scholar]
- 68.Luo J, et al. Tempo and mode of recurrent polyploidization in the Carassius auratus species complex (Cypriniformes, Cyprinidae) Heredity. 2014;112:415–427. doi: 10.1038/hdy.2013.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Liu XL, et al. Wider geographic distribution and higher diversity of hexaploids than tetraploids in Carassius species complex reveal recurrent polyploidy effects on adaptive evolution. Sci. Rep. 2017;7:5395. doi: 10.1038/s41598-017-05731-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Spring J. Vertebrate evolution by interspecific hybridisation—are we polyploid? FEBS Lett. 1997;400:2–8. doi: 10.1016/s0014-5793(96)01351-8. [DOI] [PubMed] [Google Scholar]
- 71.Chen ZJ. Molecular mechanisms of polyploidy and hybrid vigor. Trends Plant Sci. 2010;15:57–71. doi: 10.1016/j.tplants.2009.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Chen ZJ. Genomic and epigenetic insights into the molecular bases of heterosis. Nat. Rev. Genet. 2013;14:471–482. doi: 10.1038/nrg3503. [DOI] [PubMed] [Google Scholar]
- 73.Renny-Byfield S, Wendel JF. Doubling down on genomes: polyploidy and crop plants. Am. J. Bot. 2014;101:1711–1725. doi: 10.3732/ajb.1400119. [DOI] [PubMed] [Google Scholar]
- 74.Feldman M, Levy AA, Fahima T, Korol A. Genomic asymmetry in allopolyploid plants: wheat as a model. J. Exp. Bot. 2012;63:5045–5059. doi: 10.1093/jxb/ers192. [DOI] [PubMed] [Google Scholar]
- 75.Abi-Rached L, Gilles A, Shiina T, Pontarotti P, Inoko H. Evidence of en bloc duplication in vertebrate genomes. Nat. Genet. 2002;31:100–105. doi: 10.1038/ng855. [DOI] [PubMed] [Google Scholar]
- 76.Flajnik MF, Kasahara M. Origin and evolution of the adaptive immune system: genetic events and selective pressures. Nat. Rev. Genet. 2010;11:47–59. doi: 10.1038/nrg2703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Flajnik MF. A cold-blooded view of adaptive immunity. Nat. Rev. Immunol. 2018;18:438–453. doi: 10.1038/s41577-018-0003-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Kasahara M. The 2R hypothesis: an update. Curr. Opin. Immunol. 2007;19:547–552. doi: 10.1016/j.coi.2007.07.009. [DOI] [PubMed] [Google Scholar]
- 79.Kaufman J. Unfinished business: Evolution of the MHC and the adaptive immune system of jawed vertebrates. Ann. Rev. Immunol. 2018;36:383–409. doi: 10.1146/annurev-immunol-051116-052450. [DOI] [PubMed] [Google Scholar]
- 80.Ohta Y, Kasahara M, O’Connor TD, Flajnik MF. Inferring the “primordial immune complex”: origins of MHC Class I and antigen receptors revealed by comparative genomics. J. Immunol. 2019;203:1882–1896. doi: 10.4049/jimmunol.1900597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Sambrook JG, Beck S. Evolutionary vignettes of natural killer cell receptors. Curr. Opin. Immunol. 2007;19:553–560. doi: 10.1016/j.coi.2007.08.002. [DOI] [PubMed] [Google Scholar]
- 82.Makino T, McLysaght A. Interacting gene clusters and the evolution of the vertebrate immune system. Mol. Biol. Evol. 2008;25:1855–1862. doi: 10.1093/molbev/msn137. [DOI] [PubMed] [Google Scholar]
- 83.Makino T, McLysaght A. Positionally biased gene loss after whole genome duplication: evidence from human, yeast, and plant. Genome Res. 2012;22:2427–2435. doi: 10.1101/gr.131953.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Makino T, McLysaght A. Ohnologs in the human genome are dosage balanced and frequently associated with disease. Proc. Natl Acad. Sci. USA. 2010;107:9270–9274. doi: 10.1073/pnas.0914697107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.McLysaght A, et al. Ohnologs are overrepresented in pathogenic copy number mutations. Proc. Natl Acad. Sci. USA. 2014;111:361–366. doi: 10.1073/pnas.1309324111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Rice AM, McLysaght A. Dosage sensitivity is a major determinant of human copy number variant pathogenicity. Nat. Commun. 2017;8:14366. doi: 10.1038/ncomms14366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Makino T, McLysaght A, Kawata M. Genome-wide deserts for copy number variation in vertebrates. Nat. Commun. 2013;4:2283. doi: 10.1038/ncomms3283. [DOI] [PubMed] [Google Scholar]
- 88.Teh YW, Newman D, Welling M. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. Adv. Neural Inf. Process. Syst. 2007;19:1353–1360. [Google Scholar]
- 89.Asuncion, A., Welling, M., Smyth, P. & Teh, Y. W. On smoothing and inference for topic models. In Proc. Twenty-Fifth Conference on Uncertainty in Artificial Intelligence 27–34 (2009).
- 90.Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J. Mach. Learn. Res. 2003;3:993–1022. [Google Scholar]
- 91.Blei DM. Probabilistic topic models. Commun. ACM. 2012;55:77–84. [Google Scholar]
- 92.Liu JS, Lawrence CE. Bayesian inference on biopolymer models. Bioinformatics. 1999;15:38–52. doi: 10.1093/bioinformatics/15.1.38. [DOI] [PubMed] [Google Scholar]
- 93.Auger IE, Lawrence CE. Algorithms for the optimal identification of segment neighborhoods. Bull. Math. Biol. 1989;51:39–54. doi: 10.1007/BF02458835. [DOI] [PubMed] [Google Scholar]
- 94.Muffato, M. Reconstruction de génomes ancestraux chez les vertébrés. Université d’Evry-Val d’Essonne (2010).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The Japanese lamprey and elephant shark genome sequences generated in this study have been deposited at DDBJ/ENA/GenBank under the accession numbers WFAB00000000 and WEZY00000000, respectively. The reconstruction dataset including information of orthologues, paralogues and gene names in individual chromosomal segments is available as Supplementary Data 1.
The reconstruction software/code is available on request.