Abstract
Cartilaginous fishes are the oldest living phylogenetic group of jawed vertebrates. Here, we demonstrate the value of cartilaginous fish sequences in reconstructing the evolutionary history of vertebrate genomes by sequencing the protocadherin cluster in the relatively small genome (910 Mb) of the elephant shark (Callorhinchus milii). Human and coelacanth contain a single protocadherin cluster with 53 and 49 genes, respectively, that are organized in three subclusters, Pcdhα, Pcdhβ, and Pcdhγ, whereas the duplicated protocadherin clusters in fugu and zebrafish contain >77 and 107 genes, respectively, that are organized in Pcdhα and Pcdhγ subclusters. By contrast, the elephant shark contains a single protocadherin cluster with 47 genes organized in four subclusters (Pcdhδ, Pcdhε, Pcdhμ, and Pcdhν). By comparison with elephant shark sequences, we discovered a Pcdhδ subcluster in teleost fishes, coelacanth, Xenopus, and chicken. Our results suggest that the protocadherin cluster in the ancestral jawed vertebrate contained more subclusters than modern vertebrates, and the evolution of the protocadherin cluster is characterized by lineage-specific differential loss of entire subclusters of genes. In contrast to teleost fish and mammalian protocadherin genes that have undergone gene conversion events, elephant shark protocadherin genes have experienced very little gene conversion. The syntenic block of genes in the elephant shark protocadherin locus is well conserved in human but disrupted in fugu. Thus, the elephant shark genome appears to be less prone to rearrangements compared with teleost fish genomes. The small and “stable” genome of the elephant shark is a valuable reference for understanding the evolution of vertebrate genomes.
Keywords: ancestral jawed vertebrate, clustered protocadherins, cartilaginous fish, Callorhinchus milii, gene conversion
Reconstructing the evolutionary history of vertebrate genomes, and in particular the human genome, is a major goal of vertebrate comparative genomics. Efforts are underway to reconstruct the evolutionary history of the human genome at the nucleotide level and at the level of large-scale chromosomal rearrangements (1) with the objective of predicting ancestral genomes with high accuracy. The accuracy of the reconstruction of ancestral genomes is greatly influenced by the choice and number of genomes compared. Although a comparison of genomes from all major vertebrate groups is desirable, it is also important to compare genomes from the most distant branches of the phylogenetic tree. Here, we show how the genome sequence from the oldest living phylogenetic group of jawed vertebrates, the cartilaginous fishes, can dramatically change our inferences of the state of the ancestral vertebrate genome.
The protocadherin gene cluster is one of the most evolutionarily dynamic loci in vertebrates. It is composed of tandem arrays of paralogous genes that are highly susceptible to lineage-specific gene losses, tandem duplications, and gene conversion (2–7). These clustered genes are distinct from nonclustered protocadherin genes by possessing a large coding exon that codes for the entire extracellular domain, the transmembrane domain, and a small portion of the cytoplasmic domain, whereas the extracellular domain of the nonclustered protocadherins is usually encoded by multiple exons (8). Protocadherin cluster proteins are cadherin-like cell adhesion molecules that are highly enriched on synaptic membranes (9). These proteins are believed to be a vertebrate innovation that accompanied the emergence of the neural tube and the elaborate central nervous system and have been hypothesized to provide a molecular code for specifying the enormous diversity in neuronal connectivity (3, 9, 10). No clustered protocadherin gene has been identified in invertebrate genomes (11–13). The protocadherin gene cluster has been characterized in several bony vertebrates (Osteichthyes), including human and coelacanth and teleost fishes. Human and coelacanth protocadherin cluster genes are organized in three tandem subclusters, designated as Pcdhα, Pcdhβ, and Pcdhγ (3, 4). Each subcluster contains multiple “variable” exons (≈2.4 kb each) that are transcribed from independent promoters. Each variable exon codes for an extracellular domain comprising six repeats of a calcium-binding ectodomain (EC1–6), a transmembrane domain, and a part of the cytoplasmic domain. In addition to variable exons, Pcdhα and -γ subclusters contain three constant exons at the 3′ end of the respective subcluster, to which each variable exon is independently spliced. The constant region exons code for the major part of the cytoplasmic domain that is shared by genes in each subcluster. Human and coelacanth protocadherin clusters contain a total of 53 and 49 genes, respectively. Because of the “fish-specific” whole-genome duplication, teleost fishes such as fugu and zebrafish contain two unlinked protocadherin clusters, designated Pcdh1 and Pcdh2. Although fugu Pcdh1 cluster is highly degenerate, Pcdh2 clusters in both fishes have undergone lineage-specific repeated tandem duplications, and gene losses giving rise to >107 genes in zebrafish (2, 5, 7) and >77 genes in fugu (6). However, these teleost protocadherin genes are orthologous to only Pcdhα and -γ subcluster genes in mammals and coelacanth. Another interesting feature of teleost protocadherin cluster genes is that they have experienced extensive regional gene conversion events resulting in nucleotide identity of >99% in the 3′ region of variable exons among paralogs (5, 6). Based on the organization, gene content, and phylogenetic relationships of protocadherin clusters in bony vertebrates, the ancestral vertebrate protocadherin cluster is predicted to contain either two (Pcdhα and -γ) or three subclusters (Pcdhα, -β, and -γ) that have subsequently undergone lineage-specific gene losses and gene duplications, including the whole-locus duplication in the teleost fish lineage (6).
The living jawed vertebrates (gnathostomes) fall into two major monophyletic groups, the Chondrichthyes (cartilaginous fishes that include elasmobranchs and holocephalians) and the Osteichthyes (bony vertebrates, which include ray-finned fishes, coelacanths, lungfishes, and tetrapods) that shared a common ancestor ≈450 million years ago. The protocadherin cluster has been characterized in several bony vertebrates. Here, we report the sequencing and characterization of the protocadherin cluster from a holocephalian cartilaginous fish, the elephant shark (Callorhinchus milii). The elephant shark possesses a relatively small genome (≈910 Mb) among the cartilaginous fishes and is therefore attractive for genome sequencing and comparative analysis (14). Our study shows that the elephant shark protocadherin cluster contains 47 variable exons arrayed in four subclusters. Interestingly, none of the four subclusters show a direct orthology relationship to the α, β, or γ subcluster in bony vertebrates. Thus, the protocadherin cluster in the common ancestor of jawed vertebrates appears to have contained more subclusters of genes than modern vertebrates.
Results
Elephant Shark Protocadherin Cluster Genes Organized into Four Subclusters.
Screening of an elephant shark BAC library identified 10 positive clones that belong to a single locus. Three representative overlapping BACs were sequenced to obtain ≈409 kb of contiguous sequence spanning the protocadherin cluster. Based on sequence homology and GENSCAN prediction (see Methods), a total of 47 variable exons were identified in this cluster (Fig. 1). In addition, we identified several nonprotocadherin genes flanking the protocadherin cluster [supporting information (SI) Appendix, SI Fig. 5], indicating that the elephant shark protocadherin cluster sequence is complete. The overall gene content of the elephant shark protocadherin cluster (47 variable exons arrayed in four subclusters) is comparable to that in coelacanth (49 variable exons in three subclusters) (4) and human (53 variable exons in three subclusters) (3, 15). However, its gene content is considerably lower than that in fugu (>77 variable exons) (6) and zebrafish (>107 variable exons) (2, 4, 5, 7).
The mammalian and coelacanth protocadherin clusters consist of three subclusters, the Pcdhα, -β, and -γ. The teleost clusters, however, lack the β subcluster. To determine the subcluster organization of elephant shark protocadherin cluster, we first searched for constant exons by homology search and GENSCAN prediction and identified three sets of constant exons. We then determined the splicing pattern of the variable exons with the constant exons by RT-PCR and 3′RACE. These analyses indicated that the genes are organized into four subclusters. The first subcluster consists of a single variable exon and three constant exons, whereas the second subcluster is comprised of only variable exons (21 in all), each of which is transcribed independently as single exon genes similar to the coelacanth and mammalian β subcluster genes. The third and fourth subclusters consist of 8 and 17 variable exons, respectively, and three constant exons each. In addition, the fourth subcluster contains an alternative constant exon between the second and third constant exons. The splice sites of the constant exons of the three subclusters are shown in SI Appendix, SI Fig. 6. We designated the four subclusters as the Pcdhδ, Pcdhε, Pcdhμ, and Pcdhν subclusters, respectively (Fig. 1).
To determine the evolutionary relationships between individual protocadherin (Pcdh) genes in the elephant shark, we aligned the amino acid sequences of all variable exons using ClustalX and generated a phylogenetic tree using the Neighbor-joining method. Sequences of only the first three ectodomains (EC1–3) were used, because the C-terminal ectodomains (EC4–6) have been shown to be susceptible to extensive gene conversion in teleost fishes and human, resulting in homogenization of sequences among paralogs (5, 6). The phylogenetic tree shows that most of the elephant shark Pcdh genes segregate into “paralog groups,” each consisting of paralogs that belong to the same subcluster (CmPcdhε2-21; CmPcdhμ1-8; CmPcdhν2-3, and CmPcdhν4 -17) (SI Appendix, SI Fig. 7). The only exceptions are the CmPcdhδ1, CmPcdhε1, and CmPcdhν1 genes from different subclusters that form a monophyletic group. Such a relationship between genes from different subclusters indicates they are “intersubcluster paralogs” that shared a common ancestor in the primordial protocadherin cluster. Similar intersubcluster paralogs have been identified in the human protocadherin cluster, in which the Pcdhα genes, c1 and c2, were found to be more closely related to Pcdhγ genes, c4 and c5, than to other genes in the α subcluster (3). The presence of such paralogs indicates that these ancient genes have been highly conserved as single paralogous genes despite the recurrent expansion and loss of genes in their vicinity. The evolutionary conservation of such ancient paralogs between subclusters suggests they may play a fundamental role in protocadherin functions. In support of this hypothesis, it has been shown that, whereas the expression of all other protocadherin genes in the mammalian α subcluster is monoallelic and cell-selective, the c1 and c2 genes are expressed biallelically and in every neuron (16, 17). It is thus possible that the highly conserved intersubcluster paralogs CmPcdhδ1, CmPcdhε1, and CmPcdhν1 in the elephant shark carry out a fundamental role similar to that of “c” genes in mammals. The absence of such a paralog in the Pcdhμ subcluster is likely due to a subcluster-specific gene degeneration.
Elephant Shark Protocadherin Subclusters Are Not Orthologous to Those in Mammals, Coelacanth, or Teleost Fishes.
To examine the phylogenetic relationships of elephant shark subclusters with those of human, coelacanth, and teleost fishes, we aligned the amino acid sequences of the constant region exons of the CmPcdhδ, -μ, and -ν genes from elephant shark, and Pcdhα and -γ genes from human, mouse, coelacanth, fugu, and zebrafish and constructed a phylogenetic tree using the Neighbor-joining method. We also included sequences of the constant region of Pcdhδ from fugu, zebrafish, coelacanth, Xenopus tropicalis, and chicken that we have identified in the present study (described below). The phylogenetic tree (Fig. 2) indicates that the Pcdhδ sequences from elephant shark and the two teleost fishes, coelacanth and Xenopus, constitute a monophyletic group distinct from other sequences, indicating this is a separate subcluster from the α and γ subclusters. This phylogenetic tree also shows that elephant shark Pcdhμ and -ν subclusters may not be direct orthologs of Pcdhα and -γ subclusters in bony vertebrates (Fig. 2). These elephant shark subclusters are either the orthologs of the α and γ subclusters that have undergone an unusually accelerated rate of evolution involving extensive nucleotide substitutions or simply distinct subclusters that lack orthologs in bony vertebrates. Given there is no evidence that elephant shark genes have been evolving at a rapid rate, and that, in particular, the neighboring Pcdhδ gene shows a normal evolutionary rate, it is more probable that the Pcdhμ and -ν subclusters are distinct protocadherin subclusters that lack orthologs in bony vertebrates.
Because the elephant shark Pcdhε subcluster lacks the constant region, its phylogenetic relationship with other protocadherin subclusters can be inferred only by analyzing the variable exons. To this end, we generated a phylogenetic tree using the alignment of the N-terminal ectodomain sequences (EC1–3) of all variable exons from elephant shark, fugu, and coelacanth. This phylogenetic tree of variable exons suggests that the Pcdhε subcluster is not related to the bony vertebrate β subcluster that also lacks a constant region but an ancient subcluster that has undergone repeated expansion within the elephant shark lineage (SI Appendix, SI Fig. 8). In addition, this analysis showed that the first genes in the fugu Pcdh1 (FrPcdh1α1) and Pcdh2 clusters (FrPcdh2α1) and the coelacanth protocadherin cluster (LmPcdhα1) are more closely related to elephant shark CmPcdhδ1, -μ1, and -ν1 genes than to other α genes in their respective subclusters (SI Appendix, SI Fig. 8). This suggests that the fugu Pcdh1α1 and Pcdh2α1 genes and coelacanth Pcdhα1 gene may not be α genes but belong to a different subcluster.
A Pcdhδ Subcluster in Teleost Fishes, Coelacanth, Xenopus, and Chicken.
To verify this, we searched for constant exons downstream of these variable exons by GENSCAN prediction and TBLASTN by using elephant shark Pcdhδ constant region sequence as a query. We uncovered three new constant exons each in the fugu Pcdh1 and Pcdh2 clusters and in the coelacanth protocadherin cluster. Because the zebrafish Pcdh1α1 gene is orthologous to the fugu Pcdh1α1 gene (6), we extended our study to the zebrafish Pcdh1 locus and identified a similar set of constant exons immediately downstream of the zebrafish Pcdh1α1 gene. RT-PCR analyses with RNA from fugu and zebrafish brain (coelacanth was not included due to unavailability of RNA) confirmed that the first variable exons in the fugu Pcdh1 and Pcdh2 clusters and in the zebrafish Pcdh1 cluster are spliced to the respective newly identified three constant exons and not to the constant region exons in their respective α subclusters.
The absence of an ortholog for the coelacanth Pcdhα1 gene (redesignated as Pcdhδ1) in the human and mouse protocadherin clusters (2, 3, 18, 19) indicates that the Pcdhδ subcluster is lost in mammals. To determine more precisely when this subcluster was lost, we searched the X. tropicalis and chicken genome assemblies by TBLASTN using the elephant shark Pcdhδ constant region sequence as a query. This search led to the identification of a single-variable exon plus three constant exon subclusters at the 5′ end of protocadherin clusters in Xenopus (scaffold_177, length 2.1 Mb) and chicken (chromosome 13). Phylogenetic analysis of the amino acid sequences of the variable exon (SI Appendix, SI Fig. 8) and the constant region exons (Fig. 2) of this cluster from Xenopus and chicken together with their corresponding sequences in fugu, zebrafish, coelacanth, and elephant shark indicated that these genes are indeed orthologs of the elephant shark Pcdhδ subcluster. Thus, we conclude that Pcdhδ is an ancient subcluster that has been conserved in teleost fishes, coelacanth, Xenopus, and chicken protocadherin clusters but lost in the human and mouse protocadherin clusters.
Elephant Shark Protocadherin Cluster Genes Have Experienced Very Little Gene Conversion.
The tandem arrays of paralogous protocadherin genes in fugu and zebrafish have experienced repeated gene conversion events, resulting in extensive regional sequence homogenization among paralogs in the same subcluster (5, 6). Gene conversion in protocadherin clusters is usually restricted to the C-terminal ectodomains from EC4 to EC6 and the cytoplasmic fragment encoded by the variable exons, whereas the EC2 and EC3 ectodomain regions seldom undergo gene conversion, and thus are hypothesized to provide the majority of diversifying signals for protocadherin cluster genes. Protocadherin cluster genes in human have also undergone gene conversions although to a lesser extent compared with teleost fishes, but those in coelacanth have experienced few gene conversions (4). To investigate whether the elephant shark protocadherin cluster genes have undergone gene conversion, we estimated the total number of synonymous substitutions per codon (dS) for each of the elephant shark protocadherin paralog subgroups (CmPcdhε2-21, CmPcdhμ1–8, CmPcdhν4–17). The synonymous substitution rate reflects the frequency of gene conversion events, because purifying selection for protein function does not act on synonymous sites. Because gene conversion in protocadherin clusters is usually restricted to the C-terminal ectodomains, we assessed the synonymous substitution rates separately for each ectodomain. The overall ratio of synonymous substitution rates between the most- and the least-divergent ectodomains among different paralog subgroups of the elephant shark protocadherin genes ranges from 1.8 to 2.3 (Table 1). This indicates a uniform rate of synonymous-site substitutions among different ectodomains in the elephant shark paralogs. In contrast, the ratio between the most- and the least-divergent ectodomains among different subgroups in fugu ranges from 79 to 2,796 (Table 1). Such a high ratio indicates that different domains have experienced different rates of substitution, most likely due to gene conversion in some ectodomains. These results suggest that none of the ectodomains in the elephant shark protocadherin cluster genes have undergone gene conversion and sequence homogenization. Thus, like the coelacanth genome, the elephant shark genome appears to be less susceptible to recombination-mediated rearrangements.
Table 1.
Subgroups | dSEC1–6† | dSEChigh‡ | dSEClow§ | dSEChigh/dSEClow |
---|---|---|---|---|
CmPcdhε2–21 | 0.211 | 0.281 | 0.157 | 1.8 |
CmPcdhμ1–8 | 0.647 | 1.073 | 0.534 | 2.0 |
CmPcdhν4–17 | 0.210 | 0.338 | 0.146 | 2.3 |
FrPcdh2α3–7 | 0.104 | 0.517 | 0.00515 | 100.4 |
FrPcdh2α9–25 | 0.228 | 1.022 | 0.00522 | 195.8 |
FrPcdh2α26–36 | 0.327 | 1.331 | 0.00104 | 1,280 |
FrPcdh2γ1–17 | 0.067 | 0.240 | 0.00302 | 79.5 |
FrPcdh2γ19–31 | 0.227 | 1.308 | 0.00555 | 235.7 |
FrPcdh2γ33–36 | 0.297 | 8.667 | 0.00310 | 2796 |
†Average dS for each branch in the gene tree of individual subgroups was calculated based on the alignment of paralogs in the subgroup.
‡Average dS per branch calculated based on alignment of the most-divergent ectodomain in each subgroup.
§Average dS per branch calculated based on alignment of the least-divergent ectodomain in each subgroup.
Evolution of Core Promoter Elements of Protocadherin Cluster Genes.
Previous studies have identified a 15-bp core sequence element conserved in the promoter sequences of all mammalian and coelacanth protocadherin genes and in zebrafish Pcdh1γ and Pcdh2γ genes but divergent in zebrafish Pcdh1α and Pcdh2α genes (5, 18, 20). This core element includes a “CGCT” motif implicated in promoter function (20). We searched the promoter regions (480 bp upstream of the start codon) of the elephant shark genes using MEME algorithm to look for conserved motifs in genes among different subclusters. Our results show that the elephant shark Pcdhε and Pcdhμ subcluster genes share a highly conserved 27-bp promoter element (Fig. 3 A and B). The presence of this element in genes from two different subclusters and the high level of conservation indicate that this is an ancient promoter element that is under purifying selection. The promoter sequences of the elephant shark Pcdhν genes, however, are divergent from this element and also from each other, except for a short CAAT-box-like motif in the 3′ region (Fig. 3C). These genes, located at the 3′ end of the elephant shark protocadherin cluster, are likely to be regulated differently from those of Pcdhε and Pcdhμ genes.
The first 15 bp of the 27-bp element in the elephant shark Pcdhε and Pcdhμ genes are well conserved in the Pcdhα, -β, and -γ genes in human and coelacanth (Fig. 3 D–H) and in Pcdhγ genes in fugu (Fig. 3J). The 3′ region of this element, however, is divergent in bony vertebrates, except for the CAAT-box-like motif that is partially conserved in the Pcdhα and Pcdhγ genes in human and coelacanth (Fig. 3 D–G). The substitutions accumulated in the 3′ region of this promoter element in bony vertebrates may have resulted in the acquisition of new expression patterns and contributed to adaptive diversity of protocadherin cluster genes. The promoter elements of fugu Pcdhα subcluster genes are quite divergent from the promoter elements in all other protocadherin clusters. Even the “CGCT” motif is rather weakly conserved in the promoters of these genes (Fig. 3I). Thus, these fugu genes appear to have undergone further adaptive radiation in the teleost lineage.
Syntenic Block in the Elephant Shark Protocadherin Locus Is Conserved in Mammals but Rearranged in Fugu.
The elephant shark protocadherin locus contains a block of syntenic genes (Hars, Dnd1, and Zmat2, protocadherin cluster, and Diaph1 genes) spanning ≈409 kb that is conserved in the human protocadherin locus. However, the human locus is considerably larger (≈950 kb) than the elephant shark locus and contains an extra gene, HARS2, which is a duplicated copy of HARS (SI Appendix, SI Fig. 5). The content and order of genes in the orthologous mouse locus are identical to that in the human (data not shown). In contrast to the well conserved synteny between the elephant shark and mammalian protocadherin loci, the regions flanking the duplicate protocadherin clusters of fugu have experienced extensive rearrangements (SI Appendix, SI Fig. 5). Comparisons of ≈1.4× whole-genome shotgun sequences of elephant shark derived from paired end sequences of fosmid clones with the human and zebrafish genome assemblies have previously indicated that the level of conserved synteny between the elephant shark and human genomes is higher than that between the elephant shark and zebrafish genomes (21). The highly conserved synteny of genes at the protocadherin locus in the elephant shark and mammals and its disruption in fugu provides further support to this observation and is consistent with the notion that teleost fish genomes have experienced a higher rate of chromosomal rearrangements relative to elephant shark and mammals. Reconstruction of ancestral vertebrate karyotypes based on comparisons of teleost fish genomes (medaka, zebrafish, and Tetraodon) and human genome has suggested that several major chromosomal rearrangements occurred in the fish lineage within a short period of ≈50 million years after the “fish-specific” whole-genome duplication (22). The whole-genome duplication may have triggered an accelerated rate of rearrangements by facilitating recombination between paralogous chromosomal segments.
Discussion
The living jawed vertebrates comprise two major lineages, the cartilaginous fishes and bony vertebrates. In this study, we have characterized the protocadherin cluster from a cartilaginous fish, the elephant shark. The protocadherin cluster in elephant shark contains 47 genes organized into four subclusters, the Pcdhδ, -ε, -μ, and -ν. Surprisingly, phylogenetic analyses indicated that these subclusters are not direct orthologs of Pcdhα, -β, or -γ subcluster previously identified in bony vertebrates. We have shown that Pcdhδ is an ancient subcluster that is conserved in teleost fishes, coelacanth, Xenopus, and chicken but lost in mammals. Thus, our study suggests that the protocadherin cluster in the ancestral jawed vertebrate contained more subclusters than previously proposed based on characterization of protocadherin cluster in bony vertebrates. We propose that the protocadherin cluster in the last common ancestor of jawed vertebrates contained seven subclusters (Pcdhα, -β, -γ, -δ, -ε, -μ, and -ν) with multiple variable exons (Fig. 4). After the divergence of the cartilaginous fish and bony vertebrate lineages, this ancestral cluster has experienced differential loss of entire subclusters of genes in different vertebrate lineages. Although the Pcdhα, -β, and -γ subclusters were lost in the elephant shark lineage, the common ancestor of bony vertebrates lost the Pcdhε, -μ, and -ν subclusters. Of the remaining four subclusters (Pcdhδ, -α, -β, and -γ) in bony vertebrates, the Pcdhβ subcluster was lost in the ray-finned fish lineage, most likely before the “fish-specific” whole-genome duplication event. Another subcluster, Pcdhδ, was lost more recently in the mammalian lineage (Fig. 4). Besides the entire subclusters, the constant region exons of protocadherin genes have also been targeted for lineage-specific losses. The constant exons of the Pcdhε and -β subclusters were lost independently in the elephant shark lineage and in a common ancestor of lobe-finned fishes, respectively (Fig. 4). Thus, the characterization of the protocadherin cluster in the elephant shark has demonstrated the importance of cartilaginous fish genome sequence in reconstructing the evolutionary history of vertebrate genomes and in inferring the organization of the ancestral jawed vertebrate genome. Because they are the most ancient phylogenetic group of jawed vertebrate, cartilaginous fishes not only provide a better insight into the now-extinct ancestral jawed vertebrate genome but also serve as an outgroup that helps in identifying lineage-specific adaptive changes in different lineages of bony vertebrates.
The elephant shark genome was proposed as a model cartilaginous fish genome because of its relatively small size (14). Subsequently, comparative analysis of ≈1.4× coverage elephant shark sequences with the whole-genome assemblies of human, fugu, and zebrafish suggested that noncoding sequences in elephant shark are evolving more slowly than in teleost fishes and that the elephant shark genome has experienced fewer chromosomal rearrangements compared with teleost fish genomes (21, 23). The characterization of the protocadherin locus in elephant shark has provided further support to the hypothesis that the elephant shark genome is more stable compared with teleost fish genomes. The conserved synteny of genes at the protocadherin locus in elephant shark and human, the lower turnover of protocadherin genes in the elephant shark protocadherin cluster, and a lack of evidence for gene conversion between paralogous protocadherin genes in elephant shark suggest that elephant shark genome is less prone to recombination-mediated rearrangements and tandem gene duplications compared with teleost fish genomes. Although we did not measure the relative substitution rates of the promoter sequences of protocadherin genes, the relatively higher levels of conservation of the 15-bp core promoter element in the elephant shark compared with that in fugu and coelacanth indicate that the regulatory region of elephant shark protocadherin genes is evolving more slowly than that in teleosts and coelacanth. The relatively small and slowly evolving genome of the elephant shark, therefore, is an important “reference” for understanding the evolution of vertebrate genomes.
Methods
Sequencing and Assembly of BACs.
To identify probes for the elephant shark protocadherin cluster, we first performed a TBLASTN search of the whole-genome shotgun sequences of elephant shark (GenBank accession nos. CW854842–CW882785) using the amino acid sequences of human protocadherins as queries. We identified two sequences (GenBank accession nos. CW868635 and CW882727) that showed high similarity to human protocadherin. The shotgun clones of these sequences were used to probe the IMCB_Eshark BAC library (cloned in pCCBAC-EcoRI; unpublished data). In total, 10 positive clones were identified, from among which three representative overlapping clones (49C8, 166L19, and 176N9) were sequenced completely by the standard shotgun sequencing method and assembled using SeqMan (Lasergene). This sequence has been submitted to GenBank (accession no. EF693954).
Sequence Annotation.
The variable and constant exons of elephant shark protocadherin genes and exons of the nonprotocadherin genes flanking the protocadherin cluster were identified and annotated based on homology (TBLASTN and BLASTX) (www.ncbi.nlm.nih.gov) and GENSCAN prediction (24). Human orthologs of the elephant shark nonprotocadherin genes were identified by BLAT search of the human genome at University of California, Santa Cruz (UCSC), Genome Browser (http://genome.ucsc.edu). The genomic sequences of human and mouse protocadherin clusters were retrieved from the UCSC Genome Browser, whereas the coelacanth protocadherin cluster sequence was assembled from BAC sequences in GenBank (accession nos. AC150238, AC150284, and AC150308–AC150310) (4). The genomic sequences of fugu protocadherin clusters were also obtained from GenBank (accession nos. DQ986917–DQ986918) (6). Protocadherin sequences of X. tropicalis and chicken were identified by BLAT search of their genome assemblies (Ver. 4.1 and 2.1, respectively; http://genome.ucsc.edu) using the protein sequence of the elephant shark Pcdhδ constant region as query. Gaps in the intronic regions of Xenopus and chicken sequences were filled by PCR amplification of the genomic DNA. The complete genomic sequences of the Xenopus and chicken Pcdhδ subclusters have been submitted to the GenBank (accession nos. EU267079 and EU267080).
RT-PCR and 3′RACE Analyses.
Total RNA was extracted from the elephant shark, fugu, and zebrafish tissues by the TRIzol method (Invitrogen) and reverse-transcribed using SMART rapid amplification of cDNA ends (RACE) cDNA Amplification kit (Clontech). The PCR was performed by an initial denaturing step of 95°C for 2 min, followed by 35 cycles of 95°C for 30 sec, 55°C for 30 sec, and 72°C for 1–3 min. 3′RACE analysis was performed according to the manufacturer's instructions (Clontech). The RACE products were sequenced completely.
Phylogenetic Analyses.
The amino acid sequences of EC1–3 domains of protocadherin cluster genes from various species were aligned by ClustalX (25). Phylogenetic trees were constructed by the Neighbor-joining method based on sequence distance matrix and displayed using NJplot (www-igbmc.u-strasbg.fr/BioInfo). The robustness of the tree was determined by bootstrap analysis of 1,000 replicate sample sequences.
Synonymous Substitution Analyses.
The synonymous substitution rates were estimated using the CODEML program in PAML package with default parameters (26). The nucleotide sequence alignments were generated by RevTrans program (27), using amino acid sequence alignment (generated by ClustalX) as templates. The synonymous substitution rates were calculated as average dS for each branch in the gene tree of individual subgroups.
Supplementary Material
ACKNOWLEDGMENTS.
The research work in the W.-P.Y. laboratory is supported by the Biomedical Research Council (BMRC); the National Medical Research Council (NMRC); and the SingHealth Foundation Funds, Singapore; and the work in B.V.'s laboratory is supported by the Agency for Science, Technology and Research (A*STAR), Singapore. B.V. is adjunct staff of the Department of Pediatrics, Yong Loo Lin School of Medicine, National University of Singapore.
Footnotes
The authors declare no conflict of interest.
Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. EF693945, EU267079, and EU267080).
This article contains supporting information online at www.pnas.org/cgi/content/full/0800398105/DC1.
References
- 1.Ma J, et al. Reconstructing contiguous regions of an ancestral genome. Genome Res. 2006;16:1557–1565. doi: 10.1101/gr.5383506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wu Q. Comparative genomics and diversifying selection of the clustered vertebrate protocadherin genes. Genetics. 2005;169:2179–2188. doi: 10.1534/genetics.104.037606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wu Q, Maniatis T. A striking organization of a large family of human neural cadherin-like cell adhesion genes. Cell. 1999;97:779–790. doi: 10.1016/s0092-8674(00)80789-8. [DOI] [PubMed] [Google Scholar]
- 4.Noonan JP, et al. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes. Genome Res. 2004;14:2397–2405. doi: 10.1101/gr.2972804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Noonan JP, Grimwood J, Schmutz J, Dickson M, Myers RM. Gene conversion and the evolution of protocadherin gene cluster diversity. Genome Res. 2004;14:354–366. doi: 10.1101/gr.2133704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yu WP, Yew K, Rajasegaran V, Venkatesh B. Sequencing and comparative analysis of fugu protocadherin clusters reveal diversity of protocadherin genes among teleosts. BMC Evol Biol. 2007;7:49. doi: 10.1186/1471-2148-7-49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Tada MN, et al. Genomic organization and transcripts of the zebrafish Protocadherin genes. Gene. 2004;340:197–211. doi: 10.1016/j.gene.2004.07.014. [DOI] [PubMed] [Google Scholar]
- 8.Morishita H, Yagi T. Protocadherin family: diversity, structure, and function. Curr Opin Cell Biol. 2007;19:584–592. doi: 10.1016/j.ceb.2007.09.006. [DOI] [PubMed] [Google Scholar]
- 9.Kohmura N, et al. Diversity revealed by a novel family of cadherins expressed in neurons at a synaptic complex. Neuron. 1998;20:1137–1151. doi: 10.1016/s0896-6273(00)80495-x. [DOI] [PubMed] [Google Scholar]
- 10.Shapiro L, Colman DR. The diversity of cadherins and implications for a synaptic adhesive code in the CNS. Neuron. 1999;23:427–430. doi: 10.1016/s0896-6273(00)80796-5. [DOI] [PubMed] [Google Scholar]
- 11.Hill E, Broadbent ID, Chothia C, Pettitt J. Cadherin superfamily proteins in Caenorhabditis elegans and Drosophila melanogaster. J Mol Biol. 2001;305:1011–1024. doi: 10.1006/jmbi.2000.4361. [DOI] [PubMed] [Google Scholar]
- 12.Sasakura Y, et al. A genomewide survey of developmentally relevant genes in Ciona intestinalis. X. Genes for cell junctions and extracellular matrix. Dev Genes Evol. 2003;213:303–313. doi: 10.1007/s00427-003-0320-1. [DOI] [PubMed] [Google Scholar]
- 13.Whittaker CA, et al. The echinoderm adhesome. Dev Biol. 2006;300:252–266. doi: 10.1016/j.ydbio.2006.07.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Venkatesh B, Tay A, Dandona N, Patil JG, Brenner S. A compact cartilaginous fish model genome. Curr Biol. 2005;15:R82–R83. doi: 10.1016/j.cub.2005.01.021. [DOI] [PubMed] [Google Scholar]
- 15.Vanhalst K, Kools P, Vanden Eynde E, van Roy F. The human and murine protocadherin-beta one-exon gene families show high evolutionary conservation, despite the difference in gene number. FEBS Lett. 2001;495:120–125. doi: 10.1016/s0014-5793(01)02372-9. [DOI] [PubMed] [Google Scholar]
- 16.Ribich S, Tasic B, Maniatis T. Identification of long-range regulatory elements in the protocadherin-alpha gene cluster. Proc Natl Acad Sci USA. 2006;103:19719–19724. doi: 10.1073/pnas.0609445104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kaneko R, et al. Allelic gene regulation of Pcdh-alpha and Pcdh-gamma clusters involving both monoallelic and biallelic expression in single Purkinje cells. J Biol Chem. 2006;281:30551–30560. doi: 10.1074/jbc.M605677200. [DOI] [PubMed] [Google Scholar]
- 18.Wu Q, et al. Comparative DNA sequence analysis of mouse and human protocadherin gene clusters. Genome Res. 2001;11:389–404. doi: 10.1101/gr.167301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zou C, Huang W, Ying G, Wu Q. Sequence analysis and expression mapping of the rat clustered protocadherin gene repertoires. Neuroscience. 2007;144:579–603. doi: 10.1016/j.neuroscience.2006.10.011. [DOI] [PubMed] [Google Scholar]
- 20.Tasic B, et al. Promoter choice determines splice site selection in protocadherin alpha and gamma pre-mRNA splicing. Mol Cell. 2002;10:21–33. doi: 10.1016/s1097-2765(02)00578-6. [DOI] [PubMed] [Google Scholar]
- 21.Venkatesh B, et al. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii) genome. PLoS Biol. 2007;5:e101. doi: 10.1371/journal.pbio.0050101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kasahara M, et al. The medaka draft genome and insights into vertebrate genome evolution. Nature. 2007;447:714–719. doi: 10.1038/nature05846. [DOI] [PubMed] [Google Scholar]
- 23.Venkatesh B, et al. Ancient noncoding elements conserved in the human genome. Science. 2006;314:1892. doi: 10.1126/science.1130708. [DOI] [PubMed] [Google Scholar]
- 24.Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. [DOI] [PubMed] [Google Scholar]
- 25.Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25:4876–4882. doi: 10.1093/nar/25.24.4876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997;13:555–556. doi: 10.1093/bioinformatics/13.5.555. [DOI] [PubMed] [Google Scholar]
- 27.Wernersson R, Pedersen AG. RevTrans: Multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res. 2003;31:3537–3539. doi: 10.1093/nar/gkg609. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.