Abstract
One of the most important measures for detecting molecular adaptations between species/lineages at the gene level is the comparison of relative fixation rates of synonymous (dS) and non-synonymous (dN) mutations. This study shows that the branch model is sensitive to tree topology and proposes an alternative approach, devogs, which does not require phylogenetic topology for analysis. We compared devogs with a branch model method using virtual data and a varying ω ratio, in which parameters were obtained from real data. The positive predictive value, sensitivity, and specificity of the branch model were affected by the phylogenic tree topology. Devogs showed greater positive predictive value, whereas the branch model method had greater sensitivity. In a working example using devogs, a group of human RNA polymerase II-related genes, which are important in mediating alternative splicing, were significantly accelerated compared to four other mammals.
Keywords: devogs, dN/dS, branch model
Introduction
One of the most important measures for detecting molecular adaptation is to compare the relative substitution rates of synonymous (dS) and non-synonymous (dN) mutations.1 The non-synonymous/synonymous rate ratio (ω = dN/dS) measures selective pressure, with ω = 1 indicative of neutral mutation, ω < 1 indicative of purifying or negative selection, and ω > 1 indicative of positive diversifying selection. Several methods have been developed to apply this criterion to particular lineages on a phylogeny (branch methods)2,3 or to subsets of gene sites (site methods).4–7 Based on the site or branch-site model, a series of likelihood ratio tests (LRTs) have been used in a comprehensive examination of positively selected genes in six eutherian mammals.8 Current methods for detecting molecular adaptation use phylogenetic trees for analysis. However, phylogenetic relationships require further examination before they can be confirmed, although the phylogenetic relationships of orders have been addressed in several recent molecular studies. Previous studies have yielded inconsistent results with respect to some ordinal relationships. For example, the phylogenetic positions of rodents, primates, and carnivores remain unclear. Traditional morphology supports a primate–rodent clade,9 but molecular studies support either a primate–rodent clade10 or a primate–carnivore clade.11 Even a primate–cetartiodactyla clade is supported by mitochondrial DNA analysis.12 The phylogenetic tree of Brucella, a genus of host-specific bacteria, is not consistent with host species; for example, Brucella canis (dog host) is closer to Brucella suis (pig host) than to another dog host in the phylogenetic tree.13
Here, we propose an alternative approach for detecting accelerated molecular evolution between species at the individual gene level using relative criteria and not phylogenetic trees. This method uses similar data structures and analysis approaches as gene expression microarray data, which involves the identification of “differentially evolved genes” (devogs). This method can be used with data from one gene up to large comparative genomic data sets with no prior biological assumptions, which may be responsible for the evolutionary differentiation between two clades of interest. We show that the present branch method is significantly affected by phylogenetic tree topology using various phylogenetic trees with real data from five mammalian species. We also compare devogs with the present method in terms of positive predictive value (PPV), which is defined as the proportion of predicted positives that are actually positive, sensitivity, and specificity using virtually evolved data from five real mammals. Additionally, we compare human-specific accelerated genes with four other mammals using devogs. Devogs has already been applied to an evolutionary study in avian lineages at the turkey genome project.14
Methods
Concept
Let ωij(k) be the pairwise ω ratio between i and j species of an orthologous gene k∈O, where i and j∈S are in a species set S and their orthologous set O. For example, six pairs (ωab(k), ωac(k), ωad(k), ωbc(k), ωbd(k), and ωcd(k)) were generated from four species: a, b, c, and d. Focusing on species a, the pair ωij can be divided into two groups: ω*a(k) = {ωij(k) | i or j is a, a, i, j∈S, k∈O} and ω^a(k) = {ωij(k) |i and j are not a, a, i, j∈S, k∈O} (Fig. 1). Orthologous gene k under accelerated evolution within species a would increase the mean ω ratio in ω*a(k) compared to ω^a(k). Verification of whether the mean ω ratio of ω*a(k) is higher than ω^a(k) can be used to detect accelerated evolution within species a.
Figure 1.
Schematic view of two groups of the ω ratio. a, b, c, and d represent species. ωij(k) represents the pairwise ω ratio between i and j species of k orthologous genes.
Implementation
t-test comparison between ω*a(k) and ω^a(k) was used to identify differences between the two means. We normalized the ω ratios because one of the assumptions of the t-test is data normality. Base-2-logarithm transformation of the ω ratio was performed since the ω ratio was similar to the red/green (R/G) intensity of two-channel expression microarrays. Quantile normalization Q: x → quantile normalized x on {log2Ωij| i, j∈S} was then performed, where Ωij was {ωij(k)| i, j∈S, k∈O} and log2Ωij was {log2ωij(k)| i, j∈S, k∈O}. The assumption that bias exists between Ωab and Ωcd is more reasonable than supposing that evolution between species a and b is faster than that in species c and d when the mean ω ratio of Ωab is higher than that of Ωcd (a ≠ b ≠ c ≠ d, a, b, c, d∈S). The bias among {Ωij | i, j∈S} affects the t-test. For example, the mean of ω*a(k) is higher than ω^a(k) in many orthologous genes when Ω*a is higher than Ω^a, where Ω*a is {ω*a(k) | a∈S, k∈O} and Ω^a is {ω^a(k)|a∈S, k∈O}.
Orthologous gene k was verified as an accelerated gene based on two conditions: the mean of {Q(log2ω*a(k))|a∈S, k∈O} was higher than the mean of {Q(log2ω^a(k))|a∈S, k∈O} and the P-value of ω*a(k) against ω^a(k) was below 0.05, which considered to be significant. We referred to this method as devogs, or identification of differentially evolved genes.
In addition to correction for multiple hypothesis testing, Benjamini and Hochberg false discovery rates can be applied.
Validation
Branch model under various phylogenic tree topologies
To examine the consistency of the branch method with tree structures, we performed the branch method under various tree topologies. The tree topologies were obtained from real data. We prepared 10,891 orthologs of human, mouse, rat, dog, and opossum from ENSEMBL.15 Trees for each ortholog were generated using Mrbayes v3.1.2 (n-generation: 2 × 105, burn-in: 2 × 103).16 Distinct tree topologies from all orthologs were identified using TOPD-fMtS v3.3 (split method).17 Branch model using codeml in PAML4.2 (F3X4) was applied to all orthologs with changing tree topologies along distinct trees.18
Creation of simulated test data and evaluation of devogs and the branch model
To compare the performance of devogs and the branch model in codeml in PAML4.2,18 we generated simulated data with evolved genes. We then used the two methods to evaluate the simulated data and measured their performance.
Generation of test data
To compare the two methods under real conditions, we constructed virtual data using real parameters, including tree topology (excluding ω, which is the variable being manipulated). A total of 30 orthologous gene sets, which reduced the computational burden, were selected randomly from the genomes of human, mouse, rat, dog, and opossum obtained from ENSEMBL15 to choose parameter values, including codon usage, substitutions per codon, and sequence length. Test DNA sequences from the five species were generated using the evolver program in PAML4.2 (kappa = 2.0, tree length = 0) based on the actual parameters.18 The phylogenetic tree and branch lengths converted to species divergence times, as required by the evolver program, were obtained from a previous report (Fig. 2).19 In addition, the branch lengths were rescaled based on the average number of substitutions per codon between mice and humans. We included an accelerated branch with a higher ω ratio to compare human, mouse, rat, dog, and opossum branches when generating data. Differences in the ω ratio between the accelerated branch and other branches ranged from 0 to 1.2 (steps of 0.1).
Figure 2.

Branch length for generating virtual data.
Testing devogs and the branch model with virtual data
Orthologous sequences were aligned using ClustalW220 and alignments were converted to codon alignments using pal2nal.21 Both devogs and the branch model were applied to the data. Each species was set as the foreground once to test acceleration. In this case, we estimated ω values for devogs using codeml in PAML4.2 (runmode = −2 option),18 which adopts the maximum-likelihood method. Compared to the branch model, the only difference was that tree topology information was not used when estimating pairwise ω between two species; therefore, the comparison is valid and not affected by bias in maximum-likelihood methods.
Devogs analysis of real data
We downloaded 1:1 orthologous protein and reference mRNA sequences of human, mouse, rat, dog, and opossum from ENSEMBL.15 The phylogenetic trees were obtained from a previous report.19 A total of 10,891 1:1 orthologous genes for the five species were collected, and the orthologous gene sets were aligned using ClustalW2.20 The devogs method was applied for identification of accelerated genes in humans. Orthologs with dS > 3 or ω > 5 were filtered.22,23 A total of 8,407 orthologs were examined.
Results
Branch model under various phylogenic tree topologies
A total of 10,889 gene trees for 10,891 orthologs from human, mouse, rat, dog, and opossum from ENSEMBL15 were generated using Mrbayes.16 The 10,889 trees were grouped into 14 distinct topologies. Trees were arranged in order of the number of orthologous genes mapping to each tree, which were named A, B, C, … N. The branch model was used to detect genes showing accelerated evolution in each of five lineages within each of these 14 trees.
Some genes showed evidence of accelerated evolution. However, the number of predicted accelerated genes varied from 9.7%–15.2% between tree topologies. Figure 3 shows the statistical results of the branch model for all orthologous gene sets in the 14 distinct gene trees. The accelerated genes in the lineage with their own trees were considered to be true accelerated genes for the calculation of PPV, sensitivity, and specificity. The positive predictive value (PPV), which is defined as the proportion of predicted positives that are positive, varied from 0.17–0.98 among gene trees. The degree of variation of the PPV in the tree topologies was high for mouse and rat. The sensitivities also showed wide variation, with values ranging from 0.29–0.98. The degree of decrease in sensitivity on the minor trees was high for humans. The sensitivities of mice and rats were typically higher than the others, in contrast to PPV. Specificities were high for all trees, ranging from 0.77–0.98. However, the specificities of mice and rats in the G, H, I, M, and N trees were relatively low, which reduced the PPV of mice and rats in the trees, despite their high sensitivity.
Figure 3.
Statistical representations of performance for detecting accelerated genes from each foreground lineage using the branch model with real orthologous gene sets on 14 trees. Positive predictive value = TP/(TP + FP). Sensitivity = TP/(TP + FN). Specificity = TN/(TN + FP). TP: number of true-positives; FP: number of false-positives; TN: number of true-negatives; FN: number of false-negatives.
Comparison of the performance of devogs and the branch model for simulated data
We compared the devogs approach with a branch model method using virtual data from human, mouse, rat, dog, and opossum, which evolved under varying ω ratios. To compare the two methods under realistic conditions, the parameters, including tree topology (excluding ω), were obtained from real data in ENSEMBL.15 A total of 10,965 orthologous sets were generated after filtering, and devogs and the branch model were used to detect accelerated evolution of one species at a time among the five species (five times in total). A total of 54,825 examinations were conducted using both models. Additionally, we included an altered devogs approach where normalization and t-tests were replaced with the non-parametric Mann Whitney U-test.
Of the 54,825 tests, the devogs method predicted that a total of 5,514 genes (10.0%) showed evidence of accelerated evolution. Additionally, 4,630 (84.0%) of these genes were true-positives. Devogs with the Mann Whitney U-test predicted that a total of 5,690 genes (10.4%) showed evidence of accelerated evolution. Of these, 4,909 (86.3%) genes were true-positives. The branch model predicted that a total of 7,781 genes (14.2%) showed evidence of accelerated evolution. Of these, 6,419 (82.5%) genes were true-positives.
Figure 4 shows the Venn diagram of positives and true-positives from two types of devogs and the branch model. Many overlapped genes were identified among the models. There are more overlapped genes per model than non-overlapped genes in all models, whereas the branch model contained the largest number of non-overlapped genes (2,609 genes) and devogs incorporating the Mann Whitney U-test showed the lowest number of non-overlapping genes (234 genes). The ratio of true-positives was higher for overlapped genes than non-overlapped genes, and the ratio of true-positives was highest (95.4%) at the intersection of the three models. As shown in Table S1, total PPV, sensitivity, and specificity of both methods were similar. PPV and specificity of devogs with the t-test were slightly higher than the branch model, and sensitivity of the branch model was higher than that of devogs. Sensitivity of devogs with the Mann Whitney U-test was higher than that of devogs with the t-test.
Figure 4.
Venn diagram of positives and true positives from two types of devogs and the branch model. Devogs T: devogs with t-test. Devogs MW: devogs with Mann Whitney U-test.
Next, we measured the performance of both methods under varying ω ratio differences between the accelerated branch and other branches from 0–1.2 (steps of 0.1). Figure 5 shows the statistical measures of performance with differences between foreground and background ω ratios. The “longs” is the group consisting of humans, dogs, and opossums with long-branch lengths between the species, as shown in Figure 2. The “shorts” is the group consisting of the mouse and rat, with a short-branch length between species. We divided the species into short- and long-branch length groups based on the results of foreground ω having a tendency to be grouped into mainly two patterns according to branch length (Fig. S1).
Figure 5.
Statistical assessments of the performance of two types of devogs and the branch model based on differences in the ω ratio between foreground and background lineages from virtually accelerated genes through simulation. The longs are the human, dog, and opossum groups, while the shorts are the mouse and rat groups. Devogs T: devogs with t-test. Devogs MW: devogs with Mann-Whitney U-test.
PPVs of all methods (excluding the short group in the branch model) are high and increased rapidly with increasing differences in ω ratios between the accelerated branch and other branches (Fig. 5). The PPV of the branch model in the short group peaked at 0.73 at a difference in ω ratios of 0.2, and decreased to 0.54. The overall sensitivity increased gradually with increasing differences in the ω ratio (excluding the short group in devogs). The curve slope of the short group was lower than that of the long group. The sensitivity in the short group of devogs incorporating the t-test remained very low (<0.10) across all differences in ω ratio, whereas the sensitivity in the short group of devogs with the Mann Whitney U-test increased gradually to 0.70. Specificities were very high from a difference in ω ratio of 0, and increased linearly to 1.0, excluding the short group in the branch model. The specificity of the short group in the branch model decreased from 0.97 to 0.79 with increasing differences in the ω ratio.
Devogs analysis of real data
We applied devogs analysis to identify genes with accelerated evolution in humans compared to other mammals with 1:1 orthologous sequences of human, mouse, rat, dog, and opossum from ENSEMBL.15 A total of 50 genes showed accelerated evolution in humans compared with other species with FDR < 0.05 from 8,407 orthologous genes with dS ≤ 3 (Table S2). Differences in the ω ratio between these two groups ranged from 0.005–0.603, although the ω ratios of the group pairing with humans were <1. The difference in the ω ratio of the NAA30 gene was 0.443. Differences in the ω ratio of ETV6, LEPROTL1, TM4SF19, SMEK1, and HTRA4 were all >0.20. Gene enrichment analysis of GO terms was performed using the DAVID functional annotation tool24 with thresholds (count 2, EASE 0.1). Table 1 shows enriched GO terms of biological process, cellular component, and molecular function. Terms related to RNA polymerase II were also significantly enriched.
Table 1.
Functional annotation of enriched GO terms for human accelerated genes detected using devogs.
| Category | Term | P-value* |
|---|---|---|
| Bioprocess | ER-associated protein catabolic process | 0.062 |
| Cellular component | DNA-directed RNA polymerase II, core complex | 0.043 |
| Nuclear DNA-directed RNA polymerase complex | 0.075 | |
| DNA-directed RNA polymerase complex | 0.075 | |
| RNA polymerase complex | 0.078 | |
| Molecular function | Protein kinase activity | 0.003 |
| N-acetyltransferase activity | 0.012 | |
| Acetyltransferase activity | 0.016 | |
| N-acyltransferase activity | 0.017 | |
| Insulin-like growth factor binding | 0.058 | |
| Protein heterodimerization activity | 0.088 | |
| RNA polymerase activity | 0.098 | |
| DNA-directed RNA polymerase activity | 0.098 |
Note:
EASE score.
We also applied the branch model to the same actual data. A total of 553 genes were predicted to show accelerated evolution. Additionally, 39 of 50 genes identified during devogs analysis agreed with genes from the branch model (Table S3). Agreement of genes between devogs and the branch model increased as the P-values of the branch model results decreased (Fig. S3).
Discussion
Performance of the branch model and devogs approaches under various phylogenic tree topologies
The accuracy of results obtained using the branch model was affected by the phylogenic tree topology. In addition, use of the incorrect tree topology reduced overall accuracy. A PPV of 0.17, as shown in Figure 3, indicated that 83% of the results from the branch model were affected by tree topology, although very low PPVs were determined for specific orthologous genes in our experiment. However, devogs does not require phylogenetic tree topology during the analysis process. Therefore, devogs was not affected by phylogenic topology.
Comparison of the performance of devogs and branch model approaches
Both devogs and the branch model approaches showed acceptable overall performance, as shown in Figures 4 and 5. However, both devogs and the branch model showed low performance for detecting accelerated genes in mice and rats. This may be because the numbers of synonymous and non-synonymous substitutions between the mouse and rat genes were very small and did not cause statistically significant differences in ω ratios, as the branch lengths of the mouse and rat are relatively shorter than the other lineages, as shown in Figure 2. However, the responses of devogs and the branch model under their low performance conditions differed significantly. Sensitivity was markedly reduced in devogs, while PPV was reduced in the branch model. This difference in responses may explain the advantages and disadvantages of using tree topology information, particularly under low performance conditions since the only difference in input information for the branch model compared to devogs is tree topology information. The topology information input into the branch model may increase sensitivity, but may also increase the false-positive rate, thus decreasing specificity and PPV. In actual analysis, the true-positive and true-negative conditions as determined by the gold standard remain unknown, while the positive and negative values from the test outcome are available. False-positives are problematic for identifying truly accelerated genes, although the sensitivity of the model has been shown to be high in many studies. However, PPV can be used since it is defined as the sum of true positives divided by the test outcome positives. An advantage of devogs is that the number of false-positives is low, despite the reduction in sensitivity under low-performance conditions.
The t-test used in the devogs analysis has several underlying assumptions. The first is that the data follow a standard normal distribution under the null hypothesis. The distribution of ω is positively skewed as ω itself is a ratio value of the non-synonymous rate divided by the synonymous rate, and genes are typically under purifying selection. We adjusted the skewness by log2 transformation to approximate the normal distribution (Fig. S2) to fulfill this assumption, although the t-test is sufficiently robust to moderate violations of the normality assumption.25 However, another assumption is that the data used to perform the t-test should be sampled independently from the two populations being compared. The pairwise ω ratio is calculated on the branch across the two species. However, ω*a(k) and ω^a(k) which are compared by t-tests, share some branches between the two groups, as shown in Figure 1. This violates the assumption of independency in the t-test and may reduce devogs sensitivity. If rapid evolution occurred in the branch, both means of the two ω-ratio groups would increase; conversely, if evolution were suppressed, both means of the two ω-ratio groups would decrease. This would reduce the differences in means between the two groups, which may reduce the discrimination of t-tests and lower the sensitivity of devogs. Alternatively, the Mann Whitney U-test (a non-parametric test) can be applied. We generated receiver operating characteristic and PPV curves using the P-value (Figs. S4 and S5).
Devogs analysis of real data
A total of 50 genes showed accelerated evolution for humans compared with the other four species when analyzing real data with devogs. Terms related to RNA polymerase II were significantly enriched from gene enrichment analysis with this set of 50 genes. Developmentally complex organisms do not appear to be distinguished by the total number of genes they encode, but rather by the number of ways these genes can be expressed and controlled. The surprisingly small number of genes found in the human genome26 illustrates the importance of evolutionary advances in the control of gene expression. RNA polymerase II in animals has a very important role in mediating alternative splicing of exon junctions to produce different tissue-specific or developmentally specific products from the same gene. The C-terminal domain of RNA polymerase II binds the mediator that transduces control signals to the polymerase II promoter complex, as well as recruits serine-arginine proteins and other splicing factors to the elongating message.27 Rapid evolution of genes related to the RNA polymerase promoter or RNA elongation may be related to the observation that approximately 40% of genes in the human genome are subject to such alternative splicing, resulting in a more than a three-fold increase in the complexity of gene products over gene content.28 Similar trends between devogs and the branch model with actual data are observed when compared with our virtual data analysis (Fig. 4). The branch model predicted many accelerated genes, and non-overlapped genes were primarily identified using the branch model. The ratio of true-positives may be higher in overlapped genes when we refer our virtual data analysis (Fig. 4). Additionally, most genes (39 of 50) overlapped between devogs and the branch model, and the concordance of genes between devogs and branch model increased as the P-value of the branch model decreased.
Devogs can be used to complement the branch model method and yields reliable results under marginal conditions but has low sensitivity, such as short evolutionary distances between lineages, and makes no assumptions regarding phylogenetic relationships. The branch model is accurate with well-defined phylogenetic tree structures and longer distances between lineages. However, the devogs method currently corresponds only to the branch model. Therefore, further studies are required to optimize devogs and apply it to detect positive selection in sites level with particular lineages corresponding to the branch-site model.29,30
Supplementary Data
Supporting information
Footnotes
Author Contributions
Analyzed the data: K-WK, HK. Wrote the first draft of the manuscript: K-WK. Contributed to the writing of the manuscript: DWB, SC. Agree with manuscript results and conclusions: K-WK, HK, DWB, SC. Made critical revisions and approved final version: K-WK, SC, DWB. All authors reviewed and approved of the final manuscript.
Competing Interests
Author(s) disclose no potential conflicts of interest.
Disclosures and Ethics
As a requirement of publication the authors have provided signed confirmation of their compliance with ethical and legal obligations including but not limited to compliance with ICMJE authorship and competing interests guidelines, that the article is neither under consideration for publication nor published elsewhere, of their compliance with legal and ethical guidelines concerning human and animal research participants (if applicable), and that permission has been obtained for reproduction of any copyrighted material. This article was subject to blind, independent, expert peer review. The reviewers reported no competing interests.
Funding
This work was supported by a grant (PJ009019, PJ009032) from Next-Generation BioGreen 21 Program, Rural Development Administration, Republic of Korea.
References
- 1.Miyata T, Yasunaga T. Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application. J Mol Evol. 1980;16(1):23–36. doi: 10.1007/BF01732067. [DOI] [PubMed] [Google Scholar]
- 2.Messier W, Stewart CB. Episodic adaptive evolution of primate lysozymes. Nature. 1997;385(6612):151–4. doi: 10.1038/385151a0. [DOI] [PubMed] [Google Scholar]
- 3.Yang Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1998;15(5):568–73. doi: 10.1093/oxfordjournals.molbev.a025957. [DOI] [PubMed] [Google Scholar]
- 4.Fitch WM, Bush RM, Bender CA, Cox NJ. Long term trends in the evolution of H(3) HA1 human influenza type A. Proc Natl Acad Sci U S A. 1997;94(15):7712–8. doi: 10.1073/pnas.94.15.7712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Suzuki Y, Gojobori T. A method for detecting positive selection at single amino acid sites. Mol Biol Evol. 1999;16(10):1315–28. doi: 10.1093/oxfordjournals.molbev.a026042. [DOI] [PubMed] [Google Scholar]
- 6.Nielsen R, Yang Z. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 1998;148(3):929–36. doi: 10.1093/genetics/148.3.929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Yang Z, Nielsen R, Goldman N, Pedersen AM. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000;155(1):431–49. doi: 10.1093/genetics/155.1.431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kosiol C, Vinar T, da Fonseca R, et al. Patterns of positive selection in six mammalian genomes. PLoS Genetics. 2008;4(8):e1000144. doi: 10.1371/journal.pgen.1000144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Shoshani J, McKenna MC. Higher taxonomic telationships among extant mammals based on morphology, with selected comparisons of results from molecular data. Mol Phylogenet Evol. 1998;9(3):572–84. doi: 10.1006/mpev.1998.0520. [DOI] [PubMed] [Google Scholar]
- 10.Murphy WJ, Eizirik E, Johnson WE, Zhang YP, Ryder OA, O’Brien SJ. Molecular phylogenetics and the origins of placental mammals. Nature. 2001;409(6820):614–8. doi: 10.1038/35054550. [DOI] [PubMed] [Google Scholar]
- 11.Graur D. Towards a molecular resolution of the ordinal phylogeny of the eutherian mammals. FEBS Lett. 1993;325(1–2):152–9. doi: 10.1016/0014-5793(93)81432-y. [DOI] [PubMed] [Google Scholar]
- 12.Arnason U, Gullberg A, Janke A, Xu X. Pattern and timing of evolutionary divergences among hominoids based on analyses of complete mtDNAs. J Mol Evol. 1996;43(6):650–61. doi: 10.1007/BF02202113. [DOI] [PubMed] [Google Scholar]
- 13.Wattam AR, Williams KP, Snyder EE, et al. Analysis of ten Brucella genomes reveals evidence for horizontal gene transfer despite a preferred intracellular lifestyle. J Bacteriol. 2009;191(11):3569. doi: 10.1128/JB.01767-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dalloul RA, Long JA, Zimin AV, et al. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol. 2010;8(9):e1000475. doi: 10.1371/journal.pbio.1000475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hubbard T, Barker D, Birney E, et al. The Ensembl genome database project. Nucleic Acids Res. 2002;30(1):38–41. doi: 10.1093/nar/30.1.38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ronquist F, Huelsenbeck J. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19(12):1572–4. doi: 10.1093/bioinformatics/btg180. [DOI] [PubMed] [Google Scholar]
- 17.Puigbò P, Garcia-Vallvé S, McInerney J. TOPD/FMTS: a new software to compare phylogenetic trees. Bioinformatics. 2007;23(12):1556–8. doi: 10.1093/bioinformatics/btm135. [DOI] [PubMed] [Google Scholar]
- 18.Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–91. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
- 19.Thomas JV. Comparative Vertebrate Genomics. Comparative Genomics: Basic and Applied Research. 2007:105. [Google Scholar]
- 20.Larkin M, Blackshields G, Brown N, et al. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23(21):2947–8. doi: 10.1093/bioinformatics/btm404. [DOI] [PubMed] [Google Scholar]
- 21.Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34(Web Server issue):W609–12. doi: 10.1093/nar/gkl315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Castillo-Davis C, Kondrashov F, Hartl D, Kulathinal R. The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint. Genome Res. 2004;14(5):802–11. doi: 10.1101/gr.2195604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Peacock CS, Seeger K, Harris D, et al. Comparative genomic analysis of three Leishmania species that cause diverse human disease. Nat Genet. 2007;39(7):839–47. doi: 10.1038/ng2053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.da Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2008;4(1):44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
- 25.Sawilowsky SS, Blair RC. A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. Psychological Bulletin. 1992;111(2):352–60. [Google Scholar]
- 26.Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science. 2001;291(5507):1304–51. doi: 10.1126/science.1058040. [DOI] [PubMed] [Google Scholar]
- 27.Stiller JW, Hall BD. Evolution of the RNA polymerase II C-terminal domain. Proc Natl Acad Sci U S A. 2002;99(9):6091–6. doi: 10.1073/pnas.082646199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Brett D, Hanke J, Lehmann G, et al. EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett. 2000;474(1):83–6. doi: 10.1016/s0014-5793(00)01581-7. [DOI] [PubMed] [Google Scholar]
- 29.Zhang J, Nielsen R, Yang Z. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005 Dec;22(12):2472–9. doi: 10.1093/molbev/msi237. [DOI] [PubMed] [Google Scholar]
- 30.Yang Z, dos Reis M. Statistical properties of the branch-site test of positive selection. Mol Biol Evol. 2011;28(3):1217–28. doi: 10.1093/molbev/msq303. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting information




