Abstract
The reconstruction of phylogenies is becoming an increasingly simple activity. This is mainly due to two reasons: the democratization of computing power and the increased availability of sophisticated yet user-friendly software. This review describes some of the latest additions to the phylogenetic toolbox, along with some of their theoretical and practical limitations. It is shown that Bayesian methods are under heavy development, as they offer the possibility to solve a number of long-standing issues and to integrate several steps of the phylogenetic analyses into a single framework. Specific topics include not only phylogenetic reconstruction, but also the comparison of phylogenies, the detection of adaptive evolution, and the estimation of divergence times between species.
1. INTRODUCTION
Human cultures have always been fascinated by their origins as a means to define their position in the world, and to justify their hegemony over the rest of the living world. However, scientific (testable) predictions about our origins had to wait for Darwin [1] and his intellectual descendents first to classify [2] and then to reconstruct the natural history of replicating entities, and hereby to kick-start the field of phylogenetics [3, 4]. Rooted in the comparison of morphological characters, phylogenies have for the past four decades focused on the relationships between molecular sequences (e.g., [4]), potentially helped by incorporating morphological information [5], in order to infer ancestor-to-descendent relationships between sequences, populations, or species.
Today, molecular phylogenies are routinely used to infer gene or genome duplication events [6], recombination [7], horizontal gene transfer [8], variation of selective pressures and adaptive evolution [9], divergence times between species [10], the origin of genetic code [11], elucidate the origin of epidemics [12], and host-parasite cospeciation events [13, 14]. As complementary tools for taxonomy (DNA barcoding: [15]), they have also contributed to the formulation of strategies in conservation biology [16]. In addition to untangling the ancestral relationships relating a group of taxa or of a set of molecular sequences, phylogenies have also been used for some time outside of the realm of biological sciences as for instance in linguistics [17, 18] or in forensics [19, 20].
Most of these applications are beyond the scope of plant genomics, but they all suggest that sophisticated phylogenetic methods are required to address most of today's biological questions. While parsimony-based methods are both intuitive and extremely informative, for instance to disentangle genome rearrangements [21], they also have their limitations due to their minimizing the amount of change [22]. These limitations become particularly apparent when analyzing distantly related taxa. A means to overcome, at least partly, some of these difficulties is to adopt a model-based approach, be in a maximum likelihood or in a Bayesian framework. These two frameworks are extremely similar in that they both rely on probabilistic models. Bayesian approaches offer a variety of benefits when compared to traditional maximum likelihood, such as computing speed (although this is not necessarily true, especially under complex models), sophistication of the model, and an appropriate treatment of uncertainty, in particular the one about nuisance parameters. As a result, Bayesian approaches often make it possible to address more sophisticated biological questions [23], which usually comes at the expense of longer computing times and higher memory requirements than when using simpler models.
Because it is not possible or even appropriate to discuss all the latest developments in a given field of study, this review will focus on a very limited number of key phylogenetic topics. Of notable exceptions, recent developments in phylogenetic hidden Markov models [24] or applications that map ancestral states on phylogenies [25] are not treated. We focus instead on the very first steps involved in most phylogenetic analysis, ranging from reconstructing a tree to estimating selective pressures or species divergence times. For each of these steps, some of the most recent theoretical developments are discussed, and pointers to relevant software are provided.
2. RECONSTRUCTING PHYLOGENETIC TREES
2.1. Sequence alignment
The first step in reconstructing a phylogenetic tree from molecular data is to obtain a multiple sequence alignment (MSA) where sequence data are arranged in a matrix that specifies which residues are homologous [26]. A large number of methods and programs exist [27] and most have been evaluated against alignment databases [28], so that it is possible to provide some general guidelines.
The easiest sequences to align are probably those of protein-coding genes: proteins diverge more slowly than DNA sequences and, as a result, proteins are easier to align. The rule-of-thumb is therefore first to translate DNA to amino acid sequences, then perform the alignment at the protein level, before back-translating to the DNA alignment in a final step. This procedure avoids inserting gaps in the final DNA alignment that are not multiple of three and that would disrupt the reading frame. Translation to amino acid sequences can be done directly when downloading sequences, for instance from the National Center for Biotechnology Information (NCBI: www.ncbi.nlm.nih.gov). A number of programs also allow users to perform this translation locally on their computers from an appropriate translation table (e.g., DAMBE [29], MEGA [30, 31]; see Table 1). The second step is to perform the alignment at the protein level. Again, a number of programs exist, but ProbCons [32] appears to be the most accurate single method [33]. An alternative for using one single alignment method is to use consensus or meta-methods, that is, to combine several methods [27]. Meta-methods such as M-Coffee can return better MSAs almost twice as often as ProbCons [34]. Finally, when the alignment is obtained at the protein level, back-translation to the DNA sequences can be performed either by using a program such as DAMBE, CodonAlign [35], or by using a dedicated server such as protal2dna (http://bioweb. pasteur.fr/seqanal/interfaces/protal2dna.html) or Pal2Nal (coot.embl.de/pal2nal).
Table 1.
Name | Method | Platform | GUI | Inference | Reference |
---|---|---|---|---|---|
BAMBE | Bayes | DOS, MacOS, Unix | No | Tree | [36] |
BayesPhylogenies | Bayes | DOS, MacOS, Unix | No | Tree | [37] |
BAli-Phy | Bayes | DOS, MacOS, Unix | No | Simultaneous alignment and tree | [38] |
BEAST | Bayes | Windows, MacOS, Unix | Yes | Tree, times | [39] |
CONSEL | ML | DOS, MacOS, Unix | No | Tree comparison | [40] |
DAMBE | Distances, parsimony, ML | Windows | Yes | Tree | [29] |
GARLI | ML (Genetic Algorithm) | Windows, MacOS, Unix | Yes | Tree | [41] |
HyPhy | ML | Windows, MacOS, Unix | Yes | Tree, selection, recombination, tree comparison, | [42] |
MEGA | Distances, parsimony | Windows | Yes | Tree, times | [30, 31] |
MrBayes | Bayes | DOS, MacOS, Unix | No | Tree, selection | [43, 44] |
Multidivtime | Bayes | DOS, MacOS, Unix | No | Times | [45–47] |
OmegaMap | Bayes | DOS, MacOS, Unix | No | Simultaneous selection and recombination | [48] |
PAML | ML | DOS, MacOS, Unix | No | Tree, tree comparison, times, selection | [49, 50] |
PAUP∗ | Distances, parsimony, ML | DOS, MacOS, Unix | No | Tree | [51] |
PhyloBayes | Bayes | DOS, MacOS, Unix | No | Tree, tree comparison | [52] |
PHYML | ML | DOS, MacOS, Unix | No | Tree | [53] |
RAxML | ML | DOS, MacOS, Unix | No | Tree | [54] |
r8s | PL | DOS, MacOS, Unix | No | Times | [55] |
The alignment of rRNA genes with the constraint of secondary structure has now been frequently used in practical research in molecular evolution and phylogenetics [56–60]. The procedure is first to obtain reliable secondary structure and then use the secondary structure to guide the sequence alignment. This has not been automated so far, although both Clustal [61, 62] and DAMBE have some functions to alleviate the difficulties.
What to do with other noncoding genes is still an open question, especially when it comes to aligning a large number (>100) of long (>20,000 residues) and divergent sequences (<25% identity). Some authors have attempted to provide rough guidelines to choose the most accurate program depending on these parameters [28]. However, accuracy figures are typically estimated over a large number of test alignments and may not reflect the accuracy that is expected for any particular alignment [28]. More crucially, most of the alignment programs were developed and benchmarked on protein data, so that their accuracy is generally unknown for noncoding sequences [28]. A very general recommendation is then to use different methods [63] and meta-methods. Those parts of the alignment that are similar across the different methods are probably reliable. The parts that differ extensively are often simply eliminated from the alignment when no external information can be used to decide which positions are homologous. Poorly aligned regions can cause serious problems as, for instance, when analyzing rRNA sequences in which conserved domain and variable domains have different nucleotide frequencies [60]. A simple test of the reliability of an alignment consists in reversing the orientation of the original sequences, and performing the alignment again; because of the symmetry of the problem, reliable MSAs are expected to be identical whichever direction is used to align the sequences [64]. These authors further show that reliability of MSAs decreases with sequence divergence, and that the chance of reconstructing different phylogenies increases with sequence divergence. More sophisticated methods also permit the direct measure of the accuracy of an alignments or the estimation of a distance between two alignments [65]. Applications of Bayesian inference strictly to pairwise [66] and multiple [67, 68] sequence alignment are still in their infancy.
Whichever method is used to obtain an MSA, a final visual inspection is required, and manual editing is often needed. To this end, a number of editors can be used such as JalView [69].
Because an MSA represents a hypothesis about sitewise homology at all the positions, obtaining an accurate MSA presents some circularity; an accurate MSA often necessitates an accurate guide tree, which in turn demands an accurate alignment. The early realization of this “chicken-egg” conundrum led to the idea that both the MSA and the phylogeny should be estimated simultaneously [70]. Although this initial algorithm was parsimony-based, it was already too complex to analyze more than a half-dozen sequences of 100 sites or more. Subsequent parsimony-based algorithms allowed the analysis of larger data sets [71] but still showed some limitations when sequence divergence increases. More recently, a Bayesian procedure was described and implemented in a program, BAli-Phy, where uncertainties with respect to the alignment, the tree, and the parameters of the substitution model are all taken into account [38] (see also [72]). Uncertain alignments are a potential problem in large-scale genomic studies [73] or in whole-genome alignments [74]. In these contexts, disregarding alignment uncertainty can lead to systematic biases when estimating gene trees or inferring adaptive evolution [73, 74]. However, these complex Bayesian models [38, 72, 73] still require some nonnegligible computing time and resource, and to date, their performance in terms of accuracy is still unclear.
2.2. Selection of the substitution model
Once a reliable MSA is obtained, the next step in comparing molecular sequences is to choose a metric to quantify divergence. The most intuitive measure of divergence is simply to count the proportion of differences between two aligned sequences (e.g., [75]). This simple measure is known as the p distance. However, because the size of the state space is finite (four letters for DNA, 20 for amino acids, and 61 for sense codons), multiple changes at a position in the alignment will not be observable, and the p distance will underestimate evolutionary distances even for moderately divergent sequences. This phenomenon is generally referred to as saturation. Corrections were devised early to help compensate for saturation. Some of the most famous named nucleotide substitution models are the Jukes-Cantor model or JC [76], the Kimura two-parameter model or K80 [77], the Hasegawa-Kishino-Yano model or HKY85 [78], the Tamura-Nei model or TN93 [79], and the general time-reversible model or GTR [80] (also called REV). Because substitution rates vary along sequences, two components can be added to these substitution models: a “+I” component that models invariable sites [78] and a “+Γ” component that models among-site rate variation either as a continuous [81] or as a discrete [82] mean-one Γdistribution, the latter being more computationally efficient. Amino acid models can also incorporate a “+F” component so that replacement rates are proportional to the frequencies of both the replaced and resulting residues [83].
Given the variety of substitution models, the first step of any model-based phylogenetic analysis is to select the most appropriate model [84, 85]. The rational for doing so is to balance bias and variance: a highly-parameterized model will describe or fit the data much better than a model that contains a smaller number of parameters; in turn however, each parameter of the highly-parameterized model will be estimated with lower accuracy for a given amount of data (e.g., [86]). Besides, both empirical and simulation studies show that the choice of a wrong substitution model can lead not only to less accurate phylogenetic estimation, but also to inconsistent results [87]. The objective of model selection is therefore not to select the “best-fitting” model, as this one will always be the model with the largest number of parameters, but rather to select the most appropriate model that will achieve the optimal tradeoff between bias and variance. The approach followed by all model selection procedures is therefore to penalize the likelihood of the parameter-rich model for the additional parameters. Because most of the nucleotide substitution models are nested (all can be seen as a special case of GTR +Γ+I), the standard approach to model selection is to perform hierarchical likelihood ratio tests or hLRTs [88]. Note that in all rigor, likelihood ratio tests can also be performed on nonnested models; however, the asymptotic distribution of the test statistic (twice the difference in log-likelihoods) under the null hypothesis (the two models perform equally well) is complicated [89] and quite often impractical. When models are nested, the asymptotic distribution of the test statistic under the null hypothesis is simply a χ 2 distribution whose degree of freedom is the number of additional parameters entering the more complex model (see [90] or [91] for applicability conditions). With the hLRT, then all models are compared in a pairwise manner, by traversing a choice-tree of possible nested models. A number of popular programs allow users to compare pairs of models manually (e.g., PAUP [51], PAML [49, 50]). Readily written scripts that select the most appropriate model among a list of named models also exist, such as ModelTest [92] (which requires PAUP), the R package APE [93], or DAMBE. Free web servers are also available; they are either directly based on ModelTest [94] or implement similar ideas (e.g., FindModel, available at hcv.lanl.gov/content/hcv-db/findmodel/findmodel.html). A similar implementation, ProtTest, exists for protein data [95].
However, performing systematic hLRTs is not the optimal strategy for model selection in phylogenetics [96]. This is because the model that is finally selected can depend on the order in which the pairwise comparisons are performed [97]. The Akaike information criterion (AIC) or its variant developed in the context of regression and time-series analysis in small data sets (AICc, [98]) is commonly used in phylogenetics (e.g., [96]). One advantage of AIC is that it allows nonnested models to be compared, and it is easily implemented. However, in large data sets, both the hLRT and the AIC tend to favor parameter-rich models [99]. A slightly different approach was proposed to overcome this selection bias, the Bayesian information criterion (BIC: [99]), which penalizes more strongly parameter-rich models. All these model selection approaches (AIC, AICc, and BIC) are available in ModelTest and ProtTest. Other procedures exist such as the Decision-Theoretic or DT approach [100]. Although AIC, BIC, and DT are generally based on sound principles, they can in practice select different substitution models [101]. The reason for doing so is not entirely clear, but it is likely due to the data having low-information content. One prediction is that, when these model selection procedures end up with different conclusions, all the selected models will return phylogenies that are not significantly different. It is also possible that applying these different criteria outside of the theoretical context in which they were developed might lead to unexpected behaviors [102]. For instance, AICcwas derived under Gaussian assumptions for linear fixed-effect models [98], and other bias correction terms exist under different assumptions [86].
All the above test procedures compare ratios of likelihood values penalized for an increase in the dimension of one of the models, without directly accounting for uncertainty in the estimates of model parameters. This may be problematic, in particular for small data sets. The Bayesian approach to model selection, called the Bayes factor, directly incorporates this uncertainty. It is also more intuitive as it directly assesses if the data are more probable under a given model than under a different one (e.g., [103]). An extension of this approach makes it possible to select the model not only among the set of named models (JC to GTR) but among all 203 nucleotide substitution models that are possible [104]. An alternative use or interpretation of this approach is to integrate directly over the uncertainty about the substitution model, so that the estimated phylogeny fully accounts for several kinds of uncertainty: about the substitution models, and the parameters entering each of these models. MrBayes (version 3.1.2) [43] implements this feature for amino acid models.
There is an element of circularity in model selection, just as in sequence alignment. In theory, when the hLRT is used for model selection, the topology used for all the computations should be that of the maximum likelihood tree. In practice, model selection is based on an initial topology obtained by a fast algorithm such as neighbor-joining [105, 106] (default setting in ModelTest) or by Weighbor [107] (default setting in FindModel) on JC distances without any correction for among-site rate variation. As mentioned above, it is known that the choice of a wrong model can affect the tree that is estimated, but it is not always clear how the choice of a nonoptimal topology to select the substitution model affects the tree that is finally estimated. Again, this issue with model choice disappears with Bayesian approaches that integrate over all possible time-reversible models as in [104].
2.3. Finding the “best” tree and assessing its support
Once the substitution model is selected, the classical approach proceeds to reconstruct the phylogeny [108]. This is probably one area where phylogenetics has seen mixed progress over the last five years, due to both the combinatorial and the computational complexities of phylogenetic reconstruction.
The combinatorial complexity relates to the extremely large number of tree topologies that are possible with a large number of sequences [109]. For instance, with five sequences, there are 105 rooted topologies, but with ten sequences, this number soars to over 34 million. An exhaustive search for the phylogeny that has the highest probability is therefore not practical even with a moderate number of sequences. Besides, while heuristics exist (e.g., stepwise addition [109]; see [4] for a review), almost none of these is guaranteed to converge on the optimum phylogenetic tree. The common practice is then to use one of these heuristics to find a good starting tree, and then modify repeatedly its topology more or less dramatically to explore its neighborhood for better trees until a stopping rule is satisfied [110]. The art here is in designing efficient tree perturbation methods that adaptively strike a balance between large topological modifications (that almost always lead to a very different tree with a poor score) and small modifications (that almost always lead to an extremely similar tree with lower score). Some of today's challenges are about choosing between methods that successfully explore large numbers of trees but that can be costly in terms of computing time [110], and methods that are faster but may miss some interesting trees [53]. Several programs such as Leaphy, PhyML, and GARLI[41] are among the best-performing software in a maximum likelihood setting. In a Bayesian framework, the basic perturbation schemes were described early [36] and recently updated [111]. Three popular programs are MrBayes, BAMBE [36], and BEAST [39]. Among all these programs and approaches, PHYML, GARLI, and BEAST are probably among the most efficient programs in terms of computational speed, handling of large data sets and thoroughness of the tree search.
A first aspect of the computational complexity relates to estimating the support of a reconstructed phylogeny. This is more complicated than estimating a confidence interval for a real-valued parameter such as a branch length, because a tree topology is a graph and not a number. The classical approach therefore relies on a nonstandard use of the bootstrap [112]. However, the interpretation of the bootstrap is contentious. Bootstrap proportions P can be perceived as testing the correctness of internal nodes, and failing to do so [113], or 1–P can be interpreted as a conservative probability of falsely supporting monophyly [114]. Since bootstrap proportions are either too liberal or too conservative depending on the exact interpretation given to these values [115], it is difficult to adjust the threshold below which monophyly can be confidently ruled out [116]. Alternatively, an intuitive geometric argument was proposed to explain the conservativeness of bootstrap probabilities [117], but the workaround was never actually used in the community or implemented in any popular software. The introduction of Bayesian approaches in the late 1990s [36, 118] suggested a novel approach to estimate phylogenetic support with posterior probabilities. Clade or bipartition posterior probabilities can be relatively fast to compute, even for large data sets analyzed under complicated substitution models [119]. As in model selection, they have a clear interpretation as they measure the probability that a clade is correct, given the data and the model. But as with bootstrap probabilities, some controversies exist. Early empirical studies found that posterior probabilities of highly supported nodes were much larger than bootstrap probabilities [120], and subsequent simulation studies supported this observation (e.g., [121–124]). Some of these differences can be attributed to an artifact of the simulation scheme that was employed [125], but more specific empirical and simulation studies show that prior specifications can dramatically impact posterior probabilities for trees and clades [115, 126, 127]. In the simplest case, the analysis of simulated star trees with four sequences fails to give the expected three unrooted topologies with equal probability (1/3, 1/3, 1/3) but returns large posterior probabilities for an arbitrary topology [115, 126], even when infinitely long sequences are used [128, 129] ([130]). This phenomenon, called the star-tree paradox [126], seems to disappear when polytomies are assigned nonzero prior probabilities and when nonuniform priors force internal branch length towards zero [129]. The second issue surrounding Bayesian phylogenetic methods is about their convergence rate. A theoretical study shows that extremely simple Markov chain Monte Carlo (MCMC) samplers, the technique used to estimate posterior probabilities, could take an extremely long time to converge [131]. In practice, however, MCMC samplers such as those implemented in MrBayes are much more sophisticated. In particular, they include different types of moves [111] and use tempering, where some of the chains of a single run are heated, to improve mixing [43]. As a result, it is unclear whether they suffer from extremely long convergence times. It is also expected that current convergence diagnostic tools such as those implemented in MrBayes would reveal convergence problems [132]. Finally, it is also argued that these controversies such as exaggerated clade support, inconsistently biased priors, and the impossibility of hypothesis testing disappear altogether when posterior probabilities at internal nodes are abandoned in favor of posterior probabilities for topologies [133] (see Section 2.4 below).
The most fundamental aspect of the computational complexity in phylogenetics is due to the structure of the phylogenies: these are trees or binary graphs on which computations are nested and interdependent, which makes these computations intractable or NP-hard [134]. As a result, it is difficult to adopt an efficient “divide and conquer” approach, where a large complicated problem would be split into small simpler tasks, and to take advantage of today's commodity computing by distributing the computation over multicore architectures or heterogeneous computer clusters. Current strategies are limited to distributing the computation of the discrete rate categories (when using a “+Γ” substitution model) and part of the search algorithm [54], or simply to distribute different maximum likelihood bootstrap replicates [53, 54] or different MCMC samplers to available processors [44].
2.4. Comparisons of tree topologies
Science proceeds by testing hypotheses, and it is often necessary to compare phylogenies, for instance to test whether a given data set supports the early divergence of gymnosperms with respect to Gnetales and angiosperms (the anthophyte hypothesis), or whether the Gnetales diverged first (the Gnetales hypothesis) [135, 136]. Because of the importance of comparing phylogenies, a number of tests of molecular phylogenies were developed early. The KH test was first developed to compare two random trees [137]. However, this test is invalid if one of the trees is the maximum likelihood tree [138]. In this case, the SH test should be used [139]. Because the SH test can be very conservative, an approximately unbiased version was developed: the AU test [140]. PAUP and PAML only implement the KH and SH tests; CONSEL [40] also implements the AU test. A Bayesian version of these tests also exists [141], but the computations are more demanding.
Indeed, the Bayesian approach to hypothesis testing relies on computing the probability of the data under a particular model. This quantity is usually not available as a close-form equation, and it must be approximated numerically. The most straightforward approximation is based on the harmonic mean of the likelihood sampled from the posterior distribution [142]. This approximation was described several times in the context of phylogenies [141, 143] and is available from most Bayesian programs such as MrBayes or BEAST. However, the approximation is extremely sensitive to the behavior of the MCMC sampler [52, 142]: if extremely low-likelihood values happen to be sampled from the posterior distribution, the harmonic mean will be dramatically affected. To date, a couple of more robust approximations have been described and were shown to be preferable to the harmonic mean estimator [52]. The first is based on thermodynamic integration [52] and is available in PhyloBayes (see Table 1). The second approximation [144] is based on a more direct computation [145], but its availability is currently limited to one specific model of evolution.
2.5. More realistic models
While model selection is fully justified on the ground of the bias-variance tradeoff, it should not be forgotten that all these models are simplified representations of the actual substitution process and are all therefore wrong. Stated differently, if AIC selects the GTR +Γ+I to analyze a data set, it should be clear that this conclusion does not imply that the data evolved under this model. All model selection procedures measure a relative model fit. One way to estimate adequacy or absolute model fit is to perform a parametric bootstrap test [146]: first, the selected model is compared with a multinomial model by means of a LRT whose test statistic is s (twice the log-likelihood difference); the following steps determine the distribution of s under the null hypothesis that the selected model was the generating model; second, the selected model is used to simulate a large number of data sets; third, the model selection procedure (LRT) is repeated on each simulated data set, and the corresponding test statistics s* are recorded; fourth, the P-value is estimated as the number of times, the simulated s* test statistics are more extreme (>, for a one-sided test) than the original value of s. The results of such tests suggest that the selected substitution model is generally not an adequate representation of the actual substitution process [85]. Of course, we do not need a model that incorporates all the minute biological features of evolutionary processes. As argued repeatedly (e.g., [147]), we need useful models that capture enough of reality of substitution processes to make accurate predictions and avoid systematic biases such as long-branch attraction [148].
More realistic models are obtained by accommodating heterogeneities in the evolutionary process at the level of both sites (space) and lineages (time). The simplest site-heterogeneous model is one, where the aligned data are partitioned, usually based on some prior information. For instance, first and second codon positions are known to evolve slower than third codon positions in protein-coding genes, or exposed residues might evolve faster than buried amino acids in globular proteins. A number of models were suggested to analyze such partitioned data sets (e.g., [149]); these models are implemented in most general-purpose software (e.g., PAML, PAUP, MrBayes) and can be combined with a “+Γ+I” component. A different approach consists in considering that sites can be binned in a number of rate categories; the use of a Dirichlet prior process then makes it possible both to determine the appropriate number of categories and to assign sites to these categories; the application of this method to protein-coding genes was able to recover the underlying codon structure of these genes [150]. However, several studies suggest that evolutionary patterns can be as heterogeneous within a priori partitions as among partitions [37, 151].
Lineage-heterogeneous models or heterotachous models [152] have attracted more attention. In one such approach, different models of evolution are assigned to the different branches of the tree [153], which can make these models extremely parameter-rich. Such a large number of parameters can potentially affect the accuracy of the phylogenetic inference (see the “bias-variance tradeoff” above) and present computational issues (long running times, large memory requirements, and convergence issues). Several simplifications can be made. One assumes that some sets of branches evolve under a particular process [153]. But now these branches must be assigned a priori, and both the determination of the number of sets and their placement on the tree can be difficult (but see Section 4 below for a solution to a similar question). At the other end of the spectrum of heterotachous models lies the simplest model known as the covarion model [154], where sites can either be variable along a branch, or not, and can switch between these two categories across time (e.g., [155], also described in a Bayesian framework [156]).
Between these two extremes are mixture models, which extend the covarion model by allowing more categories of sites. A number of formulations exist, where each site is assumed to have been generated by either several sets of branch lengths [157, 158] or by several rate matrices [37, 96, 151]. One particularity of these models is that they give a semiparametric perspective to the phylogenetic estimation: if a single simple model cannot approximate a complex substitution process, the hope is that mixing several simple substitution models makes our models more realistic. In some applications, mixture models can also be used to avoid underestimating uncertainty, first when choosing a single model of evolution and then ignoring this uncertainty when estimating the phylogeny. The mixing therefore involves fitting at each site several sets of branch lengths, or several substitution models to the data, and combining these models using a certain weighting scheme. The difference between the numerous mixture models that have been described lies in the choice of the weight factors, and how these are obtained. In one approach, known as model averaging, the weights are determined a priori. A first possibility is to assume that all the models are equally probable, which does not work with an infinite number of models (individual weights are zero in this case). More critically in phylogenetics, this assumption is not coherent for nested models since larger models should be more likely than each submodel. A second possibility is to weight the models with respect to their probability of being the generating model given the data. For practical purposes, this posterior probability can be approximated by Akaike weights [96]. The difficulty here is that model averaging requires analyzing the data even for models that, a posteriori, turn out to have extremely small probabilities or weights. This may be seen as a waste of resources (computing time and storage space).
2.6. Integrated Bayesian approaches
Mixture models can work within the framework of maximum likelihood, but the treatment of the weight factors is complicated. A sound alternative is to resort to a fully Bayesian approach. A prior distribution is set on the weight factors, and a special form of MCMC sampler whose Markov chain moves across models with different numbers of parameters, a reversible-jump MCMC sampler (RJ-MCMC), is constructed. The advantage of RJ-MCMC samplers is that they allow estimating the phylogeny while integrating over the uncertainty pertaining to the parameters of the substitution model and even integrating over the model itself [104]. Mixture models are available in BayesPhylogenies [37] for nucleotide models. Another Bayesian mixture model, named CAT for CATegories, was developed to analyze amino acid alignments. The CAT model recently proved successful in a number of empirical [159, 160] and simulation [161] studies in avoiding the artifact known as long-branch attraction [148]. This model is freely available in the PhyloBayes software (see Table 1).
All these models assume that each site evolve independently. The independence assumption greatly simplifies the computations, but is also highly unrealistic. Models that describe the evolution of doublets in RNA genes [162], triplets in codon models [163, 164], or other models with local or context dependencies [165–167] exist, but complete dependence models are still in their infancy and, so far, have only been implemented in a Bayesian framework [168, 169]. One particularly interesting feature of this approach is that complete dependence models incorporate information about the three-dimensional (3D) structure of proteins and therefore permit the explicit modeling of structural constraints or of any other site-interdependence pattern [170]. The incorporation of 3D structures also allows the establishment of a direct relationship between evolution at the DNA level and at the phenotypic level. This link between genotype and phenotype is established via a proxy that plays the role of a fitness function which, in retrospect, can be used to predict amino-acid sequences compatible with a given target structure, that is, to help in protein design [171].
3. DETECTING POSITIVE SELECTION
Fitness functions are however difficult to determine at the molecular level. In addition, while examples of adaptive evolution at the morphological level abound, from Darwin's finches in the Galapagos [172] to cichlid fishes in the East African lakes [173], the role of natural selection in shaping the evolution of genomes is much more controversial [147, 174]. First, the neutral theory of molecular evolution asserts that much of the variation at the DNA level is due to the random fixation of mutations with no selective advantage [175]. Second, a compelling body of evidence suggests that most of the genomic complexities have emerged by nonadaptive processes [176]. A number of statistical approaches exist either to test neutrality at the population level or to detect positive Darwinian evolution at the species level [147]. A shortcoming of neutrality tests is their dependence on a demographic model [177] and their sensitivity to processes of molecular evolution such as among-site rate variation [178]. They also do not model alternative hypotheses that would permit distinguishing negative selection from adaptive evolution. The development of demographic models based on Poisson random fields [179] and composite likelihoods [180] makes it possible both to estimate the strength of selection and to assess the impact of a variety of scenarios on allele frequency spectra [9]. But demographic singularities such as bottlenecks can still generate spurious signatures of positive selection [180, 181].
When effective population sizes are no longer a concern, for instance in studies at or above the species level, the detection of positive selection in protein-coding genes usually relies on codon models [163, 164] (see [182] for a review including methods based on amino-acid models). Codon models permit distinguishing between synonymous substitutions, which are likely to be neutral, and nonsynonymous substitutions, which are directly exposed to the action of selection. If synonymous and nonsynonymous substitutions accumulate at the same rate, then the protein-coding gene is likely to evolve neutrally. Alternatively, if nonsynonymous substitutions accumulate slower than synonymous substitutions, it must be because nonsynonymous substitutions are deleterious and this suggests the action of purifying selection. Conversely, the accumulation of nonsynonymous substitutions faster than synonymous substitutions suggests the action of positive selection. The nonsynonymous to synonymous rate ratio, denoted ω = d N/d S, is therefore interpreted as a measure of selection at the protein level, with ω = 1, <1 and >1 indicating neutral evolution, negative or positive selection, respectively. This ratio is also denoted K a/K s, in particular in studies that rely on counts of nonsynonymous and synonymous sites (e.g., [183]). An extension exists to detect selection in noncoding regions [184], and a promising phylogenetic hidden Markov or phylo-HMM model permits detection of selection in overlapping genes [185].
These rate ratios can be estimated by a number of methods implemented in MEGA, DAMBE, HyPhy [42], and PAML. The most intuitive methods, called counting methods, work in three steps: (i) count synonymous and nonsynonymous sites, (ii) count the observed differences at these sites, and (iii) apply corrections for multiple substitutions [186]. Counting methods are however not optimal in the sense that most work on pairs of sequences and therefore, just like neighbor-joining, fail to account for all the information contained in an alignment. In addition, simulations suggest that counting methods can be sensitive to a variety of biases such as unequal transition and transversion rates, or uneven base, or codon frequencies [187]. Counting methods that incorporate these biases perform generally better than those that do not, but the maximum likelihood method still appears more robust to sever biases [187]. In addition, the maximum likelihood method that accounts for all the information in a data set has good power and good accuracy to detect positive selection [188, 189].
However, the first studies using these methods found little evidence for adaptive evolution essentially because they were averaging ω rate ratios over both lineages and sites [147]. Branch models were then developed [190, 191] quickly followed by site models [192–196] and by branch-site models [189, 197]. All these approaches, as implemented in PAML, rely on likelihood ratio tests to detect adaptive evolution: a model where adaptive evolution is permitted is compared with a null model where ω cannot be greater than one. Simulations show that some of these tests are conservative [189], so that detection of adaptive evolution should be safe as long as convergence of the analyses is carefully checked [198], including in large-scale analyses [199]. If the model allowing adaptive evolution explains the data significantly better than the null model, then an empirical Bayes approach can be used to identify which sites are likely to evolve adaptively [192]. The empirical Bayes approach relies on estimates of the model parameters, which can have large sampling errors in small data sets. Because these sampling errors can cause the empirical Bayes site identification to be unreliable [200], a Bayes empirical Bayes approach was proposed and was shown to have good power and low-false positive rates [201]. Full Bayesian approaches that allow for uncertain parameter estimates were also proposed [202]. Yet, simulations showed that they did not improve further on Bayes empirical Bayes estimates [203], so that the computational overhead incurred by full Bayes methods may not be necessary in this case. One particular case, where a Bayesian approach is however required, is to tell the signature of adaptive evolution from that of recombination, as these two processes can leave similar signals in DNA sequences. Indeed, simulations show that recombination can lead to false positive rates as large as 90% when trying to detect adaptive evolution [204]. The codon model with recombination implemented in OmegaMap [48] can then be used to tease apart these two processes (e.g., see [205]).
4. ESTIMATING DIVERGENCE TIMES BETWEEN SPECIES
The estimation of the dates when species diverged is often perceived to be as important as estimating the phylogeny itself. This explains why so-called “dating methods” were first wished for when molecular phylogenies were first reconstructed [206]. In spite of over four decades of history, molecular dating has only recently seen new developments. One of the reasons for this slow progress is that, unlike the other parts of phylogenetic analysis, divergence times are parameters that cannot be estimated directly. Only sitewise likelihood values and distances between pairs of sequences are identifiable, that is, directly estimable. Distances are expressed as a number of substitutions per site (sub/site) and can be decomposed as the product of two quantities: a rate of evolution (sub/site/unit of time) and a time duration (unit of time). As a result, time durations and, likewise, divergence times cannot be estimated without making an additional assumption on the rates of evolution. The simplest assumption is to posit that rates are constant in time, which is known as the molecular clock hypothesis [207]. This hypothesis can be tested, for instance, with PAUP or PAML, by means of a likelihood ratio test that compares a constrained model (clock) with an unconstrained model (no clock). These two models are nested, so that twice the log-likelihood difference asymptotically follows a χ 2 distribution. If n sequences are analyzed, the constrained model estimates n − 1 divergence times, while the unconstrained model estimates 2n − 3 branch lengths. The degree of freedom of this test is then (2n − 3) − (n − 1) = n − 2 [4]. The systematic test of the molecular clock assumption on recent data shows that this hypothesis is too often untenable [208].
The most recent work has then focused on relaxing this assumption, and three different directions have emerged [209]. A first possibility is to relax the clock globally on the phylogeny, but to assume that the hypothesis still holds locally for closely related species [210–212]. Recent developments of these local clock models now allow the use of multiple calibration points and of multiple genes [213], the automatic placement of the clocks on the tree [214] and the estimation of the number of local clocks [209]. PAML can be used for most of these computations. However, local clock models still tend to underestimate rapid rate change [209]. The second possibility to relax the global clock assumption is to assume that rates of evolution evolve in an autocorrelated manner along lineages and to minimize the amount of rate change over the entire phylogeny. The most popular approach in the plant community is Sanderson's penalized likelihood [215], implemented in r8s [55]. This approach performs well on data sets for which the actual fossil dates are known [216] but still tends to underestimate the actual amount of rate change [209].
Bayesian methods appear today as the emerging approach to estimate divergence times. Taking inspiration from Sanderson's pioneering work [217], Thorne et al. developed a Bayesian framework where rates of evolution change in an autocorrelated manner across lineages [45–47]: the rate of evolution of a branch depends on the rate of evolution of its parental branch; the branches emanating from the root require a special treatment. These Bayesian models work by modeling how rates of evolution change in time (rate prior), and how the speciation/population process shapes the distribution of divergence times (speciation prior). These prior distributions can actually be interpreted as penalty functions [45, 209], and they can have simple or more complicated forms [218]. The Multidivtime program [45–47] is extremely quick to analyze data thanks to the use of a multivariate normal approximation of the likelihood surface. It assumes that rates of evolution change following a stationary lognormal prior distribution. Further work suggested that it might not always be the best performing rate prior [218–220], but these latter studies had two potential shortcomings: (i) they were based on a speciation prior that was so strong that it biased divergence times towards the age of the fossil root [219, 221], and (ii) they used a statistical procedure, the posterior Bayes factor [222], that is potentially inconsistent. One potential limitation of the Bayesian approach described so far is its dependence on one single tree topology, which must be either known ahead of time or estimated by other means. Recently, Drummond et al. found a way to relax this requirement by positing that rates of evolution are uncorrelated across lineages, while all the branches of the tree are constrained to follow exactly the same rate prior [223]. As a result, their approach is able to estimate the most probable tree (given the data and the substitution model), the divergence times and the position of the root even without any outgroup or without resorting to a nonreversible model of substitution [224]. Drummond et al. further argue that the use of explicit models of rate variation over time might contribute to improved phylogenetic inference [223]. In addition, when the focus is on estimating divergence times, a recent analysis suggests that this uncorrelated model of rate change could outperform the methods described above to accommodate rapid rate change among lineages [209]. Implemented in BEAST, this approach offers a variety of substitution models and prior distributions and presents a graphic user interface that will appeal to numerous researchers [39].
5. CHALLENGES AND PERSPECTIVES
With the advent of high-throughput sequencing technologies such as the whole-genome shotgun approach by pyrosequencing [225], fast, cheap, and accurate genomic information is becoming available for a growing number of species [226]. If low coverage limits the complete assembly of many genome projects, it still allows the quick access to draft genomes for a growing number of species [227]. As a result, phylogenetic inference can now incorporate large numbers of expressed sequence tags (ESTs), genes [228], and occasionally complete genomes [229]. The motivation for developing these so-called phylogenomic approaches is their presumed ability to return fully resolved and well-supported trees by decreasing both sampling errors [230] and misleading signals due for instance to horizontal gene transfer [231] or to hidden paralogy [232]. In practice, these large-scale studies can give the impression that incongruence is resolved [228], but they also can fail to address systematic errors due to the use of too simple models [233]. If the genes incorporated in phylogenomic studies are often concatenated to limit the number of parameters entering the model, it remains important to allow sitewise heterogeneities [234]. If partition models can reduce systematic biases [234], Bayesian mixture models such as CAT [151] appear to be robust to long-branch attraction [159], a rampant issue in phylogenomics [235]. All together, the accumulation of genomic data and these latest methodological developments seem to make the reconstruction of the tree of life finally within reach. In comparison, dating the tree of life is still in its infancy, even if a number of initiatives such as the TimeTree server are being developed [236]. These resources are limited to some vertebrates but will hopefully soon be extended to include other large taxonomic groups such as plants. To achieve this goal, however, phylogenetic studies should systematically incorporate divergence times, as is now routine in some research communities (e.g., [237]). This joint estimation of time and trees is today facilitated by the availability of user-friendly programs such as BEAST. The near future will probably see the development of mixture models for molecular dating and more sophisticated models that integrate most of the topics discussed here from sequence alignment to detection of sites under selection into one single but yet user-friendly [238] toolbox.
ACKNOWLEDGMENTS
Jeff Thorne provided insightful comments and suggestions, and two anonymous reviewers helped in improving the original manuscript. Support was provided by the Natural Sciences Research Council of Canada (DG-311625 to SAB and DG-261252 to XX).
References
- 1.Darwin C. On the Origin of Species by Means of Natural Selection. London, UK: J. Murray; 1859. [Google Scholar]
- 2.Sokal RR, Sneath PHA. Principles of Numerical Taxonomy. San Francisco, Calif, USA: W. H. Freeman; 1963. [Google Scholar]
- 3.Cavalli-Sforza LL, Barrai I, Edwards AW. Analysis of human evolution under random genetic drift. Cold Spring Harbor Symposia on Quantitative Biology. 1964;29:9–20. doi: 10.1101/sqb.1964.029.01.006. [DOI] [PubMed] [Google Scholar]
- 4.Felsenstein J. Inferring Phylogenies. Sunderland, Mass, USA: Sinauer Associates; 2004. [Google Scholar]
- 5.Glenner H, Hansen AJ, Sørensen MV, Ronquist F, Huelsenbeck JP, Willerslev E. Bayesian inference of the metazoan phylogeny: a combined molecular and morphological approach. Current Biology. 2004;14(18):1644–1649. doi: 10.1016/j.cub.2004.09.027. [DOI] [PubMed] [Google Scholar]
- 6.Pfeil BE, Schlueter JA, Shoemaker RC, Doyle JJ. Placing paleopolyploidy in relation to taxon divergence: a phylogenetic analysis in legumes using 39 gene families. Systematic Biology. 2005;54(3):441–454. doi: 10.1080/10635150590945359. [DOI] [PubMed] [Google Scholar]
- 7.Chare ER, Holmes EC. A phylogenetic survey of recombination frequency in plant RNA viruses. Archives of Virology. 2006;151(5):933–946. doi: 10.1007/s00705-005-0675-x. [DOI] [PubMed] [Google Scholar]
- 8.Philippe H, Douady CJ. Horizontal gene transfer and phylogenetics. Current Opinion in Microbiology. 2003;6(5):498–505. doi: 10.1016/j.mib.2003.09.008. [DOI] [PubMed] [Google Scholar]
- 9.Nielsen R, Bustamante C, Clark AG, et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biology. 2005;3(6):e170. doi: 10.1371/journal.pbio.0030170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ramírez SR, Gravendeel B, Singer RB, Marshall CR, Pierce NE. Dating the origin of the Orchidaceae from a fossil orchid with its pollinator. Nature. 2007;448(7157):1042–1045. doi: 10.1038/nature06039. [DOI] [PubMed] [Google Scholar]
- 11.Knight RD, Freeland SJ, Landweber LF. Rewiring the keyboard: evolvability of the genetic code. Nature Reviews Genetics. 2001;2(1):49–58. doi: 10.1038/35047500. [DOI] [PubMed] [Google Scholar]
- 12.Antonovics J, Hood ME, Baker CH. Molecular virology: was the 1918 flu avian in origin? Nature. 2006;440(7088):E9. doi: 10.1038/nature04824. discussion E9-10. [DOI] [PubMed] [Google Scholar]
- 13.Jackson AP, Charleston MA. A cophylogenetic perspective of RNA-virus evolution. Molecular Biology and Evolution. 2004;21(1):45–57. doi: 10.1093/molbev/msg232. [DOI] [PubMed] [Google Scholar]
- 14.Huelsenbeck JP, Rannala B, Larget B. A Bayesian framework for the analysis of cospeciation. Evolution. 2000;54(2):352–364. doi: 10.1111/j.0014-3820.2000.tb00039.x. [DOI] [PubMed] [Google Scholar]
- 15.Hajibabaei M, Singer GAC, Hebert PDN, Hickey DA. DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends in Genetics. 2007;23(4):167–172. doi: 10.1016/j.tig.2007.02.001. [DOI] [PubMed] [Google Scholar]
- 16.Luo S-J, Kim J-H, Johnson WE, et al. Phylogeography and genetic ancestry of tigers (Panthera tigris) PLoS Biology. 2004;2(12):e442. doi: 10.1371/journal.pbio.0020442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Howe CJ, Barbrook AC, Spencer M, Robinson P, Bordalejo B, Mooney LR. Manuscript evolution. Endeavour. 2001;25(3):121–126. doi: 10.1016/s0160-9327(00)01367-3. [DOI] [PubMed] [Google Scholar]
- 18.Gray RD, Atkinson QD. Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature. 2003;426(6965):435–439. doi: 10.1038/nature02029. [DOI] [PubMed] [Google Scholar]
- 19.Hillis DM, Huelsenbeck JP. Support for dental HIV transmission. Nature. 1994;369(6475):24–25. doi: 10.1038/369024a0. [DOI] [PubMed] [Google Scholar]
- 20.Salas A, Bandelt H-J, Macaulay V, Richards MB. Phylogeographic investigations: the role of trees in forensic genetics. Forensic Science International. 2007;168(1):1–13. doi: 10.1016/j.forsciint.2006.05.037. [DOI] [PubMed] [Google Scholar]
- 21.Sankoff D, Nadeau JH. Chromosome rearrangements in evolution: from gene order to genome sequence and back. Proceedings of the National Academy of Sciences of the United States of America. 2003;100(20):11188–11189. doi: 10.1073/pnas.2035002100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Swofford DL, Waddell PJ, Huelsenbeck JP, Foster PG, Lewis PO, Rogers JS. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Systematic Biology. 2001;50(4):525–539. [PubMed] [Google Scholar]
- 23.Holder M, Lewis PO. Phylogeny estimation: traditional and Bayesian approaches. Nature Reviews Genetics. 2003;4(4):275–284. doi: 10.1038/nrg1044. [DOI] [PubMed] [Google Scholar]
- 24.Siepel A, Haussler D. Phylogenetic hidden Markov models. In: Nielsen R, editor. Statistical Methods in Molecular Evolution. New York, NY, USA: Springer; 2005. pp. 325–351. [Google Scholar]
- 25.Pagel M, Meade A. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. American Naturalist. 2006;167(6):808–825. doi: 10.1086/503444. [DOI] [PubMed] [Google Scholar]
- 26.Kumar S, Filipski A. Multiple sequence alignment: in pursuit of homologous DNA positions. Genome Research. 2007;17(2):127–135. doi: 10.1101/gr.5232407. [DOI] [PubMed] [Google Scholar]
- 27.Notredame C. Recent evolutions of multiple sequence alignment algorithms. PLoS Computational Biology. 2007;3(8):e123. doi: 10.1371/journal.pcbi.0030123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Edgar RC, Batzoglou S. Multiple sequence alignment. Current Opinion in Structural Biology. 2006;16(3):368–373. doi: 10.1016/j.sbi.2006.04.004. [DOI] [PubMed] [Google Scholar]
- 29.Xia X, Xie Z. DAMBE: software package for data analysis in molecular biology and evolution. Journal of Heredity. 2001;92(4):371–373. doi: 10.1093/jhered/92.4.371. [DOI] [PubMed] [Google Scholar]
- 30.Kumar S, Tamura K, Nei M. MEGA: molecular evolutionary genetics analysis software for microcomputers. Computer Applications in the Biosciences. 1994;10(2):189–191. doi: 10.1093/bioinformatics/10.2.189. [DOI] [PubMed] [Google Scholar]
- 31.Tamura K, Dudley J, Nei M, Kumar S. MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Molecular Biology and Evolution. 2007;24(8):1596–1599. doi: 10.1093/molbev/msm092. [DOI] [PubMed] [Google Scholar]
- 32.Do CB, Mahabhashyam MSP, Brudno M, Batzoglou S. ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Research. 2005;15(2):330–340. doi: 10.1101/gr.2821705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Wallace IM, Blackshields G, Higgins DG. Multiple sequence alignments. Current Opinion in Structural Biology. 2005;15(3):261–266. doi: 10.1016/j.sbi.2005.04.002. [DOI] [PubMed] [Google Scholar]
- 34.Wallace IM, O'Sullivan O, Higgins DG, Notredame C. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Research. 2006;34(6):1692–1699. doi: 10.1093/nar/gkl091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Hall BG. Phylogenetic Trees Made Easy: A How-to Manual. Sunderland, Mass, USA: Sinauer Associates; 2008. [Google Scholar]
- 36.Larget B, Simon DL. Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Molecular Biology and Evolution. 1999;16(6):750–759. [Google Scholar]
- 37.Pagel M, Meade A. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Systematic Biology. 2004;53(4):571–581. doi: 10.1080/10635150490468675. [DOI] [PubMed] [Google Scholar]
- 38.Redelings BD, Suchard MA. Joint Bayesian estimation of alignment and phylogeny. Systematic Biology. 2005;54(3):401–418. doi: 10.1080/10635150590947041. [DOI] [PubMed] [Google Scholar]
- 39.Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology. 2007;7, article 214:1–8. doi: 10.1186/1471-2148-7-214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Shimodaira H, Hasegawa M. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics. 2001;17(12):1246–1247. doi: 10.1093/bioinformatics/17.12.1246. [DOI] [PubMed] [Google Scholar]
- 41.Zwickl D. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Austin, Tex, USA: Ph.D. thesis, University of Texas at Austin; 2006. [Google Scholar]
- 42.Kosakovsky Pond SL, Frost SDW, Muse SV. HyPhy: hypothesis testing using phylogenies. Bioinformatics. 2005;21(5):676–679. doi: 10.1093/bioinformatics/bti079. [DOI] [PubMed] [Google Scholar]
- 43.Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19(12):1572–1574. doi: 10.1093/bioinformatics/btg180. [DOI] [PubMed] [Google Scholar]
- 44.Altekar G, Dwarkadas S, Huelsenbeck JP, Ronquist F. Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics. 2004;20(3):407–415. doi: 10.1093/bioinformatics/btg427. [DOI] [PubMed] [Google Scholar]
- 45.Thorne JL, Kishino H, Painter IS. Estimating the rate of evolution of the rate of molecular evolution. Molecular Biology and Evolution. 1998;15(12):1647–1657. doi: 10.1093/oxfordjournals.molbev.a025892. [DOI] [PubMed] [Google Scholar]
- 46.Kishino H, Thorne JL, Bruno WJ. Performance of a divergence time estimation method under a probabilistic model of rate evolution. Molecular Biology and Evolution. 2001;18(3):352–361. doi: 10.1093/oxfordjournals.molbev.a003811. [DOI] [PubMed] [Google Scholar]
- 47.Thorne JL, Kishino H. Divergence time and evolutionary rate estimation with multilocus data. Systematic Biology. 2002;51(5):689–702. doi: 10.1080/10635150290102456. [DOI] [PubMed] [Google Scholar]
- 48.Wilson DJ, McVean G. Estimating diversifying selection and functional constraint in the presence of recombination. Genetics. 2006;172(3):1411–1425. doi: 10.1534/genetics.105.044917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Computer Applications in the Biosciences. 1997;13(5):555–556. doi: 10.1093/bioinformatics/13.5.555. [DOI] [PubMed] [Google Scholar]
- 50.Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution. 2007;24(8):1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
- 51.Swofford DL. 10th edition. Sunderland, Mass, USA: Sinauer Associates; 2002. PAUP* : Phylogenetic Analysis Using Parsimony (and other Methods) 4.0 Beta. [Google Scholar]
- 52.Lartillot N, Philippe H. Computing Bayes factors using thermodynamic integration. Systematic Biology. 2006;55(2):195–207. doi: 10.1080/10635150500433722. [DOI] [PubMed] [Google Scholar]
- 53.Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology. 2003;52(5):696–704. doi: 10.1080/10635150390235520. [DOI] [PubMed] [Google Scholar]
- 54.Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22(21):2688–2690. doi: 10.1093/bioinformatics/btl446. [DOI] [PubMed] [Google Scholar]
- 55.Sanderson MJ. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics. 2003;19(2):301–302. doi: 10.1093/bioinformatics/19.2.301. [DOI] [PubMed] [Google Scholar]
- 56.Kjer KM. Use of rRNA secondary structure in phylogenetic studies to identify homologous positions: an example of alignment and data presentation from the frogs. Molecular Phylogenetics and Evolution. 1995;4(3):314–330. doi: 10.1006/mpev.1995.1028. [DOI] [PubMed] [Google Scholar]
- 57.Notredame C, O'Brien EA, Higgins DG. RAGA: RNA sequence alignment by genetic algorithm. Nucleic Acids Research. 1997;25(22):4570–4580. doi: 10.1093/nar/25.22.4570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Hickson RE, Simon C, Perrey SW. The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence. Molecular Biology and Evolution. 2000;17(4):530–539. doi: 10.1093/oxfordjournals.molbev.a026333. [DOI] [PubMed] [Google Scholar]
- 59.Xia X. Phylogenetic relationship among horseshoe crab species: effect of substitution models on phylogenetic analyses. Systematic Biology. 2000;49(1):87–100. doi: 10.1080/10635150050207401. [DOI] [PubMed] [Google Scholar]
- 60.Xia X, Xie Z, Kjer KM. 18S ribosomal RNA and tetrapod phylogeny. Systematic Biology. 2003;52(3):283–295. doi: 10.1080/10635150390196948. [DOI] [PubMed] [Google Scholar]
- 61.Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research. 1994;22(22):4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Larkin MA, Blackshields G, Brown NP, et al. Clustal W and clustal X version 2.0. Bioinformatics. 2007;23(21):2947–2948. doi: 10.1093/bioinformatics/btm404. [DOI] [PubMed] [Google Scholar]
- 63.Golubchik T, Wise MJ, Easteal S, Jermiin LS. Mind the gaps: evidence of bias in estimates of multiple sequence alignments. Molecular Biology and Evolution. 2007;24(11):2433–2442. doi: 10.1093/molbev/msm176. [DOI] [PubMed] [Google Scholar]
- 64.Landan G, Graur D. Heads or tails: a simple reliability check for multiple sequence alignments. Molecular Biology and Evolution. 2007;24(6):1380–1383. doi: 10.1093/molbev/msm060. [DOI] [PubMed] [Google Scholar]
- 65.Schwartz AS, Myers EW, Pachter L. Alignment metric accuracy. http://arxiv.org/abs/q-bio.QM/0510052, 2005. [Google Scholar]
- 66.Zhu J, Liu JS, Lawrence CE. Bayesian adaptive sequence alignment algorithms. Bioinformatics. 1998;14(1):25–39. doi: 10.1093/bioinformatics/14.1.25. [DOI] [PubMed] [Google Scholar]
- 67.Holmes I, Bruno WJ. Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics. 2001;17(9):803–820. doi: 10.1093/bioinformatics/17.9.803. [DOI] [PubMed] [Google Scholar]
- 68.Jensen JL, Hein J. Gibbs sampler for statistical multiple alignment. Statistica Sinica. 2005;15(4):889–907. [Google Scholar]
- 69.Clamp M, Cuff J, Searle SM, Barton GJ. The Jalview Java alignment editor. Bioinformatics. 2004;20(3):426–427. doi: 10.1093/bioinformatics/btg430. [DOI] [PubMed] [Google Scholar]
- 70.Sankoff D, Cedergren R. Simultaneous comparison of three or more sequences related by a tree. In: Sankoff D, Cedergren R, editors. Time Wraps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Reading, Mass, USA: Addison-Wesley; 1983. pp. 253–264. [Google Scholar]
- 71.Hein J. A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. Molecular Biology and Evolution. 1989;6(6):649–668. doi: 10.1093/oxfordjournals.molbev.a040577. [DOI] [PubMed] [Google Scholar]
- 72.Lunter G, Miklós I, Drummond A, Jensen JL, Hein J. Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics. 2005;6, article 83:1–10. doi: 10.1186/1471-2105-6-83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Wong KM, Suchard MA, Huelsenbeck JP. Alignment uncertainty and genomic analysis. Science. 2008;319(5862):473–476. doi: 10.1126/science.1151532. [DOI] [PubMed] [Google Scholar]
- 74.Lunter G, Rocco A, Mimouni N, Heger A, Caldeira A, Hein J. Uncertainty in homology inferences: assessing and improving genomic sequence alignment. Genome Research. 2008;18(2):298–309. doi: 10.1101/gr.6725608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Nei M, Kumar S. Molecular Evolution and Phylogenetics. New York, NY, USA: Oxford University Press; 2000. [Google Scholar]
- 76.Jukes TH, Cantor CR. Evolution of protein molecules. In: Munro HN, editor. Mammalian Protein Metabolism. New York, NY, USA: Academic Press; 1969. pp. 21–121. [Google Scholar]
- 77.Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution. 1980;16(2):111–120. doi: 10.1007/BF01731581. [DOI] [PubMed] [Google Scholar]
- 78.Hasegawa M, Kishino H, Yano T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution. 1985;22(2):160–174. doi: 10.1007/BF02101694. [DOI] [PubMed] [Google Scholar]
- 79.Tamura K, Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution. 1993;10(3):512–526. doi: 10.1093/oxfordjournals.molbev.a040023. [DOI] [PubMed] [Google Scholar]
- 80.Tavare S. Lectures on Mathematics in the Life Sciences. Vol. 17. Providence, RI, USA: American Mathematical Society; 1986. Some probabilistic and statistical problems on the analysis of DNA sequences; pp. 57–86. [Google Scholar]
- 81.Yang Z. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Molecular Biology and Evolution. 1993;10(6):1396–1401. doi: 10.1093/oxfordjournals.molbev.a040082. [DOI] [PubMed] [Google Scholar]
- 82.Yang Z. Estimating the pattern of nucleotide substitution. Journal of Molecular Evolution. 1994;39(1):105–111. doi: 10.1007/BF00178256. [DOI] [PubMed] [Google Scholar]
- 83.Goldman N, Whelan S. A novel use of equilibrium frequencies in models of sequence evolution. Molecular Biology and Evolution. 2002;19(11):1821–1831. doi: 10.1093/oxfordjournals.molbev.a004007. [DOI] [PubMed] [Google Scholar]
- 84.Liò P, Goldman N. Models of molecular evolution and phylogeny. Genome Research. 1998;8(12):1233–1244. doi: 10.1101/gr.8.12.1233. [DOI] [PubMed] [Google Scholar]
- 85.Whelan S, Liò P, Goldman N. Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends in Genetics. 2001;17(5):262–272. doi: 10.1016/s0168-9525(01)02272-7. [DOI] [PubMed] [Google Scholar]
- 86.Burnham KP, Anderson DR. Model Selection and Multimodel Inference : A Practical Information-Theoretic Approach. New York, NY, USA: Springer; 2002. [Google Scholar]
- 87.Bruno WJ, Halpern AL. Topological bias and inconsistency of maximum likelihood using wrong models. Molecular Biology and Evolution. 1999;16(4):564–566. doi: 10.1093/oxfordjournals.molbev.a026137. [DOI] [PubMed] [Google Scholar]
- 88.Posada D, Crandall KA. Selecting the best-fit model of nucleotide substitution. Systematic Biology. 2001;50(4):580–601. [PubMed] [Google Scholar]
- 89.Cox DR. Further results on tests of separate families of hypotheses. Journal of the Royal Statistical Society. Series B. 1962;24(2):406–424. [Google Scholar]
- 90.Goldman N, Whelan S. Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics. Molecular Biology and Evolution. 2000;17(6):975–978. doi: 10.1093/oxfordjournals.molbev.a026378. [DOI] [PubMed] [Google Scholar]
- 91.Anisimova M, Gascuel O. Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Systematic Biology. 2006;55(4):539–552. doi: 10.1080/10635150600755453. [DOI] [PubMed] [Google Scholar]
- 92.Posada D, Crandall KA. MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998;14(9):817–818. doi: 10.1093/bioinformatics/14.9.817. [DOI] [PubMed] [Google Scholar]
- 93.Paradis E, Claude J, Strimmer K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics. 2004;20(2):289–290. doi: 10.1093/bioinformatics/btg412. [DOI] [PubMed] [Google Scholar]
- 94.Posada D. ModelTest server: a web-based tool for the statistical selection of models of nucleotide substitution online. Nucleic Acids Research. 2006;34, web server issue:W700–W703. doi: 10.1093/nar/gkl042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Abascal F, Zardoya R, Posada D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 2005;21(9):2104–2105. doi: 10.1093/bioinformatics/bti263. [DOI] [PubMed] [Google Scholar]
- 96.Posada D, Buckley TR. Model selection and model averaging in phylogenetics: advantages of akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology. 2004;53(5):793–808. doi: 10.1080/10635150490522304. [DOI] [PubMed] [Google Scholar]
- 97.Pol D. Empirical problems of the hierarchical likelihood ratio test for model selection. Systematic Biology. 2004;53(6):949–962. doi: 10.1080/10635150490888868. [DOI] [PubMed] [Google Scholar]
- 98.Hurvich CM, Tsai C-L. Regression and time series model selection in small samples. Biometrika. 1989;76(2):297–307. [Google Scholar]
- 99.Schwarz G. Estimating the dimension of a model. Annals of Statistics. 1978;6(2):461–464. [Google Scholar]
- 100.Minin VN, Abdo Z, Joyce P, Sullivan J. Performance-based selection of likelihood models for phylogeny estimation. Systematic Biology. 2003;52(5):674–683. doi: 10.1080/10635150390235494. [DOI] [PubMed] [Google Scholar]
- 101.Abdo Z, Minin VN, Joyce P, Sullivan J. Accounting for uncertainty in the tree topology has little effect on the decision-theoretic approach to model selection in phylogeny estimation. Molecular Biology and Evolution. 2005;22(3):691–703. doi: 10.1093/molbev/msi050. [DOI] [PubMed] [Google Scholar]
- 102.Bao L, Gu H, Dunn KA, Bielawski JP. Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data. BMC Evolutionary Biology. 2007;7, supplement 1:S5. doi: 10.1186/1471-2148-7-S1-S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Suchard MA, Weiss RE, Sinsheimer JS. Bayesian selection of continuous-time Markov chain evolutionary models. Molecular Biology and Evolution. 2001;18(6):1001–1013. doi: 10.1093/oxfordjournals.molbev.a003872. [DOI] [PubMed] [Google Scholar]
- 104.Huelsenbeck JP, Larget B, Alfaro ME. Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo. Molecular Biology and Evolution. 2004;21(6):1123–1133. doi: 10.1093/molbev/msh123. [DOI] [PubMed] [Google Scholar]
- 105.Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution. 1987;4(4):406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
- 106.Gascuel O, Steel M. Neighbor-joining revealed. Molecular Biology and Evolution. 2006;23(11):1997–2000. doi: 10.1093/molbev/msl072. [DOI] [PubMed] [Google Scholar]
- 107.Bruno WJ, Socci ND, Halpern AL. Weighted neighbor-joining: a likelihood-based approach to distance-based phylogeny reconstruction. Molecular Biology and Evolution. 2000;17(1):189–197. doi: 10.1093/oxfordjournals.molbev.a026231. [DOI] [PubMed] [Google Scholar]
- 108.Baldauf SL. Phylogeny for the faint of heart: a tutorial. Trends in Genetics. 2003;19(6):345–351. doi: 10.1016/S0168-9525(03)00112-4. [DOI] [PubMed] [Google Scholar]
- 109.Cavalli-Sforza LL, Edwards AWF. Phylogenetic analysis. Models and estimation procedures. American Journal of Human Genetics. 1967;19(3, part 1):233–257. [PMC free article] [PubMed] [Google Scholar]
- 110.Whelan S. New approaches to phylogenetic tree search and their application to large numbers of protein alignments. Systematic Biology. 2007;56(5):727–740. doi: 10.1080/10635150701611134. [DOI] [PubMed] [Google Scholar]
- 111.Holder MT, Lewis PO, Swofford DL, Larget B. Hastings ratio of the LOCAL proposal used in Bayesian phylogenetics. Systematic Biology. 2005;54(6):961–965. doi: 10.1080/10635150500354670. [DOI] [PubMed] [Google Scholar]
- 112.Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985;39(4):783–791. doi: 10.1111/j.1558-5646.1985.tb00420.x. [DOI] [PubMed] [Google Scholar]
- 113.Hillis DM, Bull JJ. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Systematic Biology. 1993;42(2):182–192. [Google Scholar]
- 114.Felsenstein J, Kishino H. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Systematic Biology. 1993;42(2):193–200. [Google Scholar]
- 115.Yang Z, Rannala B. Branch-length prior influences Bayesian posterior probability of phylogeny. Systematic Biology. 2005;54(3):455–470. doi: 10.1080/10635150590945313. [DOI] [PubMed] [Google Scholar]
- 116.Berry V, Gascuel O. On the interpretation of bootstrap trees: appropriate threshold of clade selection and induced gain. Molecular Biology and Evolution. 1996;13(7):999–1011. [Google Scholar]
- 117.Efron B, Halloran E, Holmes S. Bootstrap confidence levels for phylogenetic trees. Proceedings of the National Academy of Sciences of the United States of America. 1996;93(14):7085–7090. doi: 10.1073/pnas.93.14.7085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Mau B, Newton MA, Larget B. Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics. 1999;55(1):1–12. doi: 10.1111/j.0006-341x.1999.00001.x. [DOI] [PubMed] [Google Scholar]
- 119.Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP. Bayesian inference of phylogeny and its impact on evolutionary biology. Science. 2001;294(5550):2310–2314. doi: 10.1126/science.1065889. [DOI] [PubMed] [Google Scholar]
- 120.Murphy WJ, Eizirik E, O'Brien SJ, et al. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001;294(5550):2348–2351. doi: 10.1126/science.1067179. [DOI] [PubMed] [Google Scholar]
- 121.Douady CJ, Delsuc F, Boucher Y, Doolittle WF, Douzery EJP. Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Molecular Biology and Evolution. 2003;20(2):248–254. doi: 10.1093/molbev/msg042. [DOI] [PubMed] [Google Scholar]
- 122.Cummings MP, Handley SA, Myers DS, Reed DL, Rokas A, Winka K. Comparing bootstrap and posterior probability values in the four-taxon case. Systematic Biology. 2003;52(4):477–487. doi: 10.1080/10635150390218213. [DOI] [PubMed] [Google Scholar]
- 123.Erixon P, Svennblad B, Britton T, Oxelman B. Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Systematic Biology. 2003;52(5):665–673. doi: 10.1080/10635150390235485. [DOI] [PubMed] [Google Scholar]
- 124.Svennblad B, Erixon P, Oxelman B, Britton T. Fundamental differences between the methods of maximum likelihood and maximum posterior probability in phylogenetics. Systematic Biology. 2006;55(1):116–121. doi: 10.1080/10635150500481648. [DOI] [PubMed] [Google Scholar]
- 125.Huelsenbeck JP, Rannala B. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Systematic Biology. 2004;53(6):904–913. doi: 10.1080/10635150490522629. [DOI] [PubMed] [Google Scholar]
- 126.Lewis PO, Holder MT, Holsinger KE. Polytomies and Bayesian phylogenetic inference. Systematic Biology. 2005;54(2):241–253. doi: 10.1080/10635150590924208. [DOI] [PubMed] [Google Scholar]
- 127.Kolaczkowski B, Thornton JW. Effects of branch length uncertainty on Bayesian posterior probabilities for phylogenetic hypotheses. Molecular Biology and Evolution. 2007;24(9):2108–2118. doi: 10.1093/molbev/msm141. [DOI] [PubMed] [Google Scholar]
- 128.Steel M, Matsen FA. The Bayesian “star paradox” persists for long finite sequences. Molecular Biology and Evolution. 2007;24(4):1075–1079. doi: 10.1093/molbev/msm028. [DOI] [PubMed] [Google Scholar]
- 129.Yang Z. Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics. Molecular Biology and Evolution. 2007;24(8):1639–1655. doi: 10.1093/molbev/msm081. [DOI] [PubMed] [Google Scholar]
- 130.Kolaczkowski B, Thornton JW. Is there a star tree paradox? Molecular Biology and Evolution. 2006;23(10):1819–1823. doi: 10.1093/molbev/msl059. [DOI] [PubMed] [Google Scholar]
- 131.Mossel E, Vigoda E. Phylogenetic MCMC algorithms are misleading on mixtures of trees. Science. 2005;309(5744):2207–2209. doi: 10.1126/science.1115493. [DOI] [PubMed] [Google Scholar]
- 132.Ronquist F, Larget B, Huelsenbeck JP, Kadane JB, Simon D, van der Mark P. Comment on “Phylogenetic MCMC algorithms are misleading on mixtures of trees”. Science. 2006;312(5772):367. doi: 10.1126/science.1123622. [DOI] [PubMed] [Google Scholar]
- 133.Wheeler WC, Pickett KM. Topology-Bayes versus clade-Bayes in phylogenetic analysis. Molecular Biology and Evolution. 2008;25(2):447–453. doi: 10.1093/molbev/msm274. [DOI] [PubMed] [Google Scholar]
- 134.Chor B, Tuller T. Maximum likelihood of evolutionary trees: hardness and approximation. Bioinformatics. 2005;21, supplement 1:i97–i106. doi: 10.1093/bioinformatics/bti1027. [DOI] [PubMed] [Google Scholar]
- 135.Donoghue MJ. Progress and prospects in reconstructing plant phylogeny. Annals of the Missouri Botanical Garden. 1994;81(3):405–418. [Google Scholar]
- 136.Aris-Brosou S. Least and most powerful phylogenetic tests to elucidate the origin of the seed plants in the presence of conflicting signals under misspecified models. Systematic Biology. 2003;52(6):781–793. [PubMed] [Google Scholar]
- 137.Kishino H, Hasegawa M. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoide. Journal of Molecular Evolution. 1989;29(2):170–179. doi: 10.1007/BF02100115. [DOI] [PubMed] [Google Scholar]
- 138.Goldman N, Anderson JP, Rodrigo AG. Likelihood-based tests of topologies in phylogenetics. Systematic Biology. 2000;49(4):652–670. doi: 10.1080/106351500750049752. [DOI] [PubMed] [Google Scholar]
- 139.Shimodaira H, Hasegawa M. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Molecular Biology and Evolution. 1999;16(8):1114–1116. [Google Scholar]
- 140.Shimodaira H. An approximately unbiased test of phylogenetic tree selection. Systematic Biology. 2002;51(3):492–508. doi: 10.1080/10635150290069913. [DOI] [PubMed] [Google Scholar]
- 141.Aris-Brosou S. How Bayes tests of molecular phylogenies compare with frequentist approaches. Bioinformatics. 2003;19(5):618–624. doi: 10.1093/bioinformatics/btg065. [DOI] [PubMed] [Google Scholar]
- 142.Raftery AE. Hypothesis testing and model selection. In: Gilks W, Richardson S, Spiegelhalter DJ, editors. Markov Chain Monte Carlo in Practice. Boca Raton, Fla, USA: Chapman & Hall; 1996. pp. 163–187. [Google Scholar]
- 143.Nylander JAA, Ronquist F, Huelsenbeck JP, Nieves-Aldrey JL. Bayesian phylogenetic analysis of combined data. Systematic Biology. 2004;53(1):47–67. doi: 10.1080/10635150490264699. [DOI] [PubMed] [Google Scholar]
- 144.Choi SC, Hobolth A, Robinson DM, Kishino H, Thorne JL. Quantifying the impact of protein tertiary structure on molecular evolution. Molecular Biology and Evolution. 2007;24(8):1769–1782. doi: 10.1093/molbev/msm097. [DOI] [PubMed] [Google Scholar]
- 145.Chib S, Jeliazkov I. Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association. 2001;96(453):270–281. [Google Scholar]
- 146.Goldman N. Statistical tests of models of DNA substitution. Journal of Molecular Evolution. 1993;36(2):182–198. doi: 10.1007/BF00166252. [DOI] [PubMed] [Google Scholar]
- 147.Yang Z. Computational Molecular Evolution. Oxford, UK: Oxford University Press; 2006. [Google Scholar]
- 148.Felsenstein J. Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology. 1978;27(4):401–410. [Google Scholar]
- 149.Yang Z. Maximum-likelihood models for combined analyses of multiple sequence data. Journal of Molecular Evolution. 1996;42(5):587–596. doi: 10.1007/BF02352289. [DOI] [PubMed] [Google Scholar]
- 150.Huelsenbeck JP, Suchard MA. A nonparametric method for accommodating and testing across-site rate variation. Systematic Biology. 2007;56(6):975–987. doi: 10.1080/10635150701670569. [DOI] [PubMed] [Google Scholar]
- 151.Lartillot N, Philippe H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Molecular Biology and Evolution. 2004;21(6):1095–1109. doi: 10.1093/molbev/msh112. [DOI] [PubMed] [Google Scholar]
- 152.Lopez P, Casane D, Philippe H. Heterotachy, an important process of protein evolution. Molecular Biology and Evolution. 2002;19(1):1–7. doi: 10.1093/oxfordjournals.molbev.a003973. [DOI] [PubMed] [Google Scholar]
- 153.Yang Z, Roberts D. On the use of nucleic acid sequences to infer early branchings in the tree of life. Molecular Biology and Evolution. 1995;12(3):451–458. doi: 10.1093/oxfordjournals.molbev.a040220. [DOI] [PubMed] [Google Scholar]
- 154.Fitch WM, Markowitz E. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochemical Genetics. 1970;4(5):579–593. doi: 10.1007/BF00486096. [DOI] [PubMed] [Google Scholar]
- 155.Tuffley C, Steel M. Modeling the covarion hypothesis of nucleotide substitution. Mathematical Biosciences. 1998;147(1):63–91. doi: 10.1016/s0025-5564(97)00081-3. [DOI] [PubMed] [Google Scholar]
- 156.Huelsenbeck JP. Testing a covariotide model of DNA substitution. Molecular Biology and Evolution. 2002;19(5):698–707. doi: 10.1093/oxfordjournals.molbev.a004128. [DOI] [PubMed] [Google Scholar]
- 157.Kolaczkowski B, Thornton JW. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogenous. Nature. 2004;431(7011):980–984. doi: 10.1038/nature02917. [DOI] [PubMed] [Google Scholar]
- 158.Spencer M, Susko E, Roger AJ. Likelihood, parsimony, and heterogeneous evolution. Molecular Biology and Evolution. 2005;22(5):1161–1164. doi: 10.1093/molbev/msi123. [DOI] [PubMed] [Google Scholar]
- 159.Lartillot N, Brinkmann H, Philippe H. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evolutionary Biology. 2007;7, supplement 1:S4. doi: 10.1186/1471-2148-7-S1-S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160.Jiménez-Guri E, Philippe H, Okamura B, Holland PWH. Buddenbrockia is a cnidarian worm. Science. 2007;317(5834):116–118. doi: 10.1126/science.1142024. [DOI] [PubMed] [Google Scholar]
- 161.Philippe H, Zhou Y, Brinkmann H, Rodrigue N, Delsuc F. Heterotachy and long-branch attraction in phylogenetics. BMC Evolutionary Biology. 2005;5, article 50:1–8. doi: 10.1186/1471-2148-5-50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 162.Schöniger M, Von Haeseler A. A stochastic model for the evolution of autocorrelated DNA sequences. Molecular Phylogenetics and Evolution. 1994;3(3):240–247. doi: 10.1006/mpev.1994.1026. [DOI] [PubMed] [Google Scholar]
- 163.Muse SV, Gaut BS. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Molecular Biology and Evolution. 1994;11(5):715–724. doi: 10.1093/oxfordjournals.molbev.a040152. [DOI] [PubMed] [Google Scholar]
- 164.Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Molecular Biology and Evolution. 1994;11(5):725–736. doi: 10.1093/oxfordjournals.molbev.a040153. [DOI] [PubMed] [Google Scholar]
- 165.Siepel A, Haussler D. Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Molecular Biology and Evolution. 2004;21(3):468–488. doi: 10.1093/molbev/msh039. [DOI] [PubMed] [Google Scholar]
- 166.Hwang DG, Green P. Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(39):13994–14001. doi: 10.1073/pnas.0404142101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 167.Christensen OF, Hobolth A, Jensen JL. Pseudo-likelihood analysis of codon substitution models with neighbor-dependent rates. Journal of Computational Biology. 2005;12(9):1166–1182. doi: 10.1089/cmb.2005.12.1166. [DOI] [PubMed] [Google Scholar]
- 168.Robinson DM, Jones DT, Kishino H, Goldman N, Thorne JL. Protein evolution with dependence among codons due to tertiary structure. Molecular Biology and Evolution. 2003;20(10):1692–1704. doi: 10.1093/molbev/msg184. [DOI] [PubMed] [Google Scholar]
- 169.Rodrigue N, Lartillot N, Bryant D, Philippe H. Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene. 2005;347(2):207–217. doi: 10.1016/j.gene.2004.12.011. [DOI] [PubMed] [Google Scholar]
- 170.Rodrigue N, Philippe H, Lartillot N. Assessing site-interdependent phylogenetic models of sequence evolution. Molecular Biology and Evolution. 2006;23(9):1762–1775. doi: 10.1093/molbev/msl041. [DOI] [PubMed] [Google Scholar]
- 171.Kleinman CL, Rodrigue N, Bonnard C, Philippe H, Lartillot N. A maximum likelihood framework for protein design. BMC Bioinformatics. 2006;7, article 326:1–17. doi: 10.1186/1471-2105-7-326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 172.Sato A, Tichy H, O'Huigin C, Grant B PR, Grant R, Klein J. On the origin of Darwin's finches. Molecular Biology and Evolution. 2001;18(3):299–311. doi: 10.1093/oxfordjournals.molbev.a003806. [DOI] [PubMed] [Google Scholar]
- 173.Salzburger W, Mack T, Verheyen E, Meyer A. Out of Tanganyika: genesis, explosive speciation, key-innovations and phylogeography of the haplochromine cichlid fishes. BMC Evolutionary Biology. 2005;5, article 17:1–15. doi: 10.1186/1471-2148-5-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 174.Hughes AL. Looking for Darwin in all the wrong places: the misguided quest for positive selection at the nucleotide sequence level. Heredity. 2007;99(4):364–373. doi: 10.1038/sj.hdy.6801031. [DOI] [PubMed] [Google Scholar]
- 175.Kimura M. The Neutral Theory of Molecular Evolution. New York, NY, USA: Cambridge University Press; 1983. [Google Scholar]
- 176.Lynch M. The Origins of Genome Architecture. Sunderland, Mass, USA: Sinauer Associates; 2007. [Google Scholar]
- 177.Nielsen R. Statistical tests of selective neutrality in the age of genomics. Heredity. 2001;86(6):641–647. doi: 10.1046/j.1365-2540.2001.00895.x. [DOI] [PubMed] [Google Scholar]
- 178.Aris-Brosou S, Excoffier L. The impact of population expansion and mutation rate heterogeneity on DNA sequence polymorphism. Molecular Biology and Evolution. 1996;13(3):494–504. doi: 10.1093/oxfordjournals.molbev.a025610. [DOI] [PubMed] [Google Scholar]
- 179.Bustamante CD, Wakeley J, Sawyer S, Hartl DL. Directional selection and the site-frequency spectrum. Genetics. 2001;159(4):1779–1788. doi: 10.1093/genetics/159.4.1779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 180.Zhu L, Bustamante CD. A composite-likelihood approach for detecting directional selection from DNA sequence data. Genetics. 2005;170(3):1411–1421. doi: 10.1534/genetics.104.035097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 181.Bamshad M, Wooding SP. Signatures of natural selection in the human genome. Nature Reviews Genetics. 2003;4(2):99–111. doi: 10.1038/nrg999. [DOI] [PubMed] [Google Scholar]
- 182.Anisimova M, Liberles DA. The quest for natural selection in the age of comparative genomics. Heredity. 2007;99(6):567–579. doi: 10.1038/sj.hdy.6801052. [DOI] [PubMed] [Google Scholar]
- 183.Nei M, Gojobori T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular Biology and Evolution. 1986;3(5):418–426. doi: 10.1093/oxfordjournals.molbev.a040410. [DOI] [PubMed] [Google Scholar]
- 184.Wong WSW, Nielsen R. Detecting selection in noncoding regions of nucleotide sequences. Genetics. 2004;167(2):949–958. doi: 10.1534/genetics.102.010959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 185.McCauley S, de Groot S, Mailund T, Hein J. Annotation of selection strengths in viral genomes. Bioinformatics. 2007;23(22):2978–2986. doi: 10.1093/bioinformatics/btm472. [DOI] [PubMed] [Google Scholar]
- 186.Yang Z. Adaptive molecular evolution. In: Balding DJ, Bishop M, Cannings C, editors. Handbook of Statistical Genetics. 2nd edition. New York, NY, USA: John Wiley & Sons; 2003. pp. 229–254. [Google Scholar]
- 187.Yang Z, Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Molecular Biology and Evolution. 2000;17(1):32–43. doi: 10.1093/oxfordjournals.molbev.a026236. [DOI] [PubMed] [Google Scholar]
- 188.Wong WSW, Yang Z, Goldman N, Nielsen R. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics. 2004;168(2):1041–1051. doi: 10.1534/genetics.104.031153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 189.Zhang J, Nielsen R, Yang Z. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Molecular Biology and Evolution. 2005;22(12):2472–2479. doi: 10.1093/molbev/msi237. [DOI] [PubMed] [Google Scholar]
- 190.Zhang J, Kumar S, Nei M. Small-sample tests of episodic adaptive evolution: a case study of primate lysozymes. Molecular Biology and Evolution. 1997;14(12):1335–1338. doi: 10.1093/oxfordjournals.molbev.a025743. [DOI] [PubMed] [Google Scholar]
- 191.Yang Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Molecular Biology and Evolution. 1998;15(5):568–573. doi: 10.1093/oxfordjournals.molbev.a025957. [DOI] [PubMed] [Google Scholar]
- 192.Nielsen R, Yang Z. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 1998;148(3):929–936. doi: 10.1093/genetics/148.3.929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 193.Suzuki Y, Gojobori T. A method for detecting positive selection at single amino acid sites. Molecular Biology and Evolution. 1999;16(10):1315–1328. doi: 10.1093/oxfordjournals.molbev.a026042. [DOI] [PubMed] [Google Scholar]
- 194.Yang Z, Nielsen R, Goldman N, Pedersen A-MK. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000;155(1):431–449. doi: 10.1093/genetics/155.1.431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 195.Massingham T, Goldman N. Detecting amino acid sites under positive selection and purifying selection. Genetics. 2005;169(3):1753–1762. doi: 10.1534/genetics.104.032144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 196.Kosakovsky Pond SL, Frost SDW. Not so different after all: a comparison of methods for detecting amino acid sites under selection. Molecular Biology and Evolution. 2005;22(5):1208–1222. doi: 10.1093/molbev/msi105. [DOI] [PubMed] [Google Scholar]
- 197.Yang Z, Nielsen R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Molecular Biology and Evolution. 2002;19(6):908–917. doi: 10.1093/oxfordjournals.molbev.a004148. [DOI] [PubMed] [Google Scholar]
- 198.Anisimova M, Yang Z. Molecular evolution of the hepatitis delta virus antigen gene: recombination or positive selection? Journal of Molecular Evolution. 2004;59(6):815–826. doi: 10.1007/s00239-004-0112-x. [DOI] [PubMed] [Google Scholar]
- 199.Aris-Brosou S. Determinants of adaptive evolution at the molecular level: the extended complexity hypothesis. Molecular Biology and Evolution. 2005;22(2):200–209. doi: 10.1093/molbev/msi006. [DOI] [PubMed] [Google Scholar]
- 200.Anisimova M, Bielawski JP, Yang Z. Accuracy and power of Bayes prediction of amino acid sites under positive selection. Molecular Biology and Evolution. 2002;19(6):950–958. doi: 10.1093/oxfordjournals.molbev.a004152. [DOI] [PubMed] [Google Scholar]
- 201.Yang Z, Wong WSW, Nielsen R. Bayes empirical Bayes inference of amino acid sites under positive selection. Molecular Biology and Evolution. 2005;22(4):1107–1118. doi: 10.1093/molbev/msi097. [DOI] [PubMed] [Google Scholar]
- 202.Huelsenbeck JP, Dyer KA. Bayesian estimation of positively selected sites. Journal of Molecular Evolution. 2004;58(6):661–672. doi: 10.1007/s00239-004-2588-9. [DOI] [PubMed] [Google Scholar]
- 203.Aris-Brosou S. Identifying sites under positive selection with uncertain parameter estimates. Genome. 2006;49(7):767–776. doi: 10.1139/g06-038. [DOI] [PubMed] [Google Scholar]
- 204.Anisimova M, Nielsen R, Yang Z. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics. 2003;164(3):1229–1236. doi: 10.1093/genetics/164.3.1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 205.Anisimova M, Bielawski J, Dunn K, Yang Z. Phylogenomic analysis of natural selection pressure in Streptococcus genomes. BMC Evolutionary Biology. 2007;7, article 154:1–13. doi: 10.1186/1471-2148-7-154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 206.Zuckerkandl E, Pauling L. Molecules as documents of evolutionary history. Journal of Theoretical Biology. 1965;8(2):357–366. doi: 10.1016/0022-5193(65)90083-4. [DOI] [PubMed] [Google Scholar]
- 207.Zuckerkandl E, Pauling L. Evolutionary divergence and convergence in proteins. In: Bryson V, Vogel HJ, editors. Evolving Genes and Proteins. New York, NY, USA: Academic Press; 1965. [Google Scholar]
- 208.Bromham L, Penny D. The modern molecular clock. Nature Reviews Genetics. 2003;4(3):216–224. doi: 10.1038/nrg1020. [DOI] [PubMed] [Google Scholar]
- 209.Aris-Brosou S. Dating phylogenies with hybrid local molecular clocks. PLoS ONE. 2007;2(9):e879. doi: 10.1371/journal.pone.0000879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 210.Kishino H, Hasegawa M. Converting distance to time: application to human evolution. Methods in Enzymology. 1990;183:550–570. doi: 10.1016/0076-6879(90)83036-9. [DOI] [PubMed] [Google Scholar]
- 211.Rambaut A, Bromham L. Estimating divergence dates from molecular sequences. Molecular Biology and Evolution. 1998;15(4):442–448. doi: 10.1093/oxfordjournals.molbev.a025940. [DOI] [PubMed] [Google Scholar]
- 212.Yoder AD, Yang Z. Estimation of primate speciation dates using local molecular clocks. Molecular Biology and Evolution. 2000;17(7):1081–1090. doi: 10.1093/oxfordjournals.molbev.a026389. [DOI] [PubMed] [Google Scholar]
- 213.Yang Z, Yoder AD. Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene loci and calibration points, with application to a radiation of cute-looking mouse Lemur species. Systematic Biology. 2003;52(5):705–716. doi: 10.1080/10635150390235557. [DOI] [PubMed] [Google Scholar]
- 214.Yang Z. A heuristic rate smoothing procedure for maximum likelihood estimation of species divergence times. Acta Zoologica Sinica. 2004;50:645–656. [Google Scholar]
- 215.Sanderson MJ. Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Molecular Biology and Evolution. 2002;19(1):101–109. doi: 10.1093/oxfordjournals.molbev.a003974. [DOI] [PubMed] [Google Scholar]
- 216.Smith AB, Pisani D, Mackenzie-Dodds JA, Stockley B, Webster BL, Littlewood DTJ. Testing the molecular clock: molecular and paleontological estimates of divergence times in the Echinoidea (Echinodermata) Molecular Biology and Evolution. 2006;23(10):1832–1851. doi: 10.1093/molbev/msl039. [DOI] [PubMed] [Google Scholar]
- 217.Sanderson MJ. A nonparametric approach to estimating divergence times in the absence of rate constancy. Molecular Biology and Evolution. 1997;14(12):1218–1231. [Google Scholar]
- 218.Aris-Brosou S, Yang Z. Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA phylogeny. Systematic Biology. 2002;51(5):703–714. doi: 10.1080/10635150290102375. [DOI] [PubMed] [Google Scholar]
- 219.Aris-Brosou S, Yang Z. Bayesian models of episodic evolution support a late Precambrian explosive diversification of the Metazoa. Molecular Biology and Evolution. 2003;20(12):1947–1954. doi: 10.1093/molbev/msg226. [DOI] [PubMed] [Google Scholar]
- 220.Ho SY, Phillips MJ, Drummond AJ, Cooper A. Accuracy of rate estimation using relaxed-clock models with a critical focus on the early metazoan radiation. Molecular Biology and Evolution. 2005;22(5):1355–1363. doi: 10.1093/molbev/msi125. [DOI] [PubMed] [Google Scholar]
- 221.Welch JJ, Fontanillas E, Bromham L. Molecular dates for the “cambrian explosion”: the influence of prior assumptions. Systematic Biology. 2005;54(4):672–678. doi: 10.1080/10635150590947212. [DOI] [PubMed] [Google Scholar]
- 222.Aitkin M. Posterior Bayes factors. Journal of the Royal Statistical Society B. 1991;53(1):111–142. [Google Scholar]
- 223.Drummond AJ, Ho SY, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biology. 2006;4(5):e88. doi: 10.1371/journal.pbio.0040088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 224.Huelsenbeck JP, Bollback JP, Levine AM. Inferring the root of a phylogenetic tree. Systematic Biology. 2002;51(1):32–43. doi: 10.1080/106351502753475862. [DOI] [PubMed] [Google Scholar]
- 225.Shendure J, Mitra RD, Varma C, Church GM. Advanced sequencing technologies: methods and goals. Nature Reviews Genetics. 2004;5(5):335–344. doi: 10.1038/nrg1325. [DOI] [PubMed] [Google Scholar]
- 226.Moore MJ, Dhingra A, Soltis PS, et al. Rapid and accurate pyrosequencing of angiosperm plastid genomes. BMC Plant Biology. 2006;6, article 17:1–13. doi: 10.1186/1471-2229-6-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 227.Green P. 2x genomes—Does depth matter? Genome Research. 2007;17(11):1547–1549. doi: 10.1101/gr.7050807. [DOI] [PubMed] [Google Scholar]
- 228.Rokas A, Williams BL, King N, Carroll SB. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003;425(6960):798–804. doi: 10.1038/nature02053. [DOI] [PubMed] [Google Scholar]
- 229.Clark AG, Eisen MB, Smith DR, et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007;450(7167):203–218. doi: 10.1038/nature06341. [DOI] [PubMed] [Google Scholar]
- 230.Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics. 2005;6(5):361–375. doi: 10.1038/nrg1603. [DOI] [PubMed] [Google Scholar]
- 231.Ge F, Wang LS, Kim J. The cobweb of life revealed by genome-scale estimates of horizontal gene transfer. PLoS Biology. 2005;3(10):e316. doi: 10.1371/journal.pbio.0030316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 232.Page RDM. Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. Molecular Phylogenetics and Evolution. 2000;14(1):89–106. doi: 10.1006/mpev.1999.0676. [DOI] [PubMed] [Google Scholar]
- 233.Phillips MJ, Delsuc F, Penny D. Genome-scale phylogeny and the detection of systematic biases. Molecular Biology and Evolution. 2004;21(7):1455–1458. doi: 10.1093/molbev/msh137. [DOI] [PubMed] [Google Scholar]
- 234.Nishihara H, Okada N, Hasegawa M. Rooting the eutherian tree: the power and pitfalls of phylogenomics. Genome Biology. 2007;8(9):R199. doi: 10.1186/gb-2007-8-9-r199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 235.Rodríguez-Ezpeleta N, Brinkmann H, Roure B, Lartillot N, Lang BF, Philippe H. Detecting and overcoming systematic errors in genome-scale phylogenies. Systematic Biology. 2007;56(3):389–399. doi: 10.1080/10635150701397643. [DOI] [PubMed] [Google Scholar]
- 236.Hedges SB, Dudley J, Kumar S. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics. 2006;22(23):2971–2972. doi: 10.1093/bioinformatics/btl505. [DOI] [PubMed] [Google Scholar]
- 237.Janečka JE, Miller W, Pringle TH, et al. Molecular and genomic data identify the closest living relative of primates. Science. 2007;318(5851):792–794. doi: 10.1126/science.1147555. [DOI] [PubMed] [Google Scholar]
- 238.Kumar S, Dudley J. Bioinformatics software for biologists in the genomics era. Bioinformatics. 2007;23(14):1713–1717. doi: 10.1093/bioinformatics/btm239. [DOI] [PubMed] [Google Scholar]