Abstract
Hadzipasic et al. used ancestral sequence reconstruction to identify historical sequence substitutions that putatively caused Aurora kinases to evolve allosteric regulation. We show that their results arise from using an implausible phylogeny and sparse sequence sampling. Addressing either problem reverses their inferences: allostery and the amino acids that confer it were not gained during the diversification of eukaryotes but were lost in a subgroup of Fungi.
How allosteric regulation of proteins arose during evolution is a critical question in evolutionary biochemistry. Using ancestral sequence reconstruction (ASR) and biochemical experiments, Hadzipasic et al. (1) claim to have identified historical sequence substitutions that caused the acquisition of allostery from a non-allosteric ancestor during the evolution of Aurora kinase (AURK), a eukaryotic cell cycle regulator that is allosterically activated in animals by the TPX2 protein. Inferred ancestral sequences are conditional upon the set of extant sequences used and the phylogeny that describes their relationships, but Hadzipasic et al.’s sequence sampling was extremely sparse, and the phylogeny they used is implausible. We therefore investigated these shortcomings and their effects on the reconstruction of AURK evolution.
The phylogeny inferred by Hadzipasic et al. (Fig. 1A) is highly incongruent with the established phylogeny of eukaryotes (Fig. 1B). The Hadzipasic et al. phylogeny groups animals and plants together to the exclusion of fungi, but the monophyly of Opisthokonta (fungi, animals, and their unicellular relatives) has been extensively corroborated (2, 3). Hadzipasic et al. also place microsporidians as the most basally branching eukaryotic lineage, despite strong evidence for their inclusion within Fungi (4). These incongruences are crucial to the claims of Hadzipasic et al., because the nodes on their phylogeny between which allostery is claimed to have evolved represent ancestral species that in fact never existed: AurANC2, the non-allosteric precursor, would be AURK in the last common ancestor of all eukaryotes except microsporidians, and AurANC3, the first allosteric protein, would be AURK in the last common ancestor of animals and plants to the exclusion of fungi.
Figure 1. A plausible phylogeny reverses Hadzipasic et al.’s ancestral reconstructions.

(A) The phylogeny of AURKs and their nearest outgroup (PLKs) used by Hadzipasic et al. Parentheses indicate the number of sequences in each clade. Circles mark the experimentally characterized ancestors, colored by presence/absence of the allosteric response to TPX2. Labels show inferred posterior probability of each clade. (B) The established phylogeny of the taxa in panel A. (C) Minimum number of gene duplications and losses required to reconcile panel A phylogeny with panel B. D) Minimum number of gene transfer and replacement events required to reconcile panel A phylogeny with panel B. Other scenarios with an equal or greater number of events are also possible. (E) AURK phylogeny when sequences in Hadzipasic et al. were reanalyzed given the constraint in B. Ancestral sequences reconstructed in panel G are labeled. F) Ancestral reconstruction on the phylogeny of Hadzipasic et al. (panel A). Inferred ancestral states are displayed for a group of 15 sites that experimentally confer allostery when the states from AurANC3 (green) replace those in AurANC2 (orange). Gray, other amino acid state. Row labels correspond to nodes in panel A. Site numbers based on human AURKA. G) Maximum a posteriori ancestral states reconstructed on the constrained AURK phylogeny in panel E. Sites, states, and colors are as in panel F. Shading shows the posterior probability of each state.
Hadzipasic et al. suggest that their AURK gene tree might be incongruent with the accepted species phylogeny because of gene duplication and loss, but this scenario is implausible: it requires an elaborate history of three gene duplications before the most recent common ancestor (MRCA) of eukaryotes, followed by 14 gene losses distributed so precisely that only a single resulting paralog has been retained in every eukaryote that has been sequenced (Fig. 1C). Hadzipasic et al. also suggest horizontal gene transfer (HGT) as a possible cause, but this would require a complex scenario in which every single AURK sequence on the phylogeny except for one descends from an HGT event, with every transfer replacing the recipient’s original copy and leaving no trace of the event in any extant genome (Fig. 1D); this scenario is especially implausible because HGT between multicellular eukaryotes is rare (5).
A more likely cause of the incongruence of Hadzipasic et al.’s tree with the species phylogeny is long branch attraction (LBA) (6). The branches leading to microsporidians and ascomycetes may have been moved from their established position near animals towards the root, where they attach to an extremely long branch leading to the nearest outgroup (PLKs). Microsporidians have previously been found to be subject to systematic LBA that moves them to an artifactual position as basal eukaryotes, especially when sampling in the Fungi is sparse (7). Strong support for misplaced branches is consistent with systematic bias caused by LBA (7).
We therefore repeated ASR using Hadzipasic et al.’s sequence set of AURKs and PLKs, but we constrained the phylogeny to follow established species relationships (Fig. 1B, E). We focused on the 15 sequence states from AurANC3 that experimentally confer allostery when introduced into the non-allosteric AurANC2 (Fig. 1F). We found that the direction of these substitutions is almost completely reversed compared to the trajectory proposed by Hadzipasic et al. (Fig. 1E–G). The deepest AURK ancestor (AncEukarya) now contains 14 of the 15 states associated with allostery and only one of the non-allosteric states; the other 14 non-allosteric states were all gained within the Fungi. Repairing the major topological errors in Hadzipasic et al.’s phylogeny is therefore sufficient to remove the evidence for their paper’s central claims.
We next studied the effect of improved sequence sampling. AURKs are present across eukaryotes, but the sequence set analyzed by Hadzipasic et al. included only 19 AURKs; all but three of these were from animals and fungi, which account for only a small fraction of eukaryotic diversity. Within fungi, only ascomycetes and a single microsporidian were represented, and only a single species each of plant and amoeba were included. We therefore acquired and aligned 324 AURK and 315 PLK protein sequences, broadly sampled from five major eukaryotic taxa (Fig. 2A): Fungi, Holozoa (animals and unicellular relatives), Archaeplastida (plants and green and red algae), Amoebozoa (amoebae), and SAR (stramenopiles, alveolates, and rhizarians). Within Fungi, we included 137 AURKs from numerous taxonomic groups to better resolve the phylogenetic position of Fungi and the amino acid states within it.
Figure 2. Improved sequence sampling reverses Hadzipasic et al.’s ancestral reconstructions.

(A) The established phylogeny of the major eukaryotic groups. Polytomy, branching order not established. (B) Maximum likelihood phylogeny when AURK and PLK are densely sampled. Numbers in parentheses indicate the number of sequences in each group. Node labels, reconstructed ancestral sequences. Fungi and Holozoa are pink and cyan, respectively. Branch label with arrow, approximate likelihood ratio statistic for Fungi+Holozoa (p < 0.01 by χ2 approximation, ref. 13). (C) ML phylogeny given the constraint in panel A. (D) ML phylogeny given the constraint in A, except Fungi are constrained to split first. (E) Most states that confer allostery in Hadzipasic et al. are not conserved in extant AURKs that are allosterically regulated by TPX2. Green and orange, allosteric and non-allosteric states from Hadzipasic et al.; gray, other states. Site numbers are in panel H. (F, G, H) Reconstructed sequences on the phylogenies in panels B, C, and D, respectively. The maximum a posteriori states at the 15 sites that experimentally confer allostery/nonallostery are shown, colored as in panel E and shaded by their posterior probability. Row labels correspond to ancestral nodes in panels B–D.
We used this alignment to reconstruct ancestral sequences on three phylogenies: 1) the unconstrained maximum likelihood (ML) phylogeny, which recovers almost all the established species relationships – including the sister relationship of Fungi and Holozoa – except that Amoebozoa and some SAR sequences are pulled towards the root (Fig. 2B); 2) the “maximum congruence” (MC) phylogeny, which is constrained to reflect established species relationships among the major groups (Fig. 2A, C); and 3) a “Fungi-out” phylogeny, which has the same constraints, but with Fungi as the first-branching eukaryotic lineage (Fig. 2D). The likelihood difference between the ML and MC tree is not significant (p=0.36, Shimodaira-Hasegawa test (8)), and the latter requires no auxiliary events like gene duplications/losses or horizontal transfers, so we consider the MC phylogeny to be the best supported. The Fungi-out phylogeny is implausible, but it allows us to isolate the effect of improving sampling on ASR by imposing the critical features of the Hadzipasic et al. phylogeny.
On all three trees, AncEukarya again has predominantly allosteric states and only one or two of the 15 non-allosteric states that Hadzipasic et al. inferred as ancestral (Fig. 2F–H). All other non-allosteric residues are again derived within Fungi. This result arises because the allosteric states are found not only in animals and plants but also in non-ascomycete fungi and other eukaryotic groups, which Hadzipasic et al. did not include. Improved sequence sampling alone, even on the Fungi-out phylogeny, is therefore sufficient to reverse the direction of evolution of the experimentally important substitutions compared with that inferred by Hadzipasic et al.
On the most plausible MC phylogeny, AncEukarya contains the allosteric state at 11 of 15 sites. The four missing states are not universally required for allostery, because they are absent in one or more extant allosteric AURKs (Fig. 2E) (9–11). The best-supported hypothesis is therefore that AURK of AncEukarya was allosteric, and this feature was lost along the lineage leading to ascomycetes; experiments will be necessary for a direct test. This scenario is consistent with the taxonomic distribution of AURK’s allosteric effector TPX2. Hadzipasic et al. claim that TPX2 evolved after the origin of the AURK protein and before the emergence of allostery, but a reciprocal BLAST search identifies TPX2 orthologs in all major eukaryotic taxa, including all fungal groups except ascomycetes (Supplemental Data Table). The history of TPX2 therefore tracks exactly with the best supported history of AURK allostery: presence in the eukaryotic ancestor, loss in ascomycetes.
Finally, our analysis indicates that the basal placement of fungi in the phylogeny of Hadzipasic et al. is likely attributable to LBA. The first line of defense against LBA is improved sampling to break up long branches (12). When we analyzed more sequences with greater taxonomic diversity – including numerous fungal groups that branch off the established phylogeny between Microsporidiae and ascomycetes, as well as basally branching groups within the other high-level eukaryotic taxa – support for Hadzipasic et al.’s topology was eliminated, and the canonical position of Fungi was restored with strong support (Fig. 2B). One reason for the sparse sampling in Hadzipasic et al. may have been the use of software to co-estimate phylogeny and alignment, which is computationally demanding and therefore limited to very small datasets; although co-estimation is appealing in theory, the AURK sequences align with little ambiguity, and the compromised sampling necessitated by this approach led to severe phylogenetic error.
This case illustrates the importance of sound phylogenetic practice when employing ASR. Comprehensive sequence sampling is essential, especially from taxa that attach to the phylogeny near the nodes of interest and that can break up long branches. Single-protein datasets may not have sufficient signal to resolve difficult phylogenetic problems or overcome LBA, so congruence with well-established relationships should be assessed, and the effect of imposing those relationships on the reconstruction should be explored. Confidence in the functional properties of reconstructed ancestral proteins should always be assessed by examining the distribution of functions among extant sequences across the phylogeny; if a very non-parsimonious history is implied, extra scrutiny is warranted. In the current case, characterization of other extant AURKs, particularly in non-ascomycete fungi, Amoebozoa, and SAR is essential. These kinds of practices can provide multiple safety checks against erroneous inference by ASR.
Supplementary Material
Acknowledgements
We thank members of the Thornton laboratory for comments. Supported by NIH R01GM131128 (JWT), R01GM121931 (JWT), and T32-007197 (JEJP).
Footnotes
Methods and supplementary information. Description of methods and supplementary table, phylogenies, alignments, and ancestral sequence reconstructions have been deposited at Dryad (https://doi.org/10.5061/dryad.cvdncjt2b).
References
- 1.Hadzipasic A et al. , Science 367, 912 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Baldauf SL, Palmer JD, Proc Natl Acad Sci U S A 90, 11558 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.del Campo J, Ruiz-Trillo I, Mol Biol Evol 30, 802 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Keeling PJ, Fast NM, Annu Rev Microbiol 56, 93 (2002). [DOI] [PubMed] [Google Scholar]
- 5.Soucy SM, Huang J, Gogarten JP, Nat Rev Genet 16, 472 (2015). [DOI] [PubMed] [Google Scholar]
- 6.Philippe H et al. , PLoS Biol 9, e1000602 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Brinkmann H, van der Giezen M, Zhou Y, Poncelin de Raucourt G, Philippe H, Syst Biol 54, 743 (2005). [DOI] [PubMed] [Google Scholar]
- 8.Shimodaira H, Hasegawa M, Molecular biology and evolution 16, 1114 (1999). [DOI] [PubMed] [Google Scholar]
- 9.Eyers PA, Erikson E, Chen LG, Maller JL, Current Biology 13, 691 (2003). [DOI] [PubMed] [Google Scholar]
- 10.Tomaštíková E et al. , Plant molecular biology reporter 33, 1988 (2015). [Google Scholar]
- 11.Özlü N et al. , Developmental cell 9, 237 (2005). [DOI] [PubMed] [Google Scholar]
- 12.Hedtke SM, Townsend TM, Hillis DM, Syst Biol 55, 522 (2006). [DOI] [PubMed] [Google Scholar]
- 13.Anisimova M, Gascuel O, Syst Biol 55, 539 (2006). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
