Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2020 Dec 3;18(12):e3000954. doi: 10.1371/journal.pbio.3000954

Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression

Dan Vanderpool 1,*, Bui Quang Minh 2,3, Robert Lanfear 3, Daniel Hughes 4, Shwetha Murali 4, R Alan Harris 4,5, Muthuswamy Raveendran 4, Donna M Muzny 4,5, Mark S Hibbins 1, Robert J Williamson 6, Richard A Gibbs 4,5, Kim C Worley 4,5, Jeffrey Rogers 4,5, Matthew W Hahn 1
Editor: Chris D Jiggins7
PMCID: PMC7738166  PMID: 33270638

Abstract

Our understanding of the evolutionary history of primates is undergoing continual revision due to ongoing genome sequencing efforts. Bolstered by growing fossil evidence, these data have led to increased acceptance of once controversial hypotheses regarding phylogenetic relationships, hybridization and introgression, and the biogeographical history of primate groups. Among these findings is a pattern of recent introgression between species within all major primate groups examined to date, though little is known about introgression deeper in time. To address this and other phylogenetic questions, here, we present new reference genome assemblies for 3 Old World monkey (OWM) species: Colobus angolensis ssp. palliatus (the black and white colobus), Macaca nemestrina (southern pig-tailed macaque), and Mandrillus leucophaeus (the drill). We combine these data with 23 additional primate genomes to estimate both the species tree and individual gene trees using thousands of loci. While our species tree is largely consistent with previous phylogenetic hypotheses, the gene trees reveal high levels of genealogical discordance associated with multiple primate radiations. We use strongly asymmetric patterns of gene tree discordance around specific branches to identify multiple instances of introgression between ancestral primate lineages. In addition, we exploit recent fossil evidence to perform fossil-calibrated molecular dating analyses across the tree. Taken together, our genome-wide data help to resolve multiple contentious sets of relationships among primates, while also providing insight into the biological processes and technical artifacts that led to the disagreements in the first place.


Combining three newly sequenced primate genomes with other published genomes, this study adapts a little-known method for detecting ancient introgression to genome-scale data, revealing multiple previously unknown examples of hybridization between primate species.

Introduction

Understanding the history of individual genes and whole genomes is an important goal for evolutionary biology. It is only by understanding these histories that we can understand the origin and evolution of traits—whether morphological, behavioral, or biochemical. Until recently, our ability to address the history of genes and genomes was limited by the availability of comparative genomic data. However, genome sequences are now being generated extremely rapidly. In primates alone, there are already 23 species with published reference genome sequences and associated annotations (S1 Table), as well as multiple species with population samples of whole genomes [111]. These data can now be used to address important evolutionary questions.

Several studies employing dozens of loci sampled across broad taxonomic groups have provided rough outlines of the evolutionary relationships and divergence times among primates [12,13]. Due to the rapid nature of several independent radiations within primates, these limited data cannot resolve species relationships within some clades [1214]. For instance, the New World monkeys (NWM) experienced a rapid period of diversification approximately 15 to 18 million years ago (mya) [15] (Fig 1), resulting in ambiguous relationships among the 3 Cebidae subfamilies (Cebinae = squirrel monkeys and capuchins, Aotinae = owl monkeys, and Callitrichinae = marmosets and tamarins) [1214,1618]. High levels of incomplete lineage sorting (ILS) driven by short times between the divergence of distinct lineages have led to a large amount of gene tree discordance in the NWM, with different loci favoring differing relationships among taxa. Given the known difficulties associated with resolving short internodes [1921], as well as the multiple different approaches and datasets used in these analyses, the relationships among cebid subfamilies remain uncertain.

Fig 1. Species tree estimated using ASTRAL III with 1,730 gene trees (the Mus musculus outgroup was removed to allow for a visually finer scale).

Fig 1

Common names for each species can be found in S1 Table. Node labels indicate the bootstrap value from a maximum likelihood analysis of the concatenated dataset as well as the local posterior probability from the ASTRAL analysis. gCFs and sCFs are also reported. Eight fossil calibrations (blue stars; S6 Table) were used to calibrate node ages. Gray bars indicate the minimum and maximum mean age from independent dating estimates. The inset tree with colored branches shows the maximum likelihood branch lengths estimated using a partitioned analysis of the concatenated alignment. Colors correspond to red = Strepsirrhini, cyan = Tarsiiformes, green = Platyrrhini (NWMs), blue = Cercopithecoidea (OWMs), and orange = Hominoidea (Apes). All alignments used for phylogenomic analyses (1730_Alignments_FINAL.tar.gz) and dating analyses (All_Dating_Datasets_DRYAD.tar.gz) are available via Data Dryad: https://doi.org/10.5061/dryad.rfj6q577d [22]. gCF, gene concordance factor; NWM, New World monkey; OWM, Old World monkey; sCF, site concordance factor.

In addition to issues of limited data and rapid radiations, a history of hybridization and subsequent gene flow between taxa means that there is no single dichotomously branching tree that all genes follow. Although introgression once was thought to be relatively rare (especially among animals [23]), genomic studies have uncovered widespread patterns of recent introgression across the tree of life [24]. Evidence for recent or ongoing gene flow is especially common among the primates (e.g., [9,2527]), sometimes with clear evidence for adaptive introgression (e.g., [2830]). Whether widespread gene flow among primates is emblematic of their initial radiation (which began 60 to 75 mya [13,3133]) or is a consequence of current conditions—which include higher environmental occupancy and more secondary contact—remains an open question [34].

Here, we report the sequencing and annotation of 3 new primate genomes, all Old World monkey (OWM) species: Colobus angolensis ssp. palliatus (the black and white colobus), Macaca nemestrina (southern pig-tailed macaque), and Mandrillus leucophaeus (the drill). Together with the published whole genomes of extant primates, we present a phylogenomic analysis including 26 primate species and several closely related non-primates. Incorporating recently discovered fossil evidence [35], we perform fossil-calibrated molecular dating analyses to estimate divergence times, including dates for the crown primates as well as the timing of more recent splits. Compared to recent hybridization, introgression that occurred between 2 or more ancestral lineages (represented by internal branches on a phylogeny) is difficult to detect. To get around this limitation, we modify a previously proposed method for detecting introgression [36] and apply it to our whole-genome datasets, finding additional evidence for gene flow among ancestral primates. Finally, we closely examine the genealogical patterns left behind by the NWM radiation, as well as the biases of several methods that have been used to resolve this topology. We use multiple approaches to provide a strongly supported history of the NWM and primates in general, while also highlighting the large amounts of gene tree discordance across the tree caused by ILS and introgression.

Results and discussion

Primate genome sequencing

The 3 species sequenced here are all OWMs, and each is closely related to an already-sequenced species. This sampling scheme provides us increased power to detect introgression among each of the sub-clades containing these species. The assembly and annotation of each of the 3 species sequenced for this project are summarized here, with further details listed in Table 1. A summary of all published genomes used in this study, including links to the assemblies and NCBI BioProjects, is available in S2 Table. All species were sequenced using standard methods according to Illumina (San Diego, California, United States of America) Hi-seq protocols. Additional long-read sequencing was performed using Pacific Biosciences (Menlo Park, California, USA) technology for M. nemestrina.

Table 1. Genomes sequenced in this study and associated assembly and annotation metrics.

Species name Assembly accession Assembly total length No. of scaffolds Scaffold N50 (mb) Contig N50 (kb) Protein-coding genes BUSCO
Colobus angolensis ssp. palliatus (the black and white colobus) GCF_000951035.1 2,970,124,662 13,124 7.84 38.36 20,222 95.82%
Macaca nemestrina (pig-tailed macaque) GCF_000956065.1 2,948,703,511 9,733 15.22 106.89 21,017 95.98%
Mandrillus leucophaeus (drill) GCF_000951045.1 3,061,992,840 12,821 3.19 31.35 20,465 95.45%

BUSCO percentages reflect the complete and fragmented genes relative to the Euarchontoglires ortholog database v9.

BUSCO, Benchmarking Universal Single-Copy Orthologs.

The sequencing effort for C. angolensis ssp. palliatus produced 514 Gb of data, which are available in the NCBI Short Read Archive (SRA) under the accession SRP050426 (BioProject PRJNA251421). The biological sample used for sequencing was kindly provided by Dr. Oliver Ryder (San Diego Zoo). Assembly of these data resulted in a total assembly length of 2.97 Gb in 13,124 scaffolds (NCBI assembly Cang.pa_1.0; GenBank accession GCF_000951035.1) with an average per base coverage of 86.8X. Subsequent annotation via the NCBI Eukaryotic Genome Annotation Pipeline (annotation release ID: 100) resulted in the identification of 20,222 protein-coding genes and 2,244 noncoding genes. An assessment of the annotation performed using Benchmarking Universal Single-Copy Orthologs (BUSCO) 3.0.2 [37] in conjunction with the Euarchontoglires ortholog database 9 (https://busco-archive.ezlab.org/v3/datasets/euarchontoglires_odb9.tar.gz) indicated that 95.82% single-copy orthologs (91.68% complete and 4.13% fragmented) were present among the annotated protein-coding genes. Comprehensive annotation statistics for C. angolensis ssp. palliatus with links to the relevant annotation products available for download can be viewed at https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Colobus_angolensis_palliatus/100/.

For M. nemestrina, 1,271 Gb of data were produced (SRA accession SRP045960; BioProject PRJNA251427), resulting in an assembled genome length of 2.95 Gb in 9,733 scaffolds (Mnem_1.0; GenBank accession GCF_000956065.1). This corresponds to an average per base coverage of 113.1X when both short- and long-read data are combined (Materials and methods). The biological sample used for sequencing was kindly provided by Drs. Betsy Ferguson and James Ha (Washington National Primate Research Center). The NCBI annotation resulted in 21,017 protein-coding genes and 13,163 noncoding genes (annotation release ID: 101). A BUSCO run to assess the completeness of the annotation (as above) indicated that 95.98% single-copy orthologs (92.23% complete and 3.75% fragmented) were present among the annotated protein-coding genes. Comprehensive annotation statistics for M. nemestrina with links to the relevant annotation products available for download can be viewed at https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Macaca_nemestrina/101/.

Sequencing of M. leucophaeus libraries resulted in 334.1 Gb of data (SRA accession SRP050495; BioProject PRJNA251423) that once assembled resulted in a total assembly length of 3.06 Gb in 12,821 scaffolds (Mleu.le_1.0; GenBank accession GCF_000951045.1) with an average coverage of 117.2X per base. The biological sample used for sequencing was kindly provided by Dr. Oliver Ryder (San Diego Zoo). The NCBI annotation produced 20,465 protein-coding genes and 2,300 noncoding genes (annotation release ID: 100). A BUSCO run to assess the completeness of the annotation (as above) indicated that 95.45% single-copy orthologs (91.38% complete, 4.07% fragmented) were present among the annotated protein-coding genes. The full annotation statistics with links to the associated data can be viewed at https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Mandrillus_leucophaeus/100/.

Phylogenetic relationships among primates

To investigate phylogenetic relationships among primates, we selected the longest isoform for each protein-coding gene from 26 primate species and 3 non-primate species (S1 Table). After clustering, aligning, trimming, and filtering (Materials and methods), there were 1,730 single-copy orthologs present in at least 27 of the 29 species (see S3 Table for the orthogroup, protein name, chromosome, and location of each single-copy ortholog in the human genome). The cutoffs used to filter the dataset ensure high species coverage while still retaining a large number of orthologs. The coding sequences (CDS) of these orthologs have an average length of 1,018 bp and 178 parsimony informative characters per gene. Concatenation of these loci resulted in an alignment of 1,761,114 bp, with the fraction of gaps/ambiguities varying from 4.04% (Macaca mulatta) to 18.37% (Carlito syrichta) (S4 Table).

We inferred 1,730 individual gene trees from nucleotide alignments using maximum likelihood in IQ-TREE 2 [38] and then inferred a species tree using these gene tree topologies as input to ASTRAL III ([39]; Materials and methods). We used the mouse, Mus musculus, as an outgroup to root the species tree. This approach resulted in a topology (which we refer to as “ML-ASTRAL”; Fig 1) that largely agrees with previously published phylogenies [12,13]. We also used IQ-TREE to carry out a maximum likelihood analysis of the concatenated nucleotide alignment (a topology we refer to as ML-CONCAT). This analysis resulted in a topology that differed from the ML-ASTRAL tree only with respect to the placement of Aotus nancymaae (owl monkey), rather than sister to the Saimiri+Cebus clade (as in Fig 1), the ML-CONCAT tree places Aotus sister to Callithrix jacchus, a minor rearrangement around a very short internal branch (Fig 1). All branches of the ML-ASTRAL species tree are supported by maximum local posteriors, the default support values provided by ASTRAL III [40], except for the branch that defines Aotus as sister to the Saimiri+Cebus clade (0.46 local posterior probability). Likewise, each branch in the ML-CONCAT tree is supported by 100% bootstrap values, including the branch uniting Aotus and Callithrix. We return to this conflict in the next section.

There has been some contention as to the placement of the mammalian orders Scandentia (treeshrews) and Dermoptera (colugos) [4150]. The controversy concerns whether Dermoptera is sister to Primates, Scandentia is sister to Primates, or Dermoptera and Scandentia are sister groups. As expected, both the ML-ASTRAL and ML-CONCAT trees place these 2 groups outside the Primates with maximal statistical support (i.e., local posterior probabilities of 1.0 and bootstrap values of 100%; Fig 1); they also both point to Dermoptera as the closest sister lineage to the Primates [12,5153]. However, while support values such as the bootstrap or posterior probability provide statistical confidence in the species tree topology, there can be large amounts of underlying gene tree discordance even for branches with 100% support (e.g., [5456]). To assess discordance generally, and the relationships among the Primates, Scandentia, and Dermoptera in particular, we used IQ-TREE to calculate both the gene concordance factor (gCF) and site concordance factor (sCF) [57] for each internal branch of the topology in Fig 1. These 2 measures represent the fraction of genes and sites, respectively, which are in agreement with the species tree for any particular branch.

Examining concordance factors helps to explain previous uncertainty in the relationships among Primates, Scandentia, and Dermoptera (Fig 1). Although the bootstrap support is 100% and the posterior probability is 1.0 on the branch leading to the Primate common ancestor, the gCF is 45%, and the sCF is 39%. These values indicate that, of decisive gene trees (n = 1,663), only 45% of them contain the branch that is in the species tree; this branch reflects the Primates as a single clade that excludes Scandentia and Dermoptera. While the species tree represents the single topology supported by the most gene trees (hence the strong statistical support for this branch), the concordance factors also indicate that a majority of gene tree topologies differ from the estimated species tree. In fact, the gCF value indicates that 55% of trees do not support a monophyletic Primate order, with either Dermoptera, Scandentia, or both lineages placed within Primates. Likewise, the sCF value indicates that only 39% of decisive sites in the total alignment support the branch uniting all primates, with 30% favoring Dermoptera as sister to the Primate suborder Strepsirrhini and 31% placing Dermoptera sister to the Primate suborder Haplorrhini. Similarly, only a small plurality of genes and sites have histories that place Dermoptera as sister to the Primates rather than either of the 2 alternative topologies (gCF = 37, sCF = 4 0; Fig 1), despite the maximal statistical support for these relationships. While discordance at individual gene trees can result from technical problems in tree inference (e.g., long-branch attraction, low phylogenetic signal, poorly aligned sequences, or model misspecification), it also often reflects biological causes of discordance such as ILS and introgression. We further address the possible role of technical errors in generating patterns of discordance in the section entitled “Sources of gene tree discordance” below.

Within the Primates, the phylogenetic affiliation of tarsiers (represented here by C. syrichta) has been debated since the first attempts by Buffon (1765) and Linnaeus (1767 to 1770) to systematically organize described species [58]. Two prevailing hypotheses group tarsiers (Tarsiiformes) with either lemurs and lorises (the “prosimian” hypothesis [59]) or with Simiiformes (the “Haplorrhini” hypothesis [60], where Simiiformes = Apes+OWM+NWM). The ML-ASTRAL and ML-CONCAT analyses place Tarsiiformes with Simiiformes, supporting the Haplorrhini hypothesis (Fig 1). The Strepsirrhines come out as a well-supported group sister to the other primates. Again, our inference of species relationships is consistent with previous genomic analyses [61,62] but also highlights the high degree of discordance in this part of the tree. The rapid radiation of mammalian lineages that occurred in the late Paleocene and early Eocene [32] encompassed many of the basal primate branches, including the lineage leading to Haplorrhini. The complexity of this radiation is likely the reason for low gCF and sCFs (39.5% and 36%, respectively) for the branch leading to Haplorrhini and perhaps explains why previous studies recovered conflicting resolutions for the placement of tarsiers [31,63,64].

The remaining branches of the species tree that define major primate clades all have remarkably high concordance with the underlying gene trees (gCF >80%), though individual branches within these clades do not. The gCFs for the branches defining these clades are Strepsirrhini (lemurs+lorises) = 84.5, Catarrhini (OWM+Apes) = 90.0, Platyrrhini (NWM) = 96.6, Hominoidea (Apes) = 82.7, and Cercopithecidae (OWM) = 92.3 (Fig 1). High gene tree/species tree concordance for these branches is likely due to a combination of more recent divergences (increasing gene tree accuracy) and longer times between branching events [65]. Within these clades, however, we see multiple recent radiations. One of the most contentious has been among the NWMs, a set of relationships we address next.

ML concatenation affects resolution of the New World monkey radiation

Sometime during the mid to late Eocene (approximately 45 to 34 mya), a small number of primates arrived on the shores of South America [15,66]. These monkeys likely migrated from Africa [66] and on arrival underwent multiple rounds of extinction and diversification [15]. Three extant families from this radiation now make up the NWMs (Platyrrhini; Fig 1). Because of the rapidity with which these species spread and diversified across the new continent, relationships at the base of the NWM have been hard to determine [1214,1618].

As reported above, the concatenated analysis (ML-CONCAT) gives a different topology than the gene tree-based analysis (ML-ASTRAL). Specifically, the ML-CONCAT analysis supports a symmetrical tree, with Aotus sister to Callithrix (Fig 2A). In contrast, ML-ASTRAL supports an asymmetrical (or “caterpillar”) tree, with Aotus sister to a clade comprised of Saimiri+Cebus (Fig 2B). There are reasons to have doubts about both topologies. It is well known that carrying out maximum likelihood analyses of concatenated datasets can result in incorrect species trees, especially when the time between speciation events is short [67,68]. In fact, the specific error that is made in these cases is for ML concatenation methods to prefer a symmetrical 4-taxon tree over an asymmetrical one, exactly as is observed here. Gene tree-based methods such as ASTRAL are not prone to this particular error, as long as the underlying gene trees are all themselves accurate [69,70]. However, if there is bias in gene tree reconstruction, then there are no guarantees as to the accuracy of the species tree. In addition, the ML-ASTRAL tree is supported by only a very small plurality of gene trees: There are 442 trees supporting this topology, compared to 437 supporting the ML-CONCAT topology and 413 supporting the third topology (Fig 2D). This small excess of supporting gene trees also explains the very low posterior support for this branch in the species tree (Fig 1). Additionally, a polytomy test [71], implemented in ASTRAL and performed using ML gene trees, failed to reject the null hypothesis of “polytomy” for the branch uniting Aotus+(Saimiri,Cebus) (P = 0.47).

Fig 2. The 3 most frequent topologies of NWMs.

Fig 2

(A) Tree 1 is the symmetrical topology inferred by the ML-CONCAT analysis of 1,730 loci (1.76 Mb). (B) Tree 2 is the asymmetrical topology inferred by ASTRAL III using either maximum likelihood (ML-ASTRAL) or maximum parsimony (MP-ASTRAL) gene tree topologies. Using maximum parsimony on the concatenated alignment also returns this tree (MP-CONCAT). (C) Tree 3 is the alternative resolution recovered at high frequency in all gene tree analyses, though it is not the optimal species tree using any of the methods. (D) Number of gene trees supporting each of the 3 resolutions of the NWM clade when maximum likelihood is used to infer gene tree topologies. There are 1,637 decisive gene trees for these splits. (E) Gene tree counts when maximum parsimony is used to infer gene tree topologies. (F) Number of parsimony informative sites in the concatenated alignment supporting each of the 3 resolutions. ML-CONCAT, maximum likelihood concatenated; NWM, New World monkey.

To investigate these relationships further, we carried out additional analyses. The trees produced from concatenated alignments can be biased in situations with high ILS when maximum likelihood is used for inference, but this bias does not affect parsimony methods [21,72]. Therefore, we analyzed exactly the same concatenated 1.76 Mb alignment used as input for ML but carried out a maximum parsimony analysis in PAUP* [73]. As would be expected given the known biases of ML methods, the maximum parsimony tree (which we refer to as “MP-CONCAT”) returns the same tree as ML-ASTRAL, supporting an asymmetric topology of NWMs (Fig 2B). Underlying this result is a relatively large excess of parsimony informative sites supporting this tree (Fig 2F), which results in maximal bootstrap values for every branch. The 2 most diverged species in this clade (Saimiri and Callithrix) are only 3.26% different at the nucleotide level, so there should be little effect of multiple substitutions on the parsimony analysis.

As mentioned above, gene tree-based methods (such as ASTRAL) are not biased when accurate gene trees are used as input. However, in our initial analyses, we used maximum likelihood to infer the individual gene trees. Because protein-coding genes are themselves often a combination of multiple different underlying topologies [74], ML gene trees may be biased and using them as input to gene tree-based methods may still lead to incorrect inferences of the species tree [75]. Therefore, we used the same 1,730 loci as above to infer gene trees using maximum parsimony with MPBoot [76]. Although the resulting topologies still possibly represent the average over multiple topologies contained within a protein-coding gene, using parsimony ensures that this average tree is not a biased topology. These gene trees were used as input to estimate a species tree using ASTRAL; we refer to this as the “MP-ASTRAL” tree. Once again, the methods that avoid known biases of ML lend further support to an asymmetric tree, placing Aotus sister to the Saimiri+Cebus clade (Fig 2B). In fact, the gene trees inferred with parsimony now show a much greater preference for this topology, with a clear plurality of gene trees supporting the species tree (473 versus 417 supporting the second most common tree; Fig 2E). As a consequence, the local posterior for this branch in the MP-ASTRAL tree is 0.92, and the polytomy test performed using MP gene trees rejects (P = 0.037) the null hypothesis of “polytomy” for the branch uniting Aotus+(Saimiri,Cebus). The increased number of concordant gene trees using parsimony suggests that the gene trees inferred using ML may well have been suffering from the biases of concatenation when multiple trees are brought together (as observed in the Great Apes [74]), reducing the observed levels of concordance.

A recent analysis of NWM genomes found Aotus sister to Callithrix, as in the ML-CONCAT tree, despite the use of gene trees to build the species tree [18]. However, the outgroup used in this analysis is a closely related species (Brachyteles arachnoides) that diverged during the NWM radiation and that shares a recent common ancestor with the ingroup taxa [12,13]. If the outgroup taxon used to root a tree shares a more recent common ancestor with subsets of ingroup taxa at an appreciable number of loci, the resulting tree topologies will be biased. A similar problem likely arose in previous studies that have used the Scandentia or Dermoptera as outgroups to Primates. In general, this issue highlights the difficulty in choosing outgroups: Though we may have 100% confidence that a lineage lies outside our group of interest in the species tree, a reliable outgroup must also not have any discordant gene trees that place it inside the ingroup.

Sources of gene tree discordance

As previously mentioned, there are both biological reasons for gene tree discordance (e.g., ILS or introgression) and technical reasons (e.g., long-branch attraction, homoplasy, low phylogenetic signal, poorly aligned sequences, or model misspecification). All of these phenomena may be reflected in gCFs and sCFs, but the proportion of discordance attributable to biological versus technical factors is often difficult to ascertain. We therefore performed additional analyses to assess the impact of error on estimates of concordance factors.

In order to determine the degree to which short alignments or genes with low phylogenetic signal contribute to inaccurate gene trees, we recalculated gCFs and sCFs using the genes with the 200 longest alignments in our dataset (lengths ranging from 1,640 bp to 6,676 bp, with 116 to 2,101 parsimony informative sites). The resulting gCFs for the branch leading to the Primate common ancestor increases from 45% to 66%, while the sCFs remain unchanged (S1 Fig). For the branch placing Dermoptera sister to Primates, using trees estimated from the 200 longest alignments resulted in a modest increase in gCFs from 37% to 45%. Overall, the gCFs for the 200 longest genes were higher for all branches in the tree, with the average gCF increasing from 65.18% to 79.74%. The consistent increase in gCF but not sCF when using longer genes points to errors in gene tree inference as a small, but significant, factor in our dataset.

Using a single outgroup (mouse) could potentially lead to biases such as long-branch attraction near the base of the tree. To ameliorate these concerns, we performed an additional analysis using 150 randomly chosen single-copy orthologs, with pika (Ochotona princeps) included as a second outgroup. As in the full dataset, maximum likelihood and parsimony were both applied to a concatenated dataset, and gene trees were also inferred via both ML and parsimony. Parsimony analysis of the concatenated alignment resulted in the same topology as in Fig 1, while a maximum likelihood analysis produced the same topology as the full ML-CONCAT tree from 1,730 loci, preferring a symmetric tree for the NWM clade. To assess the effect of including an additional outgroup on concordance factors, we calculated gCFs and sCFs using the 150 single-copy orthologs both with (S2A Fig) and without (S2B Fig) pika (using ML gene trees). In contrast to expectations about any error introduced by long-branch attraction, we observe slightly lower gCFs near the base of the tree when pika is included (S2A Fig). sCFs are not affected by the inclusion of pika. These analyses indicate that including additional outgroups when analyzing the full dataset is unlikely to reduce concordance factors or to change inferences of the species tree.

Technical errors leading to discordance should be more prominent deeper in the tree, as there is more opportunity for long-branch attraction, homoplasy, poor alignments, or model misspecification to cause problems. To determine whether concordance factors for deep branches in the primate tree are disproportionately affected by error, we looked for a correlation between concordance factors and the age of each bifurcation in the tree. For gCFs, we found no correlation with node age (r2 = 0.0094), while sCFs were slightly negatively correlated (r 2 = 0.2998; S3 Fig). The negative correlation found between sCFs and node age is consistent with the expectation that substitutions occurring on deeper branches of the tree are more likely to suffer from the effects of multiple substitutions (homoplasy). While there may still be technical factors affecting gCFs, true discordance throughout the tree is high enough to mask any such effect.

A recent simulation study [77] reported that negative selection, in combination with large differences in effective population size, can generate strong enough asymmetries in gene tree topologies that the most common topology does not match the species tree. Such an effect, if real, would mislead both gene tree-based and concatenation-based approaches to species tree inference. However, previous theoretical results predict that there should be no effect of negative selection on the distribution of tree topologies [7881], and the new results were obtained using custom simulation software. To clarify this issue, we used the open-source simulator SLiM [82] to study non-recombining loci under the most extreme parameters used by He and colleagues [77] (see Materials and methods). We found no evidence for the bias in gene tree frequencies recently reported (S4A Fig). However, we observed fewer than 1 mutation per locus at the end of our simulations under the parameters exactly replicating He and colleagues [77], suggesting we may not have generated sufficient deleterious variation to observe the effect. To address this, we simulated the same conditions but with the deleterious mutation rate increased by 2 orders of magnitude and still did not observe a bias in topology frequencies (S4B Fig). Our results therefore indicate that weak negative selection does not generate gene tree discordance, consistent with population genetic theory [7881].

Strongly supported divergence times using fossil calibrations

Fossil-constrained molecular dating was performed using 10 independent datasets, each of which consisted of 40 protein-coding genes randomly selected (without replacement) and concatenated. The resulting datasets had an average alignment length of 39,374 bp (SD = 2.6 × 103; S5 Table). Although individual discordant trees included in this analysis may have different divergence times, the difference in estimates of dates should be quite small [83]. We used 8 dated fossils (blue stars in Fig 1) from 10 studies for calibration (S6 Table). The most recent of these fossils is approximately 5.7 mya [84], while the most ancient is 55.8 mya [85]. Each separate dataset and the same set of “soft” fossil constraints, along with the species tree in Fig 1, were used as input to PhyloBayes 3.3 [86], which was run twice to assess convergence (Materials and methods).

We observed tight clustering of all estimated node ages across datasets and independent runs of PhyloBayes (Fig 3 and S6 Table). In addition, the ages of most major crown nodes estimated here are largely in agreement with previously published age estimates (Table 2). Some exceptions include the age of the crown Strepsirrhini (47.4 mya) and Haplorrhini (59.0 mya), which are more recent than many previous estimates for these nodes (range in the literature is Strepsirrhini = 51.6 to 68.7, Haplorrhini = 60.6 to 81.3; see Table 2). The crown nodes for Catarrhini, Hominoidea, and Cercopithecidae (28.4, 21.4, and 16.8 mya, respectively) all fall within the range of variation recovered in previous studies (Table 2).

Fig 3. Mean node ages for independent Phylobayes dating runs.

Fig 3

Box plots show the median, interquartile range, and both minimum and maximum values of the mean nodes ages for 10 different datasets (with each dataset run twice). An additional run was performed with no sequence data to ascertain the prior on node divergence times in the presence of fossil calibrations (pink asterisks). Some prior ages were too large to include in the plot while still maintaining detail; these ages are given as numeric values. The species tree topology is from Fig 1; 95% HPD intervals for each node are reported in S7 Table. Node age estimates for each independent PhyloBayes run are provided in S1 Data. HPD, highest posterior density.

Table 2. Mean crown node divergence times estimated in this study compared with mean divergences times estimated by 8 prior studies.

Node This study This study, no max* This study, concord Herrera et al. [32] Kistler et al. [33] Perez et al. [17] Springer et al. [13] Meredith et al. [45] Perelman et al. [12] Wilkinson et al. [87] Chatterjee et al. [31]
Primates 61.7 67.5 63 63.9 68 NA 67.8 71.5 87.2 84.5 63.7
Strepsirrhini 47.4 50.2 48.4 61.4 59 NA 54.2 55.1 68.7 49.8 51.6
Haplorrhini 59.0 63.8 59.8 61.9 67 60.6 61.2 62.4 81.3 NA NA
Catarrhini 28.4 29.0 27.2 32.1 33 27.8 25.1 20.6 31.6 31.0 29.3
Hominoidea 21.4 21.6 19.9 NA 21 18.44 17.4 14.4 20.3 NA 21.5
Cercopithecidae 16.8 16.9 14.2 NA 24 13.4 13.2 NA 17.6 14.1 23.4

Estimates were calculated by averaging the mean times across all runs for 10 independent datasets.

*Refers to the average divergence time of the crown node for the indicated taxonomic group when the 65.8 my maximum constraint was removed from the Primate node.

Refers to the average divergence time of the crown node for the indicated taxonomic group when divergence times were estimated using the most concordant gene trees. Datasets used in all dating analyses are available via Data Dryad in the archive All_Dating_Datasets_DRYAD.tar.gz, https://doi.org/10.5061/dryad.rfj6q577d [22].

my, million years; NA, not applicable.

Our estimate for the most recent common ancestor of the extant primates (i.e., the last common ancestor of Haplorrhini and Strepsirrhini) is 61.7 mya, which is slightly more recent than several studies [13,31,33,88] and much more recent than other studies [12,87,89] (Table 2). However, our estimate is in good agreement with Herrera and colleagues [32], who used 34 fossils representing extinct and extant lineages (primarily Strepsirrhines) to infer divergence times among primates, concluding that the split occurred approximately 64 mya. Despite limited overlap in taxon sampling, 1 similarity between our study and that of Herrera and colleagues is that we have both used the maximum constraint of 65.8 million years (my) on the ancestral primate node suggested by Benton and colleagues [90], which likely contributes to the more recent divergence. It is worth noting that the soft bounds imposed in our analysis permit older ages to be sampled from the Markov chain, but these represented only a small fraction (median 3.37%) of the total sampled states after burn-in (S6 Table). To determine the effects of imposing the 65.8 my maximum constraint on the Primate node, we analyzed all 10 datasets for a third time with this constraint removed and report the divergence time of major primate clades in Table 2 (“No Max” entries). However, it may be that using genes that have gene trees most similar to the topology being dated will reduce bias caused by concatenation [74]. To determine whether using concordant loci has an impact on the estimated dates, we constructed an 11th dataset consisting of approximately 43 kb from the 20 loci most similar to the species tree in Fig 1 (as determined by Robinson–Foulds distances). There was no consistent difference in the dates estimated with this dataset (“Concord” entries in Table 2).

There are several caveats to our age estimates that should be mentioned. Maximum age estimates for the crown node of any given clade are defined by the oldest divergence among sampled taxa in the clade. This limitation results in underestimates for nearly all crown node ages as, in practice, complete taxon sampling is difficult to achieve. Fossil calibrations are often employed as minimum constraints in order to overcome the limitations imposed by taxon sampling, allowing older dates to be estimated more easily. On the other hand, the systematic underestimation of crown node ages due to taxon sampling is somewhat counteracted by the overestimation of speciation times due to ancestral polymorphism. Divergence times estimated from sequence data represent the coalescence times of sequences, which are necessarily older than the time at which 2 incipient lineages diverged [91,92]. This overestimation will have a proportionally larger effect on recent nodes (such as the Homo/Pan split; Fig 3, node 15), but the magnitude can be no larger than the average level of polymorphism in ancestral populations and will be additionally reduced by post-divergence gene flow.

Introgression during the radiation of primates

There is now evidence for recent interspecific gene flow between many extant primates, including introgression events involving humans [25], gibbons [93,94], baboons [9,27], macaques [95,96], and vervet monkeys [10], among others. While there are several widely used methods for detecting introgression between closely related species (see chapters 5 and 9 in [97]), detecting ancient gene flow is more difficult. One of the most popular methods for detecting recent introgression is the D test (also known as the “ABBA-BABA” test; [98]). This test is based on the expectation that, for any given branch in a species tree, the 2 most frequent alternative resolutions should be present in equal proportions. However, the D test uses individual SNPs to evaluate support for alternative topologies and explicitly assumes an infinite sites model of mutation (i.e., no multiple hits). As this assumption will obviously not hold the further back in time one goes, a different approach is needed.

Fortunately, Huson and colleagues [36] described a method that uses gene trees themselves (rather than SNPs) to detect introgression. Using the same expectations as in the D test, these authors looked for a deviation from the expected equal numbers of alternative tree topologies using a test statistic they refer to as Δ. As far as we are aware, Δ has only rarely been used to test for introgression in empirical data, possibly because of the large number of gene trees needed to assess significance or the assumptions of the parametric method proposed to obtain P values. Here, given our large number of gene trees and large number of internal branches to be tested, we adapt the Δ test for genome-scale data.

To investigate patterns of introgression within primates, we used 1,730 single-copy loci to test for deviations from the null expectation of Δ on each of the 24 internal branches of the primate phylogeny (Materials and methods). To test whether deviations in Δ were significant (i.e., Δ > 0), we generated 2,000 resampled datasets of 1,730 gene tree topologies each. P values were calculated from Z-scores generated from these resampled datasets. Among the 17 branches where at least 5% of topologies were discordant, we found 7 for which Δ had P < 0.05.

To further verify these instances of potential introgression, for each of these 7 branches we increased the number of gene trees used, as well as the alignment length for each locus, by subsampling a smaller set of taxa. We randomly chose 4 taxa for each internal branch tested that also had this branch as an internal branch and then aligned all orthologs present in a single copy in each taxon. These steps resulted in approximately 3,600 to 6,400 genes depending on the branch being tested (S8 Table). Additionally, because instances of hybridization and introgression are well documented among macaques [96,99,100], we similarly resampled orthologs from the 3 Macaca species in our study.

We recalculated Δ using the larger gene sets and found significant evidence (after correcting for m = 17 multiple comparisons by using a cutoff of P = 0.00301) for 6 introgression events, all of which occurred among the Papionini (Fig 4 and see next paragraph). Within the Hominoidea, we found Δ = 0.0518 for the branch leading to the great apes (P = 0.030). The asymmetry in gene tree topologies here suggests that gene flow may have happened between gibbons (represented by Nomascus) and the ancestral branch leading to the African hominoids (humans, chimpanzees, and gorillas), but, like the D test, Δ cannot tell us the direction of introgression. Although currently separated by significant geographic distances (African apes south of the Sahara Desert and gibbons all in Southeast Asia), it is worth noting that fossil hominoids dating from the early to late Miocene had a broad distribution extending from Southern Africa to Europe and Asia [101]. Support for introgression between ancestral hominins and ancestral chimpanzees has been previously reported [102]; our 4-taxon analyses found marginal support for this conclusion (Δ = 0.0917, P = 0.055).

Fig 4. Introgression among Papionini taxa (the species tree is unrooted for clarity).

Fig 4

Arrows indicate that a significant Δ was found in our 4 taxon tests and identify the 2 lineages inferred to have exchanged genes (values underlying these tests are listed in S8 Table). Among the Papionini, there was evidence of introgression between African taxa (Papio, Theropithecus, and Cercocebus) and Asian Macaca species (light gray arrows). Introgression events likely occurred between African taxa and the ancestral Macaca, which had a wide distribution across Northern Africa prior to the radiation throughout Asia 2–3 mya [103]. More recent instances of introgression are inferred between macaque species and among the African Papionini (dark gray arrows). mya, million years ago.

Within the OWM, approximately 40% of Cercopithicine species are known to hybridize in nature [34]. Consistent with this, M. nemestrina and Macaca fascicularis showed a strong signature of gene flow in our data (Δ = 0.1761, P = 1.377e-09). These 2 species have ranges that currently overlap (S5 Fig). In contrast to the clear signal of recent gene flow in the macaques, we detected a complex pattern of ancient introgression between the African Papionini (Cercocebus, Mandrillus, Papio, and Theropithecus) and the Asian Papionini (Macaca) (Fig 4). The Δ test was significant using multiple different subsamples of 4 taxa, suggesting multiple ancestral introgression events. An initial attempt to disentangle these events using Phylonet v3.8.0 [104] with the 7 Papionini species and an outgroup was unsuccessful, as Phylonet failed to converge on an optimal network for these taxa. An attempt to infer the network with SNaQ [105] gave similarly ambiguous results. When there are multiple episodes of gene flow within a clade, even complex computational machinery may be unable to infer the correct combination of events.

As an alternative approach, we used 4-taxon trees to estimate Δ for each Macaca species paired with 2 African Papionini (1 from the Papio+Theropithecus clade and 1 from the Mandrillus +Cercocebus clade; see S8 Table) and an outgroup. Significant introgression was detected using each of the Macaca species and 3 of the 4 African Papionini species (Cercocebus, Theropithecus, and Papio). These results suggest gene flow between the ancestor of the 3 Macaca species in our analysis and the ancestors of the 3 African Papionini in our analysis, or 1 introgression event involving the ancestor of all 4 African species coupled with a second event that masked this signal in Mandrillus. This second event may have been either biological (additional introgression events masking the signal) or technical (possibly the lack of continuity or completeness of the Mandrillus reference genome sequence), but in either case, we could not detect introgression in the available drill sequence. The latter scenario would fit better with the current geographic distributions of these species, as they are on 2 different continents. However, the fossil record indicates that by the late Miocene to late Pleistocene, the ancestral distribution of the genus Macaca covered all of North Africa, into the Levant, and as far north as the United Kingdom (S5 Fig; [106]). The fossil record for Theropithecus indicates that several species had distributions that overlapped with Macaca during this time, including in Europe and as far east as India (S5 Fig; [107,108]). Ancestral macaques and ancestral Papionini may therefore have come into contact in the area of the Mediterranean Sea. The Sahara Desert is also responsible for the current disjunct distributions of many of these species. However, this region has experienced periods of increased rainfall or “greenings” over the past several million years [109111]. Faunal migration through the Sahara, including by hominins, is hypothesized to have occurred during these green periods [110,112,113], resulting in successive cycles of range expansion and contraction [114]. Hybridization and introgression could have occurred between the ancestors of these groups during 1 of these periods.

Our results on introgression come with multiple caveats, both about the events we detected and the events we did not detect. As with the D test, there are multiple alternative explanations for a significant value of Δ besides introgression. Ancestral population structure can lead to an asymmetry in gene tree topologies [115] though it requires a highly specific, possibly unlikely population structure. For instance, if the ancestral population leading to M. nemestrina was more closely related to M. fascicularis than was the ancestral population leading to its sister species, M. mulatta (Fig 4), then there could be an unequal number of alternative topologies. Similarly, any bias in gene tree reconstruction that favors 1 alternative topology over the other could potentially lead to a significant value of Δ. While this scenario is unlikely to affect recent divergences using SNPs, well-known biases that affect topology reconstruction deeper in the tree (such as long-branch attraction) could lead to gene tree asymmetries. However, we did not observe any significant Δ-values for branches more than approximately 10 my old. One alternative approach to avoid biases in reconstruction could be the use of transposon insertions or other rare genomic changes (cf. [116,117]). Future analyses that compare these different approaches to detecting introgression would be especially useful.

There are also multiple reasons why our approach may have missed introgression events, especially deeper in the tree. All methods that use asymmetries in gene tree topologies miss gene flow between sister lineages, as such events do not lead to changes in the proportions of underlying topologies. Similarly, equal levels of gene flow between 2 pairs of non-sister lineages can mask both events, while even unequal levels will lead one to miss the less frequent exchange. More insidiously, especially for events further back in time, extinction of the descendants of hybridizing lineages will make it harder to detect introgression (though extinction of donor lineages is less of a problem than extinction of lineages receiving migrants). Internal branches closer to the root will be on average longer than those near the tips because of extinction [118], and therefore, introgression between non-sister lineages would have to occur longer after speciation in order to be detected. For instance, gene flow among Strepsirrhine species has been detected in many previous analyses of more closely related species (e.g., [119122]), but the deeper relationships among the taxa sampled here may have made it very difficult to detect introgression. Nevertheless, our analyses were able to detect introgression between many primate species across the phylogeny.

Conclusions

Several previous phylogenetic studies of primates have included hundreds of taxa, but fewer than 70 loci [12,13]. While the species tree topologies produced by these studies are nearly identical to the one recovered in our analysis, the limited number of loci meant that it was difficult to assess gene tree discordance accurately. By estimating gene trees from 1,730 single-copy loci, we were able to assess the levels of discordance present at each branch in the primate phylogeny. Understanding discordance helps to explain why there have been long-standing ambiguities about species relationships near the base of primates and in the radiation of NWMs. Our analyses reveal how concatenation of genes—or even of exons—can mislead maximum likelihood phylogenetic inference in the presence of discordance, but also how to overcome these biases. Discordance also provides a window into introgression among lineages, and here, we have found evidence for exchange among several species pairs. Each instance of introgression inferred from the genealogical data is plausible insofar as it can be reconciled with current and ancestral species distributions.

Materials and methods

Source material and sequencing

The San Diego Zoo and the Washington National Primate Research Center provided biomaterials to the Human Genome Sequencing Center (HGSC), Baylor College of Medicine under an agreement that granted permission to the HGSC to use the biomaterials for academic scientific research. This is the standard agreement between these institutions which regularly provide this service and the academic community that uses their biomaterials for various types of analyses. The HGSC does not pay for biomaterials but does cover the costs of shipping the biomaterials from the provider Baylor.

For the sequencing of the C. angolensis palliatus genome, paired-end (100 bp) libraries were prepared using DNA extracted from heart tissue (isolate OR3802 from the San Diego Zoo). Sequencing was performed using 9 Illumina Hi-seq 2000 lanes and 4 Illumina Hi-seq 2500 lanes with subsequent assembly carried out using ALLPATHS-LG software (v. 48744) [123]. Additional scaffolding and gap-filling was performed using Atlas-Link v. 1.1 (https://www.hgsc.bcm.edu/software/atlas-link) and Atlas-GapFill v. 2.2. (https://www.hgsc.bcm.edu/software/atlas-gapfill), respectively. Annotation for all 3 species was carried out using the NCBI Eukaryotic Genome Annotation Pipeline. A complete description of the pipeline can be viewed at https://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/.

For the sequencing of the M. nemestrina genome, DNA was extracted from a blood sample (isolate M95218 from the Washington National Primate Research Center). Paired-end libraries were prepared and sequenced on 20 Illumina Hi-Seq 2000 lanes with the initial assembly performed using ALLPATHS-LG as above. Scaffolding was conducted using Atlas-Link v. 1.1. Additional gap-filling was performed using the original Illumina reads and Atlas-GapFill v. 2.2, as well as long reads generated using the Pacific Biosciences RS (60 SMRT cells) and RSII (50 SMRT cells) platforms. The PacBio reads were mapped to scaffolds to fill remaining gaps in the assembly using PBJelly2 (v. 14.9.9) [124].

For the sequencing of the M. leucophaeus genome, DNA was extracted from heart tissue (isolate KB7577 from the San Diego Zoo). Paired-end libraries were prepared and sequenced on 9 Illumina Hi-Seq 2000 lanes with the initial assembly performed using ALLPATHS-LG as above. Additional scaffolding was completed using Atlas-Link v. 1.1, and additional gap-filling in scaffolds was performed using the original Illumina reads and Atlas-GapFill v. 2.2.

Phylogenomic analyses

The full set of protein-coding genes for 26 primates and 3 non-primates were obtained by combining our newly sequenced genomes with already published data (see S1 Table for references and accessions and Table 1 and S2 Table for genome statistics). Ortholog clustering was performed by first executing an all-by-all BLASTP search [125,126] using the longest isoform of each protein-coding gene from each species. The resulting BLASTP output was clustered using the mcl algorithm [127] as implemented in FastOrtho [128] with various inflation parameters (the maximum number of clusters was obtained with inflation = 5). Orthogroups were then parsed to retain those genes present as a single copy in all 29 taxa (1,180 genes), 28 of 29 taxa (1,558 genes), and 27 of 29 taxa (1,735 genes). We chose to allow up to 2 missing species per alignment to maximize the data used in our phylogenomic reconstructions while maintaining high taxon occupancy in each alignment.

CDS for each single-copy orthogroup were aligned, cleaned, and trimmed via a multistep process: First, sequences in each orthogroup were aligned by codon using GUIDANCE2 [129] in conjunction with MAFFT v7.407 [130] with 60 bootstrap replicates. GUIDANCE2 uses multiple bootstrapped alignments to generate quality scores for each column in the final alignment as well as for each taxon sequence in each alignment. Sequence residues in the resulting MAFFT alignment with GUIDANCE scores <0.93 were converted to gaps, and sites with >50% gaps were removed using Trimal v1.4.rev22 [131]. Alignments shorter than 200 bp (full dataset) or 300 bp (4-taxon tests for introgression), and alignments that were invariant or contained no parsimony informative characters, were removed from further analyses. Alignments with high numbers of discordant sites were further inspected for errors and removed from the analysis when warranted. This resulted in 1,730 loci for the full analysis (see S8 Table for gene counts used in 4-taxon tests).

IQ-TREE v2-rc1 was used with all 1,730 aligned loci to estimate a maximum likelihood concatenated tree with an edge-linked, proportional-partition model, and 1,000 ultrafast bootstrap replicates [132,133]. This strategy uses ModelFinder [134] to automatically find the best-fit model for each ortholog alignment (partition). Branch lengths are shared between partitions, with each partition having its own rate that rescales branch lengths, accommodating different evolutionary rates between partitions. The full IQ-TREE command line used was “iqtree -p Directory_of_Gene_Alignments--prefix -m MFP -c 8 -B 1000”. Maximum likelihood gene trees were estimated for each alignment with nucleotide substitution models selected using ModelFinder [134] as implemented in IQ-TREE. The full IQ-TREE command line used was “iqtree -s Directory_of_Gene_Alignments--prefix -m MFP -c 8”. We used the resulting maximum likelihood gene trees to estimate a species tree using ASTRAL III (ML-ASTRAL) [39]. Parsimony gene trees were generated using MPboot [76] and used to estimate a species tree using ASTRAL III (MP-ASTRAL), while PAUP* [73] was used to estimate the concatenated parsimony tree (MP-CONCAT) with 500 bootstrap replicates. IQ-TREE was used to calculate both gCFs and sCFs, with sCFs estimated from 300 randomly sampled quartets using the command line “iqtree--cf-verbose--gcf 1730_GENETREE.treefile -t Species_tree_file--df-tree--scf 300 -p Directory_of_Gene_Alignments -c 4”.

Effects of selection on gene tree distributions

We performed 100 replicate simulations for each mutation rate condition using SLiM version 3.3.1 [82], with tree sequence recording turned on and no neutral mutations. Each replicate simulation consisted of 50 non-recombining loci of 1 kb each, with free recombination between loci, for 3 populations with the phylogenetic relationship ((p2,p3),p1). These simulations closely match the population genetic parameters under the most extreme asymmetry condition reported in He and colleagues [77], with population sizes, selection coefficients, and mutation rates rescaled 2 orders of magnitude for performance (SLiM recipes are available via Data Dryad: https://doi.org/10.5061/dryad.rfj6q577d [22]). These parameters include a per-locus deleterious mutation rate of 3 × 10−7 per generation; a population-scaled selection coefficient (Ns) of −7.5; an internal branch subtending p2 and p3 of 0.01N generations (where N is the population size of p1 and p2); a population size for p3 that is 0.04 times than that of p1 and p2; and tip branch lengths of 8N generations. In the higher mutation rate condition, the per-locus rate was increased to 3 × 10−5 per generation. We randomly sampled 1 chromosome from each population at each locus at the end of the simulation and obtained the genealogy of these samples recorded in the tree sequence at the locus.

Introgression analyses

For each internal branch of the primate tree where the proportion of discordant trees was >5% of the total, concordance factors were used to calculate the test statistic Δ, where

Δ=NumberofDF1treesNumberofDF2treesNumberofDF1trees+NumberofDF2trees,

where DF1 trees represent the most frequent discordant topology, and DF2 trees are the second most frequent discordant topology. This is a normalized version of the statistic proposed by Huson and colleagues [36], which only included the numerator of this expression. Note also that, by definition, Δ here is always equal to or greater than 0. To test whether deviations from zero were significant (i.e., Δ > 0), we calculated Δ for 2,000 pseudo-replicate datasets generated by resampling gene trees with replacement. The resulting distribution was used to calculate Z-scores and the resulting P values for the observed Δ value associated with each branch tested [135]. Of the 17 internal branches where >5% of topologies were discordant, 7 were significant at P < 0.05, and selected for more extensive testing. For each of the 7 significant branches in the all-Primates tree, 4 taxa were selected that included the target branch as an internal branch. Single-copy genes present in each taxon were aligned as previously described. Alignments with no variant or parsimony informative sites were removed from the analysis, and gene trees were estimated using maximum likelihood in IQ-TREE 2. The test statistic, Δ, was calculated, and significance was again determined using 2,000 bootstrap replicates with the P value threshold for significance corrected for multiple comparisons (m = 17) using the Dunn–Šidák correction [136,137].

Molecular dating

Molecular dating analyses were performed on 10 datasets consisting of 40 CDS alignments each sampled randomly without replacement from the 1,730 loci used to estimate the species tree. Gene alignments were concatenated into 10 supermatrices ranging from 36.7 kb to 42.7 kb in length (see S5 Table for the length of each alignment). Each dataset was then analyzed using PhyloBayes 3.3 [86] with sequences modeled using a site-specific substitution process with global exchange rates estimated from the data (CAT-GTR; [138]). Among-site rate variation was modeled using a discrete gamma distribution with 6 rate categories. A relaxed molecular clock [139] with 8, soft-bounded, fossil calibrations (see S6 Table) was used to estimate divergence times on the fixed species tree topology (Fig 1); the analyses were executed using the following command line: pb -x 1 15000 -d Alignment.phy -T Tree_file.tre -r outgroup_file.txt -cal 8_fossil.calib -sb -gtr -cat -bd -dgam 6 -ln -rp 90 90. Each dataset was analyzed for 15,000 generations, sampling every 10 generations, with 5,000 generations discarded as burn-in. Each dataset was analyzed twice to ensure convergence of the average age estimated for each node (Fig 3 shows the node age for both runs). To determine the effect of including a maximum constraint on the root of the Primates, we analyzed each dataset a third time with this constraint removed. Both the constrained and unconstrained node ages for major groups within the Primates are reported in Table 2.

Single-copy CDS gene alignments, gene trees, dating datasets, SLiM3 recipes, unaligned gene sequences, and PAUP commands can be accessed via the Data Dryad repository located at https://doi.org/10.5061/dryad.rfj6q577d [22].

Supporting information

S1 Fig. Concordance factors for the species tree in Fig 1 calculated using maximum likelihood gene trees and site patterns from the 200 longest single-copy loci alignments used in the 1,730-gene analysis.

In general, gCFs increase, while the sCFs remain the same, indicating that gene tree error is a likely source of some discordance. gCF, gene concordance factor; sCF, site concordance factor.

(PDF)

S2 Fig. Concordance factors calculated using 150 randomly chosen single-copy orthologs, with pika (Ochotona princeps) included as an additional outgroup to mouse.

(A) gCFs and sCFs for these 150 genes when pika is included. (B) gCFs and sCFs for these same genes when pika is not included. We observe slightly higher gCFs near the base of the tree with pika excluded (red boxes). Note that these species trees use unit-length branch lengths for readability of branch labels. gCF, gene concordance factor; sCF, site concordance factor.

(PDF)

S3 Fig. Gene and site concordance factors plotted as a function of node depth (in millions of years).

No correlation was found between gCFs and node depth, whereas a slightly negative correlation was found between sCFs and node depth. This relationship indicates that homoplasy may act to slightly reduce sCFs deeper in the tree. The data underlying mean node ages are provided in S1 Data. gCF, gene concordance factor; sCF, site concordance factor.

(PDF)

S4 Fig. Forward simulations using SLiM3 with the most extreme parameters used by He et al.

(2020): population size combination “F” with s = −7.5 × 10−6 and Δτ = 2,000. Our results show no significant difference in the distribution of gene tree topologies in the presence of negative selection (A). This result holds for simulations in which we increased the per-locus mutation rate by 2 orders of magnitude (B). SLiM3 recipes are available via Data Dryad at https://doi.org/10.5061/dryad.rfj6q577d [22]. Gene tree counts for both simulations, A and B, are available in S1 Data.

(PDF)

S5 Fig. Present-day species distributions for 4 African Papionini (Papio, Theropithecus, Mandrillus, and Cercocebus) and 3 Asian Macaca species included in the introgression analysis.

The ancestral Macaca distribution (gray shading) is inferred from Macaca fossil localities in Africa and Europe as reviewed in Roos et al. [106]. The ancestral Macaca distribution likely represents only a fraction of the species range from the late Miocene to the late Pleistocene in Africa and Europe. The contemporary distribution of the African Macaca sylvanus (bright green) is included for reference; the current distribution of Macaca nemestrina is completely contained within that of Macaca fascicularis. Fossil localities for Theropithecus species hypothesized to overlap contemporaneously with various ancestral Macaca are included. Citations for spatial data of extant species: M. nemestrina (Richardson et al., 2008), M. fascicularis (Ong and Richardson, 2008), M. sylvanus (Butynski et al., 2008), Macaca mulatta (Timmins et al., 2008), Theropithecus gelada (Gippoliti et al., 2019), Papio anubis (Kingdon et al., 2008), Cercocebus atys (Oates et al., 2016), and Mandrillus leucophaeus (Oates and Butynski, 2008). Base map was obtained from the public domain map database Natural Earth (http://www.naturalearthdata.com/downloads/).

(PDF)

S1 Table. Genomes analyzed in this study with the original NCBI release date, the publication for the reference used, and the accession number for the assembly.

When possible, the most recent version for each genome was used.

(DOCX)

S2 Table. All published genomes used in this study, including links to the assemblies and NCBI BioProjects.

Annotation information is included for each genome at the time of download.

(XLSX)

S3 Table. Orthogroup, protein name, human chromosome number, and coordinates for the single-copy human orthologs used in the 1,730 gene analysis.

Alignment files are named by orthogroup, allowing the use of this table to identify the protein in each alignment.

(XLSX)

S4 Table. Gaps/ambiguities by species and as a percentage of total alignment length.

* denotes species sequenced this study.

(DOCX)

S5 Table. Lengths for each 40-locus concatenated alignment used in the molecular dating analyses.

Each dataset was analyzed twice until node age estimates converged (15–25k steps) using a log-normal auto-correlated model [139]. Datasets are available via Data Dryad at https://doi.org/10.5061/dryad.rfj6q577d [22].

(DOCX)

S6 Table. Fossil calibrations employed in this study.

Node numbering corresponds to the numbering in Fig 3. Median underflow/overflow for each calibration was calculated from 20 independent runs performed on 10 datasets (2 runs per dataset).

(DOCX)

S7 Table. Mean node age for 20 independent PhyloBayes dating runs.

Node numbers correspond to the numbering in Fig 3. The 95% HPD intervals were calculated by averaging the minimum and maximum of the 95% HPD interval for each dating run. HPD, highest posterior density.

(DOCX)

S8 Table. Quartets used to test for significant Δ values for internal branches of the primate tree.

Branches tested correspond to the labeled branches in Fig 3. After correcting for multiple comparisons (Dunn–Šidák, P = 0.00301), 3 internal branches and 8 quartets were found to have significant Δ values, indicating a likely introgression event.

(DOCX)

S1 Data. The Excel workbook contains 5 different tabs.

Tab 1, Fig 3 Data: consists of the node age estimates for all 20 independent PhyloBayes dating analyses as well as the run used to determine the prior for each node; each estimate is plotted separately in Fig 3. Tab 2, Fig 3 Data for R: the same data as in tab 1, but formatted for analysis with the accompanying R script “plot_DATING.R” available via Data Dryad: https://doi.org/10.5061/dryad.rfj6q577d [22]. Tab 3, S3_Fig_Data: the data used to generate S3 Fig. The average node ages estimated in tab 1 are used here to plot age vs. concordance factors estimated for each node in IQ-TREE. Tab 4, S4_Fig_PanelA_Data: contains the tree counts that resulted from the SLiM3 simulation conditions pictured in S4A Fig. Tab 5, S4_Fig_PanelB_Data: contains the tree counts for the SLiM3 simulation conditions pictured in S4B Fig. SLiM3 recipes for both simulations are available via Data Dryad at https://doi.org/10.5061/dryad.rfj6q577d [22].

(XLSX)

Acknowledgments

We thank Yue Liu for assistance in assembling the genomes, and Fábio Mendes and Gregg Thomas for helpful advice.

Abbreviations

BUSCO

Benchmarking Universal Single-Copy Orthologs

CDS

coding sequences

gCF

gene concordance factor

HGCS

Human Genome Sequencing Center

ILS

incomplete lineage sorting

ML-CONCAT

maximum likelihood concatenated

my

million years

mya

million years ago

NWM

New World monkey

OWM

Old World monkey

PAUP*

Phylogenetic Analysis Using Parsimony*

sCF

site concordance factor

SRA

Short Read Archive

Data Availability

The relevant assembly accessions and associated references used in this study are provided in S1 Table. All raw data, assemblies, and annotation information used in theses analyses are available through each species’ NCBI BioProject link available in the relevant assembly accessions and associated references used in this study are provided in S1 Table. The Data Dryad repository associated with this study can be accessed through the following link: https://doi.org/10.5061/dryad.rfj6q577d [22]. The repository contains the following files and archives: • 1730_ALIGNMENT_CONCAT.paup.nex – Concatenated alignment with PAUP block of commands used to generate the parsimony concatenated tree. • 1730_Alignments_FINAL.tar.gz: 1,730 single-copy ortholog alignments. • 1730_ML_GENETREEs.treefile: Maximum likelihood gene trees estimated from from the 1,730 ortholog alignments. • ASTRAL_Tree_AVGdates.tre: The ASTRAL topology (Fig 1) with average dates from 10 independent datasets. • All_Dating_Datasets_DRYAD.tar.gz: The concatenated alignments used for dating analyses. • PARSIMONY_1730_Gene_Trees.tre: All 1,730 parsimony gene trees fro MPBoot. • Supp_fig4A_F_s6_b1_v2_1.slim-SLiM3 recipe for S4A Fig simulation • Supp_fig4B_F_s6_b1_v2_highmut_1.slim- SLiM3 recipe for S4B Fig simulation • All_1735_UNALIGNED_Seqs.tar.gz: All unaligned single-copy gene sequences. • plot_DATING.R: R script used for plotting Fig 3.

Funding Statement

Funding for this study was provided by grants from the National Science Foundation, grant numbers: DBI-1564611 and DEB-1936187 awarded to M.W.H. Salary was provided to D.V. and M.W.H by grant number: DBI-1564611. Additional salary was provided to M.W.H by grant number: DEB-1936187. The authors received no specific funding for this work. Additional funding was provided by the Chan-Zuckerberg Initiative grant for Essential Open Source Software for Science (https://chanzuckerberg.com/eoss/) awarded to B.Q.M. and R.L. The authors received no specific funding for this work. Additional funding was provided by the Australian Research Council under grant number: DP-200103151 awarded to R.L., B.Q.M., and M.W.H. The authors received no specific funding for this work. Additional funding was provided by a Australian National University (https://www.anu.edu.au/) Futures grant awarded to R.L, which paid salary for B.Q.M. The sequencing and assembly of the colobus, pig-tailed macaque, and drill genomes was funded by National Institutes of Health grant number: U54-HG006484 awarded to R.G. Salaries from this award were received by R.G., J.R., K.W., S.M., D.H., and D.M. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2011;473:544–544. 10.1038/nature09991 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Fawcett GL, Raveendran M, Deiros DR, Chen D, Yu F, Harris RA, et al. Characterization of single-nucleotide variation in Indian-origin rhesus macaques (Macaca mulatta). BMC Genomics. 2011;12:311 10.1186/1471-2164-12-311 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Higashino A, Sakate R, Kameoka Y, Takahashi I, Hirata M, Tanuma R, et al. Whole-genome sequencing and analysis of the Malaysian cynomolgus macaque (Macaca fascicularis) genome. Genome Biol. 2012;13:R58 10.1186/gb-2012-13-7-r58 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kuhlwilm M, Han S, Sousa VC, Excoffier L, Marques-Bonet T. Ancient admixture from an extinct ape lineage into bonobos. Nat Ecol Evol. 2019;3:957–965. 10.1038/s41559-019-0881-7 [DOI] [PubMed] [Google Scholar]
  • 5.Locke DP, Hillier LW, Warren WC, Worley KC, Nazareth LV, Muzny DM, et al. Comparative and demographic analysis of orang-utan genomes. Nature. 2011;469:529–533. 10.1038/nature09687 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.de Manuel M, Kuhlwilm M, Frandsen P, Sousa VC, Desai T, Prado-Martinez J, et al. Chimpanzee genomic diversity reveals ancient admixture with bonobos. Science. 2016;354:477–481. 10.1126/science.aag2602 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Prado-Martinez J, Sudmant PH, Kidd JM, Li H, Kelley JL, Lorente-Galdos B, et al. Great ape genetic diversity and population history. Nature. 2013;499:471–475. 10.1038/nature12228 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Prüfer K, Munch K, Hellmann I, Akagi K, Miller JR, Walenz B, et al. The bonobo genome compared with the chimpanzee and human genomes. Nature. 2012;486:527–531. 10.1038/nature11128 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rogers J, Raveendran M, Harris RA, Mailund T, Leppälä K, Athanasiadis G, et al. The comparative genomics and complex population history of Papio baboons. Sci Adv. 2019;5:eaau6947 10.1126/sciadv.aau6947 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Svardal H, Jasinska AJ, Apetrei C, Coppola G, Huang Y, Schmitt CA, et al. Ancient hybridization and strong adaptation to viruses across African vervet monkey populations. Nat Genet. 2017;49:1705–1713. 10.1038/ng.3980 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhou X, Wang B, Pan Q, Zhang J, Kumar S, Sun X, et al. Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history. Nat Genet. 2014;46:1303–1310. 10.1038/ng.3137 [DOI] [PubMed] [Google Scholar]
  • 12.Perelman P, Johnson WE, Roos C, Seuánez HN, Horvath JE, Moreira MAM, et al. A molecular phylogeny of living Primates. PLoS Genet. 2011;7:e1001342 10.1371/journal.pgen.1001342 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Springer MS, Meredith RW, Gatesy J, Emerling CA, Park J, Rabosky DL, et al. Macroevolutionary dynamics and historical biogeography of primate diversification inferred from a species supermatrix. PLoS ONE. 2012;7:e49521 10.1371/journal.pone.0049521 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wang X, Lim BK, Ting N, Hu J, Liang Y, Roos C, et al. Reconstructing the phylogeny of new world monkeys (Platyrrhini): evidence from multiple non-coding loci. Curr Zool. 2019;65:579–588. 10.1093/cz/zoy072 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Silvestro D, Tejedor MF, Serrano-Serrano ML, Loiseau O, Rossier V, Rolland J, et al. Early arrival and climatically-linked geographic expansion of New World monkeys from tiny African ancestors. Syst Biol. 2018;68:78–92. 10.1093/sysbio/syy046 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Jameson Kiesling NM, Yi SV, Xu K, Gianluca Sperone F, Wildman DE. The tempo and mode of New World monkey evolution and biogeography in the context of phylogenomic analysis. Mol Phylogenet Evol. 2015;82(Pt B):386–399. 10.1016/j.ympev.2014.03.027 [DOI] [PubMed] [Google Scholar]
  • 17.Perez SI, Tejedor MF, Novo NM, Aristide L. Divergence times and the evolutionary radiation of New World monkeys (Platyrrhini, Primates): An analysis of fossil and molecular data. PLoS ONE. 2013;8:e68029 10.1371/journal.pone.0068029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Schrago CG, Seuánez HN. Large ancestral effective population size explains the difficult phylogenetic placement of owl monkeys. Am J Primatol. 2019;55:e22955 10.1002/ajp.22955 [DOI] [PubMed] [Google Scholar]
  • 19.Degnan JH, Rosenberg NA. Discordance of species trees with their most likely gene trees. PLoS Genet. 2006;2:e68 10.1371/journal.pgen.0020068 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Huang H, Knowles LL. What is the danger of the anomaly zone for empirical phylogenetics? Syst Biol. 2009;58:527–536. 10.1093/sysbio/syp047 [DOI] [PubMed] [Google Scholar]
  • 21.Mendes FK, Hahn MW. Why concatenation fails near the anomaly zone. Syst Biol. 2018;67:158–169. 10.1093/sysbio/syx063 [DOI] [PubMed] [Google Scholar]
  • 22.Vanderpool D. Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression. Dryad Digital Repository [Internet]. 2020. 10.5061/dryad.rfj6q577d [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Mallet J. Hybridization as an invasion of the genome. Trends Ecol Evol. 2005;20:229–237. 10.1016/j.tree.2005.02.010 [DOI] [PubMed] [Google Scholar]
  • 24.Mallet J, Besansky N, Hahn MW. How reticulated are species? BioEssays. 2016;38:140–149. 10.1002/bies.201500149 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, et al. A draft sequence of the Neandertal genome. Science. 2010;328:710–722. 10.1126/science.1188021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Lima MGM, de Sousa E Silva J, Černý D, Buckner JC, Aleixo A, Chang J, et al. A phylogenomic perspective on the robust capuchin monkey (Sapajus) radiation: First evidence for extensive population admixture across South America. Mol Phylogenet Evol. 2018;124:137–150. 10.1016/j.ympev.2018.02.023 [DOI] [PubMed] [Google Scholar]
  • 27.Wall JD, Schlebusch SA, Alberts SC, Cox LA, Snyder-Mackler N, Nevonen KA, et al. Genomewide ancestry and divergence patterns from low-coverage sequencing data reveal a complex history of admixture in wild baboons. Mol Ecol. 2016;25:3469–3483. 10.1111/mec.13684 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Huerta-Sanchez E, Jin X, Asan, Bianba Z, Peter BM, Vinckenbosch N, et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature. 2014;512:194–197. 10.1038/nature13408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Racimo F, Sankararaman S, Nielsen R, Huerta-Sanchez E. Evidence for archaic adaptive introgression in humans. Nat Rev Genet. 2015;16:359–371. 10.1038/nrg3936 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Racimo F, Gokhman D, Fumagalli M, Ko A, Hansen T, Moltke I, et al. Archaic adaptive introgression in TBX15/WARS2. Mol Biol Evol. 2017;34:509–524. 10.1093/molbev/msw283 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Chatterjee HJ, Ho SYW, Barnes I, Groves C. Estimating the phylogeny and divergence times of primates using a supermatrix approach. BMC Evol Biol. 2009;9:259 10.1186/1471-2148-9-259 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Herrera JP, Dávalos LM. Phylogeny and divergence times of lemurs inferred with recent and ancient fossils in the tree. Syst Biol. 2016;65:772–791. 10.1093/sysbio/syw035 [DOI] [PubMed] [Google Scholar]
  • 33.Kistler L, Ratan A, Godfrey LR, Crowley BE, Hughes CE, Lei R, et al. Comparative and population mitogenomic analyses of Madagascar’s extinct, giant ‘subfossil’ lemurs. J Hum Evol. 2015;79:45–54. 10.1016/j.jhevol.2014.06.016 [DOI] [PubMed] [Google Scholar]
  • 34.Tung J, Barreiro LB. The contribution of admixture to primate evolution. Curr Opin Genet Dev. 2017;47:61–68. 10.1016/j.gde.2017.08.010 [DOI] [PubMed] [Google Scholar]
  • 35.Stevens NJ, Seiffert ER, O’Connor PM, Roberts EM, Schmitz MD, Krause C, et al. Palaeontological evidence for an Oligocene divergence between Old World monkeys and apes. Nature. 2013;497:611–614. 10.1038/nature12161 [DOI] [PubMed] [Google Scholar]
  • 36.Huson DH, Klöpper T, Lockhart PJ, Steel MA. Reconstruction of reticulate networks from gene trees Proceedings of RECOMB 2005: The 9th Annual International Conference Research in Computational Molecular Biology. Berlin: Springer; 2005. pp. 233–249. 10.1007/11415770_18 [DOI] [Google Scholar]
  • 37.Waterhouse RM, Seppey M, Simão FA, Manni M, Ioannidis P, Klioutchnikov G, et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol. 2018;35:543–548. 10.1093/molbev/msx319 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020. 10.1093/molbev/msaa015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Zhang C, Rabiee M, Sayyari E, Mirarab S. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics. 2018;19:153 10.1186/s12859-018-2129-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Sayyari E, Mirarab S. Fast coalescent-based computation of local branch support from quartet frequencies. Mol Biol Evol. 2016;33:1654–1668. 10.1093/molbev/msw079 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Adkins RM, Honeycutt RL. Molecular phylogeny of the superorder Archonta. Proc Natl Acad Sci U S A. 1991;88:10317–10321. 10.1073/pnas.88.22.10317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Arnason U, Adegoke JA, Bodin K, Born EW, Esa YB, Gullberg A, et al. Mammalian mitogenomic relationships and the root of the eutherian tree. Proc Natl Acad Sci U S A. 2002;99:8151–8156. 10.1073/pnas.102164299 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Bloch JI, Boyer DM. Grasping Primate origins. Science. 2002;298:1606–1610. 10.1126/science.1078249 [DOI] [PubMed] [Google Scholar]
  • 44.Madsen O, Scally M, Douady CJ, Kao DJ, DeBry RW, Adkins R, et al. Parallel adaptive radiations in two major clades of placental mammals. Nature. 2001;409:610–614. 10.1038/35054544 [DOI] [PubMed] [Google Scholar]
  • 45.Meredith RW, Janecka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC, et al. Impacts of the Cretaceous terrestrial revolution and KPg extinction on mammal diversification. Science. 2011;334:521–524. 10.1126/science.1211028 [DOI] [PubMed] [Google Scholar]
  • 46.Murphy WJ, Eizirik E, Johnson WE, Zhang YP, Ryder OA, O’Brien SJ. Molecular phylogenetics and the origins of placental mammals. Nature. 2001;409:614–618. 10.1038/35054550 [DOI] [PubMed] [Google Scholar]
  • 47.Murphy WJ, Eizirik E, O’Brien SJ, Madsen O, Scally M, Douady CJ, et al. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001;294:2348–2351. 10.1126/science.1067179 [DOI] [PubMed] [Google Scholar]
  • 48.Novacek MJ. Mammalian phylogeny: shaking the tree. Nature. 1992;356:121–125. 10.1038/356121a0 [DOI] [PubMed] [Google Scholar]
  • 49.O’Leary MA, Bloch JI, Flynn JJ, Gaudin TJ, Giallombardo A, Giannini NP, et al. The placental mammal ancestor and the post-K-Pg radiation of placentals. Science. 2013;339:662–667. 10.1126/science.1229237 [DOI] [PubMed] [Google Scholar]
  • 50.Poux C, Douzery EJP. Primate phylogeny, evolutionary rate variations, and divergence times: a contribution from the nuclear gene IRBP. Am J Phys Anthropol. 2004;124:1–16. 10.1002/ajpa.10322 [DOI] [PubMed] [Google Scholar]
  • 51.Janečka JE, Miller W, Pringle TH, Wiens F, Zitzmann A, Helgen KM, et al. Molecular and genomic data identify the closest living relative of primates. Science. 2007;318:792–794. 10.1126/science.1147555 [DOI] [PubMed] [Google Scholar]
  • 52.Mason VC, Li G, Minx P, Schmitz J, Churakov G, Doronina L, et al. Genomic analysis reveals hidden biodiversity within colugos, the sister group to primates. Sci Adv. 2016;2:e1600633 10.1126/sciadv.1600633 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Schmitz J, Ohme M, Suryobroto B, Zischler H. The colugo (Cynocephalus variegatus, Dermoptera): the primates’ gliding sister? Mol Biol Evol. 2002;19:2308–2312. 10.1093/oxfordjournals.molbev.a004054 [DOI] [PubMed] [Google Scholar]
  • 54.Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science. 2014;346:1320–1331. 10.1126/science.1253451 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Pease JB, Haak DC, Hahn MW, Moyle LC. Phylogenomics reveals three sources of adaptive variation during a rapid radiation. PLoS Biol. 2016;14:e1002379 10.1371/journal.pbio.1002379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Salichos L, Rokas A. Inferring ancient divergences requires genes with strong phylogenetic signals. Nature. 2013;497:327–331. 10.1038/nature12130 [DOI] [PubMed] [Google Scholar]
  • 57.Minh BQ, Hahn MW, Lanfear R. New methods to calculate concordance factors for phylogenomic datasets. bioRxiv. 2018;487801 10.1101/487801 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Yoder AD. The phylogenetic position of genus Tarsius: whose side are you on? In: Wright PC, Simons EL, Gursky S, editors. Tarsiers Past, Present, and Future. Rutgers University Press; 2003. [Google Scholar]
  • 59.Gregory WK. On the classification and phylogeny of the Lemuroidea. Bull Geol Soc Am. 1915:426–446. [Google Scholar]
  • 60.Pocock RI. On the external characters of the lemurs and of Tarsius. Proc Zool Soc Lond 1918;88: 19–53. 10.1111/j.1096-3642.1918.tb02076.x [DOI] [Google Scholar]
  • 61.Hartig G, Churakov G, Warren WC, Brosius J, Makalowski W, Schmitz J. Retrophylogenomics place tarsiers on the evolutionary branch of anthropoids. Sci Rep. 2013;3:1756 10.1038/srep01756 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Jameson NM, Hou Z-C, Sterner KN, Weckle A, Goodman M, Steiper ME, et al. Genomic data reject the hypothesis of a prosimian primate clade. J Hum Evol. 2011;61:295–305. 10.1016/j.jhevol.2011.04.004 [DOI] [PubMed] [Google Scholar]
  • 63.Hayasaka K, Gojobori T, Horai S. Molecular phylogeny and evolution of primate mitochondrial DNA. Mol Biol Evol. 1988;5:626–644. 10.1093/oxfordjournals.molbev.a040524 [DOI] [PubMed] [Google Scholar]
  • 64.Jaworski CJ. A reassessment of mammalian alpha A-crystallin sequences using DNA sequencing: implications for anthropoid affinities of tarsier. J Mol Evol. 1995;41:901–908. 10.1007/BF00173170 [DOI] [PubMed] [Google Scholar]
  • 65.Whitfield JB, Lockhart PJ. Deciphering ancient rapid radiations. Trends Ecol Evol. 2007;22:258–265. 10.1016/j.tree.2007.01.012 [DOI] [PubMed] [Google Scholar]
  • 66.Bond M, Tejedor MF, Campbell KE, Chornogubsky L, Novo N, Goin F. Eocene primates of South America and the African origins of New World monkeys. Nature. 2015;520:538–541. 10.1038/nature14120 [DOI] [PubMed] [Google Scholar]
  • 67.Kubatko LS, Degnan JH. Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst Biol. 2007;56:17–24. 10.1080/10635150601146041 [DOI] [PubMed] [Google Scholar]
  • 68.Roch S, Steel M. Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theor Popul Biol. 2015;100C:56–62. 10.1016/j.tpb.2014.12.005 [DOI] [PubMed] [Google Scholar]
  • 69.Warnow T. Concatenation analyses in the presence of incomplete lineage sorting. PLoS Curr. 2015;7 10.1371/currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Bryant D, Hahn MW. The concatenation question In: Scornavacca C, Delsuc F, Galtier N, editors. Phylogenetics in the genomic era. No commercial publisher | Authors open access book; 2020. p. 3.4:1–3.4:23. Available from: https://hal.archives-ouvertes.fr/hal-02535651. [Google Scholar]
  • 71.Sayyari E, Mirarab S. Testing for polytomies in phylogenetic species trees using quartet frequencies. Genes. 2018;9:132 10.3390/genes9030132 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Liu L, Edwards SV. Phylogenetic analysis in the anomaly zone. Syst Biol. 2009;58:452–460. 10.1093/sysbio/syp034 [DOI] [PubMed] [Google Scholar]
  • 73.Swofford DL. PAUP*: Phylogenetic Analysis Using Parsimony (*and other methods), Version 4. Sinauer Assoc Sunderland Mass. 2002. [Google Scholar]
  • 74.Mendes FK, Livera AP, Hahn MW. The perils of intralocus recombination for inferences of molecular convergence. Philos Trans R Soc Lond Ser B Biol Sci. 2019;374:20180244 10.1098/rstb.2018.0244 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Springer MS, Gatesy J. The gene tree delusion. Mol Phylogenet Evol. 2016;94:1–33. 10.1016/j.ympev.2015.07.018 [DOI] [PubMed] [Google Scholar]
  • 76.Hoang DT, Vinh LS, Flouri T, Stamatakis A, von Haeseler A, Minh BQ. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol Biol. 2018;18:11–11. 10.1186/s12862-018-1131-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.He C, Liang D, Zhang P. Asymmetric distribution of gene trees can arise under purifying selection If differences in population size exist. Mol Biol Evol. 2020;37:881–892. 10.1093/molbev/msz232 [DOI] [PubMed] [Google Scholar]
  • 78.Golding GB. The effect of purifying selection on genealogies In: Donnelly P, Tavaré S, editors. Progress in population genetics and human evolution. New York: Springer Verlag; 1997. pp. 271–285. [Google Scholar]
  • 79.Przeworski M, Charlesworth B, Wall JD. Genealogies and weak purifying selection. Mol Biol Evol. 1999;16:246–252. 10.1093/oxfordjournals.molbev.a026106 [DOI] [PubMed] [Google Scholar]
  • 80.Slade PF. Most recent common ancestor probability distributions in gene genealogies under selection. Theor Popul Biol. 2000;58:291–305. 10.1006/tpbi.2000.1488 [DOI] [PubMed] [Google Scholar]
  • 81.Williamson S, Orive ME. The genealogy of a sequence subject to purifying selection at multiple sites. Mol Biol Evol. 2002;19:1376–1384. 10.1093/oxfordjournals.molbev.a004199 [DOI] [PubMed] [Google Scholar]
  • 82.Haller BC, Messer PW. SLiM 3: Forward genetic simulations beyond the Wright–Fisher model. Mol Biol Evol. 2019;36:632–637. 10.1093/molbev/msy228 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Mendes FK, Hahn MW. Gene tree discordance causes apparent substitution rate variation. Syst Biol. 2016;65:711–721. 10.1093/sysbio/syw018 [DOI] [PubMed] [Google Scholar]
  • 84.Brunet M, Guy F, Pilbeam D, Mackaye HT, Likius A, Ahounta D, et al. A new hominid from the Upper Miocene of Chad. Nature. 2002;418:145–151. 10.1038/nature00879 [DOI] [PubMed] [Google Scholar]
  • 85.Sigé B, Jaeger J-J, Sudre J, Vianey-Liaud M. Altiatlasius koulchii n. gen. et sp., primate omomyidé du Paléocène supérieur du Maroc, et les origines des euprimates. Palaeontogr Abt A. 1990:31–56. [Google Scholar]
  • 86.Lartillot N, Lepage T, Blanquart S. PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics. 2009;25:2286–2288. 10.1093/bioinformatics/btp368 [DOI] [PubMed] [Google Scholar]
  • 87.Wilkinson RD, Steiper ME, Soligo C, Martin RD, Yang Z, Tavaré S. Dating primate divergences through an integrated analysis of palaeontological and molecular data. Syst Biol. 2011;60:16–31. 10.1093/sysbio/syq054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Goodman M, Porter CA, Czelusniak J, Page SL, Schneider H, Shoshani J, et al. Toward a phylogenetic classification of Primates based on DNA evidence complemented by fossil evidence. Mol Phylogenet Evol. 1998;9:585–598. 10.1006/mpev.1998.0495 [DOI] [PubMed] [Google Scholar]
  • 89.Yoder AD, Yang Z. Divergence dates for Malagasy lemurs estimated from multiple gene loci: geological and evolutionary context. Mol Ecol. 2004;13:757–773. 10.1046/j.1365-294x.2004.02106.x [DOI] [PubMed] [Google Scholar]
  • 90.Benton MJ, Donoghue PCJ, Asher RJ, Friedman M, Near TJ, Vinther J. Constraints on the timescale of animal evolutionary history. Palaeontol Electron. 2015;181.1FC:1–106. 10.26879/424 [DOI] [Google Scholar]
  • 91.Edwards SV, Beerli P. Perspective: gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies. Evolution. 2000;54:1839–1854. 10.1111/j.0014-3820.2000.tb01231.x [DOI] [PubMed] [Google Scholar]
  • 92.Rogers J. Levels of the genealogical hierarchy and the problem of hominoid phylogeny. Am J Phys Anthropol. 1994;94:81–88. 10.1002/ajpa.1330940107 [DOI] [PubMed] [Google Scholar]
  • 93.Carbone L, Harris RA, Gnerre S, Veeramah KR, Lorente-Galdos B, Huddleston J, et al. Gibbon genome and the fast karyotype evolution of small apes. Nature. 2014;513:195–201. 10.1038/nature13679 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Veeramah KR, Woerner AE, Johnstone L, Gut I, Gut M, Marques-Bonet T, et al. Examining phylogenetic relationships among gibbon genera using whole genome sequence data using an approximate bayesian computation approach. Genetics. 2015;200:295–308. 10.1534/genetics.115.174425 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Hamada Y, San AM, Malaivijitnond S. Assessment of the hybridization between rhesus (Macaca mulatta) and long-tailed macaques (M. fascicularis) based on morphological characters. Am J Phys Anthropol. 2016;159:189–198. 10.1002/ajpa.22862 [DOI] [PubMed] [Google Scholar]
  • 96.Osada N, Uno Y, Mineta K, Kameoka Y, Takahashi I, Terao K. Ancient genome-wide admixture extends beyond the current hybrid zone between Macaca fascicularis and M. mulatta. Mol Ecol. 2010;19:2884–2895. 10.1111/j.1365-294X.2010.04687.x [DOI] [PubMed] [Google Scholar]
  • 97.Hahn MW. Molecular population genetics 1st ed Oxford, New York: Oxford University Press; 2018. [Google Scholar]
  • 98.Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al. Ancient admixture in human history. Genetics. 2012;192:1065–1093. 10.1534/genetics.112.145037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Fan Z, Zhao G, Li P, Osada N, Xing J, Yi Y, et al. Whole-genome sequencing of Tibetan macaque (Macaca thibetana) provides new insight into the macaque evolutionary history. Mol Biol Evol. 2014;31:1475–1489. 10.1093/molbev/msu104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Yan G, Zhang G, Fang X, Zhang Y, Li C, Ling F, et al. Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nat Biotechnol. 2011;29:1019–1023. 10.1038/nbt.1992 [DOI] [PubMed] [Google Scholar]
  • 101.Koufos GD. Potential hominoid ancestors for Hominidae In: Henke W, Tattersall I, editors. Handbook of Paleoanthropology. Berlin, Heidelberg: Springer; 2007. pp. 1761–1790. 10.1007/978-3-642-39979-4_44 [DOI] [Google Scholar]
  • 102.Patterson N, Richter DJ, Gnerre S, Lander ES, Reich D. Genetic evidence for complex speciation of humans and chimpanzees. Nature. 2006;441:1103–1108. 10.1038/nature04789 [DOI] [PubMed] [Google Scholar]
  • 103.Fleagle JG. Apes and humans Primate Adaptation & Evolution. Elsevier; 2013. pp. 151–168. [DOI] [Google Scholar]
  • 104.Than C, Ruths D, Nakhleh L. PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics. 2008;9:322 10.1186/1471-2105-9-322 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Solís-Lemus C, Ané C. Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genet. 2016;12:e1005896 10.1371/journal.pgen.1005896 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Roos C, Kothe M, Alba DM, Delson E, Zinner D. The radiation of macaques out of Africa: Evidence from mitogenome divergence times and the fossil record. J Hum Evol. 2019;133:114–132. 10.1016/j.jhevol.2019.05.017 [DOI] [PubMed] [Google Scholar]
  • 107.Belmaker M. The presence of a large cercopithecine (cf. Theropithecus sp.) in the ‘Ubeidiya formation (Early Pleistocene, Israel). J Hum Evol. 2010;58:79–89. 10.1016/j.jhevol.2009.08.004 [DOI] [PubMed] [Google Scholar]
  • 108.Hughes JK, Elton S, O’Regan HJ. Theropithecus and “Out of Africa” dispersal in the Plio-Pleistocene. J Hum Evol. 2008;54:43–77. 10.1016/j.jhevol.2007.06.004 [DOI] [PubMed] [Google Scholar]
  • 109.Larrasoaña JC, Roberts AP, Rohling EJ, Winklhofer M, Wehausen R. Three million years of monsoon variability over the northern Sahara. Clim Dyn. 2003;21:689–698. 10.1007/s00382-003-0355-z [DOI] [Google Scholar]
  • 110.Larrasoaña JC, Roberts AP, Rohling EJ. Dynamics of green Sahara periods and their role in hominin evolution. PLoS ONE. 2013;8:e76514 10.1371/journal.pone.0076514 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Vaks A, Woodhead J, Bar-Matthews M, Ayalon A, Cliff RA, Zilberman T, et al. Pliocene–Pleistocene climate of the northern margin of Saharan–Arabian Desert recorded in speleothems from the Negev Desert, Israel. Earth Planet Sci Lett. 2013;368:88–100. 10.1016/j.epsl.2013.02.027 [DOI] [Google Scholar]
  • 112.Coulthard TJ, Ramirez JA, Barton N, Rogerson M, Brücher T. Were rivers flowing across the Sahara during the last interglacial? Implications for human migration through Africa. PLoS ONE. 2013;8:e74834 10.1371/journal.pone.0074834 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Sahnouni M, Parés JM, Duval M, Cáceres I, Harichane Z, van der Made J, et al. 1.9-million- and 2.4-million-year-old artifacts and stone tool-cutmarked bones from Ain Boucherit, Algeria. Science. 2018;362(6420):1297–1301. 10.1126/science.aau0008 [DOI] [PubMed] [Google Scholar]
  • 114.de Menocal PB. African climate change and faunal evolution during the Pliocene–Pleistocene. Earth Planet Sci Lett. 2004;220:3–24. 10.1016/S0012-821X(04)00003-2 [DOI] [Google Scholar]
  • 115.Slatkin M, Pollack JL. Subdivision in an ancestral species creates asymmetry in gene trees. Mol Biol Evol. 2008;25:2241–2246. 10.1093/molbev/msn172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Kuritzin A, Kischka T, Schmitz J, Churakov G. Incomplete lineage sorting and hybridization statistics for large-scale retroposon insertion data. PLoS Comput Biol. 2016;12:e1004812 10.1371/journal.pcbi.1004812 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Springer MS, Molloy EK, Sloan DB, Simmons MP, Gatesy J. ILS-aware analysis of low-homoplasy retroelement insertions: Inference of species trees and introgression using quartets. J Hered. 2020;111:147–168. 10.1093/jhered/esz076 [DOI] [PubMed] [Google Scholar]
  • 118.Gernhard T. New analytic results for speciation times in neutral models. Bull Math Biol. 2008;70:1082–1097. 10.1007/s11538-007-9291-0 [DOI] [PubMed] [Google Scholar]
  • 119.Gligor M, Ganzhorn JU, Rakotondravony D, Ramilijaona OR, Razafimahatratra E, Zischler H, et al. Hybridization between mouse lemurs in an ecological transition zone in southern Madagascar. Mol Ecol. 2009;18:520–533. 10.1111/j.1365-294X.2008.04040.x [DOI] [PubMed] [Google Scholar]
  • 120.Pastorini J, Zaramody A, Curtis DJ, Nievergelt CM, Mundy NI. Genetic analysis of hybridization and introgression between wild mongoose and brown lemurs. BMC Evol Biol. 2009;9:32 10.1186/1471-2148-9-32 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Williams RC, Blanco MB, Poelstra JW, Hunnicutt KE, Comeault AA, Yoder AD. Conservation genomic analysis reveals ancient introgression and declining levels of genetic diversity in Madagascar’s hibernating dwarf lemurs. Heredity. 2020;124:236–251. 10.1038/s41437-019-0260-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Wyner YM, Johnson SE, Stumpf RM, Desalle R. Genetic assessment of a white-collared×red-fronted lemur hybrid zone at Andringitra, Madagascar. Am J Primatol. 2002;57:51–66. 10.1002/ajp.10033 [DOI] [PubMed] [Google Scholar]
  • 123.Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A. 2011;108:1513–1518. 10.1073/pnas.1017351108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.English AC, Richards S, Han Y, Wang M, Vee V, Qu J, et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS ONE. 2012;7:e47768 10.1371/journal.pone.0047768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
  • 126.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421 10.1186/1471-2105-10-421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.van Dongen S. Graph clustering by flow simulation. PhD thesis, University of Utrecht. 2000. Available from: http://www.library.uu.nl/digiarchief/dip/diss/1895620/full.pdf.
  • 128.Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res. 2014;42:D581–D591. 10.1093/nar/gkt1099 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Sela I, Ashkenazy H, Katoh K, Pupko T. GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res. 2015;43:W7–W14. 10.1093/nar/gkv318 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Katoh K, Standley DM. Mafft multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–780. 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Capella-Gutiérrez S, Silla-Martínez JM, Gabaldon T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. 10.1093/bioinformatics/btp348 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Chernomor O, von Haeseler A, Minh BQ. Terrace aware data structure for phylogenomic inference from supermatrices. Syst Biol. 2016;65:997–1008. 10.1093/sysbio/syw037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS. UFBoot2: improving the ultrafast bootstrap approximation. Mol Biol Evol. 2018;35:518–522. 10.1093/molbev/msx281 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14:587–589. 10.1038/nmeth.4285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Eaton DAR, Ree RH. Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae). Syst Biol. 2013;62:689–706. 10.1093/sysbio/syt032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.Dunn OJ. Confidence intervals for the means of dependent, normally distributed variables. J Am Stat Assoc. 1959;54:613–621. 10.2307/2282541 [DOI] [Google Scholar]
  • 137.Sidak Z. Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc. 1967;62:626–633. 10.2307/2283989 [DOI] [Google Scholar]
  • 138.Lartillot N, Philippe H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol. 2004;21:1095–1109. 10.1093/molbev/msh112 [DOI] [PubMed] [Google Scholar]
  • 139.Thorne JL, Kishino H, Painter IS. Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol. 1998;15:1647–1657. 10.1093/oxfordjournals.molbev.a025892 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Roland G Roberts

22 Apr 2020

Dear Dr Vanderpool,

Thank you for submitting your manuscript entitled "Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression" for consideration as a Research Article by PLOS Biology.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I'm writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Please re-submit your manuscript within two working days, i.e. by Apr 24 2020 11:59PM.

Login to Editorial Manager here: https://www.editorialmanager.com/pbiology

During resubmission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF when you re-submit.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed all checks it will be sent out for review.

Given the disruptions resulting from the ongoing COVID-19 pandemic, please expect delays in the editorial process. We apologise in advance for any inconvenience caused and will do our best to minimize impact as far as possible.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Roli Roberts

Roland G Roberts, PhD,

Senior Editor

PLOS Biology

Decision Letter 1

Roland G Roberts

6 Jun 2020

Dear Dr Vanderpool,

Thank you very much for submitting your manuscript "Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression" for consideration as a Research Article at PLOS Biology. Your manuscript has been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by three independent reviewers. We also recruited a fourth, but they have been unable to submit in timely fashion.

You'll see that all three reviewers were very positive about your study, but each requests a number of textual and presentational changes, and in some cases some additional analyses, which should be addressed before further consideration.

In light of the reviews (below), we will not be able to accept the current version of the manuscript, but we would welcome re-submission of a much-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent for further evaluation by the reviewers.

We expect to receive your revised manuscript within 2 months.

Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension. At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may end consideration of the manuscript at PLOS Biology.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point by point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Related" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosbiology/s/submission-guidelines#loc-materials-and-methods

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roli Roberts

Roland G Roberts, PhD,

Senior Editor

PLOS Biology

*****************************************************

REVIEWERS' COMMENTS:

Reviewer #1:

[identifies himself as Matt Pennell]

This is a really exceptional contribution to phylogenetics. The analyses were well-thought and thoroughly done -- I feel very confident in your conclusions. And the writing is clear throughout; I appreciated the mix of methodological details with natural history/biological context. I anticipate that I will add this to the reading list for my phylogenetics class next year as a superb example of the power of modern data and approaches to phylogenomics.

I have only a few minor points that I think might be worth addressing before publication.

Line 150: I think this is slightly confusing since the underlying gene trees used in the ASTRAL analysis were also estimated with IQTREE2. The way it is written it makes it seem that IQTREE2 was used only for the concatenated analysis.

Line 163: I will admit my ignorance and say that I didn't know what the phrase "maximal local posteriors" meant and had to look it up. Given that this is a relatively new/uncommon term, I think it might be worth devoting an extra clause or sentence to explaining.

Line 167-168: I would really appreciate a few additional sentences giving an outline of this controversy. As it stands now, it is really hard to know what all the fuss is about. In a later section, you do a really great job discussing the controversies over the placement of tarsiers and I think something similar would be very useful here.

Line 237-238: I realize that this is a throwaway line/citation but I think the evidence for the association between primate diversification and global temperature is pretty weak in my view. I guess it is fine to speculate but this is presented as a clear fact. Given how precise the rest of the paper is, this statement is uncharacteristically flippant.

Line 264: I understand what you are getting at here but I think it is worth spelling out the argument for using parsimony in this case; it is not really clear from the text alone why you think parsimony is going to be a more consistent estimator than ML here.

Line 294: Do you have any (more directa) evidence that intralocus recombination is an issue here? I think it may well be true but I think this is a stretch to infer from the increased concordance for the MP trees.

Line 310: I understand why you had to choose small subsets of the data for the dating analyses but I am wondering what the justification for selecting genes at *random*. Given all the excellent work you've done to determine the branching pattern, I am thinking that it might be worth leveraging this information to pick genes for the dating analyses to reduce the topological noise. Maybe this is a bad idea (and please tell me if it is) but if you wanted to estimate divergence times, why not select randomly from genes that matched the species topology??

Line 333: Could you please note in the MS why you didn't include most of the fossils used by Herrera et al. Were most of the ones you didn't use associated with tips rather than nodes.

Line 418: I do not think that the line about PhyloNet adds anything to the paper. Who knows what to make of the failure to converge?

Again, congratulations on an outstanding achievement.

Reviewer #2:

Vanderpool et al. analyze primate genomes (including three new ones) to asses phylogenetic relationships, incomplete lineage sorting (ILS), and introgression among divergent lineages. The paper is generally well done and well written, and the conclusions generally follow from the analyses done by the authors. My primary requirement for revision is to simply post all relevant tree and data files and not just some of these, an essential revision. The impact of the work will be high I think, and otherwise, the statistical analyses are high quality, and the supplementary information and figures are fine.

The authors also might consider the following in revising their manuscript:

1) line 85. The authors note that, " Compared to recent hybridization, introgression that occurred between two or more ancestral lineages (represented by internal branches on a phylogeny) is difficult to detect." Introgression involving completely extinct side branches in these trees might also be a problem, as is hybridization with extant lineages for which genomes have not been published. Is there any way to account for this type of introgression as well? Authors could comment on this point either way, as I fear that sometimes these types of hybridization events can get lost in the sauce of comparative genomic studies.

2) Table 1. What are the contig N50s for each assembly? The genome sequencing and assembly methods in the methods section could be fleshed out a bit more than the current terse text. It is not my specialty, but the description here seems a bit minimal, and it might be difficult for the reader to understand exactly how the genomes were assembled with long reads or without in one case, if I am reading things correctly?

3) line 188. The authors note regarding conflicts among gene trees at a node that, "...the concordance factors also indicate that a majority of individual topologies have histories that differ from the estimated species tree." But, most of this conflict, I would bet, is not due to different gene 'histories', and much of this conflict is surely due simply to gene tree reconstruction error. For example, at the node that groups the tarsier (Carlito) with Simiiformes, the authors report 60% of genes conflicting, but in analyses of retroposon insertions more than a 100 such rare genomic events support this node with absolutely zero conflict. Furthermore, simulations have shown for similar situations that there is extensive gene tree reconstruction error for fairly short internodes that are more than 50 million years old. So, I do not agree with the authors' interpretation here. The authors acknowledge this later on line 198, but I do not see any necessity in saying one thing and then ten lines later reversing course. Same for homoplasious sites versus sites that are incongruent due to ILS or introgression; it is a challenge to distinguish these from each other after a certain level of divergence or maybe even for fairly recent divergences such as chimp, human, gorilla. A recent study based on large insertions shows that such rare genomic changes completely uniformly favor the chimp+human grouping with absolutely zero conflict.

4) line 204. As noted just above, for placement of tarsier at least, the authors' argument here on biological causes of gene tree incongruence versus gene tree reconstruction error that causes conflict do not hold up from my perspective. So, I think a more cautious interpretation is warranted given the uncertainty.

5) line 216. I see here that the authors mention the position of tarsier and note that there is extensive conflict among characters and among gene trees at this node, but this is completely contrary to a published analyses of transposon insertions where more than 100 transposons uniquely support this clade with absolutely no conflicts. So, if think that the gene tree and character conflicts are actually real, this needs to be reconciled with the published transposon insertion data. I think it would be a challenge to conclude other than that: 1) the transposon data were made up and are not real (i.e., these other authors cheated), 2) the sequence data analyzed in the current manuscript are homoplasious and gene trees are inaccurately reconstructed, or 3) some sort of weird natural selection drove retroposon insertions to be fixed very rapidly relative to nucleotide substitutions at both silent third codons and selected first and second sites in protein coding genes. I think the third interpretation would be a challenge to argue for, and the first explanation imlies that someone else cheated.

6) line 222. The studies that cite conflicting previous placements of the tarsier are ones that analyzed little data. There are many previous analyses that have strongly supported tarsier plus Simiiformes based on extensive data, and I think none based on lots of data that have supported alternatives, except for analyses where researchers have botched their analyses (e.g., Song et al., 2012).

7) line 233. Change "Concatenation Affects Resolution of the New World Monkey Radiation" to "ML Concatenation Affects Resolution of the New World Monkey Radiation"? The parsimony concatenation seems to agree with the ASTRAL tree here, and only the ML concatenation tree conflicts? Also, the large ML concatenation tree of Springer et al. (2012) conflicts with the ASTRAL tree from the current study but agrees with ML concatenation analysis of the current study. Perhaps this is a bias due to ML concatenation but not concatenation generally; what does Perleman et al. ML consatenation support? If this one also supports the authors' interpretation, this further supports the discussion here. An interesting empirical result as I have seen few conflicts between ML concatenation and ASTRAL in previous published work.

8) line 258. Change "relatively low posterior support" to "very low posterior support". The PP score is really bordering on minimal?

9) line 290. The inference using parsimony gene trees and ASTRAL is novel and quite interesting. There has been an affliction in the systematics community in that workers are obsessed with ML methods due to statistical consistency, but of course, this only applies when there is no ILS and there is a single 'gene tree' for all of the data. Given that there is not unlimited data of this type in any empirical case study, it could be argued that the early simulations that showed parsimony to be lacking/inconsistent are, in the end, bogus and irrelevant. It is not the first time that arrogance of evolutionary modelers has led to mass confusion in the field. I fear that the same thing is currently happening on a much grander scale in ecological modeling of COVID death count predictions, but in this latter example, we are dealing with actual life and death. Sad... But, it is nice that the authors here have actually used a simulation result from the Hahn group and taken the time to run the parsimony trees to reveal this interesting ASTRAL pattern.

10) line 294. Change " biases of concatenation" to " biases of ML concatenation for gene sub-segments with conflicting histories".

11) line 358 and above paragraphs. I found the discussion of divergence times to be reasonable, even-handed, and well stated. The authors discuss most of the critical issues that impact estimation of dates such as these and explain possible reasons for discrepancies with previous work.

12) line 370. The authors note that it is a challenge to infer introgression at deep nodes due to multiple hits. This is true, but recently Springer et al. have argued that this is a rationale for instead using transposon insertion characters for this purpose, given that these characters are generally thought to be lower-homoplay characters based on their mode of mutation. Have the authors attempted to used these characters for inferring gene flow among divergent lineages? If there are enough such characters, I would expect these data to be much more effective than the strategy taken by the authors who used gene trees, as opposed to individual SNPs. For using gene trees, the challenge is always to determine recombination break points accurately and to deal with different expected average sizes of coalescence-genes for gene trees that support the species tree vs. gene trees that conflict with the species tree (my understanding is that the latter gene trees are generally expected to be shorter). It perhaps goes beyond the scope of the current paper to score transposon insertions, but since genomes are assembled, this would not be too much work to do? If not, it might be worthwhile to at least suggest this strategy for future work in their discussion of this issue, as I think this is the most productive way forward, especially for divergences that are even older, where even more multiple hits are expected and gene tree reconstruction becomes even more challenging.

13) line 376. I am guessing that lack of usage of this test is in large part due to the fact that 'genes' like the ones used in this study often span multiple recombination units, so the protein-coding exons strung together here do not likely represent single gene tree histories, but instead many. The authors acknowledge this somewhat, but I think it is a bigger problem than they maybe let on here?

14) line 418. In addition to PhyloNet, Cecile Ane's group (and others) have developed methods in which introgression pathways are allowed in an ASTRAL-like quartet approach to the ILS+gene flow problem. Might the authors attempt an analysis such as this that accounts for both ILS and gene flow to infer a network using an actual optimality approach (as opposed to PhyloNet which remains a bit mysterious to me in terms of how and why and what it spits out at the end of the process)?

15) line 450. In terms of caveats, I see that the authors stress more here the issue of recombination, population subdivision, etc., but there is also the problematic aspects of negative selection, which impacts neutral MSC assumptions, in particular because different protein-coding loci are likely under a variety of selective constraints, as well as selective sweeps and diversifying selection. Proponents of the coalescence approach have tried to avoid these hard truths, but we are starting to see some movement on this front after a recent (and ongoing) exchange in the literature. For example, He et al. (2019: MBE; "Asymmetric Distribution of Gene Trees Can Arise under Purifying Selection If Differences in Population Size Exist"), but I think the situation is even more dire than this publication lets on... I predict that it will become very clear that negative selection, of various degrees, will be very problematic for MSC inference and interpretation as time moves on (as argued in Springer and Gatesy, 2016, but denied by "leaders in our field" in Edwards et al., 2016). Sometimes sets of leaders are really just followers, perhaps?

16) line 476. Again here, as long as there are enough active retroelements, I think using these to infer gene flow of ancient lineages solves this problem?

17) line 488. The authors conclude that, "Our analyses reveal how concatenation of genes or even of exons can mislead maximum likelihood phylogenetic inference in the presence of discordance, but also how to overcome the biases introduced by concatenation in some cases." I am not necessarily convinced that concatenation has failed in any way in this study relative to coalescence methods such as ASTRAL. A particular brand of this format, ML concatenation, might have failed?

18) line 545. The authors note that for concatenation, they did the following: "Q-TREE v2-rc1 was used with all 1,730 aligned loci to estimate a maximum likelihood concatenated (ML-CONCAT) tree with an edge-linked, proportional-partition model". It might be best for those who have not used IQ-TREE to describe exactly what is entailed in such an analysis. Was a single substitution model used for the entire concatenation, or were separate substitution models used for each gene in the overall alignment as in the coalescence analyses? Was just one set of branch lengths used, or were different genes allowed to have different branch lengths? Were different codon positions given different branch lengths and/or substitution models?

19) line 575. Why just use four taxa here. This would seem to increase the probability of long branch artifacts as opposed to more complete sampling of taxa that will subdivide long branches? This type of sampling might in fact lead to asymmetry in results, rather than provide information on that asymmetry? How do results, for asymmetry, differ when including all taxa (as in the initial estimates that focused on seven nodes) versus the redos with just four taxa that the authors seem to prefer? Using four taxa automatically gets rid of conflict by removing lineages perhaps and increases the number of relevant loci, but perhaps this is sort of sweeping real issues under the rug?

20) line 584. Given that the authors note much evidence for introgression and also argue that concatenation can be misleading, it is potentially problematic that the clock analyses are all based on concatenation, which would seem to offer distortions. The clock methods used do not take extensive introgression or ILS into account. Perhaps the authors should qualify that ILS and introgression could cause distortions in their concatenated clock analyses (perhaps too old dates due to deep coalescence or perhaps too shallow dates due to introgression, etc.). The authors could run *BEAST which takes ILS into account to get divergence times, as the authors think this is a concern, but note that this coalescence program, like all coalescence methods, are also deeply impacted by introgression (just like concatenation). The authors naively seem to think that introgression does not impact standard coalescence methods, but this is certainly not the case, in particular for what are considered by some the best methods (i.e., *BEAST, which has a least common denominator problem with outlier genes as noted in the original description of the method). The authors should also note how genes were modeled in the concatenated IQ-TREE analyses relative to the concatenated clock analyses. Again, was each gene given a different model or one model for all genes?

21) line 608. The authors note that, " Single-copy ortholog alignments of CDS sequences produced using Guidance2 and the Nexus formatted dated species tree phylogeny are available through DataDryad (https://doi.org/10.5061/dryad.rfj6q577d)." The authors should also provide, optimal gene trees, bootstrap consensus gene trees, and concatenated matrixes (with gene partitions) used in their IQ-TREE and clock analyses so that their analyses can be replicated by others if need be. This is perhaps my strongest comment on revision. Hard to tell what was done unless all of this material is posted for other scientists to see. So, this revision is essential from my perspective.

Reviewer #3:

The manuscript submitted by Vanderpool et al. describes the generation of three novel primate genome sequences, the mining of orthologous protein coding genes across 23 published primate genomes and several outgroup genomes, and an in-depth phylogenomic analysis. The authors estimate the phylogeny and divergence times of primates and compare their results to those of previous studies. The methods and analyses are, for the most part, sound. The results confirm relationships identified in many previous studies on primate phylogeny. The real novelty in this manuscript is the detailed analysis of genealogical discordance and evidence for ILS and ancient hybridization within several lineages of primates. This section of the manuscript is quite strong and I found the analyses and interpretations to be well-reasoned and sound, and provide important insights into this aspect of early primate evolutionary history. I believe that the findings are very interesting and publishable, but there are a number of aspects of the analysis that could be improved, which I believe will lend greater confidence to their conclusions.

1. The new reference genomes are fairly phylogenetically restricted. It might help the authors message if there is some mention as to why the addition of these particular taxa are important or strategic from the standpoint of phylogenetic resolution and hybridization.

2. On page 5, it would be helpful to mention the type of sequencing platforms used for each genome.

3. Given the availability of whole genome sequences from so many primates, I was disappointed that the phylogenetic analysis was restricted to only 1,730 single-copy genes, and did not incorporate non-coding sequence alignments. In many cases they could have better ruled out gene tree reconstruction error if the phylogenetic information content of each locus was increased. For example, primate monophyly is supported for only 55% of the gene trees, which in most cases is probably due to lack of phylogenetically informative sites and probable rooting errors. Does gene tree discordance from the assumed species tree correlated with cds alignment length? Also, it would have allowed for an assessment of discordance across a more significant majority of the genome, rather than a small proportion of coding sites. In my opinion, the quality of this report would be markedly improved if the scope of the analysis was expanded beyond protein coding loci to include intronic sequences and the alignments based on genomic sequences rather than inferred isoforms, which are prone to annotation errors.

Also, I feel that the authors have missed an opportunity to catalog rare genomic changes (insertion and deletion events) such as retroelement insertions. Because these events occur only once at a single location they are not prone to within locus recombination, and can be analyzed with D statistics. The addition of these types of analyses, particularly with regard to the basal relationships within Cebidae and Colobinae, would strengthen the results.

4. Some important details for the 1,730 loci are missing, such as the chromosome distribution of the loci, whether any are tightly linked, and the breakdown between autosomes and sex chromosomes. Perhaps add a table with this information.

5. I did not see any methods or details surrounding filtration of dubious protein coding sequence alignment regions, given that the gene set is based on the longest isoform from each species, and the ends of isoforms are often poorly annotated. This step is fairly important for the analysis of deeper evolutionary divergences, where alignment anomalies are commonplace and methods have been described to deal with these issues (see Liu et al. PNAS [2017], Mason et al. Sci Adv. [2016], Springer & Gatesy, Systematics and Biodiv. [2018]). Some reporting of these details would be important to have confidence in the assessment of gene and sitewise discordance estimates, and that these values aren't unnecessarily inflated due to poor alignment quality and exon, isoform paralogy. For example, on pgs. 9-10 (lines 199-201) the authors correctly point out the various technical problems (including poorly aligned sequences, etc.) that can result in gene tree error, but they don't provide any in-depth analysis of their own data set in this regard to determine how often gene tree error can be attributed to these different factors.

6. The use of a single outgroup (and most notably the mouse, which is extremely accelerated in terms of nucleotide substitution rate compared to all other placental mammals) is potentially problematic and may have led to long branch attraction and other artifacts that would increase the amount of discordance pertaining to the relationships between primates, colugos and treeshrews. The authors should add at least one lagomorph; the rabbit and pika genomes are of similar quality and annotation as the primate genomes. Also, in mentioning previous literature, the authors might note that studies that used rare genomic changes (indels and retroelement insertions) produced statistically significant results that are not prone to the same bootstrap issues raised by the authors on lines 171-174. **Minor point, the word "probability" is missing after "posterior" on line 172.

7. Claiming parsimony (pg. 13) would not produce a biased topology with concatenation methods seems a bit of an overstatement. Parsimony methods are more prone to long branch attraction artifacts than ML methods. The authors should consider qualifying their comment here.

8. Line 391, "then re-aligned orthologs present in a single copy in each taxon"…please clarify why this was done.

Decision Letter 2

Roland G Roberts

29 Sep 2020

Dear Dr Vanderpool,

Thank you for submitting your revised Research Article entitled "Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression" for publication in PLOS Biology. I have now obtained advice from two of the original reviewers and have discussed their comments with the Academic Editor.

Based on the reviews, we will probably accept this manuscript for publication, assuming that you will modify the manuscript to address the remaining points raised by the reviewers. Please also make sure to address the data and other policy-related requests noted at the end of this email.

IMPORTANT:

a) Please attend to the outstanding requests from rev #2.

b) Please address my Data Policy requests (further down letter).

c) You currently state that an ethics statement is not needed. However, we note that you used samples of primate heart and blood tissue provided by San Diego Zoo and Washington National Primate Research Center. Please could you provide details of the terms and approvals under which those samples were obtained?

We expect to receive your revised manuscript within two weeks. Your revisions should address the specific points made by each reviewer. In addition to the remaining revisions and before we will be able to formally accept your manuscript and consider it "in press", we also need to ensure that your article conforms to our guidelines. A member of our team will be in touch shortly with a set of requests. As we can't proceed until these requirements are met, your swift response will help prevent delays to publication.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

- a cover letter that should detail your responses to any editorial requests, if applicable

- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable)

- a track-changes file indicating any changes that you have made to the manuscript.

*Copyediting*

Upon acceptance of your article, your final files will be copyedited and typeset into the final PDF. While you will have an opportunity to review these files as proofs, PLOS will only permit corrections to spelling or significant scientific errors. Therefore, please take this final revision time to assess and make any remaining major changes to your manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*Early Version*

Please note that an uncorrected proof of your manuscript will be published online ahead of the final version, unless you opted out when submitting your manuscript. If, for any reason, you do not want an earlier version of your manuscript published online, uncheck the box. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosbiology/s/submission-guidelines#loc-materials-and-methods

Please do not hesitate to contact me should you have any questions.

Sincerely,

Roli Roberts

Roland G Roberts, PhD,

Senior Editor,

rroberts@plos.org,

PLOS Biology

------------------------------------------------------------------------

ETHICS STATEMENT:

-- Please include the full name of the IACUC/ethics committee that reviewed and approved the animal care and use protocol/permit/project license. Please also include an approval number.

-- Please include the specific national or international regulations/guidelines to which your animal care and use protocol adhered. Please note that institutional or accreditation organization guidelines (such as AAALAC) do not meet this requirement.

-- Please include information about the form of consent (written/oral) given for research involving human participants. All research involving human participants must have been approved by the authors' Institutional Review Board (IRB) or an equivalent committee, and all clinical investigation must have been conducted according to the principles expressed in the Declaration of Helsinki.

------------------------------------------------------------------------

DATA POLICY:

You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797

Many thanks for providing raw data, alignments and trees in NCBI BioProject and Dryad. However, we also that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it: Figs 3, S3, S4. NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

Please also ensure that figure legends in your manuscript include information on where the underlying data can be found (e.g. supplementary file, Dryad, etc.), and ensure your supplemental data file/s has a legend.

Please ensure that your Data Statement in the submission system accurately describes where your data can be found.

------------------------------------------------------------------------

REVIEWERS' COMMENTS:

Reviewer #2:

This is a re-review by reviewer #2, so I will mainly comment on issues that I brought up in the first round. Generally, the authors responded to nearly all of my queries with thoughtful responses and edits to their paper, and they included a more than required response to my query about negative selection by completely redoing a published simulation study that they found lacking. My only two remaining requests have to do with publication of alignments for both genes and transposon insertions.

The authors note that they have posted their trimmed DNA sequence alignments, and they have clarified how this trimming was done. However, for other researchers to exactly follow what was originally aligned and then tossed from each alignment is critical for interpretation of their results and conclusions. This is because many of their conclusions have to do with recombination and possible gene flow. These inferences derive from conflicts among loci and within loci that are caused by these processes. So, in order to track what is what in their overall analysis, the precise homology relationships of sequences should be presented both in the original alignments of what the authors thought are homologous regions of each species' genome (to see whether initial annotations of genes are justified) and also in the final alignments that were used in analyses (which the authors have posted already).

Second, I think it would be important to post a subset of the alignments for transposon insertions for the position of the tarsier relative to bushbaby and human. This would be an important contribution, because as far as I know, no shared derived transposons have been documented for the human+bushbaby clade in the literature, but the authors have discovered possibly as many as 341. Hartig et al. (2013) instead found 104 transposon insertions that cleanly support a human+tarsier clade with no conflicts at this node. To be blunt, I do not really believe the new result that support for this latter clade (435 transposons) is countered by extreme conflict (341 transposons), because the authors have not looked at the transposon insertion site alignments in detail to assess the quality of the alignments as done in Hartig et al. Although, I do not expect the authors to go through all of these 341 conflicting transposon alignments as this is not the primary focus of the paper, they should check a subset of these to be sure that at least a good number of these are convincing in terms of alignment ends, insertion points, homology of the transposon insert, etc. which can only be done currently, I think, by looking at these by eye. Ideally, it would be great if they could document 10-20 'clean'and convincing transposon insertions for the human+bushbaby clade and present these alignments in a supplementary file that would document these to skeptics (like me...).

Aside from these two points, which can be easily addressed without so much new work, I commend the authors on their nice phylogenomic study and in their thoroughness in dealing with reviewers' queries.

Reviewer #3:

I thank the authors for a very detailed and thoughtful response to my previous comments. I am satisfied with the revisions, and I have no further concerns.

Decision Letter 3

Roland G Roberts

2 Nov 2020

Dear Dr Vanderpool,

On behalf of my colleagues and the Academic Editor, Chris D Jiggins, I am pleased to inform you that we will be delighted to publish your Research Article in PLOS Biology.

PRODUCTION PROCESS

Before publication you will see the copyedited word document (within 5 business days) and a PDF proof shortly after that. The copyeditor will be in touch shortly before sending you the copyedited Word document. We will make some revisions at copyediting stage to conform to our general style, and for clarification. When you receive this version you should check and revise it very carefully, including figures, tables, references, and supporting information, because corrections at the next stage (proofs) will be strictly limited to (1) errors in author names or affiliations, (2) errors of scientific fact that would cause misunderstandings to readers, and (3) printer's (introduced) errors. Please return the copyedited file within 2 business days in order to ensure timely delivery of the PDF proof.

If you are likely to be away when either this document or the proof is sent, please ensure we have contact information of a second person, as we will need you to respond quickly at each point. Given the disruptions resulting from the ongoing COVID-19 pandemic, there may be delays in the production process. We apologise in advance for any inconvenience caused and will do our best to minimize impact as far as possible.

EARLY VERSION

The version of your manuscript submitted at the copyedit stage will be posted online ahead of the final proof version, unless you have already opted out of the process. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

PRESS

We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have not yet opted out of the early version process, we ask that you notify us immediately of any press plans so that we may do so on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for submitting your manuscript to PLOS Biology and for your support of Open Access publishing. Please do not hesitate to contact me if I can provide any assistance during the production process.

Kind regards,

Alice Musson

Publishing Editor,

PLOS Biology

on behalf of

Roland Roberts,

Senior Editor

PLOS Biology

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Concordance factors for the species tree in Fig 1 calculated using maximum likelihood gene trees and site patterns from the 200 longest single-copy loci alignments used in the 1,730-gene analysis.

    In general, gCFs increase, while the sCFs remain the same, indicating that gene tree error is a likely source of some discordance. gCF, gene concordance factor; sCF, site concordance factor.

    (PDF)

    S2 Fig. Concordance factors calculated using 150 randomly chosen single-copy orthologs, with pika (Ochotona princeps) included as an additional outgroup to mouse.

    (A) gCFs and sCFs for these 150 genes when pika is included. (B) gCFs and sCFs for these same genes when pika is not included. We observe slightly higher gCFs near the base of the tree with pika excluded (red boxes). Note that these species trees use unit-length branch lengths for readability of branch labels. gCF, gene concordance factor; sCF, site concordance factor.

    (PDF)

    S3 Fig. Gene and site concordance factors plotted as a function of node depth (in millions of years).

    No correlation was found between gCFs and node depth, whereas a slightly negative correlation was found between sCFs and node depth. This relationship indicates that homoplasy may act to slightly reduce sCFs deeper in the tree. The data underlying mean node ages are provided in S1 Data. gCF, gene concordance factor; sCF, site concordance factor.

    (PDF)

    S4 Fig. Forward simulations using SLiM3 with the most extreme parameters used by He et al.

    (2020): population size combination “F” with s = −7.5 × 10−6 and Δτ = 2,000. Our results show no significant difference in the distribution of gene tree topologies in the presence of negative selection (A). This result holds for simulations in which we increased the per-locus mutation rate by 2 orders of magnitude (B). SLiM3 recipes are available via Data Dryad at https://doi.org/10.5061/dryad.rfj6q577d [22]. Gene tree counts for both simulations, A and B, are available in S1 Data.

    (PDF)

    S5 Fig. Present-day species distributions for 4 African Papionini (Papio, Theropithecus, Mandrillus, and Cercocebus) and 3 Asian Macaca species included in the introgression analysis.

    The ancestral Macaca distribution (gray shading) is inferred from Macaca fossil localities in Africa and Europe as reviewed in Roos et al. [106]. The ancestral Macaca distribution likely represents only a fraction of the species range from the late Miocene to the late Pleistocene in Africa and Europe. The contemporary distribution of the African Macaca sylvanus (bright green) is included for reference; the current distribution of Macaca nemestrina is completely contained within that of Macaca fascicularis. Fossil localities for Theropithecus species hypothesized to overlap contemporaneously with various ancestral Macaca are included. Citations for spatial data of extant species: M. nemestrina (Richardson et al., 2008), M. fascicularis (Ong and Richardson, 2008), M. sylvanus (Butynski et al., 2008), Macaca mulatta (Timmins et al., 2008), Theropithecus gelada (Gippoliti et al., 2019), Papio anubis (Kingdon et al., 2008), Cercocebus atys (Oates et al., 2016), and Mandrillus leucophaeus (Oates and Butynski, 2008). Base map was obtained from the public domain map database Natural Earth (http://www.naturalearthdata.com/downloads/).

    (PDF)

    S1 Table. Genomes analyzed in this study with the original NCBI release date, the publication for the reference used, and the accession number for the assembly.

    When possible, the most recent version for each genome was used.

    (DOCX)

    S2 Table. All published genomes used in this study, including links to the assemblies and NCBI BioProjects.

    Annotation information is included for each genome at the time of download.

    (XLSX)

    S3 Table. Orthogroup, protein name, human chromosome number, and coordinates for the single-copy human orthologs used in the 1,730 gene analysis.

    Alignment files are named by orthogroup, allowing the use of this table to identify the protein in each alignment.

    (XLSX)

    S4 Table. Gaps/ambiguities by species and as a percentage of total alignment length.

    * denotes species sequenced this study.

    (DOCX)

    S5 Table. Lengths for each 40-locus concatenated alignment used in the molecular dating analyses.

    Each dataset was analyzed twice until node age estimates converged (15–25k steps) using a log-normal auto-correlated model [139]. Datasets are available via Data Dryad at https://doi.org/10.5061/dryad.rfj6q577d [22].

    (DOCX)

    S6 Table. Fossil calibrations employed in this study.

    Node numbering corresponds to the numbering in Fig 3. Median underflow/overflow for each calibration was calculated from 20 independent runs performed on 10 datasets (2 runs per dataset).

    (DOCX)

    S7 Table. Mean node age for 20 independent PhyloBayes dating runs.

    Node numbers correspond to the numbering in Fig 3. The 95% HPD intervals were calculated by averaging the minimum and maximum of the 95% HPD interval for each dating run. HPD, highest posterior density.

    (DOCX)

    S8 Table. Quartets used to test for significant Δ values for internal branches of the primate tree.

    Branches tested correspond to the labeled branches in Fig 3. After correcting for multiple comparisons (Dunn–Šidák, P = 0.00301), 3 internal branches and 8 quartets were found to have significant Δ values, indicating a likely introgression event.

    (DOCX)

    S1 Data. The Excel workbook contains 5 different tabs.

    Tab 1, Fig 3 Data: consists of the node age estimates for all 20 independent PhyloBayes dating analyses as well as the run used to determine the prior for each node; each estimate is plotted separately in Fig 3. Tab 2, Fig 3 Data for R: the same data as in tab 1, but formatted for analysis with the accompanying R script “plot_DATING.R” available via Data Dryad: https://doi.org/10.5061/dryad.rfj6q577d [22]. Tab 3, S3_Fig_Data: the data used to generate S3 Fig. The average node ages estimated in tab 1 are used here to plot age vs. concordance factors estimated for each node in IQ-TREE. Tab 4, S4_Fig_PanelA_Data: contains the tree counts that resulted from the SLiM3 simulation conditions pictured in S4A Fig. Tab 5, S4_Fig_PanelB_Data: contains the tree counts for the SLiM3 simulation conditions pictured in S4B Fig. SLiM3 recipes for both simulations are available via Data Dryad at https://doi.org/10.5061/dryad.rfj6q577d [22].

    (XLSX)

    Attachment

    Submitted filename: Response_to_Reviewers.docx

    Attachment

    Submitted filename: Response_to_Reviewers_2.docx

    Data Availability Statement

    The relevant assembly accessions and associated references used in this study are provided in S1 Table. All raw data, assemblies, and annotation information used in theses analyses are available through each species’ NCBI BioProject link available in the relevant assembly accessions and associated references used in this study are provided in S1 Table. The Data Dryad repository associated with this study can be accessed through the following link: https://doi.org/10.5061/dryad.rfj6q577d [22]. The repository contains the following files and archives: • 1730_ALIGNMENT_CONCAT.paup.nex – Concatenated alignment with PAUP block of commands used to generate the parsimony concatenated tree. • 1730_Alignments_FINAL.tar.gz: 1,730 single-copy ortholog alignments. • 1730_ML_GENETREEs.treefile: Maximum likelihood gene trees estimated from from the 1,730 ortholog alignments. • ASTRAL_Tree_AVGdates.tre: The ASTRAL topology (Fig 1) with average dates from 10 independent datasets. • All_Dating_Datasets_DRYAD.tar.gz: The concatenated alignments used for dating analyses. • PARSIMONY_1730_Gene_Trees.tre: All 1,730 parsimony gene trees fro MPBoot. • Supp_fig4A_F_s6_b1_v2_1.slim-SLiM3 recipe for S4A Fig simulation • Supp_fig4B_F_s6_b1_v2_highmut_1.slim- SLiM3 recipe for S4B Fig simulation • All_1735_UNALIGNED_Seqs.tar.gz: All unaligned single-copy gene sequences. • plot_DATING.R: R script used for plotting Fig 3.


    Articles from PLoS Biology are provided here courtesy of PLOS

    RESOURCES