Radiocarbon and genomic evidence for the survival of Equus Sussemionus until the late Holocene

Dawei Cai; Siqi Zhu; Mian Gong; Naifan Zhang; Jia Wen; Qiyao Liang; Weilu Sun; Xinyue Shao; Yaqi Guo; Yudong Cai; Zhuqing Zheng; Wei Zhang; Songmei Hu; Xiaoyang Wang; He Tian; Youqian Li; Wei Liu; Miaomiao Yang; Jian Yang; Duo Wu; Ludovic Orlando; Yu Jiang

doi:10.7554/eLife.73346

. 2022 May 11;11:e73346. doi: 10.7554/eLife.73346

Radiocarbon and genomic evidence for the survival of Equus Sussemionus until the late Holocene

Dawei Cai ^1,^†,^✉, Siqi Zhu ^1,^†, Mian Gong ^2,^†, Naifan Zhang ¹, Jia Wen ², Qiyao Liang ¹, Weilu Sun ¹, Xinyue Shao ¹, Yaqi Guo ¹, Yudong Cai ², Zhuqing Zheng ², Wei Zhang ³, Songmei Hu ⁴, Xiaoyang Wang ⁵, He Tian ³, Youqian Li ³, Wei Liu ³, Miaomiao Yang ⁴, Jian Yang ⁵, Duo Wu ⁶, Ludovic Orlando ^7,^✉, Yu Jiang ^2,^✉

Editors: George H Perry⁸, George H Perry⁹

PMCID: PMC9142152 PMID: 35543411

Abstract

The exceptionally rich fossil record available for the equid family has provided textbook examples of macroevolutionary changes. Horses, asses, and zebras represent three extant subgenera of Equus lineage, while the Sussemionus subgenus is another remarkable Equus lineage ranging from North America to Ethiopia in the Pleistocene. We sequenced 26 archaeological specimens from Northern China in the Holocene that could be assigned morphologically and genetically to Equus ovodovi, a species representative of Sussemionus. We present the first high-quality complete genome of the Sussemionus lineage, which was sequenced to 13.4× depth of coverage. Radiocarbon dating demonstrates that this lineage survived until ~3500 years ago, despite continued demographic collapse during the Last Glacial Maximum and the great human expansion in East Asia. We also confirmed the Equus phylogenetic tree and found that Sussemionus diverged from the ancestor of non-caballine equids ~2.3–2.7 million years ago and possibly remained affected by secondary gene flow post-divergence. We found that the small genetic diversity, rather than enhanced inbreeding, limited the species’ chances of survival. Our work adds to the growing literature illustrating how ancient DNA can inform on extinction dynamics and the long-term resilience of species surviving in cryptic population pockets.

Research organism: Other

Introduction

Today, all of the seven extant species forming the horse family belong to one single genus, Equus. It emerged in North America some 4.0–4.5 million years ago (Orlando et al., 2013), and first spread into Eurasia ~2.6 million years ago (Mya), via the Beringia land bridge (Lindsay et al., 1980). This first vicariance and expansion out of America led to the emergence of the ancestors of zebras, hemiones, and donkeys, a group collectively known as non-caballine (or stenonine) equids. Another expansion through Beringia occurred ~0.8–1.0 Mya (Vershinina et al., 2021), which allowed caballine equids (i.e., those most closely related to the horse) to enter into the Old World, where they persisted until the modern era and were domesticated ~5500–4200 years ago (Gaunitz et al., 2018; Librado et al., 2021; Outram et al., 2009).

In the recent years, ancient DNA (aDNA) data have revealed that the genetic diversity of non-caballine Equus was considerably larger in the past than it is today (Librado and Orlando, 2021; Orlando, 2020). This was further confirmed as the first mitochondrial DNA (mtDNA) data of Equus (Sussemionus) were collected (hereafter referred to as Sussemiones) (Eisenmann, 2010). This lineage radiated across North America, Africa, and Siberia, and developed multiple adaptations to a whole range of arid and humid environments (Eisenmann, 2010). Sussemiones were first believed to have become extinct during the Middle Pleistocene as the last known specimen showing typical morpho-anatomical characters dated back to ~500 kya (thousand years ago) in southeastern Siberia, Russia (Vasiliev, 2013). However, DNA results obtained on multiple osseous remains within the radiocarbon range and showing morphological traits reminiscent of the Eurasian Sussemiones species indicated that the lineage in fact survived until the Late Pleistocene (Druzhkova et al., 2017; Orlando et al., 2009; Vilstrup et al., 2013; Yuan et al., 2019). Early publications indicated survival dates 40–50 kya in southeastern Siberia, Russia (Proskuryakova cave) (Orlando et al., 2009; Vilstrup et al., 2013), ~32 kya at the Denisova cave (Druzhkova et al., 2017), and ~12.6 kya at northeastern China (Yuan et al., 2019).

Despite an abundant fossil material, only a limited number of Sussemiones specimens have been investigated for ancient mitochondrial DNA (aDNA). These studies showed that Sussemiones formed a non-caballine lineage that may have diverged first from the lineage ancestral to zebras, hemiones, and donkeys. However, the exact placement of Sussemiones could not be fully resolved and has remained contentious (Heintzman et al., 2017; Orlando et al., 2009; Vilstrup et al., 2013). In this study, we have carried out archaeological excavations in three Holocene sites in China (Honghe, Heilongjiang Province; Muzhuzhuliang, Shaanxi Province; Shatangbeiyuan, Ningxia Province) (Figure 1 and Supplementary file 1a) and uncovered equine samples showing morphological features that may be characteristic of Sussemiones.

Figure 1. — (A) E. (*Sussemionus*) *ovodovi* geographic range. The three red circles indicate the archaeological sites analyzed in this study. The site (Honghe) that delivered the complete genome sequence at 13.4-fold average depth of coverage (HH06D) is highlighted with a square. The black circles indicate sites that provided complete mitochondrial genome sequences in previous studies (Druzhkova et al., 2017; Orlando et al., 2009; Vilstrup et al., 2013; Yuan et al., 2019). The temporal range covered by the different samples analyzed is given in years before present (YBP) and follows the name of each site. Numbers between parentheses indicate the number of samples for which DNA sequence data could be generated. (B) *Facies masticatoria dentis* of P2, M3, p2, and m3 for the *E. (Susseminous) ovodovi* samples of the Honghe site analyzed here (a), *E. Sussemionus* (Eisenmann, 2010) (b), and *E. caballus* (Laboratory specimen) (c). 1, 4 protocones; 2, 5 metacones; 3 caballine notch. Teeth from the right side are shown, except for *E. Sussemionus*. The erupted teeth of the samples of the Honghe site appear to be smaller than those of the *E. Sussemionus* specimen.

Figure 1—figure supplement 1. — (A) E. (*Sussemionus*) *ovodovi* geographic range. The three red circles indicate the archaeological sites analyzed in this study. The site (Honghe) that delivered the complete genome sequence at 13.4-fold average depth of coverage (HH06D) is highlighted with a square. The black circles indicate sites that provided complete mitochondrial genome sequences in previous studies (Druzhkova et al., 2017; Orlando et al., 2009; Vilstrup et al., 2013; Yuan et al., 2019). The temporal range covered by the different samples analyzed is given in years before present (YBP) and follows the name of each site. Numbers between parentheses indicate the number of samples for which DNA sequence data could be generated. (B) *Facies masticatoria dentis* of P2, M3, p2, and m3 for the *E. (Susseminous) ovodovi* samples of the Honghe site analyzed here (a), *E. Sussemionus* (Eisenmann, 2010) (b), and *E. caballus* (Laboratory specimen) (c). 1, 4 protocones; 2, 5 metacones; 3 caballine notch. Teeth from the right side are shown, except for *E. Sussemionus*. The erupted teeth of the samples of the Honghe site appear to be smaller than those of the *E. Sussemionus* specimen.

Equine assemblages dating to prior to the late Shang dynasty (ca. 3300 years ago) have documented the presence of wild horses in Northern China during the Late Pleistocene (Yuan and Flad, 2006). The taxonomic status and/or stratigraphic placement of the rare material attributed to Neolithic and early Shang contexts remained, however, contentious, leaving the possibility that Sussemiones or other equid taxa (co-)existed in China at the time, especially at the sites investigated in this study. At the Honghe site (47.20°N, 123.62°E), excavation fieldwork of nearly 20,000 m² has uncovered a late Neolithic settlement site dated to ~3400–4400 years ago, which belonged to a unique, rich fishing and hunting culture characteristic of northeastern China (Figure 1—figure supplement 1). The scale of the moated settlement indicated that there was already social management and relatively high productivity and building technology (Zhang et al., 2020). The Muzhuzhuliang site (38.83°N, 110.50°E) belonged to the ‘Longshan culture.’ It was dated to ~3800–4300 years ago and represents the most complete moated settlement hitherto excavated in the late Neolithic age of Northern China, showing a mixed subsistence economy involving agriculture, animal husbandry, and hunting (Wang et al., 2015). Finally, the Shatangbeiyuan site (35.63°N, 105.11°E) belonged to the early cultural relics of Neolithic ‘Qijia culture,’ which was dated to ~3900–4200 years ago. While millet represented the main crop produced at that time, findings including stone and bone arrowheads have also supported the presence of hunting (Fan et al., 2017). No obvious signs of domestication, including paleopathologies related to horseback riding, bridling, or chariotry (Bendrey, 2007; Taylor and Tuvshinjargal, 2018), were found amongst the equine specimens investigated at the three sites. In contrast, slash marks could be identified on some of the bones (HH13H, HH26H, and MZ104H), together with indications of bone marrow extraction (Figure 1—figure supplement 2). These findings suggest that these specimens were hunted.

In this study, we have sequenced the complete nuclear genome of Sussemiones specimens. This allowed us to not only solve the phylogenetic placement of Sussemiones within the Equus evolutionary tree, but also to time their divergence relative to other non-caballine equids, as well as to reconstruct their demographic trajectory until their extinction during the mid-Holocene.

Results

Archaeological samples and sequencing data

All the equine specimens investigated in this study showed morphological and genetic signatures (short fragments of the mitochondrial hypervariable region) distinct from those of extant horses and donkeys (Figure 1—figure supplement 2). The morphological differences were especially marked in the second and third molars, which appeared to be smaller than in modern horses, and reminiscent of the third molars paracones and metacones observed in Sussemiones specimens (Figure 1B). Combined, these samples were radiocarbon dated to 3456–4460 calibrated years before the present (cal BP), including a mid-second millennium BCE date for the most recent sample, HH13H (3270 ± 30 uncal. BP, i.e., 3456–3616 cal BP) (Supplementary file 1b). They could, thus, represent some of the latest surviving Sussemiones individuals prior to their extinction.

We next aimed at genetically characterizing the taxonomic status of these specimens using high-throughput DNA sequencing technologies. We extracted ancient DNA from a total of 26 specimens and sequenced the whole nuclear genome at ~0.002–13.4 times coverage, including four samples from Honghe that provided 13.4×, 3.9×, 1.1×, and 1.0× nuclear genome (Supplementary file 1a). Comparison of the X chromosome and autosomal coverage revealed the presence of 15 male and 11 female individuals (Supplementary file 1c).

Taxonomic status

To assess whether the sequenced specimens belonged to the same taxonomic group or comprised different species, we carried out a principal component analysis (PCA), including all the equine species sequenced at the genome level (depth of coverage ≥1×) (Figure 2A, Figure 2—figure supplements 1 and 2). For this, we downloaded 11 previously-published equine genomes representing all extant species of equids and the extinct quagga zebra (Huang et al., 2015; Jónsson et al., 2014; Kalbfleisch et al., 2018; Orlando et al., 2013; Renaud et al., 2018; Supplementary file 1d). All the Chinese specimens analyzed in this study were found to cluster together along the first two PCA components, in a group that was distinct from all other equine species (Figure 2A, Figure 2—figure supplement 1) but closer to non-caballine equine species than to the horse (Figure 2A). This suggested that they were all members of a unique taxonomic group, most related to non-caballine equids.

Figure 2. — The Honghe (HH), Muzhuzhuliang (MZ), and Shatangbeiyuan (BY) specimens are shown in red, while Asian asses, African asses, zebras, and horses are shown in purple, blue, green, and black, respectively. (A) Principal component analysis (PCA) based on genotype likelihoods, including horses and all other extant non-caballine lineages (16,293,825 bp, excluding transitions). Only specimens whose genomes were sequenced at least to 1.0× average depth of coverage are included. (B) Maximum likelihood tree based on six mitochondrial partitions (representing a total of 16,591 bp). Those *E. ovodovi* sequences that were previously published are shown in red. The tree was rooted using *Hippidion saldiasi* and *Haringtonhippus francisci* as outgroups. Node supports were estimated from 1000 bootstrap pseudo-replicates and are displayed only if greater than 50%. The black line indicates the mitochondrial clades A and B. (C) Maximum likelihood tree based on sequences of 19,650 protein-coding genes, considering specimens sequenced at least at a 3.0× average depth of coverage (representing 32,756,854 bp).

Figure 2—figure supplement 1. — The Honghe (HH), Muzhuzhuliang (MZ), and Shatangbeiyuan (BY) specimens are shown in red, while Asian asses, African asses, zebras, and horses are shown in purple, blue, green, and black, respectively. (A) Principal component analysis (PCA) based on genotype likelihoods, including horses and all other extant non-caballine lineages (16,293,825 bp, excluding transitions). Only specimens whose genomes were sequenced at least to 1.0× average depth of coverage are included. (B) Maximum likelihood tree based on six mitochondrial partitions (representing a total of 16,591 bp). Those *E. ovodovi* sequences that were previously published are shown in red. The tree was rooted using *Hippidion saldiasi* and *Haringtonhippus francisci* as outgroups. Node supports were estimated from 1000 bootstrap pseudo-replicates and are displayed only if greater than 50%. The black line indicates the mitochondrial clades A and B. (C) Maximum likelihood tree based on sequences of 19,650 protein-coding genes, considering specimens sequenced at least at a 3.0× average depth of coverage (representing 32,756,854 bp).

Maximum likelihood (ML) and Bayesian phylogenetic analyses including the nearly complete 17 mitochondrial genomes reported in this study (Supplementary file 1a, depth of coverage above 1×) confirmed their clustering with non-caballine equids, within a single monophyletic group that also included five previously characterized Sussemiones specimens (Figure 2B, Figure 2—figure supplements 3 and 4, Supplementary file 1e). This grouping was supported with maximal (100%) bootstrap values. This, and the PCA clustering, indicated that the different excavation sites investigated in this study in fact all provided specimens that belonged to the E. (Sussemionus) ovodovi species.

We also used complete mitogenomes to assess the diversity of maternal lineages present in the E. (Sussemionus) ovodovi lineage. Phylogenetic analyses showed two major clades, of which only one (clade B) was previously characterized. The other clade (A) consisted of two and three individuals from Muzhuzhuliang and Honghe, respectively (Figure 2B).

To further assess phylogenetic affinities, we used the two genomes characterized to at least 3× average depth of coverage (HH04D and HH06D) to place Sussemiones within the equine phylogenetic tree. To achieve this, we used ML phylogenetic reconstruction and an alignment of the coding sequences of the protein-coding genes (Figure 2C, Figure 2—figure supplement 5). This showed that the Chinese ancient specimens branched off before the radiation leading to modern asses and zebras (Figure 2C). Similar tree topologies were recovered using whole-genome SNPs by TreeMix (Pickrell and Pritchard, 2012; Figure 2—figure supplements 6 and 7, Supplementary file 1f). Combined with the analysis of the occlusal surface of the molars, in particular the absence of the caballine notch, the shape of metacones and protocones, and the reduced tooth size (Figure 1B), our analyses consistently supported the material analyzed as small specimens of the extinct Equus (Sussemionus) ovodovi. We, thus, concluded that this lineage survived in China during the Holocene, and until 3477–3637 cal BP, which is ~9000 years after the latest known specimen to date (Druzhkova et al., 2017; Orlando et al., 2009; Vilstrup et al., 2013; Yuan et al., 2019).

Interspecies admixture and demographic modeling

Bifurcating trees fail to capture possible admixture events between lineages. Yet, previous research has unveiled pervasive admixture within equids, even amongst extant equids showing different chromosomal numbers (Jónsson et al., 2014). We thus next assessed whether the genomic data showed evidence for gene flow between Sussemiones and other non-caballine equids. To achieve this, we first applied D-statistics (Soraggi et al., 2018) to the genome sequence underlying 26 individual genomes and detected that E. ovodovi shared an excess of derived polymorphisms with asses than relative to zebras (Figure 3—figure supplements 1 and 2). This suggested that at least one admixture event could have taken place between Sussemiones and the ancestor of asses after their divergence from zebras.

We next leveraged the comparative genome panel and the ancient E. (Sussemionus) ovodovi genome characterized to high depth of coverage (HH06D) to reconstruct the equine demographic history using G-PhoCS (Gronau et al., 2011). More specifically, we first selected members of each equine lineage representing a total number of 10 genomes, conditioned analyses on 15,324 ‘neutral’ loci, and assumed that the genus Equus emerged some 4.0–4.5 Mya, following previous estimates (Orlando et al., 2013). G-PhoCS analyses confirmed previous work indicating that the zebras and asses linages diverged ~2.0 Mya and that the deepest divergence within zebras and asses took place prior to ~1.5 Mya (Jónsson et al., 2014; Figure 3). It revealed that the Sussemiones lineage diverged from the ancestors of extant non-caballine equids ~2.3–2.7 Mya, in line with the fossil record (Eisenmann, 2010). Allowing for migrations provided support for gene flow between Sussemiones and the ancestor of asses and zebras (Figure 3). However, weak to no migrations were detected between Sussemiones and extant equids (Supplementary file 1g). Importantly, the admixture between Sussemiones and the ancestor of extant asses seems to have been stronger than that between Sussemiones and the ancestor of extant zebras, in line with the results of D-statistics. G-PhoCS also supported the presence of significant unidirectional gene flow prior to ~2.3–2.7 Mya, from the horse branch into the ancestral branch to all non-caballine equids, including Sussemiones (probability of gene flow 2.2–8.8%, Supplementary file 1h). This is consistent with previous HMMCoal analyses applied to whole-genome sequences of all extant equine species, which indicated significant gene flow between the deepest branches of the Equus phylogenetic tree until 3.4 Mya, mostly from a caballine lineage into the ancestor of all non-caballine equids (Jónsson et al., 2014).

Figure 3. — Node bars represent 95% confidence intervals. The width of each branch is scaled with respect to effective population sizes (*N_e*). Independent *N_e* values were estimated for each individual branch of the tree, assuming constant effective sizes through time. Migration bands and probabilities of migration (transformed from total migration rates) are indicated with solid arrows. The red triangle indicates the earliest *Sussemionus* evidence found in the fossil record. (Images: *E. caballus* by Infomastern, *E. a. somalicus* by cuatrok77, *E. kiang* by Dunnock_D, *E. a. africanus* by Jay Galvin, *E. hemionus* by Cloudtail the Snow Leopard, *E. z. hartmannae* by calestyo, *E. b. quagga* by Internet Archive Book Images, *E. b. boehmi* by GRIDArendal, and *E. grevyi* by 5of7.)

Figure 3—figure supplement 1. — Node bars represent 95% confidence intervals. The width of each branch is scaled with respect to effective population sizes (*N_e*). Independent *N_e* values were estimated for each individual branch of the tree, assuming constant effective sizes through time. Migration bands and probabilities of migration (transformed from total migration rates) are indicated with solid arrows. The red triangle indicates the earliest *Sussemionus* evidence found in the fossil record. (Images: *E. caballus* by Infomastern, *E. a. somalicus* by cuatrok77, *E. kiang* by Dunnock_D, *E. a. africanus* by Jay Galvin, *E. hemionus* by Cloudtail the Snow Leopard, *E. z. hartmannae* by calestyo, *E. b. quagga* by Internet Archive Book Images, *E. b. boehmi* by GRIDArendal, and *E. grevyi* by 5of7.)

Dynamic demographic profiles, heterozygosity, and inbreeding levels

We next leveraged the high-coverage Sussemiones genome characterized here to further explore the demographic dynamics until extinction. When modeled as constant through time, population sizes in G-PhoCS indicated that most lineages, including Sussemiones, consisted of small populations, excepting the Burchell’s zebra (Supplementary file 1i). Pairwise sequential Markovian coalescent (PSMC) analyses, however, provided evidence for population size variation through time. First, the PSMC demographic trajectory of Sussemiones was found to diverge from that of other non-caballine equids (specifically, E. hemionus) after ~2.0 Mya, confirming the divergence date estimate retrieved by G-PhoCS (Supplementary file 1i). Second, the Sussemiones demographic trajectory was found to have constantly increased during the last million year but to have remained relatively low for a long period of time, until it reached a peak between 74 and 84 kya. It was, then, followed by an ~45-fold collapse until 13 kya (Figure 4). The lineage maintained extremely reduced population sizes through the Last Glacial Maximal (LGM, 19–26 kya) (Clark et al., 2009) and the Holocene, until it ultimately became extinct.

Figure 4. — The y-axis represents the effective population size (×10,000), and the x-axis is scaled in millions of years before present. Faded lines show bootstrap values.

Figure 4—figure supplement 1. — The y-axis represents the effective population size (×10,000), and the x-axis is scaled in millions of years before present. Faded lines show bootstrap values.

Importantly, the sample sequenced to sufficient coverage (HH06D) showed minimal heterozygosity and moderate inbreeding levels identified by the fraction of the segments within runs of homozygosity (ROH) (Figure 5). Strikingly, this is true in spite of the increased DNA damage error rates of this genome (Figure 2—figure supplement 9), which likely inflate our estimates. The limited population sizes and resulting genetic diversity, rather than particularly enhanced inbreeding, may, thus, have limited the chances of survival of the species and have ultimately led to extinction.

Figure 5. — (A) Individual heterozygosity outside runs of homozygosity (ROH). (B) Fraction of the genome in ROH. Estimates were obtained excluding transitions and are shown together with their 95% confidence intervals. The colors mirror those from Figure 2.

Figure 5—figure supplement 1. — (A) Individual heterozygosity outside runs of homozygosity (ROH). (B) Fraction of the genome in ROH. Estimates were obtained excluding transitions and are shown together with their 95% confidence intervals. The colors mirror those from Figure 2.

Discussion

Phylogenetic placement of Equus (Sussemionus) ovodovi

In this study, we have characterized the first nuclear genomes of the now-extinct equine lineage, E. (Sussemionus) ovodovi, the last surviving member of the subgenus Sussemionus. We demonstrated that this lineage survived in China well into the Holocene with the most recent specimens analyzed dating to ~3456–3616 cal BP. This is almost 9000 years after the latest specimens previously documented in the fossil record (Druzhkova et al., 2017; Vilstrup et al., 2013; Yuan et al., 2019). Our work, thus, shows that Sussemionus represents the last currently known Equus subgenus to become extinct. Our work also adds to the list of recently identified members of the horse family that were still alive at the time horses and donkeys were first domesticated, ~5500 years ago (Fages et al., 2019; Gaunitz et al., 2018; Rossel et al., 2008). In contrast to those divergent members that were identified in Siberia (Equus lenensis) and Iberia (IBE), which both belonged to the horse species (Fages et al., 2019; Schubert et al., 2014a), Sussemiones members were most closely related to non-caballine equids. This is in agreement with previous studies (Der Sarkissian et al., 2015; Druzhkova et al., 2017; Heintzman et al., 2017; Orlando et al., 2009; Vilstrup et al., 2013; Yuan et al., 2019), which could, however, not fully resolve the exact phylogenetic placement of this species within non-caballines as topological tests based on mitochondrial genomes received low confidence support (Der Sarkissian et al., 2015; Druzhkova et al., 2017; Heintzman et al., 2017; Orlando et al., 2009; Vilstrup et al., 2013; Yuan et al., 2019). Our study solved this question by reporting the first whole-genome phylogeny of Sussemiones, which confirmed with maximal bootstrap support this species as a basal lineage of non-caballine equids.

Suitable habitat and geographic distribution

Previous zooarchaeological and environmental research indicated an ecological range for Sussemiones overlapping with the grasslands located east of the Altay Mountains and west of the Yenisei River during the Late Pleistocene (Khenzykhenova et al., 2016; Malikov, 2016; Plasteeva et al., 2015; Shchetnikov et al., 2015; Slon et al., 2018). Recent research also reported species occurrence in northeastern China (~12,600–40,200 YBP), where similar climatic and ecological conditions were found at the time (Yuan et al., 2019). It could, thus, be speculated that Sussemiones were adapted to an environment with moderately dry climatic conditions and steppe landscapes (Yuan et al., 2019). However, our study identified Sussemiones specimens in three late Holocene sites from China characterized by mild and humid environmental conditions. Additionally, two distinct mitochondrial haplogroups from 22 individuals have been defined from the six known sites, suggesting possible population structure across various geographic areas and adaptation to local environments. It also suggests that the species could adapt to a wider variety of habitats than previously hypothesized and rejects the contention that the species became extinct as it could not survive in warmer climatic conditions (Yuan et al., 2019).

Interestingly, the Sussemiones specimens identified in this study were excavated from sites in northeastern China located at almost the same latitude as those Sussemiones localities known so far from Russia, but also at lower latitudes (Figure 1A). This implies that the geographic range of E. ovodovi was larger than previously expected and included at least Northern China and Southern Siberia. Although in the absence of identified fossils from Mongolia, given that there is a lack of mitochondrial phylogeographic structure, we could speculate that the two regions were in contact at least maternally. Further work is necessary to establish whether or not the species survived in other pockets both within and outside China.

Demographic history with ancestral interspecific admixture

Our analyses reveal that the divergence between Sussemiones and the most recent common ancestor of all extant non-caballine equids took place ~2.3–2.7 Mya, prior to the divergence of zebras and asses. Post-divergence admixture events with the lineage ancestral to asses and zebras, on the one hand, and the lineage ancestral to all extant zebras, were also identified (Figure 3 and Supplementary file 1h). Our results, thus, reveal non-caballine ancestral lineages occupying partly sympatric distributions that were, consequently, different than those of their descendants, in which zebras are restricted in Africa and Asian asses in Asia. Whether the admixture events identified here directly involved the Sussemiones lineage or one (or more) ghost lineage(s) closely related to Sussemiones requires further research.

Limited genetic diversity before extinction

The demographic profile of Sussemiones shows that after the peak of population size culminating ~74 kya, Sussemiones went through a slow and continuous decline until 13 kya (Figure 4). This time period encompasses several major climate changes (especially the LGM, ~19–26 kya) (Clark et al., 2009) and the great human expansion to Eurasia (~35–45 kya) (Henn et al., 2012). The effective size of Sussemiones populations that survived in Northern China until at least ~3500 years ago, remained extremely small, as indicated by their extremely reduced heterozygosity levels compared to other extant and extinct equine species. As the inbreeding levels were not particularly high compared to some members of endangered equine species (Figure 5), the reduced genetic diversity available in the lineage may have compromised the long-term survival of the lineage, in a process partly reminiscent of what was previously described for the woolly mammoth (Palkopoulou et al., 2015). And considering the rapid expansion of domestic horses across Eurasia from about 2000 BC (Librado et al., 2021), this lineage was ultimately replaced under growing anthropogenic stress.

In conclusion, our study clarifies the phylogenetic placement, speciation timing, and evolutionary history of the now-extinct Sussemionus equine subgenus. This group did not remain in reproductive isolation from other equine lineages, but contributed to the genetic makeup of the ancestors of present-day asses, while receiving genetic material from the ancestors of African zebras. This supports geographic distributions at least partly overlapping at the time, thus, not identical to those observed today. The species demographic trajectory experienced a steady decline from ~74 kya and during a period witnessing both important climatic changes and the Great human expansion across Asia (Henn et al., 2012). It survived with minimal genetic diversity the Pleistocene-Holocene transition, and for at least eight millennia before it became extinct, which provides insights into the survival potential of large animals since the Holocene. Given the persistence of Sussemiones throughout the third and second millennia BCE, archaeologists must be exceedingly careful while assigning Asian zooarchaeological material to equine taxa until this period.

Materials and methods

Genome sequencing

Minimum number of individuals (MNI) was determined by assigning the frequency of hip bone and was calculated from the acetabular bone to avoid double counting. MNI was estimated to 31 individuals at Honghe, 4 at Muzhuzhuliang, and 4 at Shatangbeiyuan. DNA preservation conditions were compatible with the recovery of ancient DNA sequences from only 20 of the 31 Honghe samples, 3 of the 4 Muzhuzhuliang samples, and 3 of the 4 Shatangbeiyuan samples (Supplementary file 1a).

All pre-PCR procedures were conducted in a dedicated ancient DNA laboratory at Jilin University (JLU) that is physically separated from the post-PCR laboratory. To remove potential contaminant DNA, working areas and benches were frequently cleaned with bleach and UV exposure. Lab experiments were carried out wearing full-body suits, facemasks, and gloves. To detect contamination, mock controls were included at each experimental step, including DNA extraction, DNA library preparation, and PCR setup.

Prior to DNA extraction, the outer surface of the sample was cleaned with a brush. The cleaned sample was subsequently cut into smaller pieces and soaked in 10% bleach for 20 min, rinsed with ethanol and distilled water, and then UV-irradiated for 30 min on each side. Finally, powder was obtained using a dental drill (Traus 204, Korea). Ancient DNA was extracted from the sample powder by using a modified silica spin column method (Yang et al., 1998), in the dedicated ancient DNA facilities from JLU. For each specimen, a total of 200 mg powder was added with 3.9 ml EDTA (0.465 mol/L) and placed in the refrigerator at 4°C for 12 hr for decalcification, and then 0.1 mL proteinase K (0.4 mg/mL) were added and incubated overnight in a rotating hybridization oven at 50℃ (220 rpm). After centrifugation, the supernatant was transferred into an Amicon Ultra-4 centrifugal filter device (Merck Millipore Ltd, 10,000 Nominal Molecular Weight Limit), reduced to less than 100 µL, and purified with QIAquick PCR Purification Kit (QIAGEN), according to the manual instructions.

Before preparation of DNA libraries, we first PCR-targeted short fragments of the mitochondrial hypervariable region to select those samples positive for the presence of equine DNA (which was further confirmed through Sanger sequencing). For this, we used the oligonucleotide primers L15473 5′-CTTCCCCTAAACGACAACAA-3′ and reverse primer H15692 5′-TTTGACTTGGATGGGGTATG-3′; and forward primer L15571 5′-AATGGCCTATGTACGTCGTG-3′ and reverse primer H15772 5′-GGGAGGGTTGCTGATTTC-3′ from Juan et al., 2007, and the amplification conditions therein.

Double-stranded single-indexed libraries were prepared using NEBNext Ultra Ⅱ DNA Library Prep Kit for Illumina (NEB #E7645S) and NEBNext Multiplex Oligos for Illumina Index Primers Set 1 and 2 (NEB #E7335S, #E7500S), following the manufacturer’s instructions with minor modiﬁcations. Specifically, the extracted DNA (50 µL) were end-repaired and A-tailed by adding 7 μL of NEBNext Ultra II End Prep Reaction Buffer and 3 μl of NEBNext Ultra II End Prep Enzyme Mix, and incubated for 40 min at 20°C and then 30 min at 65°C. The adaptor was ligated to the dA-tailed DNA fragments by adding 30 µL of NEBNext Ultra II Ligation Master Mix, 1 µL of NEBNext Ligation Enhancer and 2.5 µL of NEBNext Adaptor for Illumina (dilution 1:10), and incubated for 20 min at 20°C. The adaptor was then linearized by adding 3 µL of USER Enzyme and performing an incubation for 15 min at 37°C. The adaptor-ligated DNA were cleaned without size selection using the MinElute PCR Purification Kit (QIAGEN, Germany), following the instructions provided by the manufacturer. PCR enrichment was performed by using 30 µL of NEBNext Ultra II Q5 Master Mix, 1 µL of Index Primer, 1 µL of Universal PCR Primer, and 18 µL of adaptor-ligated DNA. PCR cycling conditions comprised an initial denaturation at 98°C for 30 s, 14–16 cycles of 98°C for 10 s, 65°C for 75 s, and a final extension at 65°C for 5 min. PCR-amplified DNA libraries were purified using Agencourt AMPure XP Beads, following the manufacturer’s instructions, and Illumina sequencing was performed on HiSeq X Ten platform using 150 bp paired-end reads. Overall, we sequenced a total of 28 DNA libraries and generated 2,727,843,803 read pairs (https://www.ebi.ac.uk/ena/browser/view/PRJEB44527?show=reads).

Radiocarbon dating

Radiocarbon dating of the samples was performed at the Beta Analytic Radiocarbon Dating Laboratory, Miami, FL. Bone or tooth pieces about 2 g were sampled in the bone and sent for subsequent dating of collagen (not ultrafiltered). Calibration was carried out using OxCalOnline (https://c14.arch.ox.ac.uk/oxcal.html) and the IntCal20 calibration curve. Calibrated dates are provided in Supplementary file 1b.

Data processing

Sequencing reads were processed and aligned against the horse (EquCab3.0 Kalbfleisch et al., 2018) and donkey (Renaud et al., 2018) reference genomes using the PALEOMIX pipeline (Schubert et al., 2014b) with default parameters, except that we followed the recommendations from Schubert et al., 2012 and disabled seeding. Briefly, paired-end (PE) reads longer than 25 nucleotides were trimmed with AdapterRemoval v2.2 (Schubert et al., 2016) and aligned against the reference genomes using BWA (Li and Durbin, 2009), retaining alignments with mapping qualities superior to 25. PCR duplicates were then removed using Picard (http://broadinstitute.github.io/picard/) (Broad Institute, 2019). Finally, all ancient and modern reads were locally realigned around indels using GATK (McKenna et al., 2010).

Postmortem DNA damage and average sequencing error rates were determined with mapDamage2.0 (Jónsson et al., 2013; Figure 2—figure supplement 8) and ANGSD (Korneliussen et al., 2014; Figure 2—figure supplement 9), respectively. Further rescaling and trimming procedures were implemented following Gaunitz et al., 2018 to limit the impact of remnant nucleotide misincorporations in subsequent analyses. For each of the DNA libraries examined, the base composition of the position preceding read starts on the horse reference genome showed an excess of Guanine and, to a lesser extent, of Adenine residues (Figure 2—figure supplement 8). This is in line with depurination driving postmortem DNA fragmentation (Briggs et al., 2007). Additionally, error rate estimates for each nucleotide substitution class indicated the predominance of C→T and G→A misincorporations (Figure 2—figure supplement 9). Such misincorporation rates were particularly inflated towards read ends, but not read starts (Figure 2—figure supplement 8). This is in line with the DNA nucleotide misincorporation profiles expected for the type of DNA library constructed (Seguin-Orlando et al., 2015), which was caused by the Q5 polymerase being unable to read through 5' uracils, thereby excluding the typical 5' excess of C-to-T. MapDamage profiles were, thus, consistent with Cytosine deamination at 5'-overhanging ends as the most prominent postmortem DNA degradation reactions (Jónsson et al., 2013).

GATK HaplotypeCaller was used to obtain individual gvcf files with “--minPruning 1 --minDanglingBranchLength 1” to increase sensitivity. Then, we employed GATK GenotypeGVCFs for genotyping with the option “--includeNonVariantSites” in order to retain non-variant loci. The vcf files were further filtered in TreeMix and G-PhoCS analysis.

Principal component analysis (PCA)

The genotype likelihood framework implemented in ANGSD helped mitigate various error rates in ancient and modern genomes. Using EquCab3 (Kalbfleisch et al., 2018) as the reference genome, ANGSD was run using the following options: “-only_proper_pairs 1 -uniqueOnly 1 -remove_bads 1 -minQ 20 -minMapQ 25C 50 -baq 1 -skipTriallelic 1GL 2 -SNP_pval 1e-6 -rmTrans 1”. This provided a dataset consisting of a total of 16,293,825 transversions when the horse was included, and 10,094,431 transversions when the horse was excluded (i.e., when analyses were restricted to non-caballine genomes only). In these analyses, only specimens sequenced to an average depth of coverage ≥1× were retained. PCA was carried out using the PCAngsd package (Meisner and Albrechtsen, 2018; Figure 2A). To assess the impact of potential reference bias, all analyses were repeated after mapping the sequence data against the donkey reference (Figure 2—figure supplement 2).

Phylogenetic inference

Mitochondrial phylogeny

Cleaned reads were mapped against the horse mitochondrial genome (GenBank accession no. NC_001640), following the same procedure as when mapping against the nuclear genome. Samples showing an average depth of coverage <1× were disregarded, leaving a total of 17 individuals for further analyses. After removing duplicates, consensus mitochondrial sequences were generated using ANGSD (-doFasta 2 -doCounts 1 -setMinDepth 3 -uniqueOnly 1 -remove_bads 1 -minQ 25 -minMapQ 25). Multiple alignment was performed together with the comparative mtDNA sequences downloaded from GenBank (Supplementary file 1e) using MUSCLE v3.8.31 (Edgar, 2004), with default parameters. The alignments were then split into six partitions (first, secoond, and third codon positions, rRNA, tRNA, and control region) by Partition Finder v2.1.1 (Lanfear et al., 2012).

Two ML trees based on all six partitions and excluding the control region (positions 15,469–16,660 of the horse reference mitochondrial genome) were both reconstructed using RAxML-NG v.0.9.0 (Kozlov et al., 2019) with GTR+GAMMA substitution model. A total of 1000 bootstrap pseudo-replicates were carried out to assess node robustness (Figure 2—figure supplement 3). BEAST 2.6.6 (Bouckaert et al., 2019) was used to perform Bayesian phylogenetic reconstruction and to estimate split times. The six partitions described above were used, for which the best substitution model was determined using modelgenerator (version 0.85, Keane et al., 2006) and a Bayesian information criterion. We calibrated the tree using tip dates (see Supplementary file 1j) and an age of 4–4.5 Mya for the root of crown group E. caballus (normal prior, mean 4.25 Mya, SD: 0.15 Mya) (Orlando et al., 2013). We applied together with the birth-death model and a relaxed molecular clock (log normal) for 1000 million generations (sampling frequency = 1 every 1000), while forced monophyly for all main lineages, including donkeys, hemiones, horses, ovodovi, and zebras. Convergence was assessed visually using Tracer v1.6 (with all individual ESS >200), and posterior date estimates were retrieved using 25% as burn-in. The final consensus tree was produced by TreeAnnotator 2.6.6 (Drummond and Rambaut, 2007) as the maximum clade credibility tree from 100,000 randomly sampled trees obtained using LogCombiner v2.6.6 (Bouckaert et al., 2019) (burn-in = 20%). The final tree was plotted using ITOL (Letunic and Bork, 2016; Figure 2—figure supplement 4).

Autosomal phylogeny

As for autosomes, we reconstructed an ML phylogenetic tree as implemented in the PALEOMIX phylo_pipeline, which is dedicated to phylogenomic reconstructions (Schubert et al., 2014b). This analysis was based on the coding sequence (CDS) of protein-coding genes annotated in EquCab3.0, partitioning data according to first, second, and third codon positions. ML phylogenetic inference was performed using ExaML v3.0.21 (Kozlov et al., 2015) and RAxML v8.2.12 (Stamatakis, 2014) under the GAMMA substitution model with 100 bootstrap pseudo-replicates (Figure 2C, Figure 2—figure supplement 5A). We also repeated the same procedure after mapping against the donkey reference genome, which returned the same topology (Figure 2—figure supplement 5B).

Additionally, we extracted biallelic single-nucleotide polymorphisms (SNPs) from the dataset generated in the section ‘Variant calling’ using bcftools v1.9 (Li et al., 2009). Both variant datasets obtained following mapping against the horse and donkey reference genomes were used in this analysis to rule out reference bias. We applied filters composed of minimum Phred-scaled quality score quality (QUAL) = 20, sites for all individuals below 2 or twice the mean coverage, and allowed up to three individuals with missing data per site. After disregarding transitions, a total of 18,803,101 (mapping against horse genome) and 19,459,070 (mapping against donkey genome) transversions were finally used as input for TreeMix (Pickrell and Pritchard, 2012) with parameters “-k 500 -root TWI”, and considering an increasing number of migrations edges (0 ≤ m ≤ 3; Figure 2—figure supplements 6 and 7, Supplementary file 1f).

Admixture analyses with D-statistics

D-statistics were calculated to investigate potential introgression between E. ovodovi and other non-caballines (Figure 3—figure supplement 1) using the doAbbababa2 program in ANGSD (Soraggi et al., 2018). Individuals were grouped according to their respective species. D-statistics were computed in the form (((H1, H2), H3), Outgroup) considering only the autosomal sites from bam files mapping against the horse reference with the following options: “-minQ 20 -minMapQ 25 -remove_bads 1 -only_proper_pairs 0 -uniqueOnly 1 -baq 1C 50”. The horse reference genome was used as the Outgroup. H1 and H2 denoted any non-caballine genomes except E. ovodovi, while H3 denoted the E. ovodovi. Confidence intervals were estimated applying a jackknife procedure and 5 Mb windows. Z-scores with absolute values higher than 3 were considered to be statistically significant. To rule out possible reference bias, we also rerun the same analysis using sequence alignments against the donkey reference genome (Figure 3—figure supplement 2).

G-PhoCS demographic model

Data preparation and filtering

In order to model the equine evolutionary history, we selected a total of 10 individuals representing each individual lineage and used their high-coverage genomes as input for G-PhoCS (Gronau et al., 2011). Genotypes were called by GATK and candidate ‘neutral’ loci were identified by applying the following filters:

The simple repeats track available for the reference genome was obtained from Ensembl v99 release; corresponding regions were masked.
All exons of protein-coding genes were discarded together with their 10 kb flanking regions; this was done based on the GTF format annotation file of the reference genome available from Ensembl v99 Genome Browser.
We identified conserved noncoding elements (CNEs) using phastCons scores (based on the 20-way Conservation track provided for the mammal clade according to the genomic coordinates of the human reference) downloaded from the Table Browser of UCSC. All CNEs and their 100 bp flanking regions were masked using liftOver to convert human genome coordinates into EquCab3.0 horse genome coordinates.
Exons of noncoding RNA genes together with their 1 kb flanking regions were removed, based on the annotations available for the reference genome.
Gaps in the reference genome were disregarded.

Besides the various filters described above, regions/sites likely to be enriched for misaligned bases, and to have high false-negative rates during read alignment or variant detection, were masked as missing data. More specifically, different individuals could be treated differently depending on the genotyping results obtained as described in the section ‘Variant calling,’ depending on the presence of (1) indels, (2) triallelic sites, (3) positions with depth of coverage twice the mean depth recorded for each individual, and (4) transition sites.

We selected 1 kb loci located with minimum inter-locus distance of 30 kb from the intervals that pass all the criteria described above. Then, consensus sequences were generated for each individual from the vcf file generated in the section ‘Variant calling’ using bcftools ‘consensus’ command, with IUPAC codes indicating heterozygous genotypes (--iupac-codes) and ‘N’ representing masked sites (--mask and --missing ‘N’).

Finally, we excluded contiguous intervals if the total amount of missing bases was greater than 50% of the region length, resulting in a final collection of 15,324 loci using the horse reference genome (autosomes only). Neighbor-joining trees were constructed to confirm the topology before the inferring the population divergence (Figure 3—figure supplement 3).

MCMC setup

We used default global settings (Gronau et al., 2011), including a gamma prior distribution (α = 1, β = 10,000) for all mutation-scaled population sizes (θ) and a gamma prior distribution (α = 0.002, β = 0.00001) for all mutation-scaled migration rate (m). The initial parameter value of mutation-scaled divergence times (τ) was first set individually for each population. Then, we ran ~100,000–200,000 iteration tests and manually evaluated the convergence by checking the achieve acceptance ratios (i.e., accept if around 30–70%) or using Tracer v1.6 (http://tree.bio.ed.ac.uk/software/tracer/). For each test, we updated the input of the initial τ and all fine-tuned parameters based on previous results to get the appropriate value. The final results in Figure 3 are based on 500,000 MCMC iterations, considering the first 10% as burn-in.

Parameter calibration

We assumed an average generation time (g) of 8 years. The coalescent time of the Equus (4.0–4.5 Mya) (Orlando et al., 2013) was used to bound the mutation rate μ (per site per year). Effective population sizes (Ne) and divergence times (T) were estimated by scaling θ and τ parameter using g and μ (Supplementary file 1g), and the following formula: Ne = θ/(4 μg) and T = τ/μ (Gronau et al., 2011).

Inferring gene flow

Total migration rates (M) were estimated by a mutation-scaled parameter (m) given by M = mτ_m, where τ_m is the mutation-scaled time span of the migration band. The total migration rate gives an accumulated migration rate over a long period of time, which can be superior to 100%. We then converted such rates, M, into a probability of migration using the formula P = 1-e^-M (where p is the probability of gene flow), according to the method presented in vonHoldt et al., 2016.

The migration model implemented in G-PhoCS makes it possible to detect gene flow between any two lineages by introducing migration bands manually to the demographic model. However, it remains difficult to detect weak migration events. Additionally, scenarios including a large number of migration bands can lead to spurious results. To address this, we first inferred a demographic model with no migration bands, and then introduced several migration bands corresponding to five independent scenarios (Supplementary file 1g). A significant migration band was considered supported if both the 95% Bayesian credible interval of total migration rate (M) did not include 0% values and if the mean value of M was estimated to be greater than 0.03%.

Settings for the migration bands between extant caballines are based on previous research (Jónsson et al., 2014). The significant migration band from horse to the non-caballine ancestor was identified (Supplementary file 1g), in line with previous work (Jónsson et al., 2014). However, no other non-negligible (M > 3%) migration bands were found in our analyses (Supplementary file 1g).

We then tried to estimate the migration events between E. ovodovi and other branches. We added all possible migration bands between E. ovodovi and extant non-caballine branches into the demographic model except the migration bands between E. ovodovi and the ancestor of extant non-caballines as the model is often underpowered to infer migration between sister populations. All of the migration bands were separated into four demographic models. Only three migration bands were shown to be significant (Supplementary file 1g).

Finally, the total four migration bands were combined into one demographic model (Supplementary file 1h) and compared the estimates to the one including no migration (Supplementary file 1i).

We caution that the analyses carried out using TreeMix and G-PhoCS returned partly discordant results. This may be due to TreeMix modeling pulses of admixture in contrast to G-PhoCS, in which situations of continuous gene flow can be accommodated. Additionally, gene flow affecting the two deepest tree branches can be directly accommodated by reducing their divergence. Therefore, the deep admixture inferred by G-PhoCS from the caballine branch into the ancestral branch of Sussemiones and other non-caballine equids cannot be expected to be identified through an individual migration edge with TreeMix as this could simply be modeled through a more limited divergence between both underlying lineages. The same holds true for the asymmetric gene flow inferred by G-PhoCS between Sussemiones and the branch ancestral to all extant asses; TreeMix is likely to only identify the resulting unidirectional contribution of these admixtures, which mainly sources to the branch ancestral to extant asses into Sussemiones; since G-PhoCS also infers additional admixture from the branch ancestral to extant zebras into Sussemiones, we can expect TreeMix to accommodate both sources of gene flow through a reduced divergence between Sussemiones and the branch ancestral to stenonines. Finally, it may reflect limitations pertaining to the two underlying data sets utilized, consisting, on the one hand, to the whole-genome SNP panel for TreeMix, further filtered for 15,324 candidate ‘neutral’ loci in G-PhoCS.

Demographic trajectories with PSMC

PSMC analyses

In order to reconstruct the past demographic dynamics of the E. ovodovi lineage, we applied the PSMC algorithm (version 0.6.5-r67) (Li and Durbin, 2011) to the sample HH06D (12.0×, mapping against horse reference), as well as three other Eurasian equine species (E. caballus TWI, E. hemionus ONA, and E. kiang KIA).

We first obtained the diploid consensus sequences after mapping against the horse genome for the autosomes of each specimens using bcftools ‘mpileup’ command and the ‘vcf2fq’ command from vcfutils.pl with the following filters: mapping quality ≥ 25; adjust mapping quality = 50; minimum depth of coverage = 8; maximum depth of coverage ≤ 99.5% quantile of the coverage distribution; minimum RMS mapping quality = 10; filtering window size of indels = 5.

After filtering the bases with Phred quality scores strictly lower than 35, we ran PSMC with the following command: ‘psmc -N25 -t15 -r5 -p “4+25*2+4 + 6”’. Calibration was carried out using a generation time of 8 years and mutation rate of 7.242 × 10^–9 per generation per site, following previous work (Jónsson et al., 2014). However, as for the misincorporation pattern and high error rate of HH06D (Figure 2—figure supplements 8 and 9), we also performed analyses without transitions using mutation rates of 2.3728 × 10^–9 that was obtained assuming that the most recent common ancestor of living equine species emerged 4 Mya (Orlando et al., 2013).

We found a great expansion of HH06D in the past 50,000 years when retaining transitions but not when conditioning on transversions (Figure 4—figure supplement 1). The former is thus likely spurious and at least partly driven by severe postmortem DNA damage signatures in the sequence data. We therefore only used the latter inference when considering the ancient HH06D specimen.

False-negative rate correction

The HH06D genome (12.0×) was corrected assuming a uniform false-negative rate (uFNR) following Orlando et al., 2013 as the average depth of coverage is lower than the recommended 20×. To identify the correction value of uFNR for HH06D, we randomly downsampled reads of the SOM genome (21.0×), using DownsampleSam function of Picard Tools to downscale sequence data to the same average depth of coverage as that obtained for HH06D. This indicated that a value of 0.22 was the most suitable uFNR value for rescaling the HH06D PSMC profile (Figure 4—figure supplement 2A). The KIA and the ONA genomes, which also showed limited coverage, were also rescaled following the same procedure (Figure 4—figure supplement 2B and C). Finally, PSMC confidence intervals were assessed from 100 bootstrap pseudo-replicates (Figure 4).

Heterozygosity inference and inbreeding

Global heterozygosity rates and inbreeding levels were inferred for high-coverage individuals (>10×) using ROHan (Renaud et al., 2019) with default parameters, except that transitions were excluded (--tvonly) (Figure 5—figure supplement 1). To limit the impact of remnant misincorporations, we used the attached estimateDamage.pl script to estimate damage for all ancient samples prior to heterozygosity computation. Inbreeding was co-estimated together with genome-wide heterozygosity levels from the total ROH length (Figure 5—figure supplement 2).

Acknowledgements

We thank High-Performance Computing (HPC) of Northwest A&F University (NWAFU) for providing computing resources.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Dawei Cai, Email: caidw@jlu.edu.cn.

Ludovic Orlando, Email: ludovic.orlando@univ-tlse3.fr.

Yu Jiang, Email: yu.jiang@nwafu.edu.cn.

George H Perry, Pennsylvania State University, United States.

Funding Information

This paper was supported by the following grants:

Major Program of National Fund of Philosophy and Social Science of China 17ZDA221 to Dawei Cai.
H2020 European Research Council 681605 to Ludovic Orlando.
National Natural Science Foundation of China 31822052 to Yu Jiang.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing - original draft, Writing - review and editing.

Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing - original draft, Writing - review and editing.

Investigation, Validation.

Data curation, Formal analysis, Investigation, Software, Validation, Visualization.

Investigation, Validation, Visualization.

Investigation, Validation.

Data curation, Investigation, Methodology, Software.

Resources.

Conceptualization, Methodology, Supervision, Writing - review and editing.

Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing - original draft, Writing - review and editing.

Additional files

Supplementary file 1. Tables that support the analysis and results above.

(a) Sample information. Dates are estimated from either calibrated radiocarbon dating (bold) or from the archaeological context. Sex is inferred from the ratio of depth of coverage found on the X chromosome and autosomes (F, female; M, male) (c), and the average depth of coverage when mapping against both of the horse and donkey reference genomes after rescaling and trimming are provided. (b) Calibrated radiocarbon measurement summary statistics and dating of five ancient horses sequenced in this study. Uncal BP dates were calibrated using OxCalOnline (https://c14.arch.ox.ac.uk/oxcal.html) with the IntCal20 calibration curve. (c) Sex information. The mean coverage of the autosomes and the X chromosome together with the ratio between them (F, female; M, male). (d) Comparative Genome Panel. (e) Mitochondrial sequences used in this study. (f) Variance explained by TreeMix models from 0 to 3 migration edges excluding transitions. Monotonic increase of the variance explained by the model stopped when considering more than 3 migration edges. (g) Inference of total migration rates (M) and migration proportions (p) using G-PhoCS (Gronau et al., 2011). A total of five models, including various possible migration bands, were considered. Models 1–4 include migration bands between E. ovodovi and other lineages, while model 5 contains all gene flow events identified in Jónsson et al., 2014. The migration bands with significant gene flow are highlighted in bold (these were defined as having a mean value of M > 3% and 95% credible interval not intercepting 0). They were combined to establish the final demographic model shown in Figure 3. (h). Migration rate estimates returned by G-PhoCS. The 95% credible intervals of four significant migration bands identified in (g) are shown. (i) Parameter estimates returned by G-PhoCS, considering models with and without migrations. The topology is in the form of (E. caballus, (E. ovodovi, ((E. a. somalicus, E. a. africanus), (E. kiang, E. hemionus)))), (((E. b. quagga, E. b. boehmi), E. grevyi), E. z. hartmannae). Divergence time and population size are estimated by the 95% Bayesian credible interval using total 15,324 candidate ‘neutral’ loci, considering the sequence data aligned against the horse reference genome. The migration model contains the four significant migration bands estimated and provided in (h). (j) The tip dates (average calibrated radiocarbon dates or dates were estimated from the archaeological context) for sample ages in BEAST analyses.

elife-73346-supp1.docx^{(152.7KB, docx)}

Transparent reporting form

elife-73346-transrepform1.docx^{(117.3KB, docx)}

Data availability

Sequencing data have been deposited in the European Nucleotide Archive under the accession number PRJEB44527.

The following dataset was generated:

Zhu S, Gong M, Zhang N, Wen J, Liang Q, Sun W, Shao X, Guo Y, Cai Y, Zheng Z, Zhang W, Hu S, Wang X, Tian H, Li Y, Liu W, Yang M, Yang J, Wu D, Orlando L, Jiang Y, Cai D. 2021. Our sequence data provided 26 mitochondrial genomes and 3 complete nuclear genomes for Equus (Sussemionus) ovodovi. European Nucleotide Archive. PRJEB44527

The following previously published datasets were used:

Renaud G, Petersen B, Seguin-Orlando A, Bertelsen M F, Waller A, Newton R, Paillot R, Bryant N, Vaudin M, Librado P, Orlando L. 2018. This study aims at improving the genome reference of the domestic donkey using the Chicago/HiRiSe technology. European Nucleotide Archive. PRJEB24845

Jonsson H. 2014. Speciation with gene flow in equids despite extensive chromosomal plasticity. European Nucleotide Archive. PRJEB7446

Ginolhac A. 2013. General Sample for Equus asinus asinus, Willy. NCBI BioSample. SAMN02179859

Dugarjaviin M. 2014. Model organism or animal sample from Equus hemionus. NCBI BioSample. SAMN03010637

Kalbfleisch TS. 2014. Sample from Equus caballus. NCBI BioSample. SAMN02953672

Achilli A. 2012. Mitochondrial genomes from modern horses reveal the major haplogroups that underwent domestication. NCBI GenBank. 347361635

Vilstrup JT. 2013. Mitochondrial phylogenomics of modern and ancient equids. NCBI GenBank. JX312719

Vilstrup JT. 2013. Mitochondrial phylogenomics of modern and ancient equids. NCBI GenBank. JX312721

Vilstrup JT. 2013. Mitochondrial phylogenomics of modern and ancient equids. NCBI GenBank. JX312725

Vilstrup JT. 2013. Mitochondrial phylogenomics of modern and ancient equids. NCBI GenBank. JX312730

Vilstrup JT. 2013. Mitochondrial phylogenomics of modern and ancient equids. NCBI GenBank. JX312732

Vilstrup JT. 2013. Mitochondrial phylogenomics of modern and ancient equids. NCBI GenBank. JX312734

Der Sarkissian C. 2015. Mitochondrial genomes reveal the extinct Hippidion as an outgroup to all living equids. NCBI GenBank. KM881671

Libradoa P. 2015. Tracking the origins of Yakutian horses and the genetic basis for their fast adaptation to subarctic environments. NCBI GenBank. KT368725

Orlando L. 2016. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. NCBI GenBank. KT757740

Orlando L. 2016. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. NCBI GenBank. KT757741

Druzhkova AS, Makunin AI, Vorobieva NV, Vasiliev SK, Ovodov ND, Shunkov MV, Trifonov VA, Graphodatsky AS. 2017. Complete mitochondrial genome of an extinct Equus (Sussemionus) ovodovi specimen from Denisova cave (Altai, Russia) NCBI GenBank. KY114520

Heintzman PD, Zazula GD, MacPhee R, Scott E, Cahill JA, McHorse BK, Kapp JD, Stiller M, Wooller MJ, Orlando L, Southon J, Froese DG, Shapiro B. 2018. A new genus of horse from Pleistocene North America. NCBI GenBank. MF134655

Xu X, Gullberg A, Arnason U. 2016. The complete mitochondrial DNA (mtDNA) of the donkey and mtDNA comparisons among four closely related mammalian species-pairs. NCBI GenBank. X97337

References

Bendrey R. New methods for the identification of evidence for bitting on horse remains from archaeological sites. Journal of Archaeological Science. 2007;34:1036–1050. doi: 10.1016/j.jas.2006.09.010. [DOI] [Google Scholar]
Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, Heled J, Jones G, Kühnert D, De Maio N, Matschiner M, Mendes FK, Müller NF, Ogilvie HA, du Plessis L, Popinga A, Rambaut A, Rasmussen D, Siveroni I, Suchard MA, Wu CH, Xie D, Zhang C, Stadler T, Drummond AJ. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLOS Computational Biology. 2019;15:e1006650. doi: 10.1371/journal.pcbi.1006650. [DOI] [PMC free article] [PubMed] [Google Scholar]
Briggs AW, Stenzel U, Johnson PLF, Green RE, Kelso J, Prüfer K, Meyer M, Krause J, Ronan MT, Lachmann M, Pääbo S. Patterns of damage in genomic DNA sequences from a Neandertal. PNAS. 2007;104:14616–14621. doi: 10.1073/pnas.0704665104. [DOI] [PMC free article] [PubMed] [Google Scholar]
Broad Institute Picard toolkit. d784ca3Github. 2019 https://github.com/broadinstitute/picard
Clark PU, Dyke AS, Shakun JD, Carlson AE, Clark J, Wohlfarth B, Mitrovica JX, Hostetler SW, McCabe AM. The Last Glacial Maximum. Science (New York, N.Y.) 2009;325:710–714. doi: 10.1126/science.1172873. [DOI] [PubMed] [Google Scholar]
Der Sarkissian C, Vilstrup JT, Schubert M, Seguin-Orlando A, Eme D, Weinstock J, Alberdi MT, Martin F, Lopez PM, Prado JL, Prieto A, Douady CJ, Stafford TW, Willerslev E, Orlando L. Mitochondrial genomes reveal the extinct Hippidion as an outgroup to all living equids. Biology Letters. 2015;11:20141058. doi: 10.1098/rsbl.2014.1058. [DOI] [PMC free article] [PubMed] [Google Scholar]
Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology. 2007;7:214. doi: 10.1186/1471-2148-7-214. [DOI] [PMC free article] [PubMed] [Google Scholar]
Druzhkova AS, Makunin AI, Vorobieva NV, Vasiliev SK, Ovodov ND, Shunkov MV, Trifonov VA, Graphodatsky AS. Complete mitochondrial genome of an extinct Equus (Sussemionus) ovodovi specimen from Denisova cave (Altai, Russia. Mitochondrial DNA. Part B, Resources. 2017;2:79–81. doi: 10.1080/23802359.2017.1285209. [DOI] [PMC free article] [PubMed] [Google Scholar]
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eisenmann V. Sussemionus, a new subgenus of Equus (Perissodactyla, Mammalia. Comptes Rendus Biologies. 2010;333:235–240. doi: 10.1016/j.crvi.2009.12.013. [DOI] [PubMed] [Google Scholar]
Fages A, Hanghøj K, Khan N, Gaunitz C, Seguin-Orlando A, Leonardi M, McCrory Constantz C, Gamba C, Al-Rasheid KAS, Albizuri S, Alfarhan AH, Allentoft M, Alquraishi S, Anthony D, Baimukhanov N, Barrett JH, Bayarsaikhan J, Benecke N, Bernáldez-Sánchez E, Berrocal-Rangel L, Biglari F, Boessenkool S, Boldgiv B, Brem G, Brown D, Burger J, Crubézy E, Daugnora L, Davoudi H, de Barros Damgaard P, de Los Ángeles de Chorro Y de Villa-Ceballos M, Deschler-Erb S, Detry C, Dill N, do Mar Oom M, Dohr A, Ellingvåg S, Erdenebaatar D, Fathi H, Felkel S, Fernández-Rodríguez C, García-Viñas E, Germonpré M, Granado JD, Hallsson JH, Hemmer H, Hofreiter M, Kasparov A, Khasanov M, Khazaeli R, Kosintsev P, Kristiansen K, Kubatbek T, Kuderna L, Kuznetsov P, Laleh H, Leonard JA, Lhuillier J, Liesau von Lettow-Vorbeck C, Logvin A, Lõugas L, Ludwig A, Luis C, Arruda AM, Marques-Bonet T, Matoso Silva R, Merz V, Mijiddorj E, Miller BK, Monchalov O, Mohaseb FA, Morales A, Nieto-Espinet A, Nistelberger H, Onar V, Pálsdóttir AH, Pitulko V, Pitskhelauri K, Pruvost M, Rajic Sikanjic P, Rapan Papeša A, Roslyakova N, Sardari A, Sauer E, Schafberg R, Scheu A, Schibler J, Schlumbaum A, Serrand N, Serres-Armero A, Shapiro B, Sheikhi Seno S, Shevnina I, Shidrang S, Southon J, Star B, Sykes N, Taheri K, Taylor W, Teegen W-R, Trbojević Vukičević T, Trixl S, Tumen D, Undrakhbold S, Usmanova E, Vahdati A, Valenzuela-Lamas S, Viegas C, Wallner B, Weinstock J, Zaibert V, Clavel B, Lepetz S, Mashkour M, Helgason A, Stefánsson K, Barrey E, Willerslev E, Outram AK, Librado P, Orlando L. Tracking Five Millennia of Horse Management with Extensive Ancient Genome Time Series. Cell. 2019;177:1419–1435. doi: 10.1016/j.cell.2019.03.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fan J, Wang X, Yang J, Liu S, Zhu Z, Lv J, Chen G, Jia C. The Excavation of the Beiyuan Site at Shatang Town in Longde County, Ningxia in 2013. Relics and Museolgy. 2017;6:3–12. [Google Scholar]
Gaunitz C, Fages A, Hanghøj K, Albrechtsen A, Khan N, Schubert M, Seguin-Orlando A, Owens IJ, Felkel S, Bignon-Lau O, de Barros Damgaard P, Mittnik A, Mohaseb AF, Davoudi H, Alquraishi S, Alfarhan AH, Al-Rasheid KAS, Crubézy E, Benecke N, Olsen S, Brown D, Anthony D, Massy K, Pitulko V, Kasparov A, Brem G, Hofreiter M, Mukhtarova G, Baimukhanov N, Lõugas L, Onar V, Stockhammer PW, Krause J, Boldgiv B, Undrakhbold S, Erdenebaatar D, Lepetz S, Mashkour M, Ludwig A, Wallner B, Merz V, Merz I, Zaibert V, Willerslev E, Librado P, Outram AK, Orlando L. Ancient genomes revisit the ancestry of domestic and Przewalski’s horses. Science (New York, N.Y.) 2018;360:111–114. doi: 10.1126/science.aao3297. [DOI] [PubMed] [Google Scholar]
Gronau I, Hubisz MJ, Gulko B, Danko CG, Siepel A. Bayesian inference of ancient human demography from individual genome sequences. Nature Genetics. 2011;43:1031–1034. doi: 10.1038/ng.937. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heintzman PD, Zazula GD, MacPhee RD, Scott E, Cahill JA, McHorse BK, Kapp JD, Stiller M, Wooller MJ, Orlando L, Southon J, Froese DG, Shapiro B. A new genus of horse from Pleistocene North America. eLife. 2017;6:e29944. doi: 10.7554/eLife.29944. [DOI] [PMC free article] [PubMed] [Google Scholar]
Henn BM, Cavalli-Sforza LL, Feldman MW. The great human expansion. PNAS. 2012;109:17758–17764. doi: 10.1073/pnas.1212380109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang J, Zhao Y, Bai D, Shiraigol W, Li B, Yang L, Wu J, Bao W, Ren X, Jin B, Zhao Q, Li A, Bao S, Bao W, Xing Z, An A, Gao Y, Wei R, Bao Y, Bao T, Han H, Bai H, Bao Y, Zhang Y, Daidiikhuu D, Zhao W, Liu S, Ding J, Ye W, Ding F, Sun Z, Shi Y, Zhang Y, Meng H, Dugarjaviin M. Donkey genome and insight into the imprinting of fast karyotype evolution. Scientific Reports. 2015;5:14106. doi: 10.1038/srep14106. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jónsson H, Ginolhac A, Schubert M, Johnson PLF, Orlando L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics (Oxford, England) 2013;29:1682–1684. doi: 10.1093/bioinformatics/btt193. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jónsson H, Schubert M, Seguin-Orlando A, Ginolhac A, Petersen L, Fumagalli M, Albrechtsen A, Petersen B, Korneliussen TS, Vilstrup JT, Lear T, Myka JL, Lundquist J, Miller DC, Alfarhan AH, Alquraishi SA, Al-Rasheid KAS, Stagegaard J, Strauss G, Bertelsen MF, Sicheritz-Ponten T, Antczak DF, Bailey E, Nielsen R, Willerslev E, Orlando L. Speciation with gene flow in equids despite extensive chromosomal plasticity. PNAS. 2014;111:18655–18660. doi: 10.1073/pnas.1412627111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Juan L, Dawei Z, Julie AD. Increased number and differentiation of neural precursor cells in the brainstem of superoxide dismutase 1(G93A) (G1H) transgenic mouse model of amyotrophic lateral sclerosis. Neurological Research. 2007;29:204–209. doi: 10.1179/174313206X152519. [DOI] [PubMed] [Google Scholar]
Kalbfleisch TS, Rice ES, DePriest MS, Walenz BP, Hestand MS, Vermeesch JR, O Connell BL, Fiddes IT, Vershinina AO, Saremi NF, Petersen JL, Finno CJ, Bellone RR, McCue ME, Brooks SA, Bailey E, Orlando L, Green RE, Miller DC, Antczak DF, MacLeod JN. Improved reference genome for the domestic horse increases assembly contiguity and composition. Communications Biology. 2018;1:197. doi: 10.1038/s42003-018-0199-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Keane TM, Creevey CJ, Pentony MM, Naughton TJ, Mclnerney JO. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evolutionary Biology. 2006;6:29. doi: 10.1186/1471-2148-6-29. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khenzykhenova FI, Shchetnikov AA, Sato T, Erbajeva MA, Semenei EY, Lipnina EA, Yoshida K, Kato H, Filinov II, Tumurov EG, Alexeeva N, Lokhov DN. Ecosystem analysis of Baikal Siberia using Palaeolithic faunal assemblages to reconstruct MIS 3 - MIS 2 environments and climate. Quaternary International. 2016;425:16–27. doi: 10.1016/j.quaint.2016.06.026. [DOI] [Google Scholar]
Korneliussen TS, Albrechtsen A, Nielsen R. ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics. 2014;15:356. doi: 10.1186/s12859-014-0356-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kozlov AM, Aberer AJ, Stamatakis A. ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics (Oxford, England) 2015;31:2577–2579. doi: 10.1093/bioinformatics/btv184. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: A fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics (Oxford, England) 2019;35:4453–4455. doi: 10.1093/bioinformatics/btz305. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lanfear R, Calcott B, Ho SYW, Guindon S. Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution. 2012;29:1695–1701. doi: 10.1093/molbev/mss020. [DOI] [PubMed] [Google Scholar]
Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Research. 2016;44:W242–W245. doi: 10.1093/nar/gkw290. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England) 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England) 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475:493–496. doi: 10.1038/nature10231. [DOI] [PMC free article] [PubMed] [Google Scholar]
Librado P, Khan N, Fages A, Kusliy MA, Suchan T, Tonasso-Calvière L, Schiavinato S, Alioglu D, Fromentier A, Perdereau A, Aury J-M, Gaunitz C, Chauvey L, Seguin-Orlando A, Der Sarkissian C, Southon J, Shapiro B, Tishkin AA, Kovalev AA, Alquraishi S, Alfarhan AH, Al-Rasheid KAS, Seregély T, Klassen L, Iversen R, Bignon-Lau O, Bodu P, Olive M, Castel J-C, Boudadi-Maligne M, Alvarez N, Germonpré M, Moskal-Del Hoyo M, Wilczyński J, Pospuła S, Lasota-Kuś A, Tunia K, Nowak M, Rannamäe E, Saarma U, Boeskorov G, Lōugas L, Kyselý R, Peške L, Bălășescu A, Dumitrașcu V, Dobrescu R, Gerber D, Kiss V, Szécsényi-Nagy A, Mende BG, Gallina Z, Somogyi K, Kulcsár G, Gál E, Bendrey R, Allentoft ME, Sirbu G, Dergachev V, Shephard H, Tomadini N, Grouard S, Kasparov A, Basilyan AE, Anisimov MA, Nikolskiy PA, Pavlova EY, Pitulko V, Brem G, Wallner B, Schwall C, Keller M, Kitagawa K, Bessudnov AN, Bessudnov A, Taylor W, Magail J, Gantulga J-O, Bayarsaikhan J, Erdenebaatar D, Tabaldiev K, Mijiddorj E, Boldgiv B, Tsagaan T, Pruvost M, Olsen S, Makarewicz CA, Valenzuela Lamas S, Albizuri Canadell S, Nieto Espinet A, Iborra MP, Lira Garrido J, Rodríguez González E, Celestino S, Olària C, Arsuaga JL, Kotova N, Pryor A, Crabtree P, Zhumatayev R, Toleubaev A, Morgunova NL, Kuznetsova T, Lordkipanize D, Marzullo M, Prato O, Bagnasco Gianni G, Tecchiati U, Clavel B, Lepetz S, Davoudi H, Mashkour M, Berezina NY, Stockhammer PW, Krause J, Haak W, Morales-Muñiz A, Benecke N, Hofreiter M, Ludwig A, Graphodatsky AS, Peters J, Kiryushin KY, Iderkhangai T-O, Bokovenko NA, Vasiliev SK, Seregin NN, Chugunov KV, Plasteeva NA, Baryshnikov GF, Petrova E, Sablin M, Ananyevskaya E, Logvin A, Shevnina I, Logvin V, Kalieva S, Loman V, Kukushkin I, Merz I, Merz V, Sakenov S, Varfolomeyev V, Usmanova E, Zaibert V, Arbuckle B, Belinskiy AB, Kalmykov A, Reinhold S, Hansen S, Yudin AI, Vybornov AA, Epimakhov A, Berezina NS, Roslyakova N, Kosintsev PA, Kuznetsov PF, Anthony D, Kroonen GJ, Kristiansen K, Wincker P, Outram A, Orlando L. The origins and spread of domestic horses from the Western Eurasian steppes. Nature. 2021;598:634–640. doi: 10.1038/s41586-021-04018-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Librado P, Orlando L. Genomics and the Evolutionary History of Equids. Annual Review of Animal Biosciences. 2021;9:81–101. doi: 10.1146/annurev-animal-061220-023118. [DOI] [PubMed] [Google Scholar]
Lindsay EH, Opdyke ND, Johnson NM. Pliocene dispersal of the horse Equus and late Cenozoic mammalian dispersal events. Nature. 1980;287:135–138. doi: 10.1038/287135a0. [DOI] [Google Scholar]
Malikov DG. The large mammals of North-Minusinsk basin in the Last Glacial period. Quaternary International. 2016;420:208–220. doi: 10.1016/j.quaint.2015.10.055. [DOI] [Google Scholar]
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meisner J, Albrechtsen A. Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data. Genetics. 2018;210:719–731. doi: 10.1534/genetics.118.301336. [DOI] [PMC free article] [PubMed] [Google Scholar]
Orlando L, Metcalf JL, Alberdi MT, Telles-Antunes M, Bonjean D, Otte M, Martin F, Eisenmann V, Mashkour M, Morello F, Prado JL, Salas-Gismondi R, Shockey BJ, Wrinn PJ, Vasil’ev SK, Ovodov ND, Cherry MI, Hopwood B, Male D, Austin JJ, Hänni C, Cooper A. Revising the recent evolutionary history of equids using ancient DNA. PNAS. 2009;106:21754–21759. doi: 10.1073/pnas.0903672106. [DOI] [PMC free article] [PubMed] [Google Scholar]
Orlando L, Ginolhac A, Zhang G, Froese D, Albrechtsen A, Stiller M, Schubert M, Cappellini E, Petersen B, Moltke I, Johnson PLF, Fumagalli M, Vilstrup JT, Raghavan M, Korneliussen T, Malaspinas A-S, Vogt J, Szklarczyk D, Kelstrup CD, Vinther J, Dolocan A, Stenderup J, Velazquez AMV, Cahill J, Rasmussen M, Wang X, Min J, Zazula GD, Seguin-Orlando A, Mortensen C, Magnussen K, Thompson JF, Weinstock J, Gregersen K, Røed KH, Eisenmann V, Rubin CJ, Miller DC, Antczak DF, Bertelsen MF, Brunak S, Al-Rasheid KAS, Ryder O, Andersson L, Mundy J, Krogh A, Gilbert MTP, Kjær K, Sicheritz-Ponten T, Jensen LJ, Olsen JV, Hofreiter M, Nielsen R, Shapiro B, Wang J, Willerslev E. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature. 2013;499:74–78. doi: 10.1038/nature12323. [DOI] [PubMed] [Google Scholar]
Orlando L. The Evolutionary and Historical Foundation of the Modern Horse: Lessons from Ancient Genomics. Annual Review of Genetics. 2020;54:563–581. doi: 10.1146/annurev-genet-021920-011805. [DOI] [PubMed] [Google Scholar]
Outram AK, Stear NA, Bendrey R, Olsen S, Kasparov A, Zaibert V, Thorpe N, Evershed RP. The earliest horse harnessing and milking. Science (New York, N.Y.) 2009;323:1332–1335. doi: 10.1126/science.1168594. [DOI] [PubMed] [Google Scholar]
Palkopoulou E, Mallick S, Skoglund P, Enk J, Rohland N, Li H, Omrak A, Vartanyan S, Poinar H, Götherström A, Reich D, Dalén Love. Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Current Biology. 2015;25:1395–1400. doi: 10.1016/j.cub.2015.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLOS Genetics. 2012;8:e1002967. doi: 10.1371/journal.pgen.1002967. [DOI] [PMC free article] [PubMed] [Google Scholar]
Plasteeva NA, Vasiliev SK, Kosintsev PA. Equus (Sussemionus) ovodovi Eisenmann et Vasiliev, 2011 from the Late Pleistocene of Western Siberia. Russian Journal of Theriology. 2015;14:187–200. doi: 10.15298/rusjtheriol.14.2.07. [DOI] [Google Scholar]
Renaud G, Petersen B, Seguin-Orlando A, Bertelsen MF, Waller A, Newton R, Paillot R, Bryant N, Vaudin M, Librado P, Orlando L. Improved de novo genomic assembly for the domestic donkey. Science Advances. 2018;4:eaaq0392. doi: 10.1126/sciadv.aaq0392. [DOI] [PMC free article] [PubMed] [Google Scholar]
Renaud G, Hanghøj K, Korneliussen TS, Willerslev E, Orlando L. Joint Estimates of Heterozygosity and Runs of Homozygosity for Modern and Ancient Samples. Genetics. 2019;212:587–614. doi: 10.1534/genetics.119.302057. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rossel S, Marshall F, Peters J, Pilgram T, Adams MD, O’Connor D. Domestication of the donkey: timing, processes, and indicators. PNAS. 2008;105:3715–3720. doi: 10.1073/pnas.0709692105. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schubert M, Ginolhac A, Lindgreen S, Thompson JF, Al-Rasheid KAS, Willerslev E, Krogh A, Orlando L. Improving ancient DNA read mapping against modern reference genomes. BMC Genomics. 2012;13:178. doi: 10.1186/1471-2164-13-178. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schubert M, Ermini L, Der Sarkissian C, Jónsson H, Ginolhac A, Schaefer R, Martin MD, Fernández R, Kircher M, McCue M, Willerslev E, Orlando L. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nature Protocols. 2014a;9:1056–1082. doi: 10.1038/nprot.2014.063. [DOI] [PubMed] [Google Scholar]
Schubert M, Jónsson H, Chang D, Der Sarkissian C, Ermini L, Ginolhac A, Albrechtsen A, Dupanloup I, Foucal A, Petersen B, Fumagalli M, Raghavan M, Seguin-Orlando A, Korneliussen TS, Velazquez AMV, Stenderup J, Hoover CA, Rubin CJ, Alfarhan AH, Alquraishi SA, Al-Rasheid KAS, MacHugh DE, Kalbfleisch T, MacLeod JN, Rubin EM, Sicheritz-Ponten T, Andersson L, Hofreiter M, Marques-Bonet T, Gilbert MTP, Nielsen R, Excoffier L, Willerslev E, Shapiro B, Orlando L. Prehistoric genomes reveal the genetic foundation and cost of horse domestication. PNAS. 2014b;111:E5661–E5669. doi: 10.1073/pnas.1416991111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schubert M, Lindgreen S, Orlando L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes. 2016;9:88. doi: 10.1186/s13104-016-1900-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Seguin-Orlando A, Gamba C, Sarkissian CD, Ermini L, Louvel G, Boulygina E, Sokolov A, Nedoluzhko A, Lorenzen ED, Lopez P, McDonald HG, Scott E, Tikhonov A, Stafford TW, Jr, Alfarhan AH, Alquraishi SA, Al-Rasheid KAS, Shapiro B, Willerslev E, Prokhortchouk E, Orlando L. Pros and cons of methylation-based enrichment methods for ancient DNA. Scientific Reports. 2015;5:11826. doi: 10.1038/srep11826. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shchetnikov AA, Klementiev AM, Filinov IA, Semeney EYu. Large mammals from the Upper Neopleistocene reference sections in the Tunka rift valley, southwestern Baikal Region. Stratigraphy and Geological Correlation. 2015;23:214–236. doi: 10.1134/S0869593815020057. [DOI] [Google Scholar]
Slon V, Mafessoni F, Vernot B, de Filippo C, Grote S, Viola B, Hajdinjak M, Peyrégne S, Nagel S, Brown S, Douka K, Higham T, Kozlikin MB, Shunkov MV, Derevianko AP, Kelso J, Meyer M, Prüfer K, Pääbo S. The genome of the offspring of a Neanderthal mother and a Denisovan father. Nature. 2018;561:113–116. doi: 10.1038/s41586-018-0455-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Soraggi S, Wiuf C, Albrechtsen A. Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data. G3: Genes, Genomes, Genetics. 2018;8:551–566. doi: 10.1534/g3.117.300192. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics (Oxford, England) 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Taylor W, Tuvshinjargal T. In: Care or Neglect?: Evidence of Animal Disease in Archaeology ; Proceedings of the 6th Meeting of the Animal Palaeopathology Working Group of the International Council for Archaeozoology (ICAZ) Bartosiewicz László, Gál Erika., editors. Oxbow Books; 2018. Horseback riding, asymmetry, and anthropogenic changes to the equine skull: evidence for mounted riding in Mongolia’s late Bronze Age; pp. 134–154. [Google Scholar]
Vasiliev SK. Large Mammal Fauna from the Pleistocene Deposits of Chagyrskaya Cave Northwestern Altai (based on 2007–2011 Excavations. Archaeology, Ethnology and Anthropology of Eurasia. 2013;41:28–44. doi: 10.1016/j.aeae.2013.07.003. [DOI] [Google Scholar]
Vershinina AO, Heintzman PD, Froese DG, Zazula G, Cassatt-Johnstone M, Dalén L, Der Sarkissian C, Dunn SG, Ermini L, Gamba C, Groves P, Kapp JD, Mann DH, Seguin-Orlando A, Southon J, Stiller M, Wooller MJ, Baryshnikov G, Gimranov D, Scott E, Hall E, Hewitson S, Kirillova I, Kosintsev P, Shidlovsky F, Tong HW, Tiunov MP, Vartanyan S, Orlando L, Corbett-Detig R, MacPhee RD, Shapiro B. Ancient horse genomes reveal the timing and extent of dispersals across the Bering Land Bridge. Molecular Ecology. 2021;30:6144–6161. doi: 10.1111/mec.15977. [DOI] [PubMed] [Google Scholar]
Vilstrup JT, Seguin-Orlando A, Stiller M, Ginolhac A, Raghavan M, Nielsen SCA, Weinstock J, Froese D, Vasiliev SK, Ovodov ND, Clary J, Helgen KM, Fleischer RC, Cooper A, Shapiro B, Orlando L. Mitochondrial phylogenomics of modern and ancient equids. PLOS ONE. 2013;8:e55950. doi: 10.1371/journal.pone.0055950. [DOI] [PMC free article] [PubMed] [Google Scholar]
vonHoldt BM, Cahill JA, Fan Z, Gronau I, Robinson J, Pollinger JP, Shapiro B, Wall J, Wayne RK. Whole-genome sequence analysis shows that two endemic species of North American wolf are admixtures of the coyote and gray wolf. Science Advances. 2016;2:e1501714. doi: 10.1126/sciadv.1501714. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Z, Guo X, Kang N, Liu X, Hu K, Chen J. Preliminary Report on the Excavation of Muzhuzhuliang Site in Shenmu, Shaanxi. Archaeology and Cultural Relics. 2015;5:2015 [Google Scholar]
Yang DY, Eng B, Waye JS, Dudar JC, Saunders SR. Technical note: improved DNA extraction from ancient bones using silica-based spin columns. American Journal of Physical Anthropology. 1998;105:539–543. doi: 10.1002/(SICI)1096-8644(199804)105:4<539::AID-AJPA10>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]
Yuan J, Flad R. Research on Early Horse Domestication. China: Oxbow Books; 2006. [Google Scholar]
Yuan JX, Hou XD, Barlow A, Preick M, Taron UH, Alberti F, Basler N, Deng T, Lai XL, Hofreiter M, Sheng G. Molecular identification of late and terminal Pleistocene Equus ovodovi from northeastern China. PLOS ONE. 2019;14:e0216883. doi: 10.1371/journal.pone.0216883. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang W, Tian H, Li Y, Liu W. The Honghe Site in Qiqihar City, Heilongjiang. Archaeology. 2020;7:20–33. doi: 10.16359/j.cnki.cn11-1963/q.2020.0025. [DOI] [Google Scholar]

eLife. doi: 10.7554/eLife.73346.sa0

Editor's evaluation

George H Perry ¹

This article represents multiple milestones in our understanding of the evolution and extinction of Pleistocene equids, including revising the timing of extinction and clarifying the evolutionary history of Equus (Sussemionus) ovodovi. The discovery of the late persistence of non-caballine equid taxa in Northern China until deep into the late Holocene is particularly important. This finding will be of broad interest to the paleontology, paleoecology, archaeology, and paleogenomic communities and should stimulate important future research into equid extinction processes.

eLife. doi: 10.7554/eLife.73346.sa1

Decision letter

Editor: George H Perry¹

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Ancient genomes redate the extinction of Sussemionus, a subgenus of Equus, to late Holocene" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and George Perry as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential Revisions include providing additional detail on existing datasets, the adjustment and re-processing of some analyses, and the addition of a section on archaeological context and expanded discussion on regional zooarchaeological implications of your findings. These points and more excellent feedback are detailed in the below reviews. In addition, I agree with reviewer #2s suggestion to revise your title for the next version of your paper.

Overall: Well done on this paper. We look forward to receiving and reviewing your revision. I will share it with the two reviewers to confirm that their comments have been suitably addressed.

Reviewer #2 (Recommendations for the authors):

I suggest the author's reconsider their manuscript title as it is the radiocarbon dates, rather than the ancient genomes, that re-date the extinction.

Given the specimens under study are isolated skeletal elements from a handful of stratigraphic horizons, it is possible that some of the specimens belonged to the same individual. There should be a description in the Methods about the minimum number of individuals at each locality. The mitogenomic data would prove valuable in this regard.

The authors use relative X-chromosome and autosomal coverage to distinguish male and female individuals (L104-106). However, these assignments should be justified with data. A new supplementary table with mean coverage of the autosomes and the X chromosome together with the ratio between the two would suffice.

There are issues with the BEAST-derived Bayesian phylogeny (Figure 2 —figure supplement 6). First, the ages of the tips do not appear to have been constrained with the known ages of the ancient specimens (or they are out by an order of magnitude). For example, JX312734 looks to be ~400,000 years old, when its age (based on Table S4) is 40,000 years old. Note also that ancient E. caballus individuals have been constrained as modern. Second, the constant population and strict clock models used (L538) are not suitable for the interspecies analysis used here. The authors should consider the Birth-death serially-sampled and relaxed clock models. Third, there is no description of how the molecular clock was calibrated (fossil, previous genomic estimate, and/or fixed mutation rate). I refer the authors to two publications for reference (dois: 10.1111/mec.15977, 10.7554/eLife.29944).

The authors present two D-statistics analyses based on alignment to either the horse or donkey reference genomes, which give very different results (Figure 3 —figure supplement 1 and 2). Presumably the larger D-stats when an African ass is included in the donkey reference analysis is an artifact of reference genome bias, and so the horse genome (outgroup) results should be considered more reliable. The authors should add a statement about this disagreement and an explanation in the Results, as this will not be clear to non-specialists, especially given the statement on L579-581 that the donkey reference genome analysis was included to 'eliminate the bias of the reference genome'.

It is stated that Z-scores of {greater than or equal to}3 were considered statistically significant on L577-579. However, the Z-score data are not presented and so it is not possible to determine which of the D-statistics in Figure 3 —figure supplement 1 and 2 are significant or not (L793, L801).

The G-PhoCS analysis suggests major introgression between the early non-caballine equid lineages. However, none of these events are recovered in the TreeMix analyses even with up to three migration edges considered. The apparently conflicting signal between these two analyses needs to be explained.

It will not be clear to non-specialists how the total migration rate for a single direction can be >100% (Tables S6 and S7). Please add a statement in the Methods as to why this occurs.

More details are needed in the G-PhoCs methods to enable reproducibility. Specifically, how were non-coding RNA genes identified and removed (L599-600), and what thresholds and methods were used to detect enriched misaligned bases and high false negative rates (L602-604)? In this regard, it may be helpful if the authors made their code available for these methods.

There is no discussion of the discordance between the mitochondrial and nuclear genome trees, which the G-PHoCS analysis seems to shed some light upon. I invite the authors to comment on this.

I almost missed that the authors have already made their raw sequence data publicly available (included in eLife manuscript information but not in the manuscript). To ensure readers can easily find the raw data, I suggest that the authors give a link to the European Nucleotide Archive BioProject code on L456.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Ancient DNA research redates the extinction of Sussemionus, a subgenus of Equus, to late Holocene" for further consideration by eLife. Your revised article has been evaluated by George Perry (Senior Editor) and the two reviewers of the previous version of your paper.

The manuscript has been improved but there are some remaining issues that need to be addressed. Detailed points of required revision are noted below, but in addition I will note that in my view your manuscript revisions in response to excellent and important points made by the reviewers in their original review are too superficial in multiple instances, e.g. in cases where expanded discussion or incorporation of particular concepts into your interpretation were requested, but the points were addressed with the addition of a short sentence only rather than taking the opportunity to maximally improve the manuscript, which is what we expect.

Thus, in your response, please detail the further revisions you made to the previous set of review comments, with the above in mind, in addition to further point-by-point responses to the specific comments below. This will be the final opportunity to revise your manuscript.

1. The issue with the title is not resolved.

2. Please provide expanded information on the Β Analytic 14C methods and results, if at all possible. (please contact the company for more specific information on how the samples were processed, for the sake of methodological completeness and data reproducibility).

3. The level of detail in the new archaeological background paragraph should be further improved. The revisions do accomplish the important goal of pointing the reader to the relevant background literature/citations, but acknowledgment and/or summary of the state of knowledge of the archaeological record of equids in the study region is incomplete. At the least this should include reference to Yuan and Flad's 2006 summary (Research on Early Horse Domestication in China. In Equids in Time and Space, ed. by Marjan Mashkour, pp. 124-131.)

4. The new sentence on 102-103 should be removed, and the sentence ending this paragraph on lines 104-105 needs to explain much more specifically what "no traces of domestication" means (e.g. no paleopathological problems?) and what "indicates they were hunted for food" (e.g. butchery patterns indicating meat removal? arrowheads imbedded in bone?).

5. It is unclear where the information about minimum number of individuals is coming from. The authors state that there were at least "31 individuals in the Honghe samples" (L460), yet there are only 20 samples from this locality in Supplementary File 1a. Further, the authors need to expand on the statement "the same process is repeated for the other two sites to ensure the specimens are unique" by including comparable counts.

6. We thank the authors for applying some of the suggested changes to the BEAST analysis. However, the tip dates for known sample ages have still not been constrained. Contrary to the rebuttal letter, there should not be any "deviations from the known ages" as these parameters should be fixed.

7. The explanation for the discordance between the G-PhoCS and TreeMix analyses needs to be stated in the manuscript.

8. G-PhoCS burn-in: although the MCMC run settings may use a burn-in of 0, the burn-in needs to be removed during post-processing (as is stated in the G-PhoCS user manual). For example, Vershinina et al. 2021 (doi: 10.1111/mec.15977) used a burn-in of 10%. This needs to be applied.

9. Figure 4 figure supplement 2: This is still confusing to the reader. The panel keys should be updated to reflect that all PSMC plots are based on E. a. somalicus.

10. Supplementary File 1c: correct 'gender' to 'sex'

11. Supplementary File 1e: correct 'yBC'. The age given for Haringtonhippus francisci is uncalibrated.

eLife. 2022 May 11;11:e73346. doi: 10.7554/eLife.73346.sa2

Author response

Reviewer #2 (Recommendations for the authors):

I suggest the author's reconsider their manuscript title as it is the radiocarbon dates, rather than the ancient genomes, that re-date the extinction.

Thank you for your suggestion, and we have revised the title accordingly: “Ancient DNA research redates the extinction of Sussemionus, a subgenus of Equus, to late Holocene”.

Given the specimens under study are isolated skeletal elements from a handful of stratigraphic horizons, it is possible that some of the specimens belonged to the same individual. There should be a description in the Methods about the minimum number of individuals at each locality. The mitogenomic data would prove valuable in this regard.

We thank the reviewer for pointing this out. The description about the minimum number of individuals was added in the Methods (lines 457-461): “Considering the preservation status and quantity, the minimum number of individuals was determined by assigning the frequency of hip bone and was calculated from the acetabular bone to avoid double-counting. Based on counts of skeletal elements, there is a minimum of 31 individuals in the Honghe samples. The same process is repeated for the other two sites to ensure the specimens are unique”. Meanwhile, the mitochondrial genome of each individual was visually checked to ensure they are unique according to your suggestion.

The authors use relative X-chromosome and autosomal coverage to distinguish male and female individuals (L104-106). However, these assignments should be justified with data. A new supplementary table with mean coverage of the autosomes and the X chromosome together with the ratio between the two would suffice.

We apologize for the lack of information here. We have now added Supplementary File 1c with mean coverage of the autosomes and the X chromosome together with the ratio between them.

There are issues with the BEAST-derived Bayesian phylogeny (Figure 2 —figure supplement 6). First, the ages of the tips do not appear to have been constrained with the known ages of the ancient specimens (or they are out by an order of magnitude). For example, JX312734 looks to be ~400,000 years old, when its age (based on Table S4) is 40,000 years old. Note also that ancient E. caballus individuals have been constrained as modern. Second, the constant population and strict clock models used (L538) are not suitable for the interspecies analysis used here. The authors should consider the Birth-death serially-sampled and relaxed clock models. Third, there is no description of how the molecular clock was calibrated (fossil, previous genomic estimate, and/or fixed mutation rate). I refer the authors to two publications for reference (dois: 10.1111/mec.15977, 10.7554/eLife.29944).

Thank you for this suggestion. First, we have set tip dates according to the ages of species in Supplementary File 1e. But considering an enormous range of time scales, few ancient specimens may showed some deviations from the known ages in Bayesian phylogeny.

Second, we have reconstructed Bayesian phylogeny using Birth-death model and relaxed molecular clock in Figure 2—figure supplement 4 according to your suggestion.

Third, we have added the description in lines 602-604: “we calibrated the tree using an age of 4–4.5 Mya for the root of crown group E. caballus (normal prior, mean 4.25 Mya, stdev: 0.15 Mya)” (see L. Orlando et al. (2013), https://www.nature.com/articles/nature12323).

The authors present two D-statistics analyses based on alignment to either the horse or donkey reference genomes, which give very different results (Figure 3 —figure supplement 1 and 2). Presumably the larger D-stats when an African ass is included in the donkey reference analysis is an artifact of reference genome bias, and so the horse genome (outgroup) results should be considered more reliable. The authors should add a statement about this disagreement and an explanation in the Results, as this will not be clear to non-specialists, especially given the statement on L579-581 that the donkey reference genome analysis was included to 'eliminate the bias of the reference genome'.

Thanks for spotting this. The similar statement was given on lines 647-649 in the manuscript, and we have added the sentence “Given the larger D-stats when an African ass is included in the donkey reference analysis is an artifact of reference genome bias, so that the horse reference genome results should be considered more reliable” in lines 649-652 according to the suggestion.

It is stated that Z-scores of {greater than or equal to}3 were considered statistically significant on L577-579. However, the Z-score data are not presented and so it is not possible to determine which of the D-statistics in Figure 3 —figure supplement 1 and 2 are significant or not (L793, L801).

We apologize for missing the legends to present the Z-score data. We have added the sentences “The nonsignificant results are shown in gray” in lines 887-888 and 895-896.

The G-PhoCS analysis suggests major introgression between the early non-caballine equid lineages. However, none of these events are recovered in the TreeMix analyses even with up to three migration edges considered. The apparently conflicting signal between these two analyses needs to be explained.

We thank the reviewer for the suggestion. Previous studies found that the TreeMix models will work best when gene flow between populations is restricted to a relatively short time period, situations of continuous migration violate this assumption and lead to unclear results (see Pickrell, Joseph K., and Pritchard, Jonathan K. (2012), https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002967). So compared with the G-PhoCS analysis, the TreeMix analyses had its limitation in an enormous range of time scales.

It will not be clear to non-specialists how the total migration rate for a single direction can be >100% (Tables S6 and S7). Please add a statement in the Methods as to why this occurs.

Thank you for this suggestion. We have now added a statement in the Methods. “The total migration rate gives an accumulated rate over a long period of time so that it can be >100%.” (lines 714-716)

More details are needed in the G-PhoCs methods to enable reproducibility. Specifically, how were non-coding RNA genes identified and removed (L599-600), and what thresholds and methods were used to detect enriched misaligned bases and high false negative rates (L602-604)? In this regard, it may be helpful if the authors made their code available for these methods.

Thanks for the suggestion. The non-coding RNA genes were identified using GFF annotation files.

We apologize for being unclear, and we have changed the sentence in lines 673-677: “Besides the various hard filters described above, regions/sites likely to be enriched for misaligned bases, and to have high false negative rates during read alignment or variant detection were masked as missing data. So in this case, different individuals may be treated differently depending on the result of genotyping in section “Variant calling” depending on the presence of (1) indels, (2) triallelic sites, (3) positions with depth of coverage twice the mean depth recorded for each individual, and; (4) transition sites”. Enriched misaligned bases and high false negative rates were embodied in (1) indels, (2) triallelic sites, (3) positions with depth of coverage twice the mean depth recorded for each individual, and; (4) transition sites.

There is no discussion of the discordance between the mitochondrial and nuclear genome trees, which the G-PHoCS analysis seems to shed some light upon. I invite the authors to comment on this.

This is certainly interesting suggestion. The discordance between the mitochondrial and nuclear genome trees can be caused by two reasons. First, mitochondrial DNA is maternally inherited and therefore variation in it will reflect disper-sal and history of the maternal lineage only. Second, two mitochondrial Maximum Likelihood trees based on all 6 partitions and excluding the control region were both reconstructed in Figure 2—figure supplement 5. It is not the latter but the former is discordant with nuclear genome trees, which may cause by exhibited significant increased damage in the mitochondrial control region.

I almost missed that the authors have already made their raw sequence data publicly available (included in eLife manuscript information but not in the manuscript). To ensure readers can easily find the raw data, I suggest that the authors give a link to the European Nucleotide Archive BioProject code on L456.

Thanks for the suggestion, and we have now given a link to the European Nucleotide Archive BioProject code in line 512.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed. Detailed points of required revision are noted below, but in addition I will note that in my view your manuscript revisions in response to excellent and important points made by the reviewers in their original review are too superficial in multiple instances, e.g. in cases where expanded discussion or incorporation of particular concepts into your interpretation were requested, but the points were addressed with the addition of a short sentence only rather than taking the opportunity to maximally improve the manuscript, which is what we expect.

Thus, in your response, please detail the further revisions you made to the previous set of review comments, with the above in mind, in addition to further point-by-point responses to the specific comments below. This will be the final opportunity to revise your manuscript.

1. The issue with the title is not resolved.

We have rephrased the title to indicate that it is the combination of both radiocarbon dating and phylogenomic that help reconsider the extinction/survival of Equus Sussemionus to the late Holocene. Our new title reads as follows:

“Radiocarbon and genomic evidence for the survival of Equus Sussemionus until the late Holocene”

2. Please provide expanded information on the Β Analytic 14C methods and results, if at all possible. (please contact the company for more specific information on how the samples were processed, for the sake of methodological completeness and data reproducibility).

We now provide the requested information as:

– A dedicated paragraph in the Methods section (page 29, lines 555-560: “Radiocarbon dating of the samples was performed at the Β Analytic Radiocarbon Dating Laboratory, Miami, Florida. Bone or tooth pieces about 2g were sampled in the bone and sent for subsequent dating of collagen (not ultrafiltered). Calibration was carried out using OxCalOnline (https://c14.arch.ox.ac.uk/oxcal.html) and the IntCal20 calibration curve. Calibrated dates are provided in Supplementary File 1b.”);

– A sentence in the main text (lines 142-146): “Combined, these samples were radiocarbon dated to 3,456-4,460 calibrated years before the present (cal BP), including a mid-second millennium BCE date for the most recent sample, HH13H (3270±30 uncal. BP, i.e. 3,456-3,616 cal BP) (Supplementary File 1b).”

– A table referring to laboratory numbers, uncalibrated estimates and confidence range, and calendar years calibrated estimates (IntCal20) (see Supplementary File 1b).

3. The level of detail in the new archaeological background paragraph should be further improved. The revisions do accomplish the important goal of pointing the reader to the relevant background literature/citations, but acknowledgment and/or summary of the state of knowledge of the archaeological record of equids in the study region is incomplete. At the least this should include reference to Yuan and Flad's 2006 summary (Research on Early Horse Domestication in China. In Equids in Time and Space, ed. by Marjan Mashkour, pp. 124-131.)

We apologize for being unclear. Based on all excavated equine fossil bones found at Honghe, the Minimum Number of Individuals (NMI) was estimated to 31 individuals. And because of the preservation status, ancient DNA sequences were recovered from 20 of the 31 samples (Supplementary File 1a). This is now fully detailed at page 26 (lines 491-501): “Minimum number of individuals (MNI) was determined by assigning the frequency of hip bone and was calculated from the acetabular bone to avoid double-counting. MNI was estimated to 31 individuals at Honghe, 4 at Muzhuzhuliang and 4 at Shatangbeiyuan. DNA preservation conditions were compatible with the recovery of ancient DNA sequences from only 20 of the 31 Honghe samples, 3 of the 4 Muzhuzhuliang samples, and 3 of the 4 Shatangbeiyuan samples (Supplementary File 1a).”

4. The new sentence on 102-103 should be removed, and the sentence ending this paragraph on lines 104-105 needs to explain much more specifically what "no traces of domestication" means (e.g. no paleopathological problems?) and what "indicates they were hunted for food" (e.g. butchery patterns indicating meat removal? arrowheads imbedded in bone?).

We have now removed the sentence indicated, and have rephrase the following ones, appearing on lines 117-125: “No obvious signs of domestication, including paleopathologies related to horseback riding, bridling or chariotry (Bendrey, 2007; Taylor and Tuvshinjargal 2018), were found amongst the equine specimens investigated at the three sites. In contrast, slash marks could be identified on some of the bones (HH13H, HH26H, and MZ104H), together with indications of bone marrow extraction (Figure 1—figure supplement 2). These findings suggest these specimens were hunted.”

5. It is unclear where the information about minimum number of individuals is coming from. The authors state that there were at least "31 individuals in the Honghe samples" (L460), yet there are only 20 samples from this locality in Supplementary File 1a. Further, the authors need to expand on the statement "the same process is repeated for the other two sites to ensure the specimens are unique" by including comparable counts.

6. We thank the authors for applying some of the suggested changes to the BEAST analysis. However, the tip dates for known sample ages have still not been constrained. Contrary to the rebuttal letter, there should not be any "deviations from the known ages" as these parameters should be fixed.

We have proceeded according to the editor’s suggestion and have now used tip-calibrations (average calibrated radiocarbon dates) in our BEAST analyses (Supplementary File 1j). The full procedure is now described on lines 633-649, with the resulting tree shown on Figure 2—figure supplement 4.

7. The explanation for the discordance between the G-PhoCS and TreeMix analyses needs to be stated in the manuscript.

We have added the requested explanation in lines 787-805 (pages 39-40): “We caution that the analyses carried out using TreeMix and G-PhoCS returned partly discordant results. This may be due to TreeMix modelling pulses of admixture in contrast to G-PhoCS, in which situations of continuous gene flow can be accommodated. Additionally, gene flow affecting the two deepest tree branches can be directly accommodated by reducing their divergence. Therefore, the deep admixture inferred by G-PhoCS from the caballine branch into the ancestral branch of Sussemiones and other non-caballine equids cannot be expected to be identified through an individual migration edge with TreeMix, as this could simply be modelled through a more limited divergence between both underlying lineages. The same holds true for the asymmetric gene flow inferred by G-PhoCS between Sussemiones and the branch ancestral to all extant asses; TreeMix is likely to only identify the resulting unidirectional contribution of these admixtures, which mainly sources to the branch ancestral to extant asses into Sussemiones; since G-PhoCS also infers additional admixture from the branch ancestral to extant zebras into Sussemiones, we can expect TreeMix to accommodate both sources of gene flow through a reduced divergence between Sussemiones and the branch ancestral to stenonines. Finally, it may reflect limitations pertaining to the two underlying data sets utilized, consisting on the one hand to the whole-genome SNP panel for TreeMix, further filtered for 15,324 candidate ‘neutral’ loci in G-PhoCS.”

8. G-PhoCS burn-in: although the MCMC run settings may use a burn-in of 0, the burn-in needs to be removed during post-processing (as is stated in the G-PhoCS user manual). For example, Vershinina et al. 2021 (doi: 10.1111/mec.15977) used a burn-in of 10%. This needs to be applied.

We apologize for the unclear explanation. Although we run the MCMC with a pre-set burn-in of 0, the first 10% iterations were removed as burn-in during post-processing with Tracer v1.6 (http://tree.bio.ed.ac.uk/software/tracer/). This is shown in Author response image 1. Accordingly, we have edited the sentence in lines 742-743 (page 37): “The final results in Figure 3 are based on 500,000 MCMC iterations, considering the first 10% as burn-in.”

9. Figure 4 figure supplement 2: This is still confusing to the reader. The panel keys should be updated to reflect that all PSMC plots are based on E. a. somalicus.

We apologize for being unclear. The figure captions have now been rephrased to describe the procedure followed. It reads as follows (pages 60-61, lines 961-969): “Figure 4—figure supplement 2. Determining the uniform false-negative rate (uFNR) that was necessary for scaling PSMC results. (A) HH06D (11.30×), (B) KIA (10.68×) and (C) ONA (18.38×). The most suitable uFNR values for rescaling the PSMC profile are reported between squared brackets, to the right of the species names considered. The PSMC trajectory retrieved when considering all the sequence data available for the SOM individual is shown in blue. The green line provides the PSMC trajectory reconstructed when down-sampling these data to the average genome depth of coverage obtained for the species examined (top: Equus Sussemionus, red; centre: Equus kiang, purple, and; bottom: E. hemionus, purple).”

10. Supplementary File 1c: correct 'gender' to 'sex'

Done.

11. Supplementary File 1e: correct 'yBC'. The age given for Haringtonhippus francisci is uncalibrated.

Done.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Zhu S, Gong M, Zhang N, Wen J, Liang Q, Sun W, Shao X, Guo Y, Cai Y, Zheng Z, Zhang W, Hu S, Wang X, Tian H, Li Y, Liu W, Yang M, Yang J, Wu D, Orlando L, Jiang Y, Cai D. 2021. Our sequence data provided 26 mitochondrial genomes and 3 complete nuclear genomes for Equus (Sussemionus) ovodovi. European Nucleotide Archive. PRJEB44527
Renaud G, Petersen B, Seguin-Orlando A, Bertelsen M F, Waller A, Newton R, Paillot R, Bryant N, Vaudin M, Librado P, Orlando L. 2018. This study aims at improving the genome reference of the domestic donkey using the Chicago/HiRiSe technology. European Nucleotide Archive. PRJEB24845
Jonsson H. 2014. Speciation with gene flow in equids despite extensive chromosomal plasticity. European Nucleotide Archive. PRJEB7446 [DOI] [PMC free article] [PubMed]
Ginolhac A. 2013. General Sample for Equus asinus asinus, Willy. NCBI BioSample. SAMN02179859
Dugarjaviin M. 2014. Model organism or animal sample from Equus hemionus. NCBI BioSample. SAMN03010637
Kalbfleisch TS. 2014. Sample from Equus caballus. NCBI BioSample. SAMN02953672
Achilli A. 2012. Mitochondrial genomes from modern horses reveal the major haplogroups that underwent domestication. NCBI GenBank. 347361635 [DOI] [PMC free article] [PubMed]
Vilstrup JT. 2013. Mitochondrial phylogenomics of modern and ancient equids. NCBI GenBank. JX312719 [DOI] [PMC free article] [PubMed]
Vilstrup JT. 2013. Mitochondrial phylogenomics of modern and ancient equids. NCBI GenBank. JX312721 [DOI] [PMC free article] [PubMed]
Vilstrup JT. 2013. Mitochondrial phylogenomics of modern and ancient equids. NCBI GenBank. JX312725 [DOI] [PMC free article] [PubMed]
Vilstrup JT. 2013. Mitochondrial phylogenomics of modern and ancient equids. NCBI GenBank. JX312730 [DOI] [PMC free article] [PubMed]
Vilstrup JT. 2013. Mitochondrial phylogenomics of modern and ancient equids. NCBI GenBank. JX312732 [DOI] [PMC free article] [PubMed]
Vilstrup JT. 2013. Mitochondrial phylogenomics of modern and ancient equids. NCBI GenBank. JX312734 [DOI] [PMC free article] [PubMed]
Der Sarkissian C. 2015. Mitochondrial genomes reveal the extinct Hippidion as an outgroup to all living equids. NCBI GenBank. KM881671 [DOI] [PMC free article] [PubMed]
Libradoa P. 2015. Tracking the origins of Yakutian horses and the genetic basis for their fast adaptation to subarctic environments. NCBI GenBank. KT368725 [DOI] [PMC free article] [PubMed]
Orlando L. 2016. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. NCBI GenBank. KT757740 [DOI] [PubMed]
Orlando L. 2016. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. NCBI GenBank. KT757741 [DOI] [PubMed]
Druzhkova AS, Makunin AI, Vorobieva NV, Vasiliev SK, Ovodov ND, Shunkov MV, Trifonov VA, Graphodatsky AS. 2017. Complete mitochondrial genome of an extinct Equus (Sussemionus) ovodovi specimen from Denisova cave (Altai, Russia) NCBI GenBank. KY114520 [DOI] [PMC free article] [PubMed]
Heintzman PD, Zazula GD, MacPhee R, Scott E, Cahill JA, McHorse BK, Kapp JD, Stiller M, Wooller MJ, Orlando L, Southon J, Froese DG, Shapiro B. 2018. A new genus of horse from Pleistocene North America. NCBI GenBank. MF134655 [DOI] [PMC free article] [PubMed]
Xu X, Gullberg A, Arnason U. 2016. The complete mitochondrial DNA (mtDNA) of the donkey and mtDNA comparisons among four closely related mammalian species-pairs. NCBI GenBank. X97337 [DOI] [PubMed]

Supplementary Materials

Supplementary file 1. Tables that support the analysis and results above.

elife-73346-supp1.docx^{(152.7KB, docx)}

Transparent reporting form

elife-73346-transrepform1.docx^{(117.3KB, docx)}

Data Availability Statement