Ancient mitogenomes show plateau populations from last 5200 years partially contributed to present-day Tibetans

Manyu Ding; Tianyi Wang; Albert Min-Shan Ko; Honghai Chen; Hui Wang; Guanghui Dong; Hongliang Lu; Wei He; Shargan Wangdue; Haibing Yuan; Yuanhong He; Linhai Cai; Zujun Chen; Guangliang Hou; Dongju Zhang; Zhaoxia Zhang; Peng Cao; Qingyan Dai; Xiaotian Feng; Ming Zhang; Hongru Wang; Melinda A Yang; Qiaomei Fu

doi:10.1098/rspb.2019.2968

. 2020 Mar 18;287(1923):20192968. doi: 10.1098/rspb.2019.2968

Ancient mitogenomes show plateau populations from last 5200 years partially contributed to present-day Tibetans

Manyu Ding ^1,^2,^3,^4,^†, Tianyi Wang ^1,^5,^†, Albert Min-Shan Ko ^1,^†, Honghai Chen ⁵, Hui Wang ^6,⁷, Guanghui Dong ⁸, Hongliang Lu ⁹, Wei He ¹⁰, Shargan Wangdue ¹⁰, Haibing Yuan ¹¹, Yuanhong He ⁹, Linhai Cai ¹², Zujun Chen ¹⁰, Guangliang Hou ¹³, Dongju Zhang ⁸, Zhaoxia Zhang ¹, Peng Cao ¹, Qingyan Dai ¹, Xiaotian Feng ¹, Ming Zhang ¹, Hongru Wang ¹, Melinda A Yang ^1,¹⁴, Qiaomei Fu ^1,^3,^4,^✉

PMCID: PMC7126037 PMID: 32183622

Abstract

The clarification of the genetic origins of present-day Tibetans requires an understanding of their past relationships with the ancient populations of the Tibetan Plateau. Here we successfully sequenced 67 complete mitochondrial DNA genomes of 5200 to 300-year-old humans from the plateau. Apart from identifying two ancient plateau lineages (haplogroups D4j1b and M9a1a1c1b1a) that suggest some ancestors of Tibetans came from low-altitude areas 4750 to 2775 years ago and that some were involved in an expansion of people moving between high-altitude areas 2125 to 1100 years ago, we found limited evidence of recent matrilineal continuity on the plateau. Furthermore, deep learning of the ancient data incorporated into simulation models with an accuracy of 97% supports that present-day Tibetan matrilineal ancestry received partial contribution rather than complete continuity from the plateau populations of the last 5200 years.

Keywords: ancient DNA, population genetics of humans, Tibetan prehistory

1. Introduction

The Tibetan Plateau has an average elevation of over 4000 metres above sea level (m.a.s.l.). It is the highest and largest plateau in the world and covers the Tibet Autonomous Region and Qinghai province of China. A high-altitude East Asian origin has been proposed for the 3150-year-old Himalayan populations to suggest that people are in continuity in the high-altitude region by at least 3000 years ago [1]. However, there were no approximately 3000-year-old ancient DNA from Tibet for comparison, so it is unclear what the population was in Tibet around that time that might have been replaced. It was an important period in the prehistory of Tibet because it was the introduction of cold-tolerant barley from the lower northeastern border of the plateau some 3600 years ago (3.6 ka) that enabled a sustained human presence above 2500 m.a.s.l. [2]. A problem remained about whether it was a transfer of agricultural innovation or if it involved the migration of farmers who experimented with crops at lower altitudes and then brought their cultivar to higher altitudes, who then, presumably, left descendants in present-day Tibetans. So far, investigating the details of that genetic exchange has been difficult, because no ancient DNA has been recovered from the populations that lived in the core areas of the plateau or near the northeastern edge of the plateau. To address this issue, we collected bone and tooth samples from 5200 to 300-year-old ancient humans exhumed from 27 archaeological sites across the Tibetan Plateau (electronic supplementary material, figure S1).

2. Material and Methods

(a). Ancient sample collection

We collected samples from a total of 73 ancient human individuals. Their archaeological details are provided in the electronic supplementary material. Approval for their use was granted by the respective provincial archaeology institutes or universities that managed the samples. We also gained permission and oversight from the institutional review board (201809050002) at the Institute of Vertebrate Paleontology and Paleoanthropology of the Chinese Academy of Sciences to study their ancient genomes.

(b). Ancient DNA extraction and library preparation

Seventy three ancient humans underwent DNA extraction using a method described elsewhere [3]. Their complete mitochondrial DNA (mtDNA) genomes were captured by preparing extracts that hybridized with present-day human mtDNA probes [4]. Eighty libraries were generated using a single-stranded protocol [3,5] or a double-stranded (DS) protocol [6,7]. The DS library retained the substitutions in the first three positions at the 5′-end and the last three positions at the 3′-end. Eight of the 80 libraries were partially treated with uracil-DNA-glycosylase [8] so that the characteristic damage to the ancient DNA in the first position at the 5′-end and the last position at the 3′-end would be retained. We denoted these eight libraries as ‘DS.half'.

(c). In solution capture of mitochondrial DNA

Libraries were hybridized to the oligonucleotide probes that overlapped the mitochondrial genome as described elsewhere [4]. The DNA baits were made in-house [4]. A targeted DNA sequence retrieval protocol was used to isolate the mtDNA fragments. It yielded an average mtDNA coverage of over 400-fold. To analyse the mtDNA capture data, we merged the forward and reverse sequence reads that overlapped by at least 11 bp (with one allowed mismatch) into a single sequence to reconstruct the full-length sequence [9]. We then mapped this full-length sequence (at least 30 bp) to the mtDNA revised Cambridge Reference Sequence using BWA [10] (v. 0.5.10-evan.9-1-g44db244, https://bitbucket.org/ustenzel/network-aware-bwa) and removed the duplicates. We also removed those fragments with mapping quality of less than 30. The ends of the DNA fragments were confirmed to show patterns of DNA degradation characteristic for ancient DNA [8,11]. The contamination rates were estimated by calculating the fraction of ancient mtDNA fragments that matched mtDNA sequences of present-day individuals [12]. Six individuals with unacceptable contamination rates were excluded. Further descriptions of the criteria used to select libraries for analysis are provided in the electronic supplementary material.

(d). Population analysis

Genetic distance ΦST was calculated based on the complete mtDNA sequence with 10 000 permutations in Arlequin v. 3.5 [13]. Haplogroups were called by Haplogrep 2 [14] and Phylotree 17 [15]. Haplogroup and haplotype sharing was calculated as a pairwise matrix, where sharing was the lowest denomination of counts between two groups and not-sharing was the private counts not found in all other groups. The network was constructed based on complete mtDNA sequence using median joining [16] in the Popart software [17]. The tip-date was performed in BEAST v. 1.10.4 [18], where the complete mtDNA sequence was partitioned into the control region (1122 bp) and coding region (11 395 bp) with respective mutation rates of 9.883 × 10⁻⁸ and 1.708 × 10⁻⁸ substitutions per site per year [19]. Bayesian information criterion in jModeltest2 [20,21] determined the best substitution models. One hundred million Markov chain Monte Carlo chains were run for the Skyline linear model, sampling per 1 × 10⁴ steps and 10% of total run discarded as burn-in. All effective sample size values exceeded 1 × 10³ and were verified in Tracer v. 1.7.1. The repeated haplotypes from the same archaeological sites were removed from the calculation of genetic distance ΦST, haplogroup and haplotype sharing, network analysis and BEAST, as they probably represented maternally related individuals.

(e). Population modelling and deep learning

Three models were proposed to approximate the maternal genetic history of Tibetans. For each model, one million simulations were generated by fastsimcoal2 [22]. The observed data for Tibetans and ancient populations were summarized into 19 statistics under the categories of haplotype, polymorphism, diversity, pairwise ΦST and pairwise differences. To determine the goodness-of-fit of simulations, the distance between observed and simulated datasets was compared using principal component analysis on the summary statistics. The superior model was determined by where the projected observed data fell within 95% of the simulation envelope of either model. For deep learning, a resilient backpropagation algorithm from the R package neuralnet [23] was used. Each model was trained equally. Performance metrics (accuracy, Cohen's kappa, recall, precision and F1-score) were used to evaluate the quality of each neural network. A systematic search for the best number of hidden layers and nodes was made to maximize the metrics using 1000 cross-validations. After narrowed down to 1–2 layers, every combination of total nodes was tested using 10 nodes at a time, e.g. for two hidden layers, testing 40 total nodes would have tested 10, 20 and 30 nodes in layer 1 and 30, 20 and 10 nodes in layer 2, respectively. The relative variable importance was computed using the olden function in the R package NeuralNetTools [24].

3. Results

Of the 73 ancient humans that were collected from the Tibetan Plateau, we obtained reproducible results for 67 individuals. The average contamination rate of the 67 mitogenomes was 1.36% (range 0–8.7%) and average coverage was 400-fold (range 14–1921; electronic supplementary material, table S1). Their polymorphic positions are shown in the electronic supplementary material table S2. Among these 67 individuals, there were potentially maternally related individuals (electronic supplementary material, table S3). There were two individuals sharing one haplotype at the Hejiatai site and six individuals sharing two haplotypes at the Zongri site. Two individuals from two different plateau edge sites (Hualongqunke and Guidehexi) shared one haplotype. Two individuals from the core area (Xiaoenda) and plateau edge (Hualongqunke) shared one haplotype. Proportionally, there appeared to be more maternally related individuals recovered from sites near the edge of the plateau than the core area of the plateau.

To investigate the genetic exchanges between the core areas of the plateau and those that lived near the edge of the plateau, we arranged these successfully sequenced 67 individuals into higher altitude (HTP) or lower altitude (LTP) groups (figure 1a) for subsequent population genetic analysis. The HTP group comprised 12 individuals who were recovered from seven archaeological sites averaged at 4000 m.a.s.l. (range 3249–4629) in the province of Western Qinghai and Tibet. Nine individuals had an average calibrated radiocarbon date of 2322 years (range 3061–511). The LTP group comprising 55 individuals were recovered from 20 archaeological sites averaged at 2000 m.a.s.l. (range 1566–2953) in the provinces of Eastern Qinghai, Gansu and Sichuan. Thirty-three individuals had an average calibrated radiocarbon date of 3729 years old (range 5213–300).

Furthermore, into the HTP group, we added eight individuals from the Jeong et al.'s study [1], who date to between 3150 and 1250 years and were found in the 2800–4000 m.a.s.l. Annapurna Himalayan range of Nepal. Altogether, we analysed 75 ancient humans (67 new and eight from reference [1]) from the Tibetan Plateau, for which 50 individuals have dating information (42 new and eight from reference [1]). We compared these 75 ancient humans with 4656 present-day individuals belonging to 137 populations from North Asia (Altai, Russia), Central Asia (Tajikistan), East Asia (China), South Asia (Nepal, India, Pakistan) and mainland Southeast Asia (MSEA: Myanmar, and populations that connect to the plateau via the Lancang-Mekong River, such as Laos, Thailand, Vietnam, and Cambodia; electronic supplementary material, table S4). We emphasized, and so separately grouped, the populations that resided on (Tibetans) and next to (Xinjiang province of China, Nepal, northeast India, Myanmar) the plateau.

In general, the populations of the ancient plateau were better grouped with the Tibetans (figure 1b–f). First, HTP and LTP were closely related to each other on the multidimensional scaling plot (ΦST values in the electronic supplementary material figure, S2). They also shared haplogroups (4–8%) and haplotypes (2–5%) with each other (electronic supplementary material table S5–S7). Second, HTP and LTP shared most haplogroups (7%) and haplotypes (2%) with the Tibetans. Third, HTP and LTP grouping with the Tibetans explains one of the highest percentage variances among groups (AMOVA Fct = 3.93%, p-value = 0.08; electronic supplementary material table, S8). Other populations that potentially grouped with the populations of the ancient plateau were the Xinjiang people and Chinese populations; however, the HTP did not share haplotypes with them (electronic supplementary material, table S6).

We hypothesized some haplogroups of the ancient population persisted in present-day Tibetans. To demonstrate this, we constructed haplogroup networks. The rationale was that ancient humans (either LTP or HTP) with radiocarbon dates under the same network could be informative about the direction of spread. After filtering the 50 ancient humans with date information who also have present-day relatives in our dataset of 4656 individuals, there were 21 unique haplogroups for study. To suggest a source of population expansion, we constructed their median-joining haplogroup networks and verified whether ancient humans occupy the centre of a star-shaped network (electronic supplementary material figure, S3). Having radiocarbon dates additionally allows the estimation of their haplogroup coalescence times more confidently using the BEAST software (electronic supplementary material table, S9). Out of 21 haplogroups, we found only two expanding haplogroup networks with ancient humans implicated as sources.

The first finding was of a spread from LTP to HTP that was revealed by the D4j1b network. In this network, a low-altitude 4750-year-old Qinghai Zongri haplotype was the centre of a star-like expansion. It radiated to the high-altitude 2775-year-old Nepal Chokhopani and a present-day Lhoba Tibetan. Supporting this expansion was that populations at high altitudes (total 36%: Tibetans 27% and Nepal Sherpa 9%) have a lower frequency of D4j1b compared to populations outside the plateau (total: 64%: Xinjiang Kyrgyz 18%, northeast Indian 18% and Thai 28%). The coalescence time of D4j1b was 10 508 years ago (95% high posterior density (HPD), 16 034–5657; electronic supplementary material, table S9; figure 1g), suggesting D4j1b formed somewhere about 10 000 years ago, and between 4750 and 2775 years ago, there was evidence of it spreading from the edge of the plateau to higher altitudes. The second finding was an expansion within the HTP. The M9a1a1c1b1a network showed a 2125-year-old Nepal Mebrak haplotype radiated, leading to the 1500-year-old Nepal Samdzong as well as the 1100-year-old Tibetan Chaxiutang and present-day Tibetans. M9a1a1c1b1a occurred almost exclusively among high-altitude populations (total 94%: Tibetans 91% and Nepal Sherpa 4%) and is found rarely elsewhere (total 6%). Its coalescence time was 6048 years ago (95% HPD, 7890–4353; electronic supplementary material, table S9; figure 1h), suggesting it formed somewhere around 6000 years ago, most likely at high altitudes, and there was evidence of it occurring in the ancient people of Nepal. It then spread from Nepal to Tibet between 2125 and 1100 years ago.

To understand the broader implications of these new ancient data, we additionally tested population models (figure 2a). The purpose was to summarize whether present-day Tibetans derived from other maternal sources or mainly from the plateau populations of the last 5200 years. One simulated scenario was that Tibetans descended from a diverse third population and received a partial maternal ancestry from HTP- and LTP-related populations (model 1). A counter scenario was that Tibetans descended from a recent mixture of HTP- and LTP-related populations (models 2 and 3). For each model, we performed one million simulations. Based on the distribution of simulated values, model 1 contained the observed data (figure 2b). However, because the distributions of models overlapped, we used deep learning to assign an overall probability. Machine learning is an emerging powerful tool in population genetics [25]. The optimal structure of the neural network to address this problem was found to require two hidden layers and 90 total nodes (figure 2c). This classifier could achieve an accuracy of 97.2%, a kappa value of 0.96, recall of 97.2%, precision of 97.4% and an F1-score (harmonic mean of precision and recall) of 97.2% in discriminating the three models. The high accuracy informed us that the classifier was efficient at capturing true positives, and a high F1-score indicated that there was unlikely to be bias derived from an imbalance of simulations used. It classified the observed data as likely to be model 1, which indicated that present-day Tibetans were likely to harbour other maternal ancestries not entirely explained by their continuity from the ancient plateau populations from the last 5200 years.

4. Discussion

The haplogroup networks and haplotype–haplogroup sharing demonstrated to us that there was partial matrilineal continuity in Tibetans from 5200 years ago. Under this continuity, some people spread from low-to-high altitudes 4750 to 2775 years ago and some expanded within high-altitude areas 2125 years ago. The timing revolved around the high-altitude agriculture transformed by barley, which appeared 5.2 ka near the northeastern edge of the plateau and moved into high altitudes by 3.6 ka [2]. Nonetheless, if agricultural innovation has prompted a rapid permanent human occupation, then any random sampling of ancient individuals from that period should have a very high probability of matching with a present-day Tibetan. An expectation also would be to find many expanding haplogroup networks as more people moved into unoccupied areas. However, based on the 16 haplogroups that have a frequency in Tibetans (a subset of 21 unique haplogroups), D4j1b and M9a1a1c1b1a would represent about 13% (2 out of 6) as the footprint of that event. Thus, our findings did not favour a substantial migration of lowland farmers to the high-altitude areas.

An explanation for the surplus of unaccounted maternal lineages could be that there were earlier waves of populations who settled into higher altitudes and underwent isolation by distance [26]. The earlier settlers were potentially hunters and gatherers who left behind no human fossils, perhaps connected to the blade tool assemblages or fossilized handprints and footprints dating to as far back as 40–30 ka [27] or 13–7 ka [28]. Our results could support a recent diffusion of plateau populations into an otherwise stable population continuous with previous high-altitude populations. A similar point of view has been made from analysing the whole genomes of present-day Tibetans [29]. Finally, as our findings only apply to maternal inheritance, studies of the ancient nuclear genomes and Y-chromosomes are required to produce a comprehensive picture of the peopling of Tibet.

Supplementary Material

Supplementary Information, Figures S1 - S3, Tables S1 - S9

rspb20192968supp1.pdf^{(4.4MB, pdf)}

Reviewer comments

rspb20192968_review_history.pdf^{(690KB, pdf)}

Acknowledgements

We gratefully acknowledge archaeological teams from Qinghai, Gansu, Sichuan and Tibet, as well as Svante Pääbo and Matthias Meyer for valuable support, and Birgit Nickel for technical assistance.

Data accessibility

The 67 complete ancient mitogenomes reported in this paper have been deposited in the Genome Sequence Archive [30] in BIG Data Center [31], Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, under accession no. HRA000117 that can be accessed at https://bigd.big.ac.cn/gsa-human.

Authors' contributions

Q.F. conceived the study. Q.F., H.C., H.W., G.D., H.L., W.H., S.W., H.Y., Y.H., L.C., Z.C., G.H. and M.Z. did the sample collection. Q.F., Z.Z., P.C., Z.D. and Q.D. performed experiment. Q.F. and X.F. did the data processing. T.W. and M.D. performed population analysis. A.M.S.K. performed population modelling and deep learning analysis. T.W., M.D., A.M.S.K., H.W., M.A.Y. and Q.F. wrote the manuscript. All authors read and approved the final manuscript.

Competing interests

We declare we have no competing interests.

Funding

This work was supported by National Key R&D Program of China (grant no. 2016YFE0203700), the Strategic Priority Research Program (B) (XDB26000000) of CAS, NSFC (grant nos. 41925009, 91731303, 41672021, 41630102), Chinese Academy of Sciences (QYZDBSSW-DQC003, XDA19050102), Tencent Foundation through the XPLORER PRIZE and the Howard Hughes Medical Institute (grant no. 55008731). G.D. is supported by the Second Tibetan Plateau Scientific Expedition and Research Program (STEP), grant no. 2019QZKK0601. H.Y. is supported by China Postdoctoral Science Foundation, grant/award no. 2017M623018; Sichuan University Research Cluster for Regional History and Frontier Studies.

References

1.Jeong C, et al. 2016. Long-term genetic stability and a high-altitude East Asian origin for the peoples of the high valleys of the Himalayan arc. Proc Natl Acad. Sci. USA 113, 7485–7490. ( 10.1073/pnas.1520844113) [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Chen FH, et al. 2015. Agriculture facilitated permanent human occupation of the Tibetan Plateau after 3600 B.P. Science 347, 248–250. ( 10.1126/science.1259172) [DOI] [PubMed] [Google Scholar]
3.Dabney J, et al. 2013. Complete mitochondrial genome sequence of a middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl Acad. Sci. USA 110, 15 758–15 763. ( 10.1073/pnas.1314445110) [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Fu Q, Meyer M, Gao X, Stenzel U, Burbano HA, Kelso J, Paabo S. 2013. DNA analysis of an early modern human from Tianyuan Cave, China. Proc. Natl Acad. Sci. USA 110, 2223–2227. ( 10.1073/pnas.1221359110) [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Gansauge MT, Meyer M. 2013. Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat. Protoc. 8, 737–748. ( 10.1038/nprot.2013.038) [DOI] [PubMed] [Google Scholar]
6.Meyer M, et al. 2012. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226. ( 10.1126/science.1224344) [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Kircher M, Sawyer S, Meyer M. 2012. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3 ( 10.1093/nar/gkr771) [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Rohland N, Harney E, Mallick S, Nordenfelt S, Reich D. 2015. Partial uracil-DNA-glycosylase treatment for screening of ancient DNA. Phil. Trans. R Soc. Lond. B 370, 20130624 ( 10.1098/rstb.2013.0624) [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Kircher M, Heyn P, Kelso J. 2011. Addressing challenges in the production and analysis of illumina sequencing data. BMC Genomics 12, 382 ( 10.1186/1471-2164-12-382) [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760. ( 10.1093/bioinformatics/btp324) [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Sawyer S, Krause J, Guschanski K, Savolainen V, Paabo S. 2012. Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS ONE 7, e34131 ( 10.1371/journal.pone.0034131) [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Reich D, et al. 2012. Reconstructing native American population history. Nature 488, 370–374. ( 10.1038/nature11258) [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Excoffier L, Lischer HE. 2010. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567. ( 10.1111/j.1755-0998.2010.02847.x) [DOI] [PubMed] [Google Scholar]
14.Weissensteiner H, Pacher D, Kloss-Brandstatter A, Forer L, Specht G, Bandelt HJ, Kronenberg F, Salas A, Schonherr S. 2016. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing. Nucleic Acids Res. 44(W1), W58–W63. ( 10.1093/nar/gkw233) [DOI] [PMC free article] [PubMed] [Google Scholar]
15.van Oven M, Kayser M. 2009. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394. ( 10.1002/humu.20921) [DOI] [PubMed] [Google Scholar]
16.Bandelt HJ, Forster P, Röhl A. 1999. Median-joining networks for inferring intraspecific phylogenies. Mol. Biol. Evol. 16, 37–48. ( 10.1093/oxfordjournals.molbev.a026036) [DOI] [PubMed] [Google Scholar]
17.Leigh JW, Bryant D. 2015. popart: full-feature software for haplotype network construction. Methods Ecol. Evol. 6, 1110–1116. ( 10.1111/2041-210X.12410) [DOI] [Google Scholar]
18.Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. 2018. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 ( 10.1093/ve/vey016) [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Soares P, et al. 2009. Correcting for purifying selection: an improved human mitochondrial molecular clock. Am. J. Hum. Genet. 84, 740–759. ( 10.1016/j.ajhg.2009.05.001) [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Darriba D, Taboada GL, Doallo R, Posada D. 2012. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9, 772 ( 10.1038/nmeth.2109) [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Guindon S, Gascuel O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704. [DOI] [PubMed] [Google Scholar]
22.Excoffier L, Dupanloup I, Huerta-Sanchez E, Sousa VC, Foll M. 2013. Robust demographic inference from genomic and SNP data. PLoS Genet. 9, e1003905 ( 10.1371/journal.pgen.1003905) [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Günther F, Fritsch S. 2010. neuralnet: training of neural networks. R. J. 2, 30–38. ( 10.32614/RJ-2010-006) [DOI] [Google Scholar]
24.Beck MW. 2018. NeuralNetTools: visualization and analysis tools for neural networks. J. Stat. Softw. 85, 1–20. ( 10.18637/jss.v085.i11) [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Schrider DR, Kern AD. 2018. Supervised machine learning for population genetics: a new paradigm. Trends Genet. 34, 301–312. ( 10.1016/j.tig.2017.12.005) [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Qi X, et al. 2013. Genetic evidence of Paleolithic colonization and Neolithic expansion of modern humans on the Tibetan Plateau. Mol. Biol. Evol. 30, 1761–1778. ( 10.1093/molbev/mst093) [DOI] [PubMed] [Google Scholar]
27.Zhang XL, et al. 2018. The earliest human occupation of the high-altitude Tibetan Plateau 40 thousand to 30 thousand years ago. Science 362, 1049–1051. ( 10.1126/science.aat8824) [DOI] [PubMed] [Google Scholar]
28.Meyer MC, Aldenderfer MS, Wang Z, Hoffmann DL, Dahl JA, Degering D, Haas WR, Schlütz F. 2017. Permanent human occupation of the central Tibetan Plateau in the early Holocene. Science 355, 64 ( 10.1126/science.aag0357) [DOI] [PubMed] [Google Scholar]
29.Lu D, et al. 2016. Ancestral origins and genetic history of Tibetan highlanders. Am. J. Hum. Genet. 99, 580–594. ( 10.1016/j.ajhg.2016.07.002) [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Wang Y, et al. 2017. GSA: genome sequence archive Genomics Proteomics Bioinformatics 15, 14–18. ( 10.1016/j.gpb.2017.01.001) [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Members BIGDC. 2018. Database Resources of the BIG Data Center in 2018 Nucleic Acids Res. 46, D14–D20. ( 10.1093/nar/gkx897) [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information, Figures S1 - S3, Tables S1 - S9

rspb20192968supp1.pdf^{(4.4MB, pdf)}

Reviewer comments

rspb20192968_review_history.pdf^{(690KB, pdf)}

Data Availability Statement

[RSPB20192968C1] 1.Jeong C, et al. 2016. Long-term genetic stability and a high-altitude East Asian origin for the peoples of the high valleys of the Himalayan arc. Proc Natl Acad. Sci. USA 113, 7485–7490. ( 10.1073/pnas.1520844113) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C2] 2.Chen FH, et al. 2015. Agriculture facilitated permanent human occupation of the Tibetan Plateau after 3600 B.P. Science 347, 248–250. ( 10.1126/science.1259172) [DOI] [PubMed] [Google Scholar]

[RSPB20192968C3] 3.Dabney J, et al. 2013. Complete mitochondrial genome sequence of a middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl Acad. Sci. USA 110, 15 758–15 763. ( 10.1073/pnas.1314445110) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C4] 4.Fu Q, Meyer M, Gao X, Stenzel U, Burbano HA, Kelso J, Paabo S. 2013. DNA analysis of an early modern human from Tianyuan Cave, China. Proc. Natl Acad. Sci. USA 110, 2223–2227. ( 10.1073/pnas.1221359110) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C5] 5.Gansauge MT, Meyer M. 2013. Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat. Protoc. 8, 737–748. ( 10.1038/nprot.2013.038) [DOI] [PubMed] [Google Scholar]

[RSPB20192968C6] 6.Meyer M, et al. 2012. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226. ( 10.1126/science.1224344) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C7] 7.Kircher M, Sawyer S, Meyer M. 2012. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3 ( 10.1093/nar/gkr771) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C8] 8.Rohland N, Harney E, Mallick S, Nordenfelt S, Reich D. 2015. Partial uracil-DNA-glycosylase treatment for screening of ancient DNA. Phil. Trans. R Soc. Lond. B 370, 20130624 ( 10.1098/rstb.2013.0624) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C9] 9.Kircher M, Heyn P, Kelso J. 2011. Addressing challenges in the production and analysis of illumina sequencing data. BMC Genomics 12, 382 ( 10.1186/1471-2164-12-382) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C10] 10.Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760. ( 10.1093/bioinformatics/btp324) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C11] 11.Sawyer S, Krause J, Guschanski K, Savolainen V, Paabo S. 2012. Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS ONE 7, e34131 ( 10.1371/journal.pone.0034131) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C12] 12.Reich D, et al. 2012. Reconstructing native American population history. Nature 488, 370–374. ( 10.1038/nature11258) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C13] 13.Excoffier L, Lischer HE. 2010. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567. ( 10.1111/j.1755-0998.2010.02847.x) [DOI] [PubMed] [Google Scholar]

[RSPB20192968C14] 14.Weissensteiner H, Pacher D, Kloss-Brandstatter A, Forer L, Specht G, Bandelt HJ, Kronenberg F, Salas A, Schonherr S. 2016. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing. Nucleic Acids Res. 44(W1), W58–W63. ( 10.1093/nar/gkw233) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C15] 15.van Oven M, Kayser M. 2009. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394. ( 10.1002/humu.20921) [DOI] [PubMed] [Google Scholar]

[RSPB20192968C16] 16.Bandelt HJ, Forster P, Röhl A. 1999. Median-joining networks for inferring intraspecific phylogenies. Mol. Biol. Evol. 16, 37–48. ( 10.1093/oxfordjournals.molbev.a026036) [DOI] [PubMed] [Google Scholar]

[RSPB20192968C17] 17.Leigh JW, Bryant D. 2015. popart: full-feature software for haplotype network construction. Methods Ecol. Evol. 6, 1110–1116. ( 10.1111/2041-210X.12410) [DOI] [Google Scholar]

[RSPB20192968C18] 18.Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. 2018. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 ( 10.1093/ve/vey016) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C19] 19.Soares P, et al. 2009. Correcting for purifying selection: an improved human mitochondrial molecular clock. Am. J. Hum. Genet. 84, 740–759. ( 10.1016/j.ajhg.2009.05.001) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C20] 20.Darriba D, Taboada GL, Doallo R, Posada D. 2012. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9, 772 ( 10.1038/nmeth.2109) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C21] 21.Guindon S, Gascuel O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704. [DOI] [PubMed] [Google Scholar]

[RSPB20192968C22] 22.Excoffier L, Dupanloup I, Huerta-Sanchez E, Sousa VC, Foll M. 2013. Robust demographic inference from genomic and SNP data. PLoS Genet. 9, e1003905 ( 10.1371/journal.pgen.1003905) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C23] 23.Günther F, Fritsch S. 2010. neuralnet: training of neural networks. R. J. 2, 30–38. ( 10.32614/RJ-2010-006) [DOI] [Google Scholar]

[RSPB20192968C24] 24.Beck MW. 2018. NeuralNetTools: visualization and analysis tools for neural networks. J. Stat. Softw. 85, 1–20. ( 10.18637/jss.v085.i11) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C25] 25.Schrider DR, Kern AD. 2018. Supervised machine learning for population genetics: a new paradigm. Trends Genet. 34, 301–312. ( 10.1016/j.tig.2017.12.005) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C26] 26.Qi X, et al. 2013. Genetic evidence of Paleolithic colonization and Neolithic expansion of modern humans on the Tibetan Plateau. Mol. Biol. Evol. 30, 1761–1778. ( 10.1093/molbev/mst093) [DOI] [PubMed] [Google Scholar]

[RSPB20192968C27] 27.Zhang XL, et al. 2018. The earliest human occupation of the high-altitude Tibetan Plateau 40 thousand to 30 thousand years ago. Science 362, 1049–1051. ( 10.1126/science.aat8824) [DOI] [PubMed] [Google Scholar]

[RSPB20192968C28] 28.Meyer MC, Aldenderfer MS, Wang Z, Hoffmann DL, Dahl JA, Degering D, Haas WR, Schlütz F. 2017. Permanent human occupation of the central Tibetan Plateau in the early Holocene. Science 355, 64 ( 10.1126/science.aag0357) [DOI] [PubMed] [Google Scholar]

[RSPB20192968C29] 29.Lu D, et al. 2016. Ancestral origins and genetic history of Tibetan highlanders. Am. J. Hum. Genet. 99, 580–594. ( 10.1016/j.ajhg.2016.07.002) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C30] 30.Wang Y, et al. 2017. GSA: genome sequence archive Genomics Proteomics Bioinformatics 15, 14–18. ( 10.1016/j.gpb.2017.01.001) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPB20192968C31] 31.Members BIGDC. 2018. Database Resources of the BIG Data Center in 2018 Nucleic Acids Res. 46, D14–D20. ( 10.1093/nar/gkx897) [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Ancient mitogenomes show plateau populations from last 5200 years partially contributed to present-day Tibetans

Manyu Ding

Tianyi Wang

Albert Min-Shan Ko

Honghai Chen

Hui Wang

Guanghui Dong

Hongliang Lu

Wei He

Shargan Wangdue

Haibing Yuan

Yuanhong He

Linhai Cai

Zujun Chen

Guangliang Hou

Dongju Zhang

Zhaoxia Zhang

Peng Cao

Qingyan Dai

Xiaotian Feng

Ming Zhang

Hongru Wang

Melinda A Yang

Qiaomei Fu

Abstract

1. Introduction

2. Material and Methods

(a). Ancient sample collection

(b). Ancient DNA extraction and library preparation

(c). In solution capture of mitochondrial DNA

(d). Population analysis

(e). Population modelling and deep learning

3. Results

Figure 1.

Figure 2.

4. Discussion

Supplementary Material

Acknowledgements

Data accessibility

Authors' contributions

Competing interests

Funding

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases