Summary
Australia was one of the earliest regions outside Africa to be colonized by fully modern humans, with archaeological evidence for human presence by 47,000 years ago (47 kya) widely accepted [1, 2]. However, the extent of subsequent human entry before the European colonial age is less clear. The dingo reached Australia about 4 kya, indirectly implying human contact, which some have linked to changes in language and stone tool technology to suggest substantial cultural changes at the same time [3]. Genetic data of two kinds have been proposed to support gene flow from the Indian subcontinent to Australia at this time, as well: first, signs of South Asian admixture in Aboriginal Australian genomes have been reported on the basis of genome-wide SNP data [4]; and second, a Y chromosome lineage designated haplogroup C∗, present in both India and Australia, was estimated to have a most recent common ancestor around 5 kya and to have entered Australia from India [5]. Here, we sequence 13 Aboriginal Australian Y chromosomes to re-investigate their divergence times from Y chromosomes in other continents, including a comparison of Aboriginal Australian and South Asian haplogroup C chromosomes. We find divergence times dating back to ∼50 kya, thus excluding the Y chromosome as providing evidence for recent gene flow from India into Australia.
Graphical Abstract
Highlights
-
•
We have sequenced 13 Aboriginal Australian Y chromosomes
-
•
These diverged from Y chromosomes in other continents around 50,000 years ago
-
•
They diverged from Papua New Guinean Y chromosomes soon after this
-
•
We find no evidence for Holocene male gene flow to Australia from South Asia
Bergström et al. show that Aboriginal Australian Y chromosomes diverged from Eurasian, including South Asian, Y chromosomes ∼50,000 years ago. This is around the time that Australia was first populated and thus disproves the previous hypothesis of prehistoric Y chromosome gene flow from India ∼5,000 years ago.
Results and Discussion
Genotyping and Sequencing of Aboriginal Australian Y Chromosomes
144 self-identified Aboriginal Australian males who volunteered to participate in the Genographic Project were previously typed with Y SNPs to assign them to major haplogroups [6]. A large fraction (∼70%) of Aboriginal Australian males today carry Y chromosomes of Eurasian origin (∼59% European) due to admixture in the last ∼200 years after the European colonization of Australia [7]. Among the individuals with indigenous Y chromosomes, 44% belong to haplogroup C, with 42% being C-M347 and 2% the basal C-M130∗. Paragroup K∗ constitutes 56% of indigenous Y chromosomes, with 27% being S-P308, 2% being haplogroup M-M186, and 27% being the basal K-M526∗ [6]. Although we note that other nomenclatures with relevance to these haplogroups exist [8] or could be proposed, these labels suffice for the purposes of our present study, and for simplicity we hereafter refer to C-M347 and C-M130∗ as Aboriginal Australian C, to S-P308 and K-M526 as K∗, and to M-M186 as M. For distinguishing subclades of haplogroup C, we also make use of the haplogroup labels C1, C2, C3, C4, and C5 as they are used in [9]. 31 of the 144 typed individuals carried Y chromosomes belonging to one of the indigenous haplogroups. Among these individuals, five from haplogroup C, six from haplogroup K∗, and two from haplogroup M were re-contacted and agreed to further studies, so their genomes were sequenced to high coverage using the Illumina HiSeq platform (Table 1). Consent was provided to study the history of the uniparental chromosomes, and reads mapping to the Y chromosome were identified. These form the basis for the current study. Comparative data on the sequences of Y chromosomes from other continents were obtained from phase 3 of the 1000 Genomes Project [10], comprising 1,244 samples from 26 populations falling into a wide range of haplogroups, as well as from 12 samples from Papua New Guinea [11] which fall into the haplogroups C, M, and K∗ as expected [12, 13].
Table 1.
ID | Y Coverage | Haplogroup | Key Variant | Paternal Origin |
---|---|---|---|---|
A45 | 19.74 | C | M130 | Uncertain, possibly Normanton, Queensland |
A268 | 13.06 | C | M210 | Atherton Tablelands, Far North Queensland |
A305 | 18.03 | C | M347 | The Karryarra group located near Port Hedland, Western Australia |
A342 | 18.06 | C | M347 | The Karryarra group located near Port Hedland, Western Australia |
A343 | 12.90 | C | M347 | Northwest coast, near Broome, Western Australia |
A136 | 12.61 | K∗ | M526 | Kuranda, Far North Queensland |
A179 | 18.85 | K∗ | M526 | Gunganji tribe, Yarrabah, near Cairns, Far North Queensland |
A201 | 12.19 | K∗ | M526 | Uncertain, but states father’s people from South East Queensland |
A266 | 19.07 | K∗ | M526 | Gunganji tribe, Yarrabah, near Cairns, Far North Queensland |
A293 | 12.77 | K∗ | P308 | Pilbara, Western Australia |
A473 | 13.73 | K∗ | P308 | Mount Isa region, Central Queensland |
A238 | 16.42 | M | M186 | Mer (Murray Island), Torres Strait, Far North Queensland |
A440 | 15.29 | M | M186 | Mer (Murray Island), Torres Strait, Far North Queensland |
“Y coverage” refers to the average depth of sequencing coverage on the Y chromosome. We note that the geographic information on the origin of the paternal line is sometimes uncertain and, due to the widespread movement of Aboriginal people after European colonization, might not reflect deeper geographic origins.
Construction of a Y Chromosome Phylogeny
We used the sequence data to infer a maximum-likelihood phylogenetic tree for the 1,269 Y chromosomes (Figure 1A) (see the Experimental Procedures and Supplemental Experimental Procedures). The overall topology of the tree recapitulates the known Y chromosome phylogeny. In agreement with the prior haplogroup assignments, the Aboriginal Australian and Papuan Y chromosomes fall into two distinct monophyletic clades within the C and K∗/M haplogroups. Both of these clades received high bootstrap support (100% for the haplogroup C samples and 97% for the haplogroup K∗/M samples). The shared phylogenetic history of Aboriginal Australian and Papuan Y chromosomes is consistent with the common origin of these populations as previously inferred from genome-wide data [4, 15, 16, 17].
Divergence Times between Aboriginal Australian and Other Y Chromosomes
The phylogenetic tree reveals deep divergences between Y chromosomes indigenous to Sahul, the ancient continent that included both Australia and New Guinea, and those from all other populations (Figures 1B and 1C). Complete sequence data allow direct and accurate inference of the timing of these divergences. Applying a point mutation rate of 0.76 × 10−9 per site per year inferred from the number of missing mutations on the Y chromosome of a ∼45-ky-old radiocarbon-dated Eurasian sample [18], we infer a divergence time of 54.3 ky (95% confidence interval [CI]: 48.0–61.6 ky) between K∗/M chromosomes in Sahul and their closest relatives in the R and Q haplogroups (Figure 1B), and a divergence time of 54.1 KY (95% CI: 47.8–61.4 ky) between Sahul C chromosomes and their closest relatives in the C5 haplogroup (Figure 1C), a distinction noted previously on the basis of a single SNP, M347 [9]. These dates are consistent with the archeological record documenting human occupation in Australia by ∼47 kya [2] and with genome-wide analyses that have found an early divergence between the ancestors of Eurasian populations and the ancestors of Aboriginal Australians and Papuans [15]. They thus provide no evidence for any later Y chromosome gene flow into Australia between the early separation and the beginning of recent European colonization. Specifically, these results refute earlier findings based on short tandem repeat (STR) variation that Aboriginal Australian Y chromosomes in the C haplogroup descend from populations in southern India and Sri Lanka 1.3–13.3 kya [5]. Although the closest chromosomes to the Aboriginal Australian Cs in our phylogeny are found in South Asian populations, the deep divergence time and the fact that the Aboriginal Australian Cs share a more recent common ancestor with Papuan Cs show that this is not the result of recent genetic contact. The CIs reported above take into account the uncertainty of the Y chromosome point mutation rate, but not necessarily other possible sources of technical uncertainty (such as read alignment and genotype calling). We tested whether accounting for such additional uncertainty could affect the conclusion of a deep divergence between Aboriginal Australian and South Asian C chromosomes by re-estimating this divergence time from 100 bootstrap samples of sites from the full ∼10 million analyzed Y chromosome sites. The 95% CI for these estimates was 50.9–58.1 kya, and very conservative application of the mutation rate uncertainty multiplicatively to the bootstrap estimates gives a combined CI of 44.9–65.9 kya. Technical uncertainty is thus not large enough to affect our overall conclusion. The disparity between our findings and the earlier report can be attributed to improvements in technology, as none of the methods previously used to study the history of the paternal lineage offered the level of phylogenetic or dating precision afforded by complete Y chromosome sequencing. Redd et al. employed ten widely used Y STRs (three simple trinucleotides, three simple tetranucleotides, one of which was bilocal, and four complex tetranucleotides), applying the same fast genealogical mutation rate of 2.08 × 10−3 per STR per 25 years to all of them [5]. It has been shown that Y STRs tend to massively under-estimate ancient divergence times [19], perhaps because of a combination of the fast mutation rate assumed, saturation of STR distances, and in this case the short generation time used.
Although the shared origin of Aboriginal Australians and Papuans is clearly established, and now also supported by the Y chromosome phylogeny presented here, little is known about the history of population separation and gene flow between these groups within Sahul. We observe deep divergences between Aboriginal Australian and Papuan Y chromosomes within the C (50.1 ky; 95% CI: 44.3–56.9 ky) (Figure 1C) and the K∗ (48.4 ky; 95% CI: 42.8–54.9 ky) (Figure 1B) haplogroups. Although this would be consistent with an early split between the populations, we note that our limited sample size makes it very unlikely that we have observed the most recent divergences, and we therefore cannot rule out more recent split times. Within the M haplogroup, which is found at high frequencies in Papua New Guinea and Melanesia [20] but in less than 1% of Aboriginal Australian males [6], we find a divergence time of 10.4 ky (95% CI: 9.2–11.9 ky) (Figure 1B). Although this coincides approximately with the post-glacial geographical separation of Australia and New Guinea after the rise of the sea level ∼6–8 kya [21], the fact that the two Aboriginal Australian males who carry the haplogroup M chromosomes trace their paternal ancestry to the Torres Strait Islands (Table 1) makes it more likely that these are very recent introductions into the mainland Australian gene pool. A larger number of geographically diverse Y chromosomes from the different haplogroups indigenous to Sahul would be needed in order to learn more about population relationships within the continent.
Implications for the Peopling of Australia
Y haplogroups from Australia and Papua New Guinea were estimated to diverge from the nearest non-Sahul lineages ∼54 kya, and divergences within Sahul-specific lineages date to ∼48–53 kya. We note that these times post-date the Mount Toba eruption ∼74 kya [22], supporting a model of the initial peopling of this region by modern humans long after this event. The divergence times are close to, but earlier than, the current conservative archaeological date for entry into Sahul, 47 kya [2]. However, the uncertainty in the lineage divergence estimates and the possibility that earlier archaeological sites may be detected make it impossible to determine whether the initial divergence within the Sahul-specific lineages occurred before or after entry into Sahul. The current evidence is consistent with a simple model of a single entry and subsequent rapid lineage divergence.
Around the mid-Holocene (∼4–6 kya), small stone tools began to be used extensively in Australia [3], the Pama-Nyungan language family spread over most of the mainland [23], and the first archaeological evidence for the dingo appeared [3]. Genetic patterns proposed as indications of gene flow into Australia from South Asia were dated to approximately the same period. One parsimonious interpretation of these diverse findings could be that they were all linked, and thus that there was a substantial and influential population influx at this time.
We have taken advantage of improvements of sequencing technology [10] and calibration of the molecular clock [18] to re-examine the claim for male gene flow revealed by Y chromosome relationships [5]. Our sample of 13 Aboriginal Australian Y chromosomes is small, but it includes the relevant haplogroups and conclusively refutes the original basis for this claim. Although this does not demonstrate the absence of any Holocene gene flow or non-genetic influences from South Asia at this time, and the appearance of the dingo remains as strong evidence for external contacts, the evidence overall is consistent with a complete lack of gene flow and indigenous origins for the technological and linguistic changes.
Australia and Papua New Guinea are currently separated only by the 150-km-wide Torres Strait, in which lie many islands. Gene flow across this Strait is both geographically plausible and demonstrated by our data, although we cannot determine when within the last 10 ky it occurred. The analytical techniques now available, applied to larger genetic datasets, including ancient DNA, have the potential to address such questions and provide more detailed insights into the human history of Sahul.
Experimental Procedures
This study received ethical approval from the La Trobe University Human Ethics Committee, Melbourne, Australia (HEC 05/94, April 11, 2006; amended April 18, 2012, June 26, 2012) and The Wellcome Trust Sanger Institute Human Materials and Data Management Committee, Hinxton, UK (12/055). Conclusions from the study have been returned to the participants. We sequenced the whole genomes of 13 Aboriginal Australian males to high coverage on the Illumina HiSeq platform and then analyzed only the reads mapping to the Y chromosome. We used FreeBayes to determine the genotypes of these individuals, along with those of 1,244 males sequenced to low coverage in the 1000 Genomes Project [10] and 12 males from Papua New Guinea sequenced to high coverage [11], at ∼10 million Y chromosome sites accessible by short read sequencing. We then used RAxML [14] to infer a maximum-likelihood phylogeny of all the 1,269 Y chromosomes. We estimated the divergence times between clades in the tree by applying the ρ statistic [24], aggregating data across low-coverage samples where relevant, and converted divergence times to units of years by applying a mutation rate of 0.76 × 10−9 per site per year [18]. For more detailed descriptions of the sequence data processing, genotyping and filtering, phylogenetic inference, and dating, see the Supplemental Experimental Procedures. Table S1 provides information on the SNPs called that are phylogenetically informative for the branches of the Y chromosome phylogeny specific to Aboriginal Australians and Papuans (see the Supplemental Experimental Procedures for a description of this table).
Author Contributions
Project design was carried out by Y.X., R.J.M., and C.T.-S.; community engagement, ethics, and sampling by S.W., L. Wilcox, R.A.H.v.O., P.M., L. Williams, and R.J.M.; data generation, processing, and analysis by A.B., N.N., Y.C., S.M., M.O.P., Q.A., Y.X., and C.T.-S.; data interpretation by A.B., N.N., S.W., R.A.H.v.O., P.M., L. Williams, Y.X., R.J.M., and C.T.-S.; and manuscript writing by A.B., Y.X., R.J.M., and C.T.-S.
Acknowledgments
We thank the Aboriginal Australian men and their communities for their interest and participation in this study. We thank the Wellcome Trust Sanger Institute Core Pipelines and NPG groups for their special efforts in arranging access to the Y chromosome data for this project. We also thank the 1000 Genomes Project consortium for sharing data and analysis strategies. The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute. Our work was supported by Wellcome Trust grant 098051.
Published: February 25, 2016
Footnotes
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Supplemental Information includes Supplemental Experimental Procedures and one table and can be found with this article online at http://dx.doi.org/10.1016/j.cub.2016.01.028.
Contributor Information
R. John Mitchell, Email: john.mitchell@latrobe.edu.au.
Chris Tyler-Smith, Email: cts@sanger.ac.uk.
Accession Numbers
Y chromosome sequence data from the 13 Aboriginal Australians are available for studies of population history under managed access through two separate study accession numbers at the European Genome-phenome Archive (EGA): EGAS00001000315 and EGAS00001000718, both with EGA DAC accession number EGAC00001000205 and EGA policy accession number EGAP00001000210. The correspondence between the sample IDs used in this manuscript and the sample accession numbers is as follows: A45, EGAN00001072788; A136, EGAN00001196788; A179, EGAN00001072787; A201, EGAN00001072789; A238, EGAN00001089015; A266, EGAN00001089016; A268, EGAN00001192719; A293, EGAN00001192720; A305, EGAN00001089017; A342, EGAN00001088616; A343, EGAN00001192721; A440, EGAN00001196789; and A473, EGAN00001196790. Details of SNPs called within the Sahul-specific branches are provided in Table S1. We request that L. Williams (geraniumgroup@gmail.com), R.J.M., and C.T.-S. be consulted before any commercial use is made of novel SNPs within this table.
Supplemental Information
References
- 1.Roberts R.G., Jones R., Smith M.A. Thermoluminescence dating of a 50,000-year-old human occupation site in northern Australia. Nature. 1990;345:153–156. [Google Scholar]
- 2.O’Connell J.F., Allen J. The process, biotic impact, and global implications of the human colonization of Sahul about 47,000 years ago. J. Arch. Sci. 2015;56:73–84. [Google Scholar]
- 3.Brown P. Palaeoanthropology: of humans, dogs and tiny tools. Nature. 2013;494:316–317. doi: 10.1038/494316a. [DOI] [PubMed] [Google Scholar]
- 4.Pugach I., Delfin F., Gunnarsdóttir E., Kayser M., Stoneking M. Genome-wide data substantiate Holocene gene flow from India to Australia. Proc. Natl. Acad. Sci. USA. 2013;110:1803–1808. doi: 10.1073/pnas.1211927110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Redd A.J., Roberts-Thomson J., Karafet T., Bamshad M., Jorde L.B., Naidu J.M., Walsh B., Hammer M.F. Gene flow from the Indian subcontinent to Australia: evidence from the Y chromosome. Curr. Biol. 2002;12:673–677. doi: 10.1016/s0960-9822(02)00789-3. [DOI] [PubMed] [Google Scholar]
- 6.Nagle N., Ballantyne K.N., van Oven M., Tyler-Smith C., Xue Y., Taylor D., Wilcox S., Wilcox L., Turkalov R., van Oorschot R.A., Genographic Consortium Antiquity and diversity of aboriginal Australian Y-chromosomes. Am. J. Phys. Anthropol. 2015 doi: 10.1002/ajpa.22886. Published online October 30, 2015. [DOI] [PubMed] [Google Scholar]
- 7.Taylor D., Nagle N., Ballantyne K.N., van Oorschot R.A., Wilcox S., Henry J., Turakulov R., Mitchell R.J. An investigation of admixture in an Australian Aboriginal Y-chromosome STR database. Forensic Sci. Int. Genet. 2012;6:532–538. doi: 10.1016/j.fsigen.2012.01.001. [DOI] [PubMed] [Google Scholar]
- 8.Karafet T.M., Mendez F.L., Sudoyo H., Lansing J.S., Hammer M.F. Improved phylogenetic resolution and rapid diversification of Y-chromosome haplogroup K-M526 in Southeast Asia. Eur. J. Hum. Genet. 2015;23:369–373. doi: 10.1038/ejhg.2014.106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hudjashov G., Kivisild T., Underhill P.A., Endicott P., Sanchez J.J., Lin A.A., Shen P., Oefner P., Renfrew C., Villems R., Forster P. Revealing the prehistoric settlement of Australia by Y chromosome and mtDNA analysis. Proc. Natl. Acad. Sci. USA. 2007;104:8726–8730. doi: 10.1073/pnas.0702928104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Raghavan M., Steinrücken M., Harris K., Schiffels S., Rasmussen S., DeGiorgio M., Albrechtsen A., Valdiosera C., Ávila-Arcos M.C., Malaspinas A.S. Genomic evidence for the Pleistocene and recent population history of Native Americans. Science. 2015;349:aab3884. doi: 10.1126/science.aab3884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Shi W., Ayub Q., Vermeulen M., Shao R.G., Zuniga S., van der Gaag K., de Knijff P., Kayser M., Xue Y., Tyler-Smith C. A worldwide survey of human male demographic history based on Y-SNP and Y-STR data from the HGDP-CEPH populations. Mol. Biol. Evol. 2010;27:385–393. doi: 10.1093/molbev/msp243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lippold S., Xu H., Ko A., Li M., Renaud G., Butthof A., Schröder R., Stoneking M. Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences. Investig. Genet. 2014;5:13. doi: 10.1186/2041-2223-5-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rasmussen M., Guo X., Wang Y., Lohmueller K.E., Rasmussen S., Albrechtsen A., Skotte L., Lindgreen S., Metspalu M., Jombart T. An Aboriginal Australian genome reveals separate human dispersals into Asia. Science. 2011;334:94–98. doi: 10.1126/science.1211177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Reich D., Patterson N., Kircher M., Delfin F., Nandineni M.R., Pugach I., Ko A.M., Ko Y.C., Jinam T.A., Phipps M.E. Denisova admixture and the first modern human dispersals into Southeast Asia and Oceania. Am. J. Hum. Genet. 2011;89:516–528. doi: 10.1016/j.ajhg.2011.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.McEvoy B.P., Lind J.M., Wang E.T., Moyzis R.K., Visscher P.M., van Holst Pellekaan S.M., Wilton A.N. Whole-genome genetic diversity in a sample of Australians with deep Aboriginal ancestry. Am. J. Hum. Genet. 2010;87:297–305. doi: 10.1016/j.ajhg.2010.07.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Fu Q., Li H., Moorjani P., Jay F., Slepchenko S.M., Bondarev A.A., Johnson P.L., Aximu-Petri A., Prüfer K., de Filippo C. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature. 2014;514:445–449. doi: 10.1038/nature13810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wei W., Ayub Q., Xue Y., Tyler-Smith C. A comparison of Y-chromosomal lineage dating using either resequencing or Y-SNP plus Y-STR genotyping. Forensic Sci. Int. Genet. 2013;7:568–572. doi: 10.1016/j.fsigen.2013.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kayser M., Brauer S., Weiss G., Schiefenhövel W., Underhill P., Shen P., Oefner P., Tommaseo-Ponzetta M., Stoneking M. Reduced Y-chromosome, but not mitochondrial DNA, diversity in human populations from West New Guinea. Am. J. Hum. Genet. 2003;72:281–302. doi: 10.1086/346065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Woodroffe C.D., Kennedy D.M., Hopley D., Rasmussen C.E., Smithers S.G. Holocene reef growth in Torres Strait. Mar. Geol. 2000;170:331–346. [Google Scholar]
- 22.Petraglia M., Korisettar R., Boivin N., Clarkson C., Ditchfield P., Jones S., Koshy J., Lahr M.M., Oppenheimer C., Pyle D. Middle Paleolithic assemblages from the Indian subcontinent before and after the Toba super-eruption. Science. 2007;317:114–116. doi: 10.1126/science.1141564. [DOI] [PubMed] [Google Scholar]
- 23.Evans N., McConvell P. The enigma of Pama-Nyungan expansion in Australia. In: Blench R., Spriggs M., editors. Archaeology and Language II: Archaeological Data and Linguistic Hypotheses. Routledge - Taylor & Francis; 1997. pp. 174–192. [Google Scholar]
- 24.Forster P., Harding R., Torroni A., Bandelt H.J. Origin and evolution of Native American mtDNA variation: a reappraisal. Am. J. Hum. Genet. 1996;59:935–945. [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.