Summary
In the June 5th 2012 issue of Current Biology, Agoni et al.[1] reported finding 14 endogenous retrovirus (ERV) loci in the genome sequences of Neanderthal and/or Denisovan fossils (both ∼40,000 years old) that are not found in the human reference genome sequence. The authors [1] concluded that these retroviruses were infecting the germline of these archaic hominins at or subsequent to their divergence from modern humans (∼400,000 years ago). However, in our search for unfixed ERVs in the modern human population, we have found most of these loci. We explain this apparent contradiction using population genetic theory and suggest that it illustrates an important phenomenon for the study of transposable elements such as ERVs.
Main Text
The genomes of extinct human groups (archaic hominins), such as Neanderthals, are now available with high throughput sequencing technology, which can produce millions of short (∼100 base) sequences called reads from fossil bone or teeth. An analysis of a Neanderthal and a Denisovan genome identified many reads that contained sequences of viral origin, similar to known integrations of retroviruses into the germline of modern humans [1]. Such so-called endogenous retroviruses (or ERVs) are common, making up ∼5% of our genome. Some of the reads spanned the integration site of an ERV, called here a locus, and thus were part viral DNA and part archaic hominin DNA (Figure 1). In some cases, the authors [1] did not find an ERV at the corresponding coordinate in the human genome reference sequence. Instead they found the pre-integration site, which is the sequence that existed before the virus inserted a copy of itself into the chromosome. All of these loci belonged to one ERV lineage (family), called HERVK(HML2) or HERVK, which is the only lineage that has continued to replicate within humans in the last few million years [2]. They concluded that these retroviruses had infected the germline of the archaic hominins either after their divergence from modern humans (∼400,000 years ago) or immediately before divergence (with the integration and pre-integration sites then segregating differently in the lineages). However, while searching many new genome sequences of modern humans for ERVs, we have found most of these loci. For example, of the eight Denisovan loci for which Agoni et al. [1] were able to give precise genome coordinates, at least seven exist in modern humans. We have found six in an analysis of 67 cancer patient genomes (Figure 1), and examination of another study of 43 such genomes [3] shows all seven to be present (Supplemental information). One is K113 (19p12b), which is well-described and has a frequency of 16% in modern humans [2]. The four reported Denisovan loci lacking coordinates are within repetitive or unassembled regions of the genome, and we can neither confirm nor refute their presence in the modern human population: e.g. two loci are in transposable elements called Alu’s, of which there are ∼1,000,000 copies in the human genome (making up ∼10% of the human genome sequence). When an ERV integrates into another transposable element, finding this ERV locus can be a formidable computational challenge because there are many paralogous copies of the integration site. Two additional loci were reported from the Neanderthal fossil, and we have found one of these.
It is unlikely that these ERV loci in the archaic hominins are contaminants from modern human DNA. Average coverage of the Denisovan genome was only about twofold and the contamination rate among the reads was estimated using several approaches to have been less than 1% [4]. We believe that the explanation lies in fundamental population genetics. With the exception of co-opted ERV loci such as syncytins [5], which could increase in frequency due to positive selection, we assume ERV loci become common by genetic drift, and the average time for a neutral allele to go to fixation is 4Ne generations (where Ne is the effective population size). Given estimates of long-term human generation time and population size [6], this is ∼800,000 years. The population divergence of modern humans from the Denisovan/Neanderthal lineage is more recent, between 170,000 and 700,000 years according to a more recent — and much deeper —sequencing of the above Denisovan fossil [7], so many loci will have persisted at fluctuating frequencies in all three lineages.
As well as showing how differences in loci between one genome and another must be interpreted cautiously, our finding illustrates how single genomes, whether the human reference or one from an archaic hominin fossil, are likely to only contain those ERV loci that after almost a million years have drifted to high frequency. These old loci give us only a limited insight into the processes that created them, e.g. they will have accrued multiple inactivating mutations during this time. In contrast, loci that have integrated recently are more likely to produce proteins and might even be replicating. Such loci are interesting, perhaps most importantly because they are more likely to be pathogenic. The long-running debate over whether or not ERVs cause disease in humans has been handicapped by our poor knowledge of ERV polymorphism. Characterising individual loci is necessary to test ERV involvement in disease 8, 9, and will aid the potential exploitation of ERV proteins as cancer and HIV immunotherapy targets [10].
ERVs in fossil hominins also improve our understanding of both ERV and human evolution. When the ERV loci in modern humans have been reasonably well-sampled, fossil loci will help us build a robust mathematical model of ERV proliferation. Then, because ERV loci make easily detectable and irreversible genetic markers (the common mechanism called ‘recombinational deletion’ leaves a relict structure called a solo-LTR [9]), they might help us in the measurement of divergence dates and population sizes for these archaic hominins.
Acknowledgments
We are grateful to the TCGA Data Access Committee (project 3504: “Endogenous retroviruses and cancer”), the UCSC CGhub, and the WGS500 Project Consortium for access to the data. The WGS500 project is funded by the Wellcome Trust, Oxford NIHR Biomedical Research Centre and Illumina. E.M., A.K. and R.B. are supported by the Wellcome Trust, and G.M. by an MRC Clinician Scientist Fellowship.
Footnotes
Supplemental Information including experimental procedures, one table and one figure can be found with this article online at http://dx.doi.org/10.1016/j.cub.2013.10.028.
Contributor Information
Gkikas Magiorkinis, Email: gkikas.magiorkinis@zoo.ox.ac.uk.
Robert Belshaw, Email: robert.belshaw@plymouth.ac.uk.
Supplemental Information
References
- 1.Agoni L., Golden A., Guha C., Lenz J. Neandertal and Denisovan retroviruses. Curr. Biol. 2012;22:R437–R438. doi: 10.1016/j.cub.2012.04.049. [DOI] [PubMed] [Google Scholar]
- 2.Subramanian R.P., Wildschutte J.H., Russo C., Coffin J.M. Identification, characterization, and comparative genomic distribution of the HERV-K (HML-2) group of human endogenous retroviruses. Retrovirology. 2011;8:90. doi: 10.1186/1742-4690-8-90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lee E., Iskow R., Yang L., Gokcumen O., Haseley P., Luquette L.J., III, Lohr J.G., Harris C.C., Ding L., Wilson R.K. Landscape of somatic retrotransposition in human cancers. Science. 2012;337:967–971. doi: 10.1126/science.1222077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Reich D., Green R.E., Kircher M., Krause J., Patterson N., Durand E.Y., Viola B., Briggs A.W., Stenzel U., Johnson P.L.F. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature. 2010;468:1053–1060. doi: 10.1038/nature09710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Dupressoir A., Lavialle C., Heidmann T. From ancestral infectious retroviruses to bona fide cellular genes: Role of the captured syncytins in placentation. Placenta. 2012;33:663–671. doi: 10.1016/j.placenta.2012.05.005. [DOI] [PubMed] [Google Scholar]
- 6.Belshaw R., Dawson A.L.A., Woolven-Allen J., Redding J., Burt A., Tristem M. Genomewide screening reveals high levels of insertional polymorphism in the human endogenous retrovirus family HERV-K(HML2): Implications for present-day activity. J. Virol. 2005;79:12507–12514. doi: 10.1128/JVI.79.19.12507-12514.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Meyer M., Kircher M., Gansauge M.-T., Li H., Racimo F., Mallick S., Schraiber J.G., Jay F., Prüfer K., de Filippo C. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012;338:222–226. doi: 10.1126/science.1224344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Young G.R., Stoye J.P., Kassiotis G. Are human endogenous retroviruses pathogenic? An approach to testing the hypothesis. Bioessays. 2013;35:794–803. doi: 10.1002/bies.201300049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Magiorkinis G., Belshaw R., Katzourakis A. “There and back again”: revisiting the pathophysiological roles of Endogenous Retroviruses in the post-genomic era. Phil. Trans. Roy. Soc. B. 2013;368:20120504. doi: 10.1098/rstb.2012.0504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wang-Johanning F., Rycaj K., Plummer J., Li M., Yin B., Frerich K., Garza J., Shen J., Lin K., Yan P. Immunotherapeutic potential of anti-human endogenous retrovirus-K envelope protein antibodies in targeting breast tumors. J. Natl. Cancer Inst. 2012;104:189–210. doi: 10.1093/jnci/djr540. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.