Skip to main content
Journal of Virology logoLink to Journal of Virology
. 2011 Oct;85(19):9863–9876. doi: 10.1128/JVI.00828-11

Widespread Endogenization of Densoviruses and Parvoviruses in Animal and Human Genomes

Huiquan Liu 1,2, Yanping Fu 2, Jiatao Xie 2, Jiasen Cheng 2, Said A Ghabrial 3, Guoqing Li 1,2, Youliang Peng 4, Xianhong Yi 2, Daohong Jiang 1,2,*
PMCID: PMC3196449  PMID: 21795360

Abstract

Parvoviruses infect humans and a broad range of animals, from mammals to crustaceans, and generally are associated with a variety of acute and chronic diseases. However, many others cause persistent infections and are not known to be associated with any disease. Viral persistence is likely related to the ability to integrate into the chromosomal DNA and to establish a latent infection. However, there is little evidence for genome integration of parvoviral DNA except for Adeno-associated virus (AAV). Here we performed a systematic search for homologs of parvoviral proteins in publicly available eukaryotic genome databases followed by experimental verification and phylogenetic analysis. We conclude that parvoviruses have frequently invaded the germ lines of diverse animal species, including mammals, fishes, birds, tunicates, arthropods, and flatworms. The identification of orthologous endogenous parvovirus sequences in the genomes of humans and other mammals suggests that parvoviruses have coexisted with mammals for at least 98 million years. Furthermore, some of the endogenized parvoviral genes were expressed in eukaryotic organisms, suggesting that these viral genes are also functional in the host genomes. Our findings may provide novel insights into parvovirus biology, host interactions, and evolution.

INTRODUCTION

Members of the family Parvoviridae infect a wide variety of hosts, ranging from insects to primates. These viruses contain linear single-stranded DNA (ssDNA) genomes and typically possess two major gene cassettes; one encodes the nonstructural protein (NS or Rep) essential for viral gene expression and DNA replication, and the other encodes the structural proteins of the capsid (CP or VP) (5, 38). Members of this family have been classified into two subfamilies, the Parvovirinae (vertebrate viruses) and the Densovirinae (arthropod viruses) (15).

Generally, parvoviruses cause a wide range of acute or chronic diseases; many, however, are not known to be associated with any disease (6). Parvoviruses frequently cause persistent infections, but the persistence mechanisms remain unknown. Viral persistence is likely related to the ability to integrate into the chromosomal DNA and to establish a latent infection, such as for retroviruses (17, 22) and some DNA tumor viruses (11, 36, 50, 51). Adeno-associated virus (AAV), a nonautonomous parvovirus, can establish latency through site-specific genome integration into human chromosome 19 in cell culture (29, 41), and the autonomous parvovirus minute virus of mice (MVM) has been shown to integrate in a site-specific manner into episomes (12). However, it is not known whether integration into the host germ line DNA and consequent transmission to offspring (endogenization) take place.

Recently, Kerr and Boschetti (27) identified some short regions (17 to 26 nucleotides [nt]) of sequence identity between several human and rodent parvoviruses and their respective host genomes; this could be biologically relevant to the persistence of these viruses in host tissues. However, there is no clear evidence of integration of these viruses. The presence within the shrimp genome of sequences clearly related to infectious hypodermal and hematopoietic necrosis virus (IHHNV) (46), a common parvovirus of shrimp, implies that integration of autonomous parvoviruses may have occurred widely but has not been well documented. The increasing availability of eukaryotic genome data and viral sequences open up the scope for further investigating integration events as well as the mechanisms of pathogenesis and persistence of parvoviruses.

Hence, we performed a systematic search for homologs of parvoviral proteins in the publicly available eukaryotic genome databases, and our subsequent phylogenetic analysis confirmed that parvoviruses have been frequently endogenized into the nuclear genomes of various animals. While our paper was being prepared for submission, two independent groups of investigators reported that sequences derived from two genera, the parvoviruses and dependoviruses in the subfamily Parvovirinae, are integrated in the genomes of vertebrate species (3, 26). Here we report our more comprehensive and convincing results based on ample critical data analysis and laboratory research. Our studies not only have corroborated the endogenization of viruses in the subfamily Parvovirinae in vertebrate species but have also confirmed that numerous densoviruses (subfamily Densovirinae) have been endogenized into the genomes of invertebrate species. Furthermore, we identified a syntenic endogenous parvovirus in human and other mammal species that dates this endogenization event back to at least 98 million years. In addition, we have confirmed the expression of some endogenous viral genes in host genomes. The implications of these findings, which are pertinent to virus-cell interaction and evolution, are also discussed.

MATERIALS AND METHODS

Data mining.

All database searches were performed online and were completed in April 2010. To screen for parvovirus-related DNA sequences (PRDs) in eukaryotic genomes, we performed tBLASTn searches using as queries the NS and CP protein sequences of representative parvoviruses against the refseq_genomic, chromosome, whole-genome shotgun (WGS), and eukaryote genomic BLAST databases at NCBI (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Through an iterative process of screening, all nonredundant hits from these searches with E values of ≤1e−5 were extracted. The relationships between sequences found to be similar in these searches were determined by reverse BLAST comparisons using each extracted hit as a BLASTx query against the nonredundant (NR) protein database. The region beside each PRD was scanned for adjacent transposable elements (TEs) or repetitive sequences by a WSCensor (http://www.ebi.ac.uk/Tools/censor/) search against a reference collection of sequence repeats (25).

Examination of possible chimeras or errors in assembling of PRDs.

Sequence similarities between parvoviruses and eukaryotic genomes could be attributed to trivial contamination of eukaryotic genomic DNA with viral sequences during cloning or sequence assembly. To rule out this possibility, we searched against archival data of the eukaryotic genome sequencing using their PRDs and flanking cellular sequences as megaBLAST queries at the NCBI Trace Archive (http://www.ncbi.nlm.nih.gov/blast/mmtrace.shtml) with the cutoff value of >95% nucleotide identity and carefully examined the junctions between PRD and cellular sequences. The statistics of junction coverage that show the number of trace records containing the junctions between PRDs and cellular sequences are listed in Data Set S1 in the supplemental material.

Phylogenetic analyses.

The putative amino acid sequences of all available PRDs, which were obtained according to BLASTx hits, were used for the phylogenetic analysis with NS or CP proteins of representative exogenous vertebrate or arthropod parvoviruses. Multiple alignments of protein sequences were constructed using COBALT (37) (http://www.ncbi.nlm.nih.gov/tools/cobalt/cobalt.cgi?link_loc=BlastHomeAd.). The in-frame stop codons within PRDs were indicated as X. Protein maximum-likelihood (ML) trees were inferred with PhyML-mixtures (20, 30), assuming the EX2 mixture model (30) and SPR tree topology search strategy (21). Gaps in alignment were systematically treated as unknown characters. The reliability of internal branches was evaluated based on approximate likelihood ratio test (aLRT) statistics (1).

Detection of expression of PRDs from animal nuclear genomes.

To investigate whether PRDs can be expressed in animals, we first screened parvovirus-related cDNA sequences in the NCBI expressed sequence tag (EST) database using the method described in “Data mining” above. Subsequently, we compared the identified parvovirus-related cDNA sequences with those of animal genomes and parvoviruses by megaBLAST to determine whether they were expressed sequences from animal nuclear genomes or contaminating sequences from exogenous incidental viruses.

PCR amplification and DNA sequencing.

Bovine, cat, dog, horse, guinea pig, mouse C57BLC/6J, porcine, rabbit, Sprague-Dawley rat, and sheep genomic DNA samples were obtained from Zyagen Laboratories, and Drosophila sechellia, Drosophila persimilis, and Drosophila willistoni genomic DNA samples were acquired from the Drosophila Species Stock Center. To amplify the candidate DNA fragments from these DNA samples by PCR, primer pairs were designed based on the virus-related sequences and their flanking cellular sequences (see Table S1 in the supplemental material for the primer pairs used). PCR products were fractionated by gel electrophoresis on 1% agarose gels and stained with ethidium bromide. DNA was sequenced by Sanger methods at the Beijing Genomics Institute (BGI).

Nucleotide sequence accession numbers.

New sequences generated in this study were deposited in GenBank under accession numbers HM469386 to HM469391 and HM989956 to HM989958.

RESULTS

Identification of parvovirus-related DNA sequences in animal nuclear genomes.

We systematically screened the assembled genomes of 209 eukaryotes in genomic BLAST databases, as well as other, uncompleted eukaryotic genomes in the WGS database, using the methods described in Materials and Methods. This process identified 275 significant matches to parvoviral NS or CP proteins in diverse animal nuclear genomes (Table 1; see Data Set S1 in the supplemental material), which we named parvovirus-related DNA sequences (PRDs). The PRD copy numbers varied among animal species. For example, the tammar wallaby contains 39 PRDs, although its genome sequencing is not yet complete. In contrast, only one PRD homologous to parvovirus CP was found in the human genome, although humans are infected by many isolates belonging to distinct species of parvoviruses. Some PRDs were found in puffer fish, blood fluke, and two sea squirt species. Curiously, in terms of known viral families, parvoviruses comprise one group of DNA viruses that have not been found in fishes (14, 49). Parvoviruses have also not yet been reported in tunicates and flatworms (49).

Table 1.

Distribution of parvovirus-related DNA sequences among animal genomes

Organisma No. of genes
Parvovirus related
Total
NS CP
Mammals
    Primates
        Homo sapiens(human) 1 1
        Pan troglodytes(chimpanzee) 1 1
        Gorilla gorilla gorilla (Western lowland gorilla) 1 1
        Pongo abelii (Sumatran orangutan) 1 1
        Macaca mulatta(rhesus monkey) 1 1
        Callithrix jacchus (white-tufted-ear marmoset) 1 1
        Microcebus murinus (gray mouse lemur) 1 1
        Otolemur garnettii (small-eared galago) 1 1
        Tarsius syrichta (Philippine tarsier) 1 1b
    Rodents
        Mus musculus(house mouse) 3 2 5
        Rattus norvegicus(Norway rat) 4 2 6
        Dipodomys ordii (Ord's kangaroo rat) 1 1
        Cavia porcellus (domestic guinea pig) 7 3 10
    Carnivores
        Canis lupus familiaris(dog) 1 2 3
        Ailuropoda melanoleuca (giant panda) 1 1 2
        Felis catus (domestic cat) 1 1 2
    Odd-toed ungulate
        Equus caballus(horse) 2 1 3
    Even-toed ungulates
        Bos taurus(cattle) 1 1 2
        Sus scrofa(pig) 1 1
        Lama pacos (alpaca) 1 1
        Ovis aries (sheep) 1 1
    Marsupials
        Monodelphis domestica(gray short-tailed opossum) 6 5 11
        Macropus eugenii (tammar wallaby) 14 25 39
    Monotreme
        Ornithorhynchus anatinus(platypus) 3 1 4
    Bats
        Myotis lucifugus (little brown bat) 2 3 5
        Pteropus vampyrus (large flying fox) 3 3
    Placentals
        Dasypus novemcinctus (nine-banded armadillo) 1 1
        Procavia capensis (cape rock hyrax) 2 1 3
        Echinops telfairi (small Madagascar hedgehog) 2 2
        Loxodonta africana (African savanna elephant) 1 1
    Whale or dolphin
        Tursiops truncatus (bottlenosed dolphin) 2 2 4
    Rabbits and hares
        Oryctolagus cuniculus (rabbit) 1 1 2
        Ochotona princeps (American pika) 1 1
Fish
    Tetraodon nigroviridis (spotted green pufferfish) 2 2
Bird
    Taeniopygia guttata(zebra finch) 1 1
Tunicates
    Ciona intestinalis(sea squirt) 8 8
    Ciona savignyi (Pacific transparent sea squirt) 6 6
Crustacean
    Lepeophtheirus salmonis (salmon louse) strain Pacific 3 3c
Arachnid
    Ixodes scapularis (black-legged tick) strain Wikel colony 32 12 44c
Insects
    Aphid
        Acyrthosiphon pisum(pea aphid) strain LSR1 30 19 49c
    Flies
        Drosophila (fruit fly) grimshawi strain TSC 15287-2541.00 3 5 8c
        Drosophila (fruit fly) persimilis strain MSH-3 2 2c
        Drosophila (fruit fly) willistoni strain TSC 14030-0811.24 1 1c
        Drosophila (fruit fly) sechellia strain Rob3c 1 1 2c
    Bug
        Rhodnius prolixus (Triatomid bug) 19 7 26c
Flatworm
    Schistosoma mansoni(blood fluke) 1 1c
Total 169 106 275
a

Boldface, completed genomic assembly; underlining, whole-genome shotgun assembly; standard font, unfinished genomic sequence.

b

The top Blast hit is with the U94 gene of human herpesvirus 6, a parvovirus Rep homolog.

c

Densovirus-like gene.

Characteristics of PRDs.

Compared to intact parvovirus genomes, endogenous parvoviral sequences, in most cases, comprised only individual viral genes or gene fragments. However, some others consisted of complete copies of viral genomes, but several contained rearranged viral genome structures or were interrupted by TEs or repeated nuclear element insertions (Fig. 1). It should be noted that some PRDs are truncated due to sequencing or assembly gaps. This may also be caused by divergence of the PRDs. Furthermore, many PRD copies have resulted from segmental duplication within animal genomes, as they shared unambiguous flanking cellular sequences (see Fig. S1 in the supplemental material). The 5′ or 3′ noncoding sequences of endogenous parvoviruses were generally not detected, possibly due to sequence degradation. In the genome of the African savanna elephant, however, a PRD located within a reverse transcriptase gene shared significant sequence similarities not only with the complete Rep protein of Bovine adeno-associated virus but also with its 5′ noncoding sequence (Fig. 2). In summary, DNA copies of complete parvovirus genomes or individual genes could have been endogenized into host genomes and subsequently subjected to amplification, deletions, insertions, and rearrangements in the host genome.

Fig. 1.

Fig. 1.

Schematic representation of some PRDs and their most related viruses. Arrowhead boxes indicate viral-like genes (red, nonstructural proteins; blue, structural proteins). Green rectangular boxes indicate transposable elements. Colored sectors connect corresponding homologous regions, and the percent amino acid identity scores are indicated. Wavy and vertical lines within boxes indicate sequences containing frameshifts and stop codons compared with viral genes, respectively. Black arrowheads indicate primers which were used to amplify the junctions between PRD and host sequences. See Table S1 in the supplemental material for PCR primer sequences and their chromosomal locations.

Fig. 2.

Fig. 2.

Schematic representation and alignment of a PRD in the African savanna elephant genome and Bovine adeno-associated virus (BAAV). Sequence alignment of 5′ untranslated regions (1) and predicted amino acid sequences (2) of BAAV Rep and PRD are shown. Conserved nucleotides (amino acid residues) are shaded in orange. Green interrupted rectangular boxes indicate transposable elements, the length of which was not drawn to scale. Colored sectors connect corresponding homologous regions; percent nucleotide or amino acid identities are indicated.

Examination of the potential coding capacities of the endogenized viral sequences indicated that most were defective, containing numerous in-frame stop codons, frameshift mutations, and insertions or deletions (Fig. 1; see Data Set S1 in the supplemental material), which suggests that these are unlikely to have functional potential as a virus. This characteristic was especially apparent for vertebrate PRDs. The invertebrate PRDs, however, were more conserved, and some of them retained uninterrupted viral open reading frames (see Data Set S1 in the supplemental material).

Although AAV integrates into the human genome in a site-specific manner (29, 41), the PRDs were scattered in an apparently random fashion within animal genomes, but most of them were adjacent to or directly connected to TEs or repeated elements (Fig. 1 and 2; see Data Set S1 in the supplemental material), which might be involved in the integration mechanism.

Exclusion of the possibility that PRDs are contaminant sequences.

Several lines of evidence indicate that PRDs are real cellular sequences rather than contaminants originating from exogenous incidental viruses. First, a closer inspection of the raw sequence reads used for WGS assembly indicated deep sequencing coverage across the junctions between PRDs and cellular sequences (see Data Set S1 in the supplemental material). Moreover, identical PRD loci were present in two different assemblies (the NCBI reference assembly and the Celera assembly) of mouse and rat genomes and in three different assemblies (the GRCh37 primary reference assembly and alternate Celera and HuRef assemblies) of the human genome. These results strongly suggest that PRDs were not artifacts of cloning or sequence assembly. Second, most PRDs underwent various degrees of degradation, and several also contained rearranged viral structures or were interrupted by TEs or repeated nuclear element insertions, suggesting that PRDs invaded animal genomes millions of years ago. Finally, many PRDs arose via segmental duplication events within the animal genomes; hence, these represent an established germ line infection.

To validate these observations, we amplified and sequenced some of the proposed junctions between PRDs and cellular sequences from bovine, cat, dog, horse, guinea pig, mouse, porcine, rabbit, rat, sheep, and three fruit fly species (Drosophila sechellia, D. persimilis, and D. willistoni) (Fig. 3; see Table S1 in the supplemental material). The results revealed that the PCR products were of the expected sizes and the experimental sequences were identical to sequences of the animal genomes containing both the expected host sequences and PRDs.

Fig. 3.

Fig. 3.

PCR using animal total DNAs. PCR products were fractionated by gel electrophoresis on 1% agarose gels and stained with ethidium bromide. Marker, DNA marker DL 2000. Arrowheads indicate bands of the expected sizes in lanes with more than one band. The sequences of bands of the expected sizes from guinea pig, horse, D. sechellia NS, D. persimilis, mouse, pig, rabbit, rat NSCP-2, and cat were deposited in GenBank under accession numbers HM469386 to HM469391 and HM989956 to HM989958.

Phylogenetic analysis of PRDs and exogenous viruses.

To evaluate the genetic relationship of PRDs and exogenous parvoviruses, we performed a comprehensive phylogenetic analysis using deduced amino acid sequences of all available PRDs with the NS or CP protein sequences of representative parvoviruses. As shown in Fig. 4, the PRDs were most clearly placed within the subclades of phylogenetic trees of subfamily Parvovirinae or Densovirinae, thus strongly suggesting that the PRDs were derived from members of the family Parvoviridae. Numerous PRDs from different animal species and even different lineages clustered together and formed a sister clade to modern exogenous viruses (see examples in the upper part of the trees shown in Fig. 4A and B), suggesting that these represented extinct or undescribed lineages sharing a common ancestor with known modern viruses. Moreover, the phylogenetic pattern of these PRDs is not consistent with the evolutionary relatedness of their hosts, indicating that these PRDs have most likely invaded these species via independent integration events over time rather than via a single event in their ancestor. The PRDs from two tunicate species showed clustering of PRD units from the same species (Fig. 4B). Thus, the integration of virus probably occurred before the separation of these two species. However, it is also possible that these PRDs were derived from amplification events after integration into the two genomes. Indeed, the flanking sequences of some PRDs in the sea squirt genome show high sequence similarities (see Fig. S1 in the supplemental material).

Fig. 4.

Fig. 4.

Fig. 4.

Fig. 4.

Fig. 4.

Phylogenetic trees of exogenous parvoviruses and animal PRDs. (A and B) NS and CP trees of vertebrate parvoviruses and their related PRDs, respectively. The trees were rooted with the densovirus-like Penaeus monodon hepatopancreatic parvovirus. The node of the orthologous PRD clade is marked by a red diamond, and the relevant hosts are indicated by a blue arc in the middle of tree (B). (C and D) NS and CP trees of arthropod parvoviruses and their related PRDs, respectively. The trees were rooted with the parvovirus Aleutian mink disease virus. Only P values of the approximate likelihood ratios (SH test) of >0.5 (50%) are indicated. All scale bars correspond to 0.5 amino acid substitution per site. The PRD branches are printed in red. The taxon names of PRDs possibly derived from recent integration events are shaded in green (see details in text). Animals belonging to the same group are indicated to the right. The sequence accession number is given for each sequence.

Although many PRDs from the same species (such as those in pea aphid and black-legged tick) clustered together and shared high sequence identity with each other (Fig. 4C and D), their flanking host sequences were not homologous. This suggests that multiple endogenization of the same or very similar viruses occurred. In contrast, some PRDs from individual species (such as those in Triatomid bug, tammar wallaby, Norway rat, and domestic guinea pig) were placed within different clades (Fig. 4A, B, and C), suggesting that integration of distinct parvoviral species occurred in the same host genome. Notably, the sole PRD in the genome of Philippine tarsier is most closely related to the U94 gene of human herpesvirus 6 (48), a homolog of the parvovirus Rep gene (44, 48), indicating the possibility that it was directly derived from a herpesvirus integration event (Fig. 4A) (see Fig. S2 in the supplemental material).

The integration of parvoviruses could have occurred over a wide range of the evolutionary time scale. For example, several PRDs were most closely related to one species of modern exogenous viruses in the phylogenetic trees (green-shaded taxa in Fig. 4A, B, and C), clearly suggesting that integrations of these viruses involved relatively recent events. Some PRDs, on the other hand, were located at the base of the extant virus clades and have accumulated numerous degeneration mutations, suggesting that these were derived from integration of ancestral parvoviruses millions of years ago. Furthermore, through genomic syntenic analysis, we found that the human PRD-related sequences were present at similar locations in the genomes of primates, carnivores, ungulates, and dolphins but not in placental African savanna elephant (Fig. 5 A and B; see Table S2 in the supplemental material). These human-related PRDs were located in an intron of human Ellis van Creveld syndrome 2 (limbin) gene orthologs. Phylogenetic analysis of such PRDs from different mammalian lineages confirmed that they are derived from a single event rather than from many independent integration events at the same location, since the phylogenetic topology was consistent with mammal evolution (9, 39) (Fig. 5C). Considering mammal phylogeny, this finding demonstrated that these PRDs must have appeared in an ancestor of living placental mammals that diverged from primitive placental mammals in the late Cretaceous period (9, 39). This implies that parvoviruses have coexisted with mammals for an evolutionary history stretching at least 98 million years (9). Given that the nucleotide substitution rate of ssDNA viruses is close to that of RNA viruses, it is remarkable that the similarity between these PRDs and related exogenous viruses could still be recognized (Fig. 5D). We could not identify genes orthologous to human PRD in the rodent lineage (Fig. 5B), suggesting that lineage-specific deletions have occurred during evolution.

Fig. 5.

Fig. 5.

Identification of a syntenic PRD locus in mammal genomes. (A) Schematic representation of the human limbin gene structure. Vertical blue bars indicate putative exons; arcs indicate putative introns. The region of the PRD and flanking sequence is marked with a red rectangular box. (B) The PRD and flanking sequence in human genome were aligned with the orthologous regions of other mammals using BLASTn. Colored bars indicate the similarity level between human sequences and other mammal sequences as measured by BLAST scores. Asterisks indicate that sequences are truncated due to sequencing gaps. Note that African savanna elephant and mouse did not contain PRDs. See Table S2 in the supplemental material for accession numbers and positions of mammal sequences used for analysis. SINE, short interspersed repetitive element. (C) Phylogenetic tree of orthologous PRD regions in mammal genomes. The phylogenetic tree was constructed by the neighbor-joining method using the maximum composite likelihood substitution model with the pairwise deletion option in MEGA4 (45). The bootstrap probability is indicated for each interior branch. The scale bar indicates the number of nucleotide substitutions per site. The tree is midpoint rooted, and its topology is consistent with the phylogeny of mammals (9, 39). (D) Alignment of putative amino acid sequences of human PRD and its best-matched virus. The default color scheme for ClustalW alignment in the Jalview program was used. Percent amino acid identity is indicated. The red asterisks and triangle indicate predicted stop codons and frameshift sites in human PRD, respectively. NHP_AAV_CP, capsid protein of nonhuman primate adeno-associated virus (AAO88189.1).

Expression of PRDs in animal nuclear genomes.

We found numerous parvovirus-related cDNA sequences in various organisms by mining NCBI EST database. Through subsequent sequence comparisons, some of these ESTs were regarded as contaminated sequences from exogenous incidental viruses because they shared high nucleotide identity with sequences of known parvoviruses but lacked sequence similarity to animal genomes (see Data Set S2 in the supplemental material). Some other ESTs exhibited low amino acid identity compared to sequences of PRDs or exogenous parvoviruses (see Data Set S2 in the supplemental material), and several also contained frameshifts or internal stop codons. Such characteristics are very similar to those of the PRDs detected from nuclear genomes, implying that these ESTs are most likely expressed PRDs from nuclear genomes of animals. However, whether they were real expressed PRDs remain to be established, because the genome sequences of relevant animals are not currently available.

We were convinced that the parvovirus-related cDNA sequences from some invertebrates (sea squirt, fruit fly, pea aphid, black-legged tick, and salmon louse) were expressed PRDs from nuclear genomes (see Data Set S3 in the supplemental material) because most of these have high sequence identity to PRDs as well as to flanking cellular sequences over full-length sequences (Fig. 6). Although a corresponding full-length sequence of a parvovirus-related cDNA (accession no. EW905967) was not found in the black-legged tick genome, it occurred in its Trace-WGS records (Fig. 6G). This suggests that some trace records of black-legged tick containing expressed PRDs remain to be assembled into genomic contigs. In addition, two parvovirus-related cDNAs (DY223558 and DY224604) of pea aphid did not have corresponding full-length sequences in either its genome or its Trace-WGS records, and the parvovirus-related cDNAs of salmon louse contained rearranged structures relative to genomic sequences (Fig. 6H and I). This finding could possibly be because genomic sequencing is not yet complete or did not cover some regions containing PRDs.

Fig. 6.

Fig. 6.

Schematic representation of some PRDs and their expressed sequences. Colored boxes with arrowheads and swallowtails indicate ORFs with and without start codons, respectively. Red, nonstructural proteins; blue, structural proteins. Green rectangular boxes indicate transposable elements. Wavy and vertical lines within boxes indicate sequences containing frameshifts and stop codons compared with viral genes, respectively. Similar regions of expressed sequences are identified, and the percent nucleotide identity with PRDs is indicated. Note that the full-length sequence corresponding to a parvovirus-related cDNA (EW905967) containing repeated sequences was not identified in the genomic database but occurred in the Trace-WGS database (G), suggesting that some trace records containing expressed PRDs remain to be assembled into genomic contigs. In addition, two parvovirus-related cDNAs (DY223558 and DY224604) containing repeated sequences did not have corresponding full-length sequences in either the genomic database or the Trace-WGS database (H), and the parvovirus-related cDNAs contained rearranged structures relative to genomic sequences (I), which could possibly be due to incomplete genomic sequencing or because sequencing did not cover some regions containing PRDs.

Remarkably, the only endogenous parvoviral sequence detected in the fruit fly (Drosophila sechellia) has a distinct viral genome organization (Fig. 7). The arrangement and sequence of its NS-like open reading frame (ORF) are similar to those of Dendrolimus punctatus densovirus (DpDNV), whereas the arrangement and sequence of its CP-like ORF are similar to those of Periplaneta fuliginosa densovirus (PfDNV). More than 300 EST sequences were detected for the CP-like gene. Through EST assembly and sequence comparison, we found that the CP-like gene was expressed as multiple transcript variants through alternative splicing involving two additional overlapping ORFs near the 3′ end of this gene. The genome structure and alternatively spliced CP transcripts of this endogenous parvoviral counterpart are different from those of any known parvoviruses. Currently, we do not know whether it functions as viral or cellular genes. It will be interesting to determine if it could be activated as an episomal virus under certain conditions in the host.

Fig. 7.

Fig. 7.

Structural and expression analysis of an endogenous densovirus in the genome of Drosophila sechellia. Colored arrowhead boxes indicate virus-like ORFs. Red, nonstructural proteins; blue, structural proteins; other colors, hypothetical proteins. Gray sectors connect corresponding homologous regions detected by BLASTp; percent amino acid identities are indicated. Black arrows indicate primers which were used to amplify and validate the connections. The sequence of the transposable element-densovirus-like gene boundary is shown above the diagram at the left. Blue bars represent the matched regions of expressed sequences of the endogenous densovirus; arcs indicate introns.

Most of the expressed PRDs were truncated or contained frameshifts or internal stop codons (Fig. 6), suggesting that these no longer generate functional proteins. However, the possibility that these PRDs function at the RNA level cannot be ruled out. We did not detect expressed sequences for any vertebrate PRDs. This is consistent with the observation that most vertebrate PRDs showed multiple defects and therefore may not be functional.

DISCUSSION

The discovery of PRDs unequivocally demonstrates that parvoviruses are capable of invading diverse animal genomes. Given the sequence divergence between the extant viruses and the integrated sequences as well as the limitation of sequenced animal genomes, the endogenous parvoviruses are likely to be more widely dispersed than described here. Moreover, integration of these endogenous viruses must have occurred in cellular tissues that were subsequently able to contribute to the germ line and must be fixed in a given species. Therefore, the integration events that actually occurred in the somatic cells might be much more frequent and widespread. Integration of parvoviruses probably involved illegitimate recombination, rearrangements, insertions, deletions, and intragenomic proliferation. Numerous PRDs are adjacent to TEs or repeated elements. The genomic regions containing these elements are unstable and prone to form double-strand breaks, which could have facilitated viral integration (8, 34, 35).

Although incorporation of parvoviral sequences is likely to be random and incidental as was integration of AAV recombinant vectors (33), the presence of these viral sequences in animal genomes could have functional implications for the virus-cell interactions. On the one hand, integration of parvoviral sequences may confer selective advantages to the host. Recently, a similar antiviral immunity mechanism has been proposed to underlie the integration of different viruses in plants, fungi, and animals (4, 7, 16, 28, 31). DNA double-strand break repair functions have been reported to serve as a defense response against parvovirus infection (47). Parvoviruses can preferentially target genetically unstable, transformed (tumor) cells, which are deficient in DNA repair mechanisms (43). Infection with minute virus of mice (MVM) is known to trigger an innate antiviral response in normal but not transformed mouse cells (19). Furthermore, we found that the host species which carry endogenous parvoviruses were generally not amenable for invasion by their genetically most closely related exogenous counterparts, which were commonly found in other species, and the virus-infected species appear not to be subject to integration by the invading viral sequences. These observations are consistent with a previous report that shrimp (Penaeus monodon) populations containing integrated IHHNV sequences were not infected by IHHNV (46). Our own findings combined with previous reports suggest that the host can capture parvovirus-specific sequences during the repair of double-stranded DNA (dsDNA) breaks and subsequently use them as a defense against the virus. Moreover, this “integration-based immunity” (28) could be heritable if the viruses integrated into the germ line and were vertically transmitted to offspring.

On the other hand, integration of parvoviral sequences may be associated with parvovirus pathogenesis and persistence. Parvoviruses are species specific and appear to have evolved with their host species to such an extent that infection usually remains subclinical, whereas infections causing lethal disease may be an uncommon or aberrant situation (32, 42). In contrast, species jumping of parvoviruses could result in acute infections (23, 40, 49). Hence, parvoviruses may circumvent the host immune reaction and maintain an unapparent persistent infection in their host. However, when these viruses are transferred to new host species and the precise relationship with host has yet to be established, this situation might trigger a host antiviral innate immune response resulting in integration. As a consequence of suppressed viral propagation by PRD-mediated immunity, parvoviruses persistently infect their hosts at a low level. Although the host has subclinical illness or is asymptomatic under these conditions, the virus could still be transmissible to host offspring or to naive individuals.

Although some animal lineages (such as fishes, tunicates, and flatworms) are not known to be infected by parvoviruses, we have found many endogenous parvoviral sequences in their genomes. This suggests that these species can also be infected by parvoviruses, or at least could have been in the past. We identified an endogenous parvoviral sequence in the human genome. Genomic synteny (orthology) and phylogenetic analysis of this integrated sequence in mammal species date this endogenization event back to at least 98 million years, such that it coexisted with the rise of the mammals. As far as we know, this is the oldest “viral fossil” known. Some of the parvovirus-related genes were conserved and transcribed, suggesting that these viral genes are also functional in the host genomes.

Our studies also have a potential impact on gene therapy. A possible consequence of using viral vectors for human gene therapy is the inadvertent introduction of foreign DNA into recipient germ cells, causing the introduction of heritable changes into the offspring of patients (18). This could cause profound and far-reaching ethical problems. Parvoviruses, especially AAV, are currently being used for human somatic gene therapy (10, 13). Previous studies on the use of AAV as a gene therapy vector suggest that it did not transduce the germ cells (2, 24). Our findings, however, clearly suggest that germ line integration of parvoviruses is possible, raising the concern of germ line transmission of gene therapy vectors. Hence, the potential risk of germ line integration using AAV vectors during human gene therapy should be experimentally assessed before clinical applications.

In summary, our study provided convincing evidence that parvoviruses have been endogenized into the host genomes and that this endogenization is widespread and has occurred in diverse animal genomes. This discovery extends the host range of parvoviruses and provides fossil records of past viral invasions, and it thereby will help shed light on the evolutionary history of parvoviruses and hosts, as well as advance our knowledge of host-virus interactions. Furthermore, the capture and functional assimilation of exogenous viral genes may represent an important force in animal evolution.

Supplementary Material

[Supplemental material]

ACKNOWLEDGMENTS

This research was supported in part by the National Basic Research Program (2006CB101901), the Commonweal Specialized Research Fund of China Agriculture (3-21), the Program for New Century Excellent Talents in University (NCET-06-0665), and the Huazhong Agricultural University Scientific & Technological Self-Innovation Foundation.

Footnotes

Supplemental material for this article may be found at http://jvi.asm.org/.

Published ahead of print on 27 July 2011.

REFERENCES

  • 1. Anisimova M., Gascuel O. 2006. Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst. Biol. 55:539–552 [DOI] [PubMed] [Google Scholar]
  • 2. Arruda V. R., et al. 2001. Lack of germline transmission of vector sequences following systemic administration of recombinant AAV-2 vector in males. Mol. Ther. 4:586–592 [DOI] [PubMed] [Google Scholar]
  • 3. Belyi V. A., Levine A. J., Skalka A. M. 2010. Sequences from ancestral single-stranded DNA viruses in vertebrate genomes: the Parvoviridae and Circoviridae are more than 40 to 50 million years old. J. Virol. 84:12458–12462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Belyi V. A., Levine A. J., Skalka A. M. 2010. Unexpected inheritance: multiple integrations of ancient bornavirus and ebolavirus/marburgvirus sequences in vertebrate genomes. PLoS Pathog. 6:e1001030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Bergoin M., Tijssen P. 2008. Parvoviruses of arthropods, p. 76–85In Mahy B. W. J., Van Regenmortel M. H. V.(ed.), Encyclopedia of virology 3rd ed., vol. 4 Elsevier, Oxford, United Kingdom [Google Scholar]
  • 6. Berns K., Parrish C. R. 2007. Parvoviridae, p. 2437–2477In Knipe D. M., Howley P. M. (ed.), Fields virology, 5th ed. Lippincott-Williams & Wilkins Publishers, Philadelphia, PA [Google Scholar]
  • 7. Bertsch C., et al. 2009. Retention of the virus-derived sequences in the nuclear genome of grapevine as a potential pathway to virus resistance. Biol. Direct. 4:21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Bill C. A., Summers J. 2004. Genomic DNA double-strand breaks are targets for hepadnaviral DNA integration. Proc. Natl. Acad. Sci. U. S. A. 101:11135–11140 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Bininda-Emonds O. R., et al. 2007. The delayed rise of present-day mammals. Nature 446:507–512 [DOI] [PubMed] [Google Scholar]
  • 10. Blechacz B., Russell S. J. 2004. Parvovirus vectors: use and optimisation in cancer gene therapy. Expert Rev. Mol. Med. 6:1–24 [DOI] [PubMed] [Google Scholar]
  • 11. Clark D. A., et al. 2006. Transmission of integrated human herpesvirus 6 through stem cell transplantation: implications for laboratory diagnosis. J. Infect. Dis. 193:912–916 [DOI] [PubMed] [Google Scholar]
  • 12. Corsini J., Tal J., Winocour E. 1997. Directed integration of minute virus of mice DNA into episomes. J. Virol. 71:9008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Daya S., Berns K. I. 2008. Gene therapy using adeno-associated virus vectors. Clin. Microbiol. Rev. 21:583–593 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Essbauer S., Ahne W. 2001. Viruses of lower vertebrates. J. Vet. Med. B Infect. Dis. Vet. Public Health 48:403–475 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Fauquet C. M., Mayo M. A., Maniloff J., Desselberger U., Ball L. A. 2005. Virus taxonomy: eighth report of the International Committee on Taxonomy of Viruses. Elsevier Academic Press, San Diego, CA [Google Scholar]
  • 16. Flegel T. W. 2009. Hypothesis for heritable, anti-viral immunity in crustaceans and insects. Biol. Direct. 4:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Goff S. P. 1992. Genetics of retroviral integration. Annu. Rev. Genet. 26:527–544 [DOI] [PubMed] [Google Scholar]
  • 18. Gordon J. W. 1998. Germline alteration by gene therapy: assessing and reducing the risks. Mol. Med. Today 4:468–470 [DOI] [PubMed] [Google Scholar]
  • 19. Grekova S., et al. 2010. Activation of an antiviral response in normal but not transformed mouse cells: a new determinant of minute virus of mice oncotropism. J. Virol. 84:516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Guindon S., Gascuel O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696–704 [DOI] [PubMed] [Google Scholar]
  • 21. Hordijk W., Gascuel O. 2005. Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood. Bioinformatics 21:4338–4347 [DOI] [PubMed] [Google Scholar]
  • 22. Hu W. S., Temin H. M. 1990. Retroviral recombination and reverse transcription. Science 250:1227–1233 [DOI] [PubMed] [Google Scholar]
  • 23. Hueffer K., Parrish C. R. 2003. Parvovirus host range, cell tropism and evolution. Curr. Opin. Microbiol. 6:392–398 [DOI] [PubMed] [Google Scholar]
  • 24. Jakob M., et al. 2005. No evidence for germ-line transmission following prenatal and early postnatal AAV-mediated gene delivery. J. Gene Med. 7:630–637 [DOI] [PubMed] [Google Scholar]
  • 25. Kapitonov V. V., Jurka J. 2008. A universal classification of eukaryotic transposable elements implemented in Repbase. Nat. Rev. Genet. 9:411–412 (Author reply, 9:414.) [DOI] [PubMed] [Google Scholar]
  • 26. Kapoor A., Simmonds P., Lipkin W. I. 2010. Discovery and characterization of mammalian endogenous parvoviruses. J. Virol. 84:12628–12635 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Kerr J. R., Boschetti N. 2006. Short regions of sequence identity between the genomes of human and rodent parvoviruses and their respective hosts occur within host genes for the cytoskeleton, cell adhesion and Wnt signalling. J. Gen. Virol. 87:3567–3575 [DOI] [PubMed] [Google Scholar]
  • 28. Koonin E. V. 2010. Taming of the shrewd: novel eukaryotic genes from RNA viruses. BMC Biol. 8:2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Kotin R. M., et al. 1990. Site-specific integration by adeno-associated virus. Proc. Natl. Acad. Sci. U. S. A. 87:2211–2215 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Le S. Q., Lartillot N., Gascuel O. 2008. Phylogenetic mixture models for proteins. Philos. Trans. R. Soc. Lond. B Biol. Sci. 363:3965–3976 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Liu H. Q., et al. 2010. Widespread horizontal gene transfer from double-stranded RNA viruses to eukaryotic nuclear genomes. J. Virol. 84:11876–11887 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Lukashov V. V., Goudsmit J. 2001. Evolutionary relationships among parvoviruses: virus-host coevolution among autonomous primate parvoviruses and links between adeno-associated and avian parvoviruses. J. Virol. 75:2729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. McCarty D. M., Young S. M., Jr., Samulski R. J. 2004. Integration of adeno-associated virus (AAV) and recombinant AAV vectors. Annu. Rev. Genet. 38:819–845 [DOI] [PubMed] [Google Scholar]
  • 34. Miller D. G., Petek L. M., Russell D. W. 2004. Adeno-associated virus vectors integrate at chromosome breakage sites. Nat. Genet. 36:767–773 [DOI] [PubMed] [Google Scholar]
  • 35. Miller D. G., et al. 2005. Large-scale analysis of adeno-associated virus vector integration sites in normal human cells. J. Virol. 79:11434–11442 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Orend G., Linkwitz A., Doerfler W. 1994. Selective sites of adenovirus (foreign) DNA integration into the hamster genome: changes in integration patterns. J. Virol. 68:187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Papadopoulos J. S., Agarwala R. 2007. COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23:1073–1079 [DOI] [PubMed] [Google Scholar]
  • 38. Parrish C. R. 2008. Parvoviruses of vertebrates, p. 85–90In Mahy B. W. J., Van Regenmortel M. H. V. (ed.), Encyclopedia of virology 3rd ed., vol. 4 Elsevier, Oxford, United Kingdom [Google Scholar]
  • 39. Prasad A. B., Allard M. W., Green E. D., Program N. C. S. 2008. Confirming the phylogeny of mammals by use of large comparative sequence data sets. Mol. Biol. Evol. 25:1795–1808 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Roekring S., et al. 2002. Comparison of penaeid shrimp and insect parvoviruses suggests that viral transfers may occur between two distantly related arthropod groups. Virus Res. 87:79–87 [DOI] [PubMed] [Google Scholar]
  • 41. Samulski R. J., et al. 1991. Targeted integration of adeno-associated virus (AAV) into human chromosome 19. EMBO J. 10:3941–3950 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Shadan F., Villarreal L. P. 1993. Coevolution of persistently infecting small DNA viruses and their hosts linked to host-interactive regulatory domains. Proc. Natl. Acad. Sci. U. S. A. 90:4117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Shadan F. F., Villarreal L. P. 2000. Parvovirus-mediated antineoplastic activity exploits genome instability. Med. Hypotheses 55:1–4 [DOI] [PubMed] [Google Scholar]
  • 44. Srivastava A., Lusby E. W., Berns K. I. 1983. Nucleotide sequence and organization of the adeno-associated virus 2 genome. J. Virol. 45:555–564 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Tamura K., Dudley J., Nei M., Kumar S. 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24:1596–1599 [DOI] [PubMed] [Google Scholar]
  • 46. Tang K. F., Lightner D. V. 2006. Infectious hypodermal and hematopoietic necrosis virus (IHHNV)-related sequences in the genome of the black tiger prawn Penaeus monodon from Africa and Australia. Virus Res. 118:185–191 [DOI] [PubMed] [Google Scholar]
  • 47. Tauer T. J., Schneiderman M. H., Vishwanatha J. K., Rhode S. L. 1996. DNA double-strand break repair functions defend against parvovirus infection. J. Virol. 70:6446–6449 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Thomson B. J., Efstathiou S., Honess R. W. 1991. Acquisition of the human adeno-associated virus type-2 rep gene by human herpesvirus type-6. Nature 351:78–80 [DOI] [PubMed] [Google Scholar]
  • 49. Villarreal L. P. 2005. Viruses and the evolution of life. American Society for Microbiology, Washington, DC [Google Scholar]
  • 50. Wentzensen N., Vinokurova S., von Knebel Doeberitz M. 2004. Systematic review of genomic integration sites of human papillomavirus genomes in epithelial dysplasia and invasive cancer of the female lower genital tract. Cancer Res. 64:3878–3884 [DOI] [PubMed] [Google Scholar]
  • 51. Yang W., Summers J. 1999. Integration of hepadnavirus DNA in infected liver: evidence for a linear precursor. J. Virol. 73:9710–9717 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplemental material]

Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES