Skip to main content
Journal of Virology logoLink to Journal of Virology
. 2012 Jul;86(14):7688–7691. doi: 10.1128/JVI.00769-12

Endogenous Hepadnaviruses in the Genome of the Budgerigar (Melopsittacus undulatus) and the Evolution of Avian Hepadnaviruses

Jie Cui a, Edward C Holmes a,b,
PMCID: PMC3416299  PMID: 22553337

Abstract

Endogenous hepadnaviruses (hepatitis B viruses [HBVs]) were recently discovered in the genomes of passerine birds. We mined six additional avian genomes and discovered multiple copies of endogenous HBVs in the budgerigar (order Psittaciformes), designated eBHBV. A phylogenetic analysis reveals that the endogenous hepadnaviruses are more diverse than their exogenous counterparts and that the endogenous and exogenous hepadnaviruses form distinct lineages even when sampled from the same avian order, indicative of multiple genomic integration events.

TEXT

Although viruses lack a true fossil record, key aspects of the pattern, process, and time scale of their evolution can be inferred from the analysis of the endogenous viral elements that are present in some eukaryotic genomes (10). The best-documented cases are the retroviruses that integrate into the genome of host somatic cells as an integral part of their replication cycle. Occasionally, these viruses invade and integrate into the genome of germ line cells and are then vertically inherited as elements of host DNA and widely known as endogenous retroviruses. Endogenous retroviruses are commonplace in animals, comprising at least 8% of the human genome (11). It was recently observed that nonretroviral viruses from multiple families of both RNA and DNA viruses have also integrated into the genomes of some animal species (2, 10), indicating that a diverse array of viruses have the ability to form endogenous copies.

Hepadnaviruses (family Hepadnaviridae) are partially double-stranded DNA viruses that possess a circular genome of up to ∼3.2 kb and only three or four open reading frames (ORFs). The best-described hepadnavirus is human hepatitis B virus (HBV), which replicates in the liver cells (hepatocytes) of hosts and which may lead to such serious conditions as cirrhosis and liver cancer (12). Notably, hepadnaviruses carry a DNA polymerase (P) that possesses a reverse transcriptase activity similar to that of retroviruses. To date, exogenous hepadnaviruses have been described in a diverse range of animal hosts, including higher primates, some species of rodents, and a variety of birds. Despite the diversity of animal species infected, endogenous hepadnaviruses have been discovered only in avian species (all from the order Passeriformes), namely, some finches (Taeniopygia guttata, Poephila cincta, Lonchura punctulata, and Chloebia gouldiae, family Estrildidae), the olive sunbird (Cyanomitra olivacea, family Nectariniidae), and the dark-eyed junco (Junco hyemalis, family Emberizidae) (5). Phylogenetic analyses of these endogenous sequences indicate that hepadnaviruses integrated into some bird genomes at least 19 million years ago (MYA) (5, 10). However, because of the large numbers of extant avian species, which cover multiple orders and span an evolutionary time scale of 100 million years (7, 8), it is possible that endogenous hepadnaviruses have integrated into other avian species in addition to members of the Passeriformes, a phenomenon which in turn will shed new light on important aspects of hepadnavirus evolution.

We employed a genomic screen method (3) for hepadnaviruses using the whole-genome shotgun assemblies of four avian species available in GenBank (http://www.ncbi.nlm.nih.gov/genome): budgerigar (Melopsittacus undulatus, order Psittaciformes), chicken (Gallus gallus, order Galliformes), turkey (Meleagris gallopavo, order Galliformes), and zebra finch (Taeniopygia guttata, order Passeriformes). Briefly, we used the genomic group BLAST program (http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi) and set a cutoff E value of 1e−10, sequence similarity of 40%, and coverage of at least 10% of the query sequence to signify a positive match. The P (polymerase) ORF of duck hepatitis B virus (DHBV, NC_001344) was used as the query in all searches. We also screened three additional bird species released by the NIH Intramural Sequencing Center (NISC) as part of the Comparative Vertebrate Sequencing Initiative Project (http://www.nisc.nih.gov/): emu (Dromaius novaehollandiae, order Struthioniformes), condor (Gymnogyps californianus, uncertain order), and sparrow (Zonotrichia albicollis, order Passeriformes). In this case, due to a lack of protein annotation, we employed the nucleotide BLASTN program (http://www.ncbi.nlm.nih.gov/blast/) and utilized the complete DHBV NC_001344 genome. Finally, to provisionally determine whether hepadnaviruses might be present in even more diverse taxa, we applied the same genomic search protocol to the genomes of a reptile, the anole lizard (Anolis carolinensis), and an amphibian, the Western clawed frog (Xenopus tropicalis).

This genomic screening revealed that the genome of the budgerigar possesses multiple copies of endogenous hepadnaviruses (designated eBHBVs), with 17 P ORF sequences detected (Table 1). The endogenous hepadnaviruses present in the zebra finch genome (eZHBVs) (5) were also confirmed by our analysis. A reverse BLAST analysis against exogenous HBV protein sequences confirmed that these insertions were indeed closely related to avian hepadnaviruses, although many were present as only partial genomic forms. It was also clear that the eBHBVs and eZHBVs identified to date are nonorthologous, as they did not share the same host flanking genes, such that they represent independent insertions into their respective host genomes (Table 1). Interestingly, the genome of the anole lizard contained viral fragments that are distantly related to HBV (e.g., query AAWZ02028642, match to DHBV of E value of 3e−15; sequence similarity of 27%). However, due to the low sequence similarity to exogenous counterparts, these sequences were not analyzed further. No endogenous hepadnaviruses were observed in the other avian genomes analyzed (although it is important to note that the sparrow genome is available only at low coverage).

Table 1.

Results of the BLAST analysis of the P ORFs of avian hepadnaviruses

Species GenBank accession no. Contig length (bp) P length (amino acids) Similarity (%) Genome type Flanking gene(s) E value Reverse BLAST (E value)
Melopsittacus undulatus (budgerigar) AGAI01066647 52,555 418 65 Complete PCBD1, GH1 0.0 DHBV (0.0)
324 46 1e−78 DHBV (0.0)
108 62 4e−32 RGHBV (5e−40)
AGAI01070398 34,567 377 66 Complete PCBD1 4e−140 RGHBV (2e−179)
265 49 5e−57 DHBV (4e−73)
AGAI01058091 67,933 156 65 Partial PCBD1 1e−59 DHBV (1e−69)
AGAI01048473 59,743 110 55 Partial PCBD1 3e−32 CHBV (9e−40)
AGAI01069060 4,174 198 42 Partial LOC100548410 5e−29 HHBV (3e−35)
AGAI01070370 182,598 135 46 Partial PCBD1 1e−27 ShHBV(1e−34)
AGAI01068110 134,081 105 55 Partial PCBD1 3e−26 DHBV (5e−32)
AGAI01067768 139,808 108 61 Partial PCBD1 3e−26 DHBV (1e−25)
AGAI01046846 42,041 95 56 Partial PCBD1 2e−24 DHBV (4e−30)
AGAI01056960 185,067 85 53 Partial PCBD1 1e−17 StHBV (2e−24)
AGAI01048462 27,912 84 50 Partial Aanat 2e−15 RGHBV (6e−19)
AGAI01067853 33,178 114 44 Partial CR1 5e−13 StHBV (7e−21)
AGAI01070664 32,569 90 46 Partial LOC100232143 2e−12 RGHBV (7e−17)
AGAI01050253 40,494 91 48 Partial PCBD1 9e−10 StHBV (5e−22)
Taeniopygia guttata (zebra finch) ABQF01038718 71,638 404 58 Complete LOC100217595 9e−138 DHBV (3e−156)
ABQF01051978 1,732 159 58 Partial NDa 2e−48 DHBV (4e−54)
ABQF01047718 82,878 206 47 Partial ATP2B2 8e−44 PHBV (5e−53)
ABQF01007435 128,889 108 42 Partial LOC100221519 2e−43 SGHBV (6e−22)
ABQF01039383 26,988 139 51 Partial LOC100228257 2e−43 DHBV (3e−34)
ABQF01051981 3,171 79 71 Partial ND 1e−25 RGHBV (1e−30)
ABQF01105392 2,838 137 52 Partial ND 3e−23 CHBV (4e−35)
ABQF01097021 4,156 91 56 Partial ND 6e−20 RGHBV (3e−24)
ABQF01026236 39,254 91 55 Partial LOC100225966 3e−19 DHBV (2e−24)
ABQF01051706 13,577 81 54 Partial ND 6e−17 SGHBV (2e−20)
ABQF01033534 67,953 100 43 Partial LOC100224031 2e−09 RGHBV (5e−12)
a

ND, no data.

The complete genome of eBHBV was recovered from contig AGAI01066647 of the budgerigar genome. Accordingly, the viral genome was 3,067 bp in length, with a precore (PreC) ORF of 930 bp, a P ORF of 2,410 bp, and presurface (PreS) ORF of 1,030 bp (Fig. 1). Notably, all three ORFs are not in fully intact form: PreC has lost the ATG start codon and P has a 1-bp insertion and a premature stop codon, while PreS also possesses a premature stop codon. Overall, the PreC, P, and PreS ORFs of eBHBV exhibited 32%, 36%, and 35% divergence (uncorrected p [pairwise] distance) from DHBV, respectively.

Fig 1.

Fig 1

Genome structure of eBHBV. The major ORFs are located and shown in comparison to that of duck hepatitis virus (DHBV; NC_001344). Positions of proteins are also shown in comparison to DHBV. Rectangles with different colors represent the different ORFs of eBHBV, encoding the precore (PreC), core (C), polymerase (P), and surface (PreS and S) proteins.

To establish the phylogenetic positions of avian endogenous HBVs (eBHBVs and eZHBVs), we collected P ORF sequences of eight avian exogenous viruses in the genus Orthohepadnavirus: duck hepatitis B virus (DHBV, NC_001344, duck, order Anseriformes), Ross's goose hepatitis B virus (RGHBV, AY494849, Ross's goose, order Anseriformes), crane hepatitis B virus (CHBV, AJ441112, crane, order Gruiformes), stork hepatitis B virus (StHBV, AJ251937, stork, order Ciconiiformes), sheldgoose hepatitis B virus (ShHBV, NC_005890, sheldgoose, order Anseriformes), heron hepatitis B virus (HHBV, NC_001486, heron, order Pelecaniformes), parrot hepatitis B virus (PHBV, JN565944, parrot, order Psittaciformes), and snow goose hepatitis B virus (SGHBV, AF110997, snow goose, order Anseriformes). Because the mammalian hepadnaviruses are far more distantly related (mean p distance of 0.7 to the avian hepadnaviruses), they were not included in this analysis. A total of nine avian endogenous HBVs (eBHBV: AGAI01066647, AGAI01070398, AGAI01068110, and AGAI01067768; eZHBV: ABQF01051978, ABQF01039383, ABQF01051981, ABQF01105392, and ABQF01038718) were analyzed in three different data sets, reflecting the differing lengths of the available P ORF sequences: (i) data set A, comprising amino acid positions 412 to 513 of the P ORF of DHBV (n = 17); (ii) data set B, representing positions 412 to 565 of the P ORF (n = 13); and (iii) data set C, comprising the complete P ORF (785 amino acids; n = 11). Other sequences of eZHBV described previously (5) were considered too short in the resultant sequence alignments for meaningful analysis. All protein sequences were aligned using MUSCLE (4) and checked manually using the Se-Al program (http://tree.bio.ed.ac.uk/software/seal/). Highly divergent and ambiguously aligned regions were then removed using the Gblocks program (13). Finally, the phylogenetic relationships of both the endogenous and exogenous hepadnaviruses were determined using the maximum likelihood method available in PhyML 3.0 (6), incorporating the best-fit JTT+Γ model of amino acid substitution as determined by ProtTest 2.4 (1). The robustness of each node was determined using 1,000 bootstrap replicates.

Notably, our phylogenetic analyses show the endogenous avian hepadnaviruses as being more divergent (and hence likely basal, although the tree is unrooted) from all known exogenous avian HBVs, with some of the eZHBVs being strikingly divergent (Fig. 2). Although there was often a lack of phylogenetic resolution, especially in the short A and B data sets, all phylogenetic trees exhibited similar evolutionary patterns, with an intermingling of eBHBVs and eZHBVs (data sets A and B) and a clear separation between the endogenous and exogenous hepadnaviruses.

Fig 2.

Fig 2

Phylogenetic relationships of avian endogenous and exogenous hepadnaviruses. (A) Data set for amino acid residues 412 to 513 of the P ORF of DHBV (n = 17); (B) data set for positions 412 to 565 (n = 13); (C) data set for complete P ORF (n = 11). The two avian endogenous HBVs (with accession numbers) are colored: eBHBVs in blue and eZHBVs in red. Bootstrap values lower than 70% are not shown, and branch lengths are drawn to a scale of amino acid substitutions per site. The trees are midpoint rooted for purposes of clarity only. The letter(s) following each sequence denotes the avian order of the host species: A, Anseriformes; C, Ciconiiformes; G, Gruiformes; Pa, Passeriformes; Pe, Pelecaniformes; Ps, Psittaciformes.

More difficult to determine is the time scale of this evolutionary history, even though previous studies have convincingly shown that at least some endogenous hepadnaviruses diverged millions of years ago (5, 10). In particular, the fact that eBHBVs and eZHBVs do not constitute orthologous sequences means that their time of integration cannot simply be estimated using host divergence times, and the presence of diverse eBHBV and eZHBV lineages on the phylogenetic trees is strongly suggestive of multiple integrations of hepadnaviruses into avian genomes. More notable is the great genetic diversity of endogenous avian hepadnaviruses in comparison to the far shallower diversity of the exogenous forms, suggesting that evolutionary tempos and modes may differ between these two forms of virus. The phylogenetic separation between those exogenous and endogenous hepadnaviruses identified to date is also striking. This is most notable in the case of the budgerigar (endogenous) and parrot (exogenous). Although both these species belong to the order Psittaciformes, their hepadnavirus sequences do not cluster together; rather, the parrot sequence groups closely with all other exogenous avian HBVs and those sampled from a range of bird orders. Although the available data set is small, at face value this phylogenetic pattern suggests that the distribution of hepadnaviruses that circulate in bird species has changed through time; this is compatible with a macroevolutionary model in which individual viral families experience considerable lineage birth and death (9). In addition, although endogenous hepadnaviruses are clearly inherited between some bird species for time periods of millions of years (5), the exogenous component of the hepadnavirus phylogeny presents no strong evidence for long-term virus-host codivergence; for example, sequences from Anseriformes fall into multiple locations, and Anseriformes and Gruiformes are not as closely related as is implied by the viral phylogeny (7). Hence, there has clearly been at least some cross-species transmission among the avian hepadnaviruses, which will further complicate attempts to reconstruct a realistic time scale for their evolutionary history.

Although our study has shed new light on the genetic diversity of avian hepadnaviruses, a fuller understanding of the patterns and processes of evolution in this important group of viruses will require an expanded screening of extant bird species (especially passerines) for exogenous viral forms and of high-quality avian genomes for endogenous copies of this important group of viruses. In addition, because the replication of exogenous hepadnaviruses occurs largely in hepatocytes, it will be important to determine the precise mechanisms by which these viruses achieve germ line integration.

Footnotes

Published ahead of print 2 May 2012

REFERENCES

  • 1. Abascal F, Zardoya R, Posada D. 2005. ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21:2104–2105 [DOI] [PubMed] [Google Scholar]
  • 2. Belyi VA, Levine AJ, Skalka AM. 2010. Unexpected inheritance: multiple integrations of ancient bornavirus and ebolavirus/marburgvirus sequences in vertebrate genomes. PLoS Pathog. 6:e1001030 doi:10.1371/journal.ppat.1001030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Cui J, Holmes EC. 2012. Endogenous lentiviruses in the ferret genome. J. Virol. 86:3383–3385 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792–1797 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Gilbert C, Feschotte C. 2010. Genomic fossils calibrate the long-term evolution of hepadnaviruses. PLoS Biol. 8:e1000495 doi:10.1371/journal.pbio.1000495 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Guindon S, et al. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59:307–321 [DOI] [PubMed] [Google Scholar]
  • 7. Hackett SJ, et al. 2008. A phylogenomic study of birds reveals their evolutionary history. Science 320:1763–1768 [DOI] [PubMed] [Google Scholar]
  • 8. Hedges SB, Dudley J, Kumar S. 2006. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22:2971–2972 [DOI] [PubMed] [Google Scholar]
  • 9. Holmes EC. 2009. The evolution and emergence of RNA viruses. Oxford series in ecology and evolution (OSEE). Oxford University Press, Oxford, United Kingdom [Google Scholar]
  • 10. Katzourakis A, Gifford RJ. 2010. Endogenous viral elements in animal genomes. PLoS Genet. 6:e1001191 doi:10.1371/journal.pgen.1001191 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Lander ES, et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921 [DOI] [PubMed] [Google Scholar]
  • 12. Seeger C, Mason WS. 2000. Hepatitis B virus biology. Microbiol. Mol. Biol. Rev. 64:51–68 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Talavera G, Castresana J. 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56:564–577 [DOI] [PubMed] [Google Scholar]

Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES