Skip to main content
Journal of Virology logoLink to Journal of Virology
. 2012 Sep;86(17):9510–9513. doi: 10.1128/JVI.01164-12

The First Full-Length Endogenous Hepadnaviruses: Identification and Analysis

Wei Liu 1, Shaokun Pan 1, Huijuan Yang 1, Weiya Bai 1, Zhongliang Shen 1, Jing Liu 1,, Youhua Xie 1,
PMCID: PMC3416111  PMID: 22718817

Abstract

In silico screening of metazoan genome data identified multiple endogenous hepadnaviral elements in the budgerigar (Melopsittacus undulatus) genome, most notably two elements comprising about 1.3× and 1.0× the full-length genome. Phylogenetic and molecular dating analyses show that endogenous budgerigar hepatitis B viruses (eBHBV) share an ancestor with extant avihepadnaviruses and infiltrated the budgerigar genome millions of years ago. Identification of full-length genomes with preserved key features like ε signals could enable resurrection of ancient BHBV.

TEXT

Endogenous viruses are generally believed to have arisen from infection of host germ line cells by exogenous viruses that integrated their DNA into cellular genomes, followed by inheritance during evolution (5, 19). Endogenized viral DNA sequences can be regarded as fossil records of ancient viruses and provide invaluable information on both the origin and evolutionary history of extant viruses. Endogenous viral elements have been most frequently identified for retroviruses (2, 6, 8, 12, 14, 15, 24), as integration of viral double-stranded DNA (dsDNA) into the host genome is an indispensable step of the retroviral life cycle.

Hepadnaviridae is a family of small (about 3,000- to 3,200-bp), partially double-stranded circular DNA viruses (23). Extant hepadnaviruses fall into two genera: orthohepadnaviruses, infecting mammals, and avihepadnaviruses, infecting birds. Upon infection, hepadnaviral genomes are converted into covalently closed circular DNA (cccDNA) in the nucleus, which serves as a transcription template for all viral RNA species, and progeny viral genomes are produced through reverse transcription of pregenomic RNA (pgRNA) by viral polymerase. Integration of hepadnaviral DNA into the hepatocyte genome has been demonstrated in vitro and in vivo (1, 18) and has been associated with development of hepatocellular carcinoma (HCC) in humans (9). Although analyses of available mammalian genome sequence data have failed to identify any endogenous hepadnavirus elements (13), a number of endogenous hepadnavirus fragments that collectively cover ∼70% of the hepadnavirus genome have been located on the zebra finch (Taeniopygia guttata) genome (7, 13).

In this study, in silico screening of potential endogenous hepadnaviral sequences in 157 publicly available metazoan genomes (http://www.ncbi.nlm.nih.gov/sites/genome) was performed using tBLASTn and polymerase protein sequences from extant orthohepadnaviruses of human (hepatitis B virus [HBV]), woolly monkey (WMHBV), woodchuck (WHV), ground squirrel (GSHV), and Arctic squirrel (ASHV), as well as avihepadnaviruses of heron (HHBV), Ross's goose (RGHBV), snow goose (SGHBV), stork (STHBV), crane (CCHBV), ashy-headed sheldgoose (AGHBV) (22), and parrot (PHBV) (20), as queries. Employing a cutoff of 35% sequence identity (2), positive matches were found in the zebra finch, of the order Passeriformes, as previously reported (7, 13), and in the budgerigar (Melopsittacus undulatus), of the order Psittaciformes. Since PHBV polymerase generated the best matches in this initial screening, budgerigar genome sequences were analyzed in tBLASTn using PHBV polymerase and PreC/C protein sequences as queries and 35% sequence identity as a cutoff. A total of 61 endogenous hepadnavirus sequences (collectively designated endogenous budgerigar hepatitis B viruses [eBHBV]) were identified in 49 contigs (see Table S1 in the supplemental material). Because budgerigar genomic sequencing data are still in draft status and not yet fully assembled, precise localization of eBHBV sequences at the chromosome level is currently not possible.

Like all previously reported endogenous zebra finch HBV (eZHBV) sequences (7, 13), most eBHBV sequences constitute incomplete viral genome fragments (see Table S1 in the supplemental material). However, two eBHBV sequences containing near- or longer-than-full-length genome were identified: eBHBV1 is ∼3.9 kb in length and comprises about 1.3× the genome with redundant polymerase and PreC/C sequences at the termini, whereas eBHBV2 is ∼3.1 kb in length and comprises almost exactly 1.0× the genome with both termini in PreC/C (Fig. 1A). To confirm the presence of eBHBV1 and eBHBV2 sequences in the budgerigar genome, we amplified genomic DNA extracted from blood samples of four budgerigar birds using primers annealing to flanking host genomic sequences. Amplicons with expected sizes were obtained, and sequencing results of cloned amplicons were nearly identical (>99.5%) with published contig data (see Fig. S1 and S2 in the supplemental material).

Fig 1.

Fig 1

Endogenous full-length hepadnavirus genomes identified in budgerigar. (A) Endogenous budgerigar hepadnavirus eBHBV1 (1.3×) and eBHBV2 (1.0×) sequences are depicted using 1.0× parrot HBV genome as a reference (top). (B) Phylogenetic relationships between exogenous and endogenous hepadnaviruses. eBHBV1 and eBHBV2 sequences are highlighted in red. Bootstrap values greater than 70 percentage points are shown.

The availability of full-length viral genome sequences enabled us to conduct a more thorough analysis of eBHBV. Polymerase and core amino acid sequences of eBHBV1, eBHBV2, and 13 representative exogenous hepadnaviruses were aligned in MUSCLE 3.7 (4) and manually adjusted to achieve maximum similarity. Corresponding nucleotide sequence alignment was then used to infer phylogenetic relationships using the maximum likelihood (ML) method in PhyML 3.0 (10). Based on the Akaike information criterion, jModeltest 0.1 (21) suggested GTR+ G as the best substitution model for ML analysis. The robustness of inferred phylogeny was tested by performing bootstrap analysis using 1,000 pseudoreplicate data sets. The obtained phylogenetic tree places eBHBV1 and eBHBV2 in a separate branch, distinct from all the extant exogenous avihepadnaviruses, including PHBV (Fig. 1B), and suggests the existence of a common ancestor shared by eBHBV and the common ancestor of extant avihepadnaviruses. Nucleotide sequence divergence between two long terminal repeats (LTRs) has been used to estimate the insertion time of endogenous retrovirus using molecular dating approaches (2, 12). The presence of 0.3× genome length of redundancy in eBHBV1 offered the opportunity to perform similar analysis. The two copies exhibit 2.01% divergence at the nucleotide level. Since neutral nuclear substitution rates in birds have been estimated to range from 0.2% to 0.39% substitutions per site per million years (7), it could be calculated that eBHBV1 inserted into the budgerigar genome at least 2.5 to 5.0 million years ago (mya). By applying a similar approach to two putatively duplicated eZHBV elements, the initial insertion of the ancestor element into the zebra finch genome has been estimated to occur at least 3.8 to 7.5 mya (7). In addition, molecular dating analysis of the divergence of orthologs of two endogenous hepadnaviral elements in the zebra finch and closely related dark-eyed junco indicated that insertion of these elements into the common ancestor of these species occurred at least 19.2 and 40 mya, respectively (7). On the other hand, calculations using extant exogenous avihepadnavirus sequences suggested that the time to their most recent common ancestor was only less than 6,000 years (25). Phylogenetic and molecular dating analyses of endogenous hepadnaviruses in birds clearly indicate that avian hepadnaviruses have a much longer evolution history.

The genomic organization of both eBHBV1 and eBHBV2 is nearly identical to that of modern-day avihepadnavirus (Fig. 1A). Most open reading frames (ORFs) are disrupted by nonsense mutations, except for coding sequences for eBHBV1 core, which remain intact. Alignment of eBHBV P/PreS/S and PreC/C nucleotide sequences as well as translatable P, PreC/C, and PreS/S amino acid sequences with extant hepadnaviruses showed a higher degree of similarity with avian viruses than with mammalian viruses (see Fig. S3 to S7 in the supplemental material), which is in agreement with their phylogenetic relationship (Fig. 1B). Compared to the situation in extant avihepadnaviruses, the C termini of eBHBV PreC/C extend 15 bp more deeply into polymerase ORF, resulting in a slightly longer PreC/C ORF.

According to the current model of hepadnaviral replication (1, 18, 23), the terminal protein (TP) domain of polymerase, the ε packaging signal, and the direct repeat (DR) elements are all essential for viral replication. Interestingly, these features are preserved in both eBHBV1 and eBHBV2. Alignment of eBHBV TP amino acid sequences with extant hepadnaviruses identified tpY101 as the residue most likely to be used for BHBV reverse transcription priming, corresponding to tpY96 in DHBV and tpY63 in HBV (Fig. 2A).

Fig 2.

Fig 2

Endogenous budgerigar hepadnavirus features related to viral replication. (A) Alignment of endogenous and exogenous hepadnavirus polymerase TP domain amino acid sequences. The conserved tyrosine residue used for priming is marked by asterisk. (B) Comparison of ε secondary structure as well as DR1 sequences of eBHBV and representative exogenous hepadnaviruses of human (HBV), duck (DHBV), and heron (HHBV) (1). Template sites for initial priming of minus DNA synthesis and primers attached to tyrosine on TP are depicted. Red arrows indicate direction of minus DNA elongation post-primer translocation.

Priming of hepadnaviral minus DNA synthesis takes place when polymerase binds to the ε packaging signal near the 5′ end of pgRNA, which depends on the latter's characteristic stem-loop structure. The two copies of ε in eBHBV1 are identical (see Fig. S1 in the supplemental material) and highly similar to eBHBV2 ε (see Fig. S8 in the supplemental material). The secondary structure assumed by the middle part of eBHBV ε RNA predicted by Mfold (26) is similar to what has been established for representative extant hepadnavirus ε signals using experimental approaches (1), except that the upper stem is disrupted (Fig. 2B).

After priming, the short primer attached to TP must translocate to complementary sequences on the copy of DR1 near the 3′ end of pgRNA (DR1*) to continue minus DNA synthesis (1, 18, 23). Between them, eBHBV1 and eBHBV2 contain a total of 6 DR elements of 13 bp in length (Fig. 1A and 2B). Compared to DR of extant avihepadnaviruses, eBHBV DR has an extra T (U on RNA) at the 5′ end, but the remaining 12 nucleotides differ from avihepadnavirus DR at only one position. Moreover, avihepadnavirus DR1 has an adjoining 5′ T (Fig. 2B), whereas DR2 has an adjoining 5′ A, making eBHBV DR more similar to avihepadnavirus DR1.

Taken together, these analyses indicate that eBHBV1 and eBHBV2 not only have preserved full-length genomes of an ancient hepadnavirus but might also have preserved key elements involved in viral replication, which might greatly benefit future efforts to resurrect this fossilized virus for characterization and comparison with modern-day hepadnaviruses (7).

The mechanisms underlying integration of hepadnaviral DNA into the hepatocyte genome are still poorly understood. Analysis of HBV integrations in human HCC samples indicates that about half of the integrated HBV DNA segments have one host-virus junction located near DR1 or DR2 or between DR1 and DR2 (3, 17). Since this region is where the ends of partially double-stranded HBV genomes and aberrant double-stranded replication products lie, it has been suggested that nonhomologous end joining (NHEJ) reactions linking breakpoints in host chromosome DNA and open ends of HBV DNA are probably responsible for initiating these integrations (11, 16, 23). However, except in a few cases, most eBHBV elements, including eBHBV1 and eBHBV2, do not have host-virus junctions near DR1/DR2 (Fig. 1A; see Table S1 in the supplemental material). It is possible that eBHBV2 might have been derived from cccDNA broken up in the PreC/C region and integrated through NHEJ-like mechanisms. On the other hand, although some HBV integrations identified in HCC possess long inverted repeat regions (11, 17), HBV insertions with long direct repeat regions like eBHBV1 have almost never been reported. How eBHBV1 sequences could have formed and integrated is puzzling and warrants further study.

The reasons behind endogenous hepadnavirus elements' survival through millions of years of evolutionary selection are intriguing. Full assembly and annotation of the budgerigar genome will eventually provide information on neighboring host genes that might be affected by eBHBV integration and offer useful hints. On the other hand, full-length eBHBV sequences like eBHBV1 and eBHBV2 might have retained certain viral functions that somehow became advantageous for the host during evolution. Further in vivo experiments are warranted to address such possibilities.

GenBank accession numbers.

Nucleotide sequence data of eBHBV1 and eBHBV2, including flanking host genomic sequence data, are available in the Third Party Annotation Section of the DDBJ/EMBL/GenBank databases under accession numbers BK008520 and BK008521, respectively. Confirmative sequencing results for eBHBV1 and eBHBV2 are available under GenBank accession numbers JQ978774 to JQ978780 and JQ978781 to JQ978784, respectively.

Supplementary Material

Supplemental material

ACKNOWLEDGMENTS

This work was supported by NSTMP (2008ZX10002-010, 2012ZX10002-006, 2012ZX10004-503), the 973 project (2012CB519002), NSFC (31071143, 31170148), Shanghai Science and Technology Committee (11DZ2291900), and the scientific research foundation for young scientists of Shanghai Medical College, Fudan University (11J-5).

We thank Jianyi Fu, Wenli Zhang, and Minglei Xu of Shanghai Zoo for their help with animal samples.

ADDENDUM IN PROOF

While this paper was being reviewed, Cui and Holmes (J. Cui and E. C. Holmes, J. Virol. 86:7688–7691, 2012) reported some of the results independently.

Footnotes

Published ahead of print 20 June 2012

Supplemental material for this article may be found at http://jvi.asm.org/.

REFERENCES

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental material

Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES