Skip to main content
The Journal of General Virology logoLink to The Journal of General Virology
. 2016 Mar 1;97(Pt 3):537–542. doi: 10.1099/jgv.0.000393

Proposed reference sequences for hepatitis E virus subtypes

Donald B Smith 1,, Peter Simmonds 1, Jacques Izopet 2, Edmilson F Oliveira-Filho 3, Rainer G Ulrich 4,5, Reimar Johne 6, Matthias Koenig 7, Shahid Jameel 8,, Tim J Harrison 9,, Xiang-Jin Meng 10,, Hiroaki Okamoto 11,, Wim H M Van der Poel 12,, Michael A Purdy 13,
PMCID: PMC5588893  NIHMSID: NIHMS878173  PMID: 26743685

Abstract

The nomenclature of hepatitis E virus (HEV) subtypes is inconsistent and makes comparison of different studies problematic. We have provided a table of proposed complete genome reference sequences for each subtype. The criteria for subtype assignment vary between different genotypes and methodologies, and so a conservative pragmatic approach has been favoured. Updates to this table will be posted on the International Committee on Taxonomy of Viruses website (http://talk.ictvonline.org/r.ashx?C). The use of common reference sequences will facilitate communication between researchers and help clarify the epidemiology of this important human pathogen. This subtyping procedure might be adopted for other taxa of the genus Orthohepevirus.


The current literature contains several inconsistencies in the naming of hepatitis E virus (HEV) subtypes, which often creates confusion in the scientific community. HEV is a member of the family Hepeviridae within the genus Orthohepevirus. The genus has three species that infect birds (Orthohepevirus B), rodents, soricomorphs and carnivores (Orthohepevirus C) or bats (Orthohepevirus D), and one species, Orthohepevirus A, comprising seven genotypes that infect humans (HEV-1, -2, -3, -4 and -7), pigs (HEV-3 and -4), rabbit (HEV-3), wild boar (HEV-3, -4, -5 and -6), mongoose (HEV-3), deer (HEV-3), yak (HEV-4) and camel (HEV-7) (Lee et al., 2015; Smith et al., 2014).

This division of HEV into seven genotypes and criteria for their assignment and identification are based on a demarcation p-distance threshold between genotypes of 0.088 for amino acid distances of concatenated ORF1- and ORF2-encoded proteins (lacking hypervariable regions between ORF1 protein amino acid residues 706–778 and 928–929, numbered with reference to GenBank accession no. M73218) (Smith et al., 2014). However, the criteria by which HEV sequences can be assigned to subtypes within genotypes are less consistent and sometimes confusing. When HEV subtypes were first comprehensively tabulated a decade ago, only 49 complete genome sequences were available and many subtype assignments were based on the analysis of subgenomic regions (Lu et al., 2006). Since then, the number of complete genome sequences has increased to almost 300 and most of the subtypes defined by Lu et al. (2006) are now represented by at least one complete genome sequence. However, there is currently no agreed list of reference sequences for these subtypes, although attempts at standardization have been made for HEV-3 (Smith et al., 2015; Vina-Rodriguez et al., 2015). One problem that is encountered in assigning sequences to particular subtypes is that no consistent criteria have been identified that define intra- and inter-subtype distances (Oliveira-Filho et al., 2013; Smith et al., 2013). For example, nucleotide p-distances between subtypes of HEV-1 are all less than 0.12, while those between subtypes of HEV-3 range from 0.12 to 0.26 and from 0.13 to 0.18 for subtypes of HEV-4. In addition, within these genotypes, the ranges of within- and between-subtype distances overlap. As a result, some complete genome sequences have been given conflicting subtype assignments.

An example comes from a recent paper (Lhomme et al., 2015) in which strain TR19 (GenBank accession no. JQ013794) was used as the reference sequence for subtype 3c. The frequency of subtype 3c infections has increased over the last decade in France, similar to the increase in subtype 3c documented previously in England and Wales (Ijaz et al., 2014). However, the ‘subtype 3c’ strains from the UK actually correspond to the subtype 3i reference sequence used in the French study. In other cases, subgenomic sequences used as reference sequences (Thiry et al., 2015) derive from strains for which no further sequence information is available. As a result, it has become difficult to compare the results of phylogenetic analyses carried out using different subgenomic regions or even using the same region in different studies.

To address these issues, we propose a standard reference set of complete genome sequences (Table 1). This table is available online on the International Committee on Taxonomy of Viruses (ICTV) website (http://talk.ictvonline.org/r.ashx?C) and will be updated as new information becomes available. In producing this table, we considered using the most central sequence (the medoid) in each subtype group as the reference sequence. Although this method avoids the possibility of choosing a divergent member of a subtype as the reference sequence, it would also mean that subtype reference sequences might not be stable because the medoid could change as more sequences are obtained or as the structure of the subtype is redefined by the addition or exclusion of divergent strains. In addition, our decision to use the designations of Lu et al. (2006), with priority given to strains with the earliest date of accession, will be minimally disruptive to the existing literature.

Table 1.

Reference sequences for HEV subtypes

Genotype Subtype* GenBank accession no. Strain Subgenomic reference sequences/comments
1 1a M73218 Burma
1b D11092 HPECG
1c X98292 I1
1d AY230202 Morocco
1e AY204877 T3
1f JF443721 IND-HEV-AVH5-2010
2 2a M74506 M1
2b AF173231, AF173232, AY903950 (ORF2)
3 3a AF082843 Meng
3b AP003430 JRA1
3c FJ705359 wbGER27
3d AF296165AF296167 (ORF2)
3e AB248521 swJ8-5
3f AB369687 E116-YKH98C
3g AF455784 Osh 205
3h JQ013794 TR19
3i FJ998008 BB02
3j AY115488 Arkell Isolated from pooled material
3 AB290312 swMN06-A1288
3 JQ953664 FR-SHEV3c-like
3 AB369689 E088-STM04C
3 AB290313 swMN06-C1056
3 EU360977 swX07-E1
3 KJ873911 FR_R
3 EU723513 SW627
3ra FJ906895 GDC9 Mostly from rabbit
3 KJ013415 CHN-BJ-r14(9) From rabbit – divergent within 3ra clade
3 JQ013791 W1-11 From rabbit – divergent within 3ra clade
4 4a AB197673 JKO-ChiSai98C
4b DQ279091 swDQ
4c AB074915 JAK-Sai
4d AJ272108 T1
4e AY723745 IND-SW-00-01
4f AB220974 HE-JA2
4g AB108537 CCC220
4h GU119961 CHN-XJ-SW13
4i DQ450072 swCH31
4 AB369688 E087-SAP04C
5 5a AB573435 JBOAR135-Shiz09 From wild boar
6 6a AB602441 wbJOY_06 From wild boar
6 AB856243 wbJNN_13 From wild boar
7 7a KJ496143 178C From camel
7 KJ496144 180C From camel

Reference sequences not assigned a subtype by Lu et al. (2006) are highlighted in bold.

*

Unassigned subtypes are denoted by genotype without a subtype designation.

Subtypes 2b and 3d are defined from Lu et al. (2006) by the subgenomic sequences indicated.

The criteria used were as follows:

  1. To minimize disruption of previous subgenotype assignments, priority was given to the subtype assignments given by Lu et al. (2006).

  2. To enable phylogenetic analyses to be carried out on different fragments of the genome, subtype reference sequences must comprise both the ORF1 and ORF2 coding regions and not be a recombinant between previously assigned subtypes.

  3. If more than one complete genome sequence was available for a subtype, priority was given to the first sequence to be submitted to GenBank or, where submission dates were identical, the lowest alphabetic/numeric accession number.

  4. If a subtype was assigned by Lu et al. (2006) based on the analysis of subgenomic fragments, these fragments were used to identify potential reference sequences by performing a blast search against GenBank. The highest-scoring complete genome sequences were considered as potential reference sequences if nucleotide sequence identities were >90 % and formed a discontinuous distribution compared with identities with previously named complete genome sequences.

  5. Complete genome sequences that were phylogenetically distinct from previously assigned complete genome sequences, and not related to any of the subtypes described by Lu et al. (2006), were only assigned as a new subtype if at least three complete ORF1 and ORF2 sequences were available that were epidemiologically unrelated (from different studies or localities). Unassigned complete genome sequences were labelled ‘genotype_accession number’ (e.g. ‘3_AB369689’).

Phylogenetic and sequence analyses

HEV sequences >7000 nt were downloaded from GenBank on 27 October 2015 and aligned using sse v.1.2 (Simmonds, 2012). Sequences differing by < 1 % (HEV-1 and HEV-3) or 2 % (HEV-4) of nucleotide positions were analysed by producing neighbour-joining trees, based on maximum composite likelihood distances using mega6 (Tamura et al., 2013), or by analysing the distribution of nucleotide p-distances using sse. Analyses in sequence sets lacking hypervariable regions or lacking the overlapping ORF2/3 region produced similar results (data not shown).

Genotype 1

Subtypes 1a–1e were all originally assigned on the basis of an analysis of complete genome sequences (Lu et al., 2006). A group of sequences that share a common branch with subtype 1a (JF443721–JF443726 and AB720035, Fig. 1a) are more divergent from subtype 1a (nucleotide p-distances 0.052–0.075, apart from M73218 to JF443726, with a distance of 0.046), than sequences of subtype 1a are from each other ( < 0.056), these distances being comparable to those between subtypes 1b and 1c (0.058–0.065). We propose that this phylogenetically distinct group of sequences be considered as subtype 1f, although no discontinuity exists in the distribution of pairwise nucleotide p-distances within HEV-1 sequences that distinguishes within- and between-subtype distances. Sequence FJ457024 is intermediate between subtypes 1a and 1f, but bootscan analysis using SSE suggests that it is a recombinant between these two subtypes (data not shown). All p-distances >0.087 derive from comparisons between subtypes 1a, 1b, 1c and 1f and either subtype 1d or 1e (>0.101), or between subtypes 1d and 1e (0.096), supporting the division of HEV-1 into two clades: 1abcf (comprising subtypes 1a, 1b, 1c and 1f) and 1de (comprising subtypes 1d and 1e).

Fig. 1.

Fig. 1.

Phylogenetic analyses of HEV complete genome sequences. A neighbour-joining tree of maximum-likelihood distances is shown with symbols used to indicate sequences belonging to the same subtype of HEV-1 (a), HEV-3 (b) or HEV-4 (c). Sequences without a symbol are recombinant (1_FJ457024) or have not been assigned to a subtype. Branches supported by >70 % of bootstrap replicates are indicated. Reference sequences are indicated by thick branch lines. Brackets indicate clades 1abcf and 1de (a), and 3abchij and 3efg (b).

Genotype 2

Only a single complete genome sequence has been reported for genotype 2a; genotype 2b was identified from the analysis of a 318 nt ORF2 fragment (Lu et al., 2006).

Genotype 3

The distribution of nucleotide p-distances among HEV-3 subtypes shows a complex pattern with multiple hierarchies of relatedness, even if the more divergent rabbit-derived strains are excluded. Subtypes 3a, 3b, 3c, 3h, 3i and 3j (3abchij) form one major clade (Fig. 1b), while subtypes 3e, 3f and 3 g form another (3efg) (Hewitt et al., 2014; Ijaz et al., 2014; Oliveira-Filho et al., 2013; Smith et al., 2015; Widén et al., 2011). The reference sequences for subtypes 3e and 3f were assigned according to their date of accession to GenBank. Five strains belonging to subytpe 3c were listed by Lu et al. (2006); their partial ORF1 and ORF2 sequences group with the corresponding regions of the complete genome sequence FJ705359, and separately from JQ013794, previously described as subtype 3c (Izopet et al., 2012). The latter sequence becomes the subtype 3h reference sequence since it groups with the ORF1 and ORF2 sequences of a subtype 3h strain listed by Lu et al. (2006) (AF110390 and AF110387). The other 3h strain (swNZ) listed by Lu et al. (2006) groups separately from all complete genome sequences. Four strains of subtype 3i are listed by Lu et al. (2006); sequences of one of these strains groups with FJ998008 for both the ORF1 and ORF2 regions. The other three strains have sequences only loosely (ORF1) or not associated (ORF2) with this sequence. Accordingly, we have assigned FJ998008 as the 3i reference sequence. Nucleotide p-distances between these subtypes (>0.120) overlap distances within subtypes ( < 0.123), making it difficult to unambiguously assign some subtypes. For example, nucleotide p-distances between subtype 3f and EU360977 (0.116–0.125) and between 3f and KJ873911 (0.116–0.125) span this range, as does that between 3h and AB290312 (0.120), while AB369689 and AB740232 are equally related to subtypes 3a (nucleotide p-distances 0.124–0.134) and 3b (0.126–0.137). We have chosen not to assign a subtype to these sequences, or to more divergent sequences such as JQ953664 and AB290313. Divergence among the HEV-3 rabbit-derived strains ranges up to 0.255, again with multiple levels of sequence divergence; assignment of these strains into named subtypes within the 3ra clade awaits the availability of further complete genome sequences.

Genotype 4

Seven HEV-4 subtypes were defined by Lu et al. (2006) (subtypes 4a–4 g). The distribution of nucleotide sequence distances between and within HEV-4 subtypes is nearly continuous with distances between subtypes (>0.133) overlapping those within subtypes ( < 0.139), although a peak from 0.15 to 0.18 consists only of distances between subtypes. Phylogenetic analysis also reveals multiple levels of branching (Fig. 1c) but without higher-level groupings akin to those observed for HEV-1 and HEV-3. Consequently we used a pragmatic approach, adopting previous designations and avoiding the proliferation of new subtype names. One of the 4f subgenomic ORF1 GenBank accession numbers given by Lu et al. (2006) (AY427953) should be AY684253. However, both this sequence and another subtype 4f ORF1 sequence (AB075970) group with the subtype 4a reference sequence (data not shown). Two additional subtype 4f sequences given by Lu et al. (2006) (AB082547 and AB082558) derive from the HE-JA2 strain for which a complete genome sequence is now available (AB220974) and which is distinct from previously named subtypes, so this becomes the subtype 4f reference sequence. Two additional subtypes (4h and 4i) follow the assignments given in a previous publication (Liu et al., 2012). Sequence AB369688, although distinct from other subtypes, is represented by a single complete genome sequence and therefore remains unassigned.

Genotypes 5–7

The distance between the two complete genome sequences of HEV-6 (AB602441 and AB856243) is 0.198, and between the three complete genome sequences of HEV-7 (KJ496143, KJ496144 and KT818608) is 0.06–0.147. Comparison with distances between subtypes of HEV-3 and between subtypes of HEV-4 would suggest that both HEV-6 and HEV-7 could also be divided into two subtypes. However, as fewer than three complete genome sequences are currently available for each of these clades, we have not made any subtype assignments except to designate the first sequence of each genotype as subtype ‘a’.

Concluding remarks

A perennial problem in classifying virus diversity is that discrete, man-made categories used for classification become arbitrary as their genetic distinctness blurs into a continuum of variability with the description of additional novel lineages or recombinants. This problem has hindered the assignment of subtypes of HEV because of different levels of diversity within different HEV genotypes, and because both distance-based and phylogenetic methods do not provide clear criteria for demarcation between groups. Despite this problem, it is important that researchers have a common set of named reference sequences so that results from different studies can be compared. We hope that our table of subtype reference sequences will assist the interpretation of epidemiological and evolutionary studies of HEV. However, an important caveat is that researchers should use these reference sequences as way-markers in a complex landscape and be cautious about treating subtypes as stable biological or epidemiological entities.

Acknowledgements

This work was supported by The Wellcome Trust (grant 095831/Z/11/Z) to the Centre for Immunity, Infection and Evolution at the University of Edinburgh, UK.

References

  1. Hewitt P. E., Ijaz S., Brailsford S. R., Brett R., Dicks S., Haywood B., Kennedy I. T. R., Kitchen A., Patel P., other authors Hepatitis E virus in blood components: a prevalence and transmission study in southeast England. Lancet. 2014;384:1766–1773. doi: 10.1016/S0140-6736(14)61034-5. [DOI] [PubMed] [Google Scholar]
  2. Ijaz S., Said B., Boxall E., Smit E., Morgan D., Tedder R. S. Indigenous hepatitis E in England and Wales from 2003 to 2012: evidence of an emerging novel phylotype of viruses. J Infect Dis. 2014;209:1212–1218. doi: 10.1093/infdis/jit652. [DOI] [PubMed] [Google Scholar]
  3. Izopet J., Dubois M., Bertagnoli S., Lhomme S., Marchandeau S., Boucher S., Kamar N., Abravanel F., Guérin J.-L. Hepatitis E virus strains in rabbits and evidence of a closely related strain in humans, France. Emerg Infect Dis. 2012;18:1274–1281. doi: 10.3201/eid1808.120057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Lee G.-H., Tan B.-H., Teo E.C.-Y., Lim S.-G., Dan Y.-Y., Wee A., Aw P. P. K., Zhu Y., Hibberd M. L., other authors Chronic infection with camelid hepatitis E virus in a liver-transplant recipient who regularly consumes camel meat and milk. Gastroenterology. 2015;150:355–357. doi: 10.1053/j.gastro.2015.10.048. [DOI] [PubMed] [Google Scholar]
  5. Lhomme S., Abravanel F., Dubois M., Chapuy-Regaud S., Sandres-Saune K., Mansuy J.-M., Rostaing L., Kamar N., Izopet J. Temporal evolution of the distribution of hepatitis E virus genotypes in southwestern France. Infect Genet Evol. 2015;35:50–55. doi: 10.1016/j.meegid.2015.07.028. [DOI] [PubMed] [Google Scholar]
  6. Liu P., Li L., Wang L., Bu Q., Fu H., Han J., Zhu Y., Lu F., Zhuang H. Phylogenetic analysis of 626 hepatitis E virus (HEV) isolates from humans and animals in China (1986–2011) showing genotype diversity and zoonotic transmission. Infect Genet Evol. 2012;12:428–434. doi: 10.1016/j.meegid.2012.01.017. [DOI] [PubMed] [Google Scholar]
  7. Lu L., Li C., Hagedorn C. H. Phylogenetic analysis of global hepatitis E virus sequences: genetic diversity, subtypes and zoonosis. Rev Med Virol. 2006;16:5–36. doi: 10.1002/rmv.482. [DOI] [PubMed] [Google Scholar]
  8. Oliveira-Filho E. F., König M., Thiel H.-J. Genetic variability of HEV isolates: inconsistencies of current classification. Vet Microbiol. 2013;165:148–154. doi: 10.1016/j.vetmic.2013.01.026. [DOI] [PubMed] [Google Scholar]
  9. Simmonds P. sse: A nucleotide and amino acid sequence analysis platform. BMC Res Notes. 2012;5:50. doi: 10.1186/1756-0500-5-50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Smith D. B., Purdy M. A., Simmonds P. Genetic variability and the classification of hepatitis E virus. J Virol. 2013;87:4161–4169. doi: 10.1128/JVI.02762-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Smith D. B., Simmonds P., Jameel S., Emerson S. U., Harrison T. J., Meng X.-J., Okamoto H., Van der Poel W. H. M., Purdy M. A., International Committee on Taxonomy of Viruses Hepeviridae Study Group Consensus proposals for classification of the family Hepeviridae . J Gen Virol. 2014;95:2223–2232. doi: 10.1099/vir.0.068429-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Smith D. B., Ijaz S., Tedder R. S., Hogema B., Zaaijer H. L., Izopet J., Bradley-Stewart A., Gunson R., Harvala H., other authors Variability and pathogenicity of hepatitis E virus genotype 3 variants. J Gen Virol. 2015;96:3255–3264. doi: 10.1099/jgv.0.000264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Tamura K., Stecher G., Peterson D., Filipski A., Kumar S. mega6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol. 2013;30:2725–2729. doi: 10.1093/molbev/mst197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Thiry D., Mauroy A., Saegerman C., Licoppe A., Fett T., Thomas I., Brochier B., Thiry E., Linden A. Belgian wildlife as potential zoonotic reservoir of hepatitis E virus. Transbound Emerg Dis. 2015 doi: 10.1111/tbed.12435. (in press) [DOI] [PubMed] [Google Scholar]
  15. Vina-Rodriguez A., Schlosser J., Becher D., Kaden V., Groschup M. H., Eiden M. Hepatitis E virus genotype 3 diversity: phylogenetic analysis and presence of subtype 3b in wild boar in Europe. Viruses. 2015;7:2704–2726. doi: 10.3390/v7052704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Widén F., Sundqvist L., Matyi-Toth A., Metreveli G., Belák S., Hallgren G., Norder H. Molecular epidemiology of hepatitis E virus in humans, pigs and wild boars in Sweden. Epidemiol Infect. 2011;139:361–371. doi: 10.1017/S0950268810001342. [DOI] [PubMed] [Google Scholar]

Articles from The Journal of General Virology are provided here courtesy of Microbiology Society

RESOURCES