Highlights
-
•
Not all genomic regions and lengths of hepatitis E virus (HEV) are equally suited for use in genetic analyses.
-
•
Generally, longer nucleic acid stretches yield better accuracy in predicting the viral subtype.
-
•
Analysis of molecular and epidemiological data of HEV in the Netherlands indicates the ability of the 493 nt fragment to detect short term changes in the viral population size.
-
•
In the absence of whole genome sequences the 493 bp fragment can be used for molecular typing and population structure characterization of HEV.
Keywords: Hepatitis E virus, Diagnostics, Phylogenetics, Phylodynamics, Population size, Genotyping
Abstract
The aim of this study was to investigate to what extent fragments of the HEV genome could be used for accurate diagnostics and inference of viral population-scale processes. For this, we selected all the published whole genome sequences from the NCBI GenBank and trimmed them to various fragment lengths (ORF1,2,3, ORF1, ORF2, ORF3, 493 nt in ORF2 and 148 nt in ORF2). Each of the fragment lengths was used to infer the richness and diversity of the viral sequence types, typing accuracy, and potential use in phylodynamics. The results obtained from the different fragments were compared. We observed that, generally, the longer the nucleic acid fragment used in typing, the better the accuracy in predicting the viral subtype. However, the dominant HEV subtypes circulating in Europe were relatively well classified even by the 493 nt fragment, with false negative rates as low as 8 in 1000 typed sequences. Most fragments also give comparable results in analyses of population size, albeit with shorter fragments showing a broader 95 % highest posterior density interval and less obvious increase of the viral effective population size. The reconstructed phylogenies of a heterochronous subset indicated a good concordance between all the fragments, with the major clades following similar branching patterns. Furthermore, we have used the HEV sequence data from the Netherlands available in the HEVnet database as a case study for reconstruction of population size changes in the past decades. This data showed that molecular and epidemiological results are concordant and point to an increase in the viral effective population size underlying the observed increase in incidence of acute HEV infection cases. In the absence of whole genome sequencing data, the 493 bp fragment can be used for analyzing HEV strains currently circulating in Europe, as it is informative for describing short term population-scale processes.
1. Introduction
Hepatitis E virus (HEV) is a zoonotic virus responsible for acute hepatitis E in humans. The World Health Organization (WHO) estimates approximately 20 million HEV infections every year, with three million symptomatic manifestations and almost 60,000 HEV-related deaths (https://www.who.int/news-room/fact-sheets/detail/hepatitis-e). HEV is a member of the Paslahepevirus balayani, subfamili Orthohepevirus in the family Hepeviridae (Purdy et al., 2022) and, after its discovery an increasing number of HEV genotypes and particularly subtypes have been isolated in humans and animals (Smith et al., 2020, 2014). To date, it can be classified into 8 genotypes and 36 subtypes (Smith et al., 2020). Genotypes HEV-1 through HEV-4 can infect humans, with HEV-1 and 2 being shown to cause large outbreaks in undeveloped countries due to contamination of water supplies with human waste. Genotypes HEV-3 and HEV-4 are zoonotic and are associated with outbreaks caused by consuming uncooked meat or animal products from pigs, wild boar or deer (Kamar et al., 2012; Matsuda et al., 2003; Tei et al., 2003; Van der Poel, 2014); alternative routes of transmission for these subtypes are surface water or crops, with 17 % of surface water samples in The Netherlands found to be HEV RNA positive (Rutjes et al., 2009), and donation of HEV contaminated blood and blood products from subclinically infected donors. Genotypes HEV-5 and HEV-6 only infect wild boar, and HEV-7 and HEV-8 have been found in camels, with HEV-7 also occasionally identified in humans (Lee et al., 2016).
Though HEV is widespread in a number of animal species (Smith et al., 2020, 2014), the major animal reservoirs in developed countries appear to be domestic and wild pigs. In Europe, more than 50 % of the pig farms are reported to be infected and the seroprevalence within those farms can be over 80 % (Meester et al., 2022; Rutjes et al., 2014, 2007). Molecular analyses have shown that HEV strains detected in pigs and humans in the same geographical regions have a high genetic similarity, indicating that swine are the main source of infection for humans (Hogema et al., 2021). In addition, epidemiological studies showed pork products (especially dry raw sausages) as the main transmission routes for hepatitis E virus to the general population in the Netherlands (Tulen et al., 2019) and elders in Europe (Faber et al., 2018).
Analysis of viral RNA sequences can be useful for following the viral population dynamics, such as shifts of main genotypes or increases in disease incidence. For example, in the Netherlands, in recent years, a shift towards HEV-3c has been observed (Hogema et al., 2021), accompanied by an increase in the number of clinical HEV infection cases (Aspinall et al., 2017; Hogema et al., 2016); whether this is due to a higher virulence or transmissibility of the emerging subtype is unknown and subject to further investigations. Additionally, RNA sequence analysis is useful for identifying possible sources of infection (Davis et al., 2021). To this end genotyping can be performed, preferably based on the whole viral genome sequence. However, to generate whole genome sequences of HEV, positive samples with a sufficiently high RNA concentration (i.e. 10^4 genome copies) will be needed and these can often be retrieved in the acute phase of the viral infection only. Since pigs and wild boars infected with HEV do not show overt clinical signs and in humans, serious clinical disease is mainly seen in hospitalized patients, acquiring samples from the acute phase of infection is challenging. While fecal samples are normally the most appropriate for diagnostics due to their high HEV titers (Choi et al., 2003), they have the disadvantage of a high content of polysaccharides, phenolic and metabolic compounds that will act as inhibitors in the amplification reactions (Ward et al., 2009). As well as the presence of host genome and bacterial DNA and mRNA present in the sample. Yet another impediment in generating whole genome sequences from low-concentration samples is the presence of some stable stem-loop and hairpin structures in the HEV genome (Huang et al., 2007).
Alternatively, genotyping can be based on smaller fragments of the genome. In order to maintain concordance with the results based on the whole genome, the selected fragment(s) should yield comparable results when employed in various analyses. One RNA fragment that has been sequenced and used extensively for HEV analyses is only 493 nt in length (Boxman et al., 2017; Hogema et al., 2021) and located in the ORF2. Its wide usage across the OneHealth sectors facilitate the integrated surveillance of HEV infections. As the majority of the HEV sequences available from the Netherlands cover only this fragment, we have tested to what extent this is suitable in performing accurate diagnostics or inferring any viral population-scale processes. To do this, we selected all the published whole genome sequences from the NCBI GenBank and trimmed them to various fragment lengths (down to 148 nt). Each of the fragment lengths was used to infer the richness and diversity of the viral sequence types, typing accuracy, and viral phylodynamics. Subsequently, the results obtained of the different fragments were compared and conclusions were drawn on the utility of these particular genomic loci for molecular diagnostics, typing, phylogenetics, and phylodynamics of HEV. Furthermore, as a case study, we have used the HEV sequence data from the Netherlands, available in the HEVnet database, for the reconstruction of population size changes in the past decades.
2. Material and methods
2.1. Sequences
Hepatitis E virus sequences with a total length >6500 nt were retrieved from NCBI GenBank (17 January 2023). In addition, we have retrieved all HEV sequences available in HEVnet (hosted at https://secure.rivm.nl/mpf/database/hev, and accessed on 17 January 2023) that were labelled with “The Netherlands” as country of sampling (referred throughout the manuscript as the NL sequences).
2.2. Epidemiological data
The incidence of human acute HEV infections in the Dutch population since 2001 was retrieved from the annual surveillance data of the Dutch National Institute for Public Health and the Environment (RIVM).
2.3. Alignments
The NCBI selected whole genome sequences and the NL sequences were aligned using MUSCLE implemented in the muscle (v3.36.0) package in R (Edgar, 2004). The resulting alignment was manually curated using a custom R script and Jalview (2_11_1_3-windows-x64-java_8). Briefly, the whole genome alignment was trimmed to positions 26:7154 of reference strain MN614141 (NCBI accession number MN614141), and nucleotides 2122:2378, corresponding to the hypervariable region (Smith et al., 2012), have been removed. Subsequently out-of-frame sequences were removed. Hamming distances were calculated on the intermediate alignment and the resulting distance matrix was used to remove extremely divergent sequences (> 2350 nt distance from the cluster of the reference strain, ∼ 33 % divergence) from the alignment. The sequences coming from rabbits as well as duplicates thereof were removed from the dataset. The resulting alignment has been trimmed to generate fragments of loci that are commonly used in either typing or phylogenetic analyses of HEV: ORF1, complete ORF2, non-overlapping ORF2 (to exclude the influence of the ORF3 fraction in the ORF2 fragment), a 493 nt fragment of ORF2, a 148 nt fragment of ORF2, and ORF3 (Fig. 1). Subsequent analyses used trimmed alignments corresponding to the above fragments as well as a concatenated alignment of ORF 1, 2 and 3 (hereinafter referred to as ORF1,2,3 and used as proxy for the whole genome).
Fig. 1.
Schematic description of the HEV genome.
2.4. Subtype assessment
The whole genome sequences were compared to the reference type sequences of HEV described in (Smith et al., 2020) by calculating the pairwise genetic distances with a K80 substitution model, as implemented in the function dist.dna of the ape v5.7 R package (Paradis and Schliep, 2019). The reference sequence with the smallest distance to the genome was considered to be the corresponding HEV subtype. Similarly, a subtype was predicted using the 493 nt and ORF2 fragments.
2.5. Diagnostic accuracy
The concordance between the subtypes predicted by the various fragments was assessed with the Fowlkes-Mallows (FM) index (sqrt((TP/(TP+FP))x(TP/(TP+FN)))) (Fowlkes and Mallows, 1983); this index takes values in the interval [0,1], with 0 indicating complete absence of concordance and 1 indicating maximum concordance. The predictive value of the various genomic fragments for the viral subtype was assessed by the overall and balanced accuracy ((((TP/(TP+FN)+(TN/(TN+FP)))/2), as well as by the F1 score (2xTP/(2xTP+FP+FN)), where the actual subtype was considered to be the subtype as inferred from the whole genomes, and TP = the number of true positives, FP = the number of false positives, TN = the number of true negatives, FN = the number of false negatives.
2.6. Sequence diversity
Further exploration of the typing accuracy of the various fragments was assessed by calculating haplotype diversity using the function haplotype implemented in pegas v1.2 R package. This was subsequently used in R package iNEXT v3.0.0 to estimate expected diversities for interpolated and extrapolated sample sizes based on three Hill numbers (Hsieh et al., 2016): species richness (q = 0), the exponential of Shannon's entropy (q = 1; referring to Shannon diversity), and the inverse of Simpson's concentration (q = 2; referring to Simpson diversity). The exponential of Shannon's entropy and the inverse of Simpson's concentration capture important facets of taxonomic diversity weighted by abundances of respectively rare and dominant sequences (Chao et al., 2014). Additionally, nucleotide diversity and the number and fraction of segregating sites were inferred, using the same R package pegas (v1.2). Several of the HEV sequences were incomplete, with a variable number of nucleotides missing at one or both ends. In assessing the diversity metrics, the respective sequences were excluded.
2.7. Phylodynamics
To test the usability of genome fragments in assessing changes in viral population size we have selected a subset of the full genomes with known sampling dates and country of origin. This consisted of sequences originating from France, as these were the most heterochronous ones (2004–2017). Using this set, root-to-tip regression analyses were performed in TempEst v1.5.3 (Rambaut et al., 2016). Close inspection of the regression plots, and removal of the isolates with extremely high residuals resulted in a subset of 62 HEV sequences.
The phylogenies were inferred in BEAST 2 (v2.6.3) using an Extended Bayesian Skyline demographic model (Bouckaert et al., 2019; Drummond and Rambaut, 2007; Heled and Drummond, 2008), with a relaxed clock model and a substitution model best fitting the sequence data, as inferred for each of the sequenced fragments with ModelTest-NG v0.1.6 (Darriba et al., 2020). To explore the convergence of the posterior estimates of the various parameters sampled by the Markov Chain (effective sample size > 200) we have used Tracer (v.1.7.1) (Rambaut et al., 2018). The log files were combined in LogCombiner after removing a burn-in of 10 % and a maximum clade credibility tree was generated with TreeAnnotator.
To further infer trends of HEV population dynamics in the Netherlands, we have used the 493 nt fragment, and generated a subset thereof that would be similar in composition and size with the subset of sequences from France. The phylogeny and population size history were inferred as described above.
3. Results
3.1. Alignments
The final alignment of the whole genome sequences consisted of 1068 sequences and 6889 nucleotides (Supplementary material 1). The fragments trimmed hereof corresponded to positions: ORF1 (positions 1–4866; length 4866 nt), complete ORF2 (positions 4904–6889; length 1986 nt), non-overlapping ORF2 (positions 5235–6889; length 1652), a 493 nt fragment of ORF2 (positions 5743–6235), a 148 nt fragment of ORF2 (positions 6079–6226), and ORF3 (positions 4863–5234; length 372 nt).
From the HEVnet database we retrieved 1013 HEV sequences. Of these, only two covered the whole genome (−7:7225 and 21:7230 of reference strain MN614141), 163 sequences covered a fragment of ORF1 (with variable lengths from 242 to 371 nt, corresponding to positions 77:448 of the reference strain MN614141), and the majority of the sequences covered a fragment of ORF2 (with variable lengths from 148 to 1390 nt, corresponding to positions 5732:7121 of the reference strain MN614141). The alignment of these sequences has been trimmed to correspond to the 493 nt and 148 nt fragments of ORF2 (n = 653). These sequences came from 3 different sources (human n = 395, animal n = 183, food n = 75).
3.2. Subtype assessment
Of the 1068 sequences in the whole genome alignment, 47 were reference sequences. The full genomes belonged to 32 subtypes, of which five contained more than 50 sequences each; these were 3f (n = 258/1021, 25.27 %), 3c (n = 128/1021, 12.54 %), 4c (n = 124/1021, 12.14 %), 3b (n = 101/1021, 9.89 %), and 3a (n = 95/1021, 9.3 %) (Supplementary material 2).
The HEVnet dataset of HEV-3 comprised primarily subtype 3c (n = 582), followed by 3f (n = 40). The remaining subtypes were represented with less than 10 sequences (3e n = 9, 3a n = 6, 3 m n = 3, 3i n = 2), and 11 sequences were closer to the HEV-3 reference than to any of the subtypes.
3.3. Diagnostic accuracy
The FM concordance index of the subtypes predicted by the various fragments and the actual subtypes (full genome) decreased with the length of the fragment used, with a maximum of 1 for ORF1, followed by ORF2 (0.98), and the 493 nt fragment (0.91), with ORF3 having the lowest performance (0.73) (calculated based on Supplementary material 2).
The best accuracy metrics were observed for the ORF1 fragment, with an overall accuracy of 0.99, and balanced accuracies for the main subtypes of: 1 for 3f, 1 for 3c, 1 for 4c, 1 for 3b, and 1 for 3a (Supplementary material 3). This was followed by the ORF2 fragment with overall accuracy of 0.97, and balanced accuracies for the main subtypes of: 1 for 3f, 1 for 3c, 0.95 for 4c, 0.99 for 3b, and 1 for 3a. The overall accuracy of the subtype predicted by the 493 nt fragment was 0.86, with balanced accuracies for the main subtypes of: 0.96 for 3f, 1 for 3c, 0.58 for 4c, 0.98 for 3b, and 0.99 for 3a (Supplementary material 3). The F1 score was highest for 3c (1), followed by 3a (0.98), 3b (0.98), 3f (0.96), and 4c (0.28). Among the less prevalent subtypes but still fairly well represented in the dataset (> 20 sequences), better predictive scores were obtained for 3e, 1f, 4i, 4a, 1 g, and 3 h (all with balanced accuracies and F1 scores > 0.9; Supplementary material 3).
3.4. Sequence diversity
A variable number of sequences was used for assessment of diversity metrics, according to their completeness (Table 1). The highest haplotype diversity was observed for the concatenated ORF1, ORF2, and ORF3 (0.95), followed by ORF1 (0.92), and ORF2 (0.9; Table 1). The 493 nt fragment showed a low haplotype diversity (0.78), but still higher than that of ORF3 (0.73). The highest nucleotide diversity was observed for ORF1 (0.21), followed by ORF1,2,3 (0.19), while the lowest was observed for ORF3 (0.1). On the other hand, ORF3 presented the highest fraction of segregating sites (0.68), followed by ORF1, ORF1,2,3, and ORF2 (∼ 0.68); a relatively low fraction of segregating sites was also observed for the 493 nt fragment (0.6) (Table 1).
Table 1.
Sequence diversity metrics for the various fragments of the HEV genome.
| fragment | sequence_length | n_sequences | n_haplotypes | haplo_div | nuc_div | segsites_count | segsites_fraction |
|---|---|---|---|---|---|---|---|
| 148 | 148 | 1068 | 698 | 0.654 | 0.1554790 | 85 | 0.5743243 |
| 493 | 493 | 1068 | 836 | 0.783 | 0.1711094 | 298 | 0.6044625 |
| ORF1 | 4866 | 1010 | 933 | 0.924 | 0.2056120 | 3363 | 0.6911221 |
| ORF2 | 1986 | 998 | 899 | 0.901 | 0.1620672 | 1327 | 0.6681772 |
| ORF2nonoverlap | 1652 | 998 | 878 | 0.8797595 | 0.1756595 | 1104 | 0.6682809 |
| ORF3 | 375 | 1068 | 777 | 0.7275281 | 0.1007443 | 256 | 0.6826667 |
| ORF1,2,3 | 6893 | 954 | 902 | 0.9454927 | 0.1923839 | 4685 | 0.6796750 |
The 653 sequences of HEV-3 in HEVnet comprised 603 haplotypes, with a haplotype diversity of 0.92 and a nucleotide diversity of 0.08. The number of segregating sites was 275 (segregating sites fraction = 0.56).
The Hill's numbers diversity indices showed the lowest values for the 148 nt and ORF3 fragments, and the highest for ORF1,2,3, ORF1, and ORF2 fragments (Fig. 2). Extrapolation of the Shannon diversity to a double of the current sampling size indicated an increase in the Shannon diversity of 1.92 for the concatenated ORF1,2,3, and only 1.67 for the 493 nt fragment. Similarly, the increase of the Simpson diversity would be of 1.7 for the ORF1,2,3 vs. 1.35 for the 493 nt fragment.
Fig. 2.
Diversity of HEV sequences expressed as a function of Hills numbers. The three panels correspond to: species richness (q = 0), Shannon diversity (q = 1), and Simpson diversity (q = 2). The x-axis depicts the sample size, while the y-axis depicts the corresponding value of the diversity index. The various colours correspond to the fragments of the HEV genome used to infer the diversity indices. The dots indicate the values corresponding to the observed sample sizes; a continuous line indicates values corresponding to sample sizes smaller than the ones observed; a dashed line indicates values corresponding to sample sizes higher than the observed ones.
3.5. Phylodynamics
For the ORF1 and ORF2 fragments the selected model was General Time Reversible (GTR) (Rodríguez et al., 1990). For the 148 nt and 493 nt fragments, a TIM3 substitution model (AC=AT, CG=GT) was selected and for the ORF3 fragment a TPM2 model (AC=AT, CG=GT and AG=CT). The root-to-tip regression coefficients indicated generally a good temporal signal (Supplementary material 4).
The reconstructed phylogenies of the test dataset indicated a good concordance between the various genomic fragments and ORF1,2,3, with values of Baker's gamma of 0.9893 for the ORF2, 0.9925 for the 148 nt fragment, 0.9954 for the ORF3, 0.9964 for the 493 nt fragment, and 0.9994 for the ORF1. The major clades in the phylogenies followed the same diversification pattern for all fragments (Fig. 3, Supplementary material 5). The date of the root was however different, with the 148 nt and 493 nt fragments placing it at ∼150, ORF2 ∼223, and ORF1,2,3 ∼256 years in the past.
Fig. 3.
Comparison of time-scaled phylogenies inferred from the 493 nt fragment and the concatenated ORF1,2,3.
The population dynamics of HEV also showed consistent patterns of effective population size increase in recent years, with an HPD interval for the sum (indicators.alltrees) of [1,3]. The relative increase in the population size was highest when inferred from the whole genome, than from any of the other fragments. Similarly, the confidence interval was much narrower when the whole genome was used as compared to the other fragments (Fig. 4). The combination of low resolution and broad HPD for the shorter fragments means that the time point when a population size increase is being detected (i.e. the median of the population size surpasses the upper HPD) is estimated closer to present day; thus, this time point varies from ∼ 7.5 years in the past for the 148 nt fragment, to ∼ 12.5 years for the 493 nt fragment, to ∼ 27 years for the ORF2, and ∼ 53 years for the combined ORF1,2,3 fragment (Fig. 4).
Fig. 4.
Extended Bayesian skyline plots corresponding to the various fragments of the genome in the test dataset; the x-axis indicates the time from the last sampled sequences to the root of the tree while the y-axis indicates the effective population size of HEV.
An increase in the viral effective population size was also observed from the phylogeny of HEV sequences from The Netherlands (Fig. 5, panel A and Supplementary material 6), both for 3c (Fig. 5, panel B) and combined 3c and 3f sequences, although to a lesser extent from the latter one (Supplementary material 6). The increase in effective population size of subtype HEV-3c is estimated to having started ∼ 19 years earlier (i.e. ∼2003), and is concordant to the data from epidemiological surveillance of HEV (Fig. 5, panel C).
Fig. 5.
Extended Bayesian skyline plot corresponding to the 493 nt fragment of the genome in the NL HEV3c dataset (B) and the associated phylogeny (A). B: the x-axis indicates the time from the last sampled sequences to the root of the tree while the y-axis indicates the effective population size of HEV. C: number of cases of HEV acute infection in humans in the NL in the past decades, with smoothed temporal trend.
4. Discussion
4.1. Viral population processes can be approximated with a short 493 nt genomic fragment
While whole-genome sequencing has become an essential part for analyzing biological, ecological, and epidemiological processes for a wide range of organisms, the quality, abundance, and sample matrix of the nucleic acids, and potentially financial limitations do not always allow the retrieval of a whole genome. In this study, we aimed to demonstrate if the most commonly sequenced 493 nt fragment of HEV, will be suitable for performing accurate diagnostics or inferring any viral population-scale processes. For this purpose, we have performed comparative analyses of various fragment lengths (ORF1,2,3, ORF1, ORF2, ORF3, 493 nt, and 148 nt) taken from a set of HEV whole genome sequences. These different fragment lengths were used to infer the sequence diversity, typing accuracy, and potential use in phylodynamics.
For diagnostic purposes, a fast and reliable typing method is required, and that often has to make use of a short nucleic acid sequence. We observed that, generally, the longer the nucleic acid fragment used in typing, the better the accuracy in predicting the viral subtype, as inferred from the FM index, the balanced accuracy, and F1 index, with the exception of ORF3 that had a poorer performance than the 493 nt fragment for all metrics. However, some of the dominant HEV subtypes circulating in Europe (e.g. HEV-3c, HEV-3e) (Hogema et al., 2021; Nicot et al., 2018) were relatively well classified even by the 493 nt fragment, with false negative rates as low as 8 in 1000 typed sequences. While ORF1 and ORF2 were better predictors for most of the genotypes/subtypes, HEV-1 and subtype HEV-1f were better predicted using the shorter 493 nt fragment. A possible explanation for this unusual behavior is that the sites discriminating HEV-1 from HEV-1f are concentrated in the region of the 493 nt fragment, and the addition of extra sites to the prediction will only dilute the signal. The overall lower performance of the 493 nt fragment is due specifically to subtype HEV-4c, often misclassified as either subtype HEV-4f or HEV-4 h. The low balanced accuracy observed for both ORF2 and 493 nt in several subtypes (i.e. HEV-3 g and HEV-3i) could be explained by the differential distribution of the segregating sites across the HEV genome.
Thus, the highest haplotype diversity, nucleotide diversity, and segregating sites fraction were observed in the ORF1 fragment. This can be partly explained by the length of this ORF and ensuing higher number of sites that can segregate, as opposed to the shorter fragments that get more quickly saturated with mutations, and are more likely to diverge from the infinite sites hypotheses (i.e. have a higher chance of back or recurrent mutations; (Philippe et al., 2011)). The lower discriminatory power of the shorter fragments was also underlined by the faster flattening of the curves corresponding to the ecological indices of richness, Shannon and Simpson diversity when extrapolated to higher sample sizes than the ones our analysis was based on. Another explanation for the increased nucleotide diversity observed in ORF1 might lie in its biological function. It encodes nonstructural proteins which are required for genomic replication and is responsible for the attenuation of the virus (Pudupakam et al., 2009). On the other hand, the key function of ORF3 is viral particle assembly and release, operating as an ion channel (Ding et al., 2017); as this is a vital function in all HEV viruses it can imply a more conserved RNA sequence and thus lower nucleotide diversity.
Phylogenetics is useful in identifying the origin of a pathogen, but also in inferring evolutionary relationships between various lineages/strains. In this analysis, we have used a time-scaled Bayesian estimate of phylogeny with the Extended Bayesian Skyline Plot method (Heled and Drummond, 2008), as this allows the use of multiple genomic loci, each with its own model of evolution. The analysis of the various genomic fragments yielded comparable results, even between the short 493 nt fragment and the approximation of the whole genome (i.e. the concatenated loci of the three ORFs; Fig. 4). Most branching events are concordant between the two trees with only minor differences. The similar topology of the phylogenetic trees indicates that evolutionary past of the virus could be also inferred from smaller fragments, albeit with differences in temporality i.e. the divergence of subtypes 3c and 3f seems to have taken place about 75 years in the past from the most recent tips, while the same event is dated at about 125 years in the past in the combined phylogeny of the three ORFs. This is likely due to the difference in length between the fragments, as shorter fragments will be sooner saturated with mutations (Philippe et al., 2011). The clustering of the subtypes in the two phylogenies is largely concordant with all subtypes forming monophyletic clusters; however, the order of the divergence events differs, with the split of HEV-3 m predating the split of the (HEV-3c, HEV-3 h, HEV-3l) clade in the 493 nt phylogeny, and postdating it in the whole genome phylogeny.
When looking at the population dynamics of HEV we see consistent patterns of effective population size increase in recent years for all the fragments. This increase is consistent with previously reported increase in the number of cases in France (Adlhoch et al., 2016; Mansuy et al., 2016). However, the shorter fragments display broader 95 % HPD intervals and a less obvious increase compared to the whole genome. This is also to be expected, as it has been previously shown that accurate population size dynamics can only be recovered from multi-locus data (Heled and Drummond, 2008). Nevertheless, the good concordance between the population phylodynamics as inferred from the 148 nt, 493 nt, ORF2, and ORF1,2,3 fragments indicates that shorter fragments might be successfully used to infer recent demographic past of the viral population.
4.2. Characterisation of the HEV population in the Netherlands using the 493 nt sequences in HEVnet
The NL sequences in HEVnet come mostly from HEV-3, with an overrepresentation of subtype HEV-3c, indicating that this is the dominant subtype circulating in NL. This is also consistent with previous studies on HEV sequences in the Netherlands (Hogema et al., 2021). The various OneHealth sectors were unevenly represented, with more than half of the sequences coming from humans, and only ∼10 % coming from food produce or farmed pigs. This reflects on the one hand the focus on public health, and on the other hand the difficulty of retrieving viral sequences from food (Boxman et al., 2019) and fecal samples from farmed pigs. These difficulties are related to the scarcity and fragmentation of the genetic material in processed foods (Szabo et al., 2015), the lack of overt clinical symptoms of HEV infection in pigs, and the presence of inhibitory chemical compounds in the fecal samples retrieved from pig stalls (Ward et al., 2009).
Phylogenetic reconstruction based on the subset of HEV-3c and HEV-3f from the Netherlands indicates distinct clades of the two subtypes with a split occurring ∼100 years ago and the diversification of the two clades occurring in the recent past (<50 years ago). New viral variants will develop when the virus has a high prevalence in the reservoir host or when new hosts are present. In the beginning of the twentieth century, there was a clear change in pig holding with the introduction of new pig breeds in the Netherlands (Slachthuis and van der Berg, 2010). In the Netherlands intensive farming has increased from three million pigs in the 1960s to 15 million in the 1990s and the number of pigs per farm has increased fivefold (CBS, 1999). These events could have influenced the observed diversity of HEV.
The viral effective population size, as inferred from the NL HEV sequences, started increasing about 20 years ago and the increase stabilized about 10 years ago. This observation is in line with the increase in incidence of clinical cases that has been reported in many EU member states, including the Netherlands (Aspinall et al., 2017; Hogema et al., 2016). The broader 95 % HPD interval observed in the last few years could be due to an actual decrease in the HEV incidence, as observed from the national HEV surveillance (RIVM, NL) and the blood of donors (Sanquin, NL, personal communication). The steeper increase in the HEV-3c sequences compared to the combined HEV-3c and HEV-3f sequences indicates the increase is mainly caused by HEV-3c. A shift in HEV subtypes in the Netherlands in the past years has been described, particularly in the swine population, from HEV-3f towards HEV-3c, which might be consistent with an increase in the population size of HEV-3c (Hogema et al., 2021).
5. Conclusion
In this study we have investigated the suitability of various genome lengths of Hepatitis E virus to perform reliable diversity and evolutionary analyses. We have observed that, although, as expected, longer genomic fragments will perform better, some shorter fragments, such as the 493 nt fragment of ORF2 provide enough information for analyzing HEV strains currently circulating in Europe. The differences in HEV population structure among large geographical regions (i.e. continents), coupled with the observation that the accuracy of the 493 nt fragment might be highest for HEV-3, might lower the performance of this fragment in analyses of HEV from elsewhere.
Research funding
This research was funded by the European Union's Horizon 2020 Research and Innovation programme, grant agreement No 773830: One Health European Joint Programme, internal project TRACE.
CRediT authorship contribution statement
Renate W. Hakze-van der Honing: Data curation, Formal analysis, Writing – original draft, Writing – review & editing. Eelco Franz: Funding acquisition, Writing – review & editing. Wim H.M. van der Poel: Funding acquisition, Writing – review & editing. Claudia E. Coipan: Conceptualization, Data curation, Formal analysis, Supervision, Visualization, Writing – original draft, Writing – review & editing.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
The Duthch HEV net sequences were used with permission given by; - Ingeborg Boxman, Wageningen Food Safety Research, WUR, the Netherlands and were retrieved from samples collected by Netherlands Food and Consumer Product Safety Authority, the Netherlands. - Hans Zaaijer, Sanquin (Dutch blood suply), the Netherlands. - Harry Vennema, National Institute for Public Health and the Environment, the Netherlands. We will thank them for the use of the sequences.
Footnotes
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.virusres.2024.199429.
Appendix. Supplementary materials
Data availability
Data will be made available on request.
References
- Adlhoch C., Avellon A., Baylis S.A., Ciccaglione A.R., Couturier E., de Sousa R., Epštein J., Ethelberg S., Faber M., Fehér Á., et al. Hepatitis E virus: assessment of the epidemiological situation in humans in Europe, 2014/15. J. Clin. Virol. 2016;82:9–16. doi: 10.1016/j.jcv.2016.06.010. [DOI] [PubMed] [Google Scholar]
- Aspinall E.J., Couturier E., Faber M., Said B., Ijaz S., Tavoschi L., Takkinen J., Adlhoch C., et al. Hepatitis E virus infection in Europe: surveillance and descriptive epidemiology of confirmed cases, 2005 to 2015. Eurosurveillance. 2017;22:30561. doi: 10.2807/1560-7917.ES.2017.22.26.30561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bouckaert R., Vaughan T.G., Barido-Sottani J., Duchêne S., Fourment M., Gavryushkina A., Heled J., Jones G., Kühnert D., De Maio N., et al. BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 2019;15 doi: 10.1371/journal.pcbi.1006650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boxman I.L., Jansen C.C., Hägele G., Zwartkruis-Nahuis A., Cremer J., Vennema H., Tijsma A.S. Porcine blood used as ingredient in meat productions may serve as a vehicle for hepatitis E virus transmission. Int. J. Food Microbiol. 2017;257:225–231. doi: 10.1016/j.ijfoodmicro.2017.06.029. [DOI] [PubMed] [Google Scholar]
- Boxman I.L., Jansen C.C., Hägele G., Zwartkruis-Nahuis A., Tijsma A.S., Vennema H. Monitoring of pork liver and meat products on the Dutch market for the presence of HEV RNA. Int. J. Food Microbiol. 2019;296:58–64. doi: 10.1016/j.ijfoodmicro.2019.02.018. [DOI] [PubMed] [Google Scholar]
- Central Bureau of Statistics (CBS), 1999 www.cbs.nl/nl-nl/nieuws/1999/23/opkomst-van-de-intensieve-veehouderij.
- Chao A., Gotelli N.J., Hsieh T., Sander E.L., Ma K., Colwell R.K., Ellison A.M. Rarefaction and extrapolation with hill numbers: a framework for sampling and estimation in species diversity studies. Ecol. Monogr. 2014;84:45–67. [Google Scholar]
- Choi I.-S., Kwon H.-J., Shin N.-R., Yoo H.S. Identification of swine hepatitis e virus (HEV) and prevalence of anti-HEV antibodies in swine and human populations in Korea. J. Clin. Microbiol. 2003;41:3602–3608. doi: 10.1128/JCM.41.8.3602-3608.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Darriba D., Posada D., Kozlov A.M., Stamatakis A., Morel B., Flouri T. Model Test-NG: a new and scalable tool for the selection of DNA and protein evolutionary models. Mol. Biol. Evol. 2020;37:291–294. doi: 10.1093/molbev/msz189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davis C.A., Haywood B., Vattipally S., Filipe A.D.S., AlSaeed M., Smollet K., Baylis S.A., Ijaz S., Tedder R.S., Thomson E.C., et al. Hepatitis E virus: whole genome sequencing as a new tool for understanding HEV epidemiology and phenotypes. J. Clin. Virol. 2021;139 doi: 10.1016/j.jcv.2021.104738. [DOI] [PubMed] [Google Scholar]
- Ding Q., Heller B., Capuccino J.M., Song B., Nimgaonkar I., Hrebikova G., Contreras J.E., Ploss A. Hepatitis E virus ORF3 is a functional ion channel required for release of infectious particles. Proc. Natl. Acad. Sci. 2017;114:1147–1152. doi: 10.1073/pnas.1614955114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Drummond A.J., Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 2007;7:1–8. doi: 10.1186/1471-2148-7-214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edgar R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl. Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faber M., Askar M., Stark K. Case-control study on risk factors for acute hepatitis E in Germany, 2012 to 2014. Eurosurveillance. 2018;23:17–00469. doi: 10.2807/1560-7917.ES.2018.23.19.17-00469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fowlkes E.B., Mallows C.L. A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 1983;78:553–569. doi: 10.2307/2288117. [DOI] [Google Scholar]
- Heled J., Drummond A.J. Bayesian inference of population size history from multiple loci. BMC Evol. Biol. 2008;8:1–15. doi: 10.1186/1471-2148-8-289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hogema B.M., Hakze-Van der Honing R.W., Molier M., Zaaijer H.L., Poel W.H.van der. Comparison of hepatitis E virus sequences from humans and swine, The Netherlands, 1998–2015. Viruses. 2021;13:1265. doi: 10.3390/v13071265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hogema B.M., Molier M., Sjerps M., Waal M.de, Swieten P.van, Laar T.van de, Molenaar-de Backer M., Zaaijer H.L. Incidence and duration of hepatitis E virus infection in Dutch blood donors. Transfusion. 2016;56:722–728. doi: 10.1111/trf.13402. [DOI] [PubMed] [Google Scholar]
- Hsieh T., Ma K., Chao A. iNEXT: an R package for rarefaction and extrapolation of species diversity (Hill numbers) Methods Ecol. Evol. 2016;7:1451–1456. [Google Scholar]
- Huang Y., Opriessnig T., Halbur P., Meng X. Initiation at the third in-frame AUG codon of open reading frame 3 of the hepatitis E virus is essential for viral infectivity in vivo. J. Virol. 2007;81:3018–3026. doi: 10.1128/JVI.02259-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kamar N., Bendall R., Legrand-Abravanel F., Xia N.-S., Ijaz S., Izopet J., Dalton H.R. Hepatitis E. The Lancet. 2012;379:2477–2488. doi: 10.1016/S0140-6736(11)61849-7. [DOI] [PubMed] [Google Scholar]
- Lee G.-H., Tan B.-H., Teo E.C.-Y., Lim S.-G., Dan Y.-Y., Wee A., Aw P.P.K., Zhu Y., Hibberd M.L., Tan C.-K., et al. Chronic infection with camelid hepatitis E virus in a liver transplant recipient who regularly consumes camel meat and milk. Gastroenterology. 2016;150:355–357. doi: 10.1053/j.gastro.2015.10.048. [DOI] [PubMed] [Google Scholar]
- Mansuy J.M., Gallian P., Dimeglio C., Saune K., Arnaud C., Pelletier B., Morel P., Legrand D., Tiberghien P., Izopet J. A nationwide survey of hepatitis E viral infection in French blood donors. Hepatology. 2016;63:1145–1154. doi: 10.1002/hep.28436. [DOI] [PubMed] [Google Scholar]
- Matsuda H., Okada K., Takahashi K., Mishiro S. Severe hepatitis e virus infection after ingestion of uncooked liver from a wild boar. J. Infect. Dis. 2003;188 doi: 10.1086/378074. 944–944. [DOI] [PubMed] [Google Scholar]
- Meester M., Bouwknegt M., Hakze-van der Honing R., Vernooij H., Houben M., Oort S.van, Poel W.H.van der, Stegeman A., Tobias T. Repeated cross-sectional sampling of pigs at slaughter indicates varying age of hepatitis E virus infection within and between pig farms. Vet. Res. 2022;53:50. doi: 10.1186/s13567-022-01068-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicot F., Jeanne N., Roulet A., Lefebvre C., Carcenac R., Manno M., Dubois M., Kamar N., Lhomme S., Abravanel F., et al. Diversity of hepatitis E virus genotype 3. Rev. Med. Virol. 2018;28:e1987. doi: 10.1002/rmv.1987. [DOI] [PubMed] [Google Scholar]
- Paradis E., Schliep K. Ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35:526–528. doi: 10.1093/bioinformatics/bty633. [DOI] [PubMed] [Google Scholar]
- Philippe H., Brinkmann H., Lavrov D.V., Littlewood D.T.J., Manuel M., Wörheide G., Baurain D. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 2011;9 doi: 10.1371/journal.pbio.1000602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pudupakam R., Huang Y., Opriessnig T., Halbur P., Pierson F., Meng X. Deletions of the hypervariable region (HVR) in open reading frame 1 of hepatitis E virus do not abolish virus infectivity: evidence for attenuation of HVR deletion mutants in vivo. J. Virol. 2009;83:384–395. doi: 10.1128/JVI.01854-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Purdy M.A., Drexler J.F., Meng X.J., Norder H., Okamoto H., Van der Poel W.H., Reuter G., de Souza W.M., Ulrich R.G., Smith D.B. ICTV virus taxonomy profile: hepeviridae 2022. J. General Virol. 2022;103(9) doi: 10.1099/jgv.0.001778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rambaut A., Drummond A.J., Xie D., Baele G., Suchard M.A. Posterior summarization in bayesian phylogenetics using Tracer 1.7. Syst. Biol. 2018;67:901–904. doi: 10.1093/sysbio/syy032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rambaut A., Lam T.T., Max Carvalho L., Pybus O.G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly path-o-gen) Virus. Evol. 2016;2:vew007. doi: 10.1093/ve/vew007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rodríguez F., Oliver J.L., Marín A., Medina J.R. The general stochastic model of nucleotide substitution. J. Theor. Biol. 1990;142:485–501. doi: 10.1016/S0022-5193(05)80104-3. [DOI] [PubMed] [Google Scholar]
- Rutjes S., Bouwknegt M., Van Der Giessen J., Husman A.D.R., Reusken C. Seroprevalence of hepatitis e virus in pigs from different farming systems in the Netherlands. J. Food Prot. 2014;77:640–642. doi: 10.4315/0362-028X.JFP-13-302. [DOI] [PubMed] [Google Scholar]
- Rutjes S.A., Lodder W.J., Bouwknegt M., Husman Roda, de A.M. Increased hepatitis E virus prevalence on Dutch pig farms from 33 to 55 % by using appropriate internal quality controls for RT-PCR. J. Virol. Methods. 2007;143:112–116. doi: 10.1016/j.jviromet.2007.01.030. [DOI] [PubMed] [Google Scholar]
- Rutjes S.A., Lodder W.J., Lodder-Verschoor F., Van den Berg H.H., Vennema H., Duizer E., Koopmans M., Roda Husman A.M.de. Sources of hepatitis E virus genotype 3 in the Netherlands. Emerg. Infect. Dis. 2009;15:381. doi: 10.3201/eid1503.071472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Slachthuis H, van der Berg R. (2010) van everzwijn tot vleesvarken. https://edepot.wur.nl/244587.
- Smith D.B., Izopet J., Nicot F., Simmonds P., Jameel S., Meng X.-J., Norder H., Okamoto H., Der Poel W.H.van, Reuter G., et al. Update: proposed reference sequences for subtypes of hepatitis E virus (species orthohepevirus a) J. Gen. Virol. 2020;101:692. doi: 10.1099/jgv.0.001435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith D.B., Simmonds P., Jameel S., Emerson S.U., Harrison T.J., Meng X.-J., Okamoto H., Van der Poel W.H., Purdy M.A., Taxonomy of Viruses Hepeviridae Study Group, I.C. on the, others Consensus proposals for classification of the family Hepeviridae. J. Gen. Virol. 2014;95:2223. doi: 10.1099/vir.0.068429-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith D.B., Vanek J., Ramalingam S., Johannessen I., Templeton K., Simmonds P. Evolution of the hepatitis E virus hypervariable region. J. Gen. Virol. 2012;93:2408. doi: 10.1099/vir.0.045351-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szabo K., Trojnar E., Anheyer-Behmenburg H., Binder A., Schotte U., Ellerbroek L., Klein G., Johne R. Detection of hepatitis E virus RNA in raw sausages and liver sausages from retail in Germany using an optimized method. Int. J. Food Microbiol. 2015;215:149–156. doi: 10.1016/j.ijfoodmicro.2015.09.013. [DOI] [PubMed] [Google Scholar]
- Tei S., Kitajima N., Takahashi K., Mishiro S. Zoonotic transmission of hepatitis E virus from deer to human beings. Lancet. 2003;362:371–373. doi: 10.1016/S0140-6736(03)14025-1. [DOI] [PubMed] [Google Scholar]
- Tulen A.D., Vennema H., Pelt W.van, Franz E., Hofhuis A. A case-control study into risk factors for acute hepatitis E in the Netherlands, 2015–2017. J. Infect. 2019;78:373–381. doi: 10.1016/j.jinf.2019.02.001. [DOI] [PubMed] [Google Scholar]
- Van der Poel W.H. Food and environmental routes of hepatitis E virus transmission. Curr. Opin. Virol. 2014;4:91–96. doi: 10.1016/j.coviro.2014.01.006. [DOI] [PubMed] [Google Scholar]
- Ward P., Poitras E., Leblanc D., Letellier A., Brassard J., Plante D., Houde A. Comparative analysis of different TaqMan real-time RT-PCR assays for the detection of swine hepatitis e virus and integration of feline calicivirus as internal control. J. Appl. Microbiol. 2009;106:1360–1369. doi: 10.1111/j.1365-2672.2008.04104.x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data will be made available on request.





