Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Apr 1.
Published in final edited form as: J Virol Methods. 2022 Feb 14;302:114493. doi: 10.1016/j.jviromet.2022.114493

Within-host Quantitation of Anellovirus Genome Complexity from Clinical Samples

Peng Peng a,d, Yanjuan Xu a, Rajeev Aurora c, Adrian M Di Bisceglie a,b, Xiaofeng Fan a,b,*
PMCID: PMC8900665  NIHMSID: NIHMS1781202  PMID: 35176352

Abstract

Anellovirus (AV) is a ubiquitous and diverse virus in the human population. An individual can be infected with multiple AV genera and species that form a heterogeneous repertoire, called the anellome. Due to its exceptional genetic diversity, efficient evaluation of anellome complexity remains a methodological challenge. In the current study, AV genome was first enriched from patient serum samples through two-phase rolling circle amplification. Following Illumina sequencing, anellome was analyzed with an advanced bioinformatics pipeline, including read extraction at three similarity levels, de novo assembly, species assignment, and determination of relative abundance among AV variants. The method was validated in the mock sample and then applied to 21 hepatitis C virus (HCV) patients with and without hepatocellular carcinoma (HCC). Overall, there was a large variance regarding AV richness, ranging from 2 to 51 AV species. In contrast to HCV patients without HCC, HCC incidence was associated with reduced richness (12.6±14.4 vs. 35.4±13.6, p=0.001) and Shannon entropy (0.4±0.34 vs. 0.61±0.12, p=0.095) at the AV species level. Interestingly, AV genus beta and gamma expanded in the anellome in 7 of 10 HCC patients. These observations shed light on the potential association between anellome and HCC incidence in patients with chronic HCV infection. The method presented here represents a valuable tool to investigate the role of anellome in human health and disease.

Keywords: Anellovirus, virome, next-generation sequencing, rolling circle amplification, hepatitis C virus, hepatocellular carcinoma

1. Introduction

Anellovirus (AV), also known as torque teno virus (TTV), is a non-enveloped circular single-strand DNA virus (Simmonds and Sharp, 2016). Following the discovery of AV in 1997 (Nishizawa et al., 1997), its potential link with human diseases has been extensively examined (Spandole et al., 2015). As of now, there is no solid evidence supporting its etiological role in human diseases. Using next-generation sequencing (NGS) or PCR with the primers from its conserved domain, AV infection appears to be ubiquitous in the human population (Shulman and Davidson, 2017). AV has also been detected in a wide array of mammals (Li et al., 2015; Manzin et al., 2015, Nishiyama et al., 2014; Zhang et al., 2017). Genome analysis reveals an exceptional diversity of the AVs, which together belong to the newly created viral family Anelloviridae (Simmonds and Sharp, 2016). Based on nucleic acid differences of the AV open reading frame 1 (ORF1), human AVs can be divided into three genera (≥56% sequence difference), alpha-, beta-, and gammatorquevirus. Each genus consists of multiple species (≥36% sequence difference) with currently 29 in alpha, 16 in beta-, and 15 species in gammatorquevirus in the National Center for Biotechnology Information (NCBI) reference database (Maggi et al., 2009; Simmonds and Sharp, 2016). The number of species in each genus is likely underestimated as new species have been continuously identified (Ng et al., 2017; Tisza et al., 2020; Arze et al., 2021; Cebriá-Mendoza et al., 2021;). An individual can be infected with multiple AV species or genera that constitute a repertoire (Li et al., 2019). By analogy with gut microbiome, this repertoire has recently been named anellome (Kaczorowska and van der Hoek, 2020). If AV does have a role in human health and disease, the anellome, rather than individual viral isolates, may show signs of dysbiosis associated with a given disease. Unlike bacteria, viruses lack conservative and species-distinct domains, such as 16S RNA, that can be used to categorize the heterogeneity within an ecological niche (Wang, 2020). While NGS is a solution, it is associated with a low sensitivity as clinical specimens are dominated by host nucleic acids. Extreme AV genomic diversity further adds to the bioinformatics challenge for the precise measurement of anellome heterogeneity. Here, we present a method for the quantitative estimation of within-host AV complexity. The method was applied to profile serum anellome in hepatitis C virus (HCV) patients with and without the incidence of hepatocellular carcinoma (HCC).

2. Materials and methods

2.1. Patient samples

A total of twenty-nine serum samples were included in the current study. Of the samples, eight came from blood donors who were investigated for the human virome in our previous study (Li et al., 2019). In the occurrence of either positive (n=6) or negative (n=2) AV detection, these samples were used to optimize the experimental protocol in the current study. The other twenty-one samples were from chronic hepatitis C virus (HCV) patients with (n=10) and without HCC (n=11) based on histological examination (Supplementary Table 1). These samples were archived in the Saint Louis University Liver Center Sample Repository. At the time of collection, no antiviral therapy was given to these patients. HCV RNA titers were quantitated by Roche Amplicor HCV Monitor (version 2.0). All patients were infected with HCV genotype 1a as determined by line probe assay (Innogenetics, Belgium). The optimized method was used to profile the anellome complexity from these patients. Prior to sample collection, a written informed consent was obtained from each subject (both donors and patients). The entire research protocol for the use of these samples was reviewed and approved by the Saint Louis University Institutional Review Board (IRB protocol: SLU10592).

2.2. Anellome sequencing

The serum virome sequencing protocol is detailed in our previous studies (Li et al., 2019; Ren et al., 2020). However, there are two important modifications for the current study. First, since AV is a DNA virus, total DNA rather than RNA was extracted using the QIAamp Circulating Nucleic Acid Kit. An aliquot of 250 μL of serum mixed with 750 μL of phosphate-buffered saline (PBS) (pH 7.0) was used as the starting material, and it was finally eluted with 60 μL of AVE buffer. Second, whole genome amplification was previously conducted using template-dependent multiple displacement amplification (tdMDA) (Wang et al., 2017). To enrich AV genomes, the current study applied two-phase rolling circle amplification (RCA) (2pRCA) with the sequential use of AV-specific primers and random pentamer primers from our tdMDA protocol. From all 6-nt strings located in the AV non-coding regions (NCR), two AV-specific primers were carefully selected based on the following criteria. First, both primers had 40-60% GC content. Second, they were conserved among human AV NCR sequences (Fig. 1). Finally, as determined using the “digester” function in the program HiCUP (Wingett et al., 2015), they had a low number of perfect replicates in the human reference genome (GRCh Build 38) (Schneider et al., 2017). In brief, an aliquot of 5 μL of extracted DNA was used in a 40-μL reaction consisting of 1x phi29 DNA polymerase buffer, 1 mM of dNTPs, 0.4 μM each of primers AV1 and AV2 (Table 1), and 20 units of phi29 DNA polymerase (New England Biolabs, Ipswich, MA). After 12 hours of incubation at 30°C, random pentamer primer C28 was added to the final 50-μL reaction with a concentration of 40 μM (Table 1). The reaction was incubated for an additional 6 hours at 28°C, and then terminated by heating at 65°C for 15 minutes. After purification with the QIAamp DNA mini kit (Qiagen), 2pRCA product was subjected to library construction with the Nextera XT DNA Sample Preparation kit (Illumina, San Diego, CA) and sequenced on the Illumina MiSeq (1 × 250-nt single reads and mid-output) at MOgene (St. Louis, MO).

Fig. 1. The alignment of the partial NCR sequences from 78 AV species in the NCBI RefSeq.

Fig. 1.

The two 6-nt primers used in 2pRCA were indicated.

Table 1. The list of primers used in the current study.

Star donates phosphorothioate bonds in primers for 2pRCA. Restriction site, either PacI or ASisI, was included at the 5’ends of primers b43F2, b43R2, b89F2, and b89R2. C28 had its 5’ ends blocked by C18 spacer. I indicates inosine in primer AMTS. Probe AMTPTU was labeled with 6-carboxyfluorescein and 6-carboxytetramethylrhodamine at the 5’ and 3’ ends, respectively, and contained a propynilic chemical group bound to each thymidine. Primer positions were numbered according to different AV strains, TTV12 (NC_014075) for b43 series, TTV16 (NC_014091) for b89 series, TTV1 (NC_002076) for AV1 and AV2, and TTV1a (AB017610) for TaqMan PCR primers. All primers were synthesized in the Integrated DNA Technologies (Newark, NJ). NA, not applicable.

Primers Polarity Sequence (5’→3’) Position Application
b43F1 sense tgtccacaggccaccaac 17-34 Nested PCR TTV12 (1,674 bp)
b43F2 sense ggatctgacgttaattaaaccgaagtctacgtcgtgc 33-51
b43R1 antisense atgtgctttgttgcctttcc 1816-1835
b43R2 antisense ggatctgacggcgatcgctccattgtttctggagtttgc 1686-1706
b89F1 sense cttatggcgaagtctggtac 39-58 Nested PCR TTV16 (1,595 bp)
b89F2 sense ggatctgacgttaattaacctgggccaggtctacgtc 73-91
b89R1 antisense tttgttgtggtgggttgtgg 1814-1833
b89R2 antisense ggatctgacggcgatcgcgtgggtaagaagtggatggg 1648-1667
AV1 antisense catt*c*g 104-109 2pRCA
AV2 antisense ccga*a*t 215-220
C28 NA /5Sp18/NNN*N*N NA
AMTS sense gtgccgiaggtgagttta 177-194 TaqMan PCR
AMTAS antisense agcccggccagtcc 226-239
AMTPTU sense tcaaggggcaattcgggct 205-223

2.3. Determination of relative abundance of AV species

2.3.1. Strategy

To deal with the exceptional diversity of AV, we extracted AV-related NGS reads from raw NGS data at three similarity levels, mapping (high level), blastx (moderate level), and Profile Hidden Markov Model (HMM) (low level) (Fig. 2). The gapped mapper Bowtie 2 was used for mapping with an index of 1,295 human AV sequences (each ≥ 500 bp) retrieved from the NCBI GenBank (Langmead and Salzberg, 2012). After the removal of human reads, the remaining reads were blastxed by indexing all translated sequences from 1,295 human AV genomes. In the step of HMM analysis, reads were scanned in HMMER3 under the default setting against ten AV-relevant HMMs, which were collected from vFam (#112, #116, #354, #874, #990, #1650, #6091, and #4814) and Pfam [PF02956 (TT_ORF1) and PF02957(TT_ORF2) (Skewes-Cox et al., 2014; Mistry et al., 2021). All extracted reads were de novo assembled using SPAdes (version 3.15.2) (Bankevich et al., 2012). Owing to extreme AV genome diversity, resulting contigs (≥100 nt) were collapsed at 90% similarity in CD-HIT to minimize background noise. Thus, the analysis had a resolution of 10% nucleotide difference. These contigs were then assigned for their species attributes by blastx against 59 full-length human AV ORF1 retrieved from the NCBI viral reference sequences (Fu et al., 2012). For each patient, all AV ORF1 sequences were extracted based on blastx-defined positions in the contigs. These ORF1 sequences and their supporting reads served as the input of the Genome Abundance Similarity Correction (GASiC) that estimated relative genome abundances via read alignment (Lindner and Renard, 2013). Reference genome similarities were considered in GASiC through the application of a nonnegative least absolute shrinkage and selection operator (LASSO) approach. Four major scripts in GASiC (run_mappers, create_matrix, correct_abundance, and core tool) were revised to be compatible with python3.7. Bowtie 2 (version 2.4.2) was used as the mapper and the mason (version 0.1.2) served as a read simulator (Holtgrewe, 2010). After determining the relative abundance of AV ORF1 sequences, normalized Shannon entropy was computed using R package QSutils at AV species and genus levels (Guerrero-Murillo and Font, 2020). Similarly, anellome richness, the total number of AV variants within a patient, was also counted at the two levels.

Fig. 2. A bioinformatics pipeline to profile AV repertoire.

Fig. 2.

Major steps were indicated with the programs as detailed in our previous studies (Li et al, 2019; Ren et al., 2020).

2.3.2. Estimation of the optimized expect value in blastx analysis

Defined as the number of expected hits of similar quality that could be found just by chance, the Expect (E) value controls the output of blast analysis (Camacho et al., 2009). Due to the exceptional AV genome diversity, a strict E setting would miss AV-related reads, and relaxed E values could recruit non-AV reads that not only increase the computation load but also affect the accuracy of de novo AV genome assembly. An optimized E value became a crucial factor in our analysis pipeline. Using our previous serum virome data pooled from 30 blood donors (b1 through b30) (Li et al., 2019), AV reads were extracted with the index from 1,295 AV genomes (each ≥500 nt). These reads were then mapped on 108 AV genomes from the NCBI viral reference sequences. Under a series of E values, unmapped reads were blastxed against a custom database that contained 367 protein sequences encoded by 108 AV reference genomes (n=319) and 18 HCV and hepatitis B virus (HBV) genomes (n=48) (HCV: NC_038882, NC_004102, NC_030791, NC_009824, NC_009827, NC_009823, NC_009826, NC_009825. HBV: X70185, AF355781, HW610289, HW610288, HW610287, HW610286, HW610285, HW610284, HW610283, HW610282). An optimal E value would maximize the inclusion of AV reads with the least number of reads hit by HBV and HCV.

2.3.3. Validation of experimental and analysis approach

Based on our previous study (Li et al., 2019), two AV genomes (TTV12 and TTV16) that cover the entire NCR were amplified using nested PCR from the blood donors. An aliquot of 5 μL of extracted total DNA was mixed with PCR reagents in a 50-μL reaction containing 1x Q5 polymerase buffer, 1x Q5 high GC enhancer, 0.8 mM dNTPs, primers b43F1 and b43R1 (Table 1), both at 0.4 μM, and 1 U of Q5 DNA polymerase (New England Biolabs). Cycle parameters were programmed at 94 °C for 2 minutes connected by the first 5 cycles of 94 °C for 1 min, 60 °C for 1 minute and 72 °C for 1.6 minutes, linked by 25 cycles in which the annealing temperature was dropped to 55 °C (touch-down protocol), followed by a final 7-minute incubation at 72 °C. A 2-μL aliquot of the first-round PCR product was used for the second round of PCR with primers b43F2 and b43R2 (Table 1) under the same cycle setting. Similarly, TTV16 was amplified with the primers b89 series (Table 1). PCR product at the expected size was gel-purified, PacI/AsiSI digested, and cloned into pClone vector as described previously (Fan and Di Bisceglie, 2010). Recombinant clones were confirmed by colony PCR and propagated in lysogeny broth (LB) medium. Plasmids were purified with the QIAprep spin miniprep kit (Qiagen) and further fully validated by Sanger sequencing in Genewiz (South Plainfield, New Jersey). Both recombinant TTV12 and TTV16 plasmids, mixed at equal molar amounts, were spiked into an AV-negative donor serum with a final concentration at 1x106/mL. This artificial serum sample was then subjected to 2pRCA, Illumina sequencing, and data analysis as described above.

2.4. Real-time quantitation of serum AV copy numbers.

Dependent on the amplification of a 63-nt conservative region in AV’s NCR, Maggi et al. developed a TaqMan PCR assay to quantitate the viral load of all three AV genera (Pistello et al., 2001; Maggi et al., 2001). The assay was validated to be comparable with the commercial TTV RGENE kit and digital droplet PCR (ddPCR) (Macera et al., 2019). This assay was adopted in the current study. In doing so, 5 μL of the extracted total serum DNA was used in a 25-μL reaction containing 1x TaqManTM Fast Universal PCR Master Mix, no AmpErase™ UNG (Thermo Fisher Scientific, Waltham, MA), each 0.4 μM of primers AMTS and AMTAS (Table 1), and 20 nM of probe AMTPTU (Table 1). The reaction was carried on the ABI 7500 instrument (Applied Biosystems) with an initial heating at 95°C for 10 minutes, followed by 45 cycles of 95 °C for 15 seconds and 58 °C for 60 seconds. An AV-negative serum sample from our previous study was included as a negative control (Li et al., 2019). As described above, mock serum samples were created by spiking a 10-fold serially diluted recombinant TTV12 plasmid with the concentration ranging from zero to 108 copies/mL. These samples were used to determine the sensitivity of the assay and included as a standard in each assay. Each sample was conducted with three technical replicates to provide a mean value as final AV copy numbers.

2.5. Statistical analysis

Among-group comparison (HCV vs. HCC) was done using the two-tailed Student’s t-test. Data were expressed as the mean ± SD (standard deviation), and p<0.05 was considered statistically significant.

2.6. Data availability

Raw sequence data after the quality control in fastq format were deposited in the NCBI Sequence Read Archive (SRA) under BioProject ID: PRJNA749275

3. Results

3.1. Enhanced AV detection by 2pRCA

Six AV-positive donor serum samples were sequenced using both tdMDA and 2pRCA methods. AV-related reads were extracted at three different similarity levels (Fig. 2). Considering the potential sequence similarity between human and AV genomes, mapping was directly performed on the raw NGS data after read quality control (Li et al., 2019; Ren et al., 2020). The remaining reads were then subtracted with human sequences prior to blastx and hammer analysis. In the evaluation of an optimized E value for blastx, E setting at 0.01 retrieved 94.3% AV reads while 0.2% AV reads were assigned to HBV/HCV genomes. After this point, a significant portion of the AV reads were classified to HBV/HCV genomes (Fig. 3). Therefore, for a small AV protein database, 0.01 is an appropriate E value in blastx analysis to minimize the hits on non-AV reads without sacrificing authentic AV read assignments. Using the tdMDA method, the percentages of total AV reads in six samples were 0.55%, 1.81%, 7.70%, 6.61%, 20.77%, and 27.41%, which respectively increased to 3.51%, 6.49%, 11.37%, 9.84%, 27.00%, and 36.47% under the 2pRCA protocol. The average enrichment of AV reads was 2.6 ± 2.1x with a clear increase in the samples that had low-abundance AV sequences. Samples 1 and 2 had 6.42- and 3.59-fold changes, respectively (Fig. 4). On the human genome (building GRCh38), AV1 and AV2 had only 121,857 and 94,806 matches, respectively, thereby converting an enriched amplification of AV sequences in 2pRCA.

Fig. 3. Simulation of E value in blastx analysis.

Fig. 3.

Fig. 4. Comparison of AV read recovery between two amplification protocols.

Fig. 4.

Total DNA was amplified by both tdMDA and 2pRCA methods. AV reads were extracted at three similarity levels and compared between two methods. Fold change of the enrichment is indicated in the right axis (grey line).

3.2. Determination of relative abundance of AV sequences

To estimate the efficiency of both the experimental and analysis procedures, a mock serum sample, spiked with recombinant TTV12 and TTV16 plasmids, was amplified through 2pRCA and sequenced in three technical replicates. Of the average 237,187 AV reads, mapping, blastx, and HMM contributed 158,203 (67.7%), 75,426 (31.8%), and 3,558 (1.5%) reads, respectively. Blastx- or HMM-extracted AV reads were essential for de novo contig assembly. Only the inclusion of all AV reads from mapping, blastx, and HMM assembled the contigs that showed 99.9% similarity to the AV sequences used for PCR and plasmid construction (Supplementary material). In the estimation of the relative abundance of TTV12/TTV16 in GASiC, the ratio was quickly saturated at the expected value (1:1) along with the increase of AV genome coverage. At the coverage of 6.6x, the ratio of TTV12/TTV16 was 0.94 and then stabilized above this level (Fig. 5), illustrating the feasibility of our experimental strategy to decipher AV complexity from patient serum samples.

Fig. 5. Estimation of relative abundance of TTV12 and TTV16 in a mock serum sample.

Fig. 5.

AV genome coverage was calculated by using total bases from randomly sampled AV reads over the entire TTV12 (1,674 nt) and TTV16 (1,595 nt) amplicon length. At a given coverage, the average ratio of TTV12/TTV16 was indicated with standard deviation (error bar) from as three technical replicates. C/R, coverage/ratio.

3.3. Reduced AV complexity in HCV patients with the incidence of HCC

Under an empirical cutoff of 20 reads, AV was not detected in only 1 of 21 HCV patients. This patient was not included in subsequent analysis. Based on the AV ORF1, within-host AV genome complexity was summarized at species and genus levels, which corresponded to 36% and 56% sequence differences, respectively. There was a wide variance with regard to the number of within-host AV variants, ranging from 2 to 51 AV species. Compared to HCV patients without HCC, the incidence of HCC was associated with decreased number of AV variants (richness) at the species level (12.6 ± 14.4 vs. 35.4 ± 13.6, p=0.001) (Fig. 6A). Such a difference was also observed when considering their relative abundance. Thus, a reduced normalized Shannon entropy was found in HCV patients with the incidence of HCC at the species level (0.4 ± 0.34 vs. 0.61 ± 0.12, p=0.095) (Fig. 6A). Of the 20 AV-positive patients, there were 15 patients co-infected by all three AV genera (triple genera), 4 patients for double genera, and one patient for a single genus. No difference was found among the HCV patients with and without HCC (2.5 ± 0.7 vs. 2.9 ± 0.31, p=0.12) (Fig. 6A). However, at the same level, normalized Shannon entropy was increased in HCV-associated HCC patients with marginal significance (0.25 ± 0.28 vs. 0.067 ± 0.05, p=0.055) (Fig. 6A).

Fig. 6. AV complexity and copy numbers in HCV patients with and without HCC.

Fig. 6.

HCV patients with (n=10) or without (n=10) HCC were compared in terms of AV richness (A, left axis), Shannon index (A, right axis), and copy numbers (B). Based on their relative abundance, AV copy numbers were further divided into alpha- and combinational beta- and gamma- genera (B).

3.4. Within-host expansion of beta- and gammatorquevirus in HCV-associated HCC

Relative abundance of AV variants was determined and then summarized at the level of AV species. While most patients had an infection with triple AV genera, only one genus dominated the population (Fig. 7). Alphatorquevirus (TTV) prevailed in all 10 AV-positive HCV patients. In contrast, HCV-associated HCC patients were associated with the expansion of either betatorquevirus (TTMV) or gammatorquevirus (TTMDV) in 7 of 10 cases (Fig. 7). Within a genus, there was no consistent pattern with respect to prevailing AV species across the patients (Fig. 7). The AV TaqMan assay had a detection limit of 100 copies/mL. The Ct values were linearly associated with the concentrations of spiked TTV12 plasmid from 102 to 108 copies/mL. Patients with HCC had higher AV copy numbers. However, statistical significance was not reached owing to the large variances (1.44 ± 1.85 x105 vs. 5.61 ± 0.91 x 105 copies/mL, p=0.19) (Fig. 6B). When focusing on TTMV/TTMDV copy numbers, the difference between HCV and HCV-associated HCC patients approached marginal significance (0.96 ± 1.52 x105 vs. 0.011 ± 0.018 x105 copies/mL, p=0.06). Both groups showed a similar level of TTV copy numbers (0.48 ± 1.11 x105 vs. 0.54 ± 0.89 x105 copies/mL, p=0.88) (Fig. 6B).

Fig. 7. Comparison of anellome spectrum among patients with and without HCC.

Fig. 7.

Three AV genera were represented by different colors. Major AV variants were indicated for their species assignment within a genus.

4. Discussion

Applying multiple technical advances, the current study presents a method for the quantitative estimation of AV complexity from patient samples. AV has an exceptional genome diversity that precludes the use of PCR-based approaches. While NGS is a straightforward option, the small genomes of the viruses result in very low on-target rates upon direct NGS application to clinical samples (Houldcroft et al., 2017). In this setting, we have developed tdMDA to eliminate primer-mediated artifacts (Wang et al., 2017). In addition, phi29 DNA polymerase in tdMDA favors the amplification of circular sequences like AV genomes (Nelson, 2014). Yet this advantage is offset by the existence of circular sequences from the host, other microbes, and reagents. In the current study, AV sequences were enriched in the 2pRCA with AV-specific primers in the first-phase RCA. Enriched amplification was also able to reduce the reaction time of the second-phase RCA to six hours. Single primers do not trigger an exponential amplification in the use of phi29 DNA polymerase (Lizardi et al., 1998). Six hours are fallen within the log phase of the second-phase RCA (Wang et al., 2017). Therefore, 2pRCA not only increases AV on-target rates in NGS (Fig. 4) but also minimizes the interference of amplification on relative abundance among individual AV variants in the original templates. This might explain why we successfully replicated the relative abundance of two recombinant AV plasmids after 2pRCA and NGS. Second, reference mapping is routinely used to detect and extract target viral sequences in metagenomics studies. However, using the NCBI AV reference sequences (n=108), mapping extracted only 49% on average of AV reads of 20 patient samples in the current study. Blastx and HMM retrieved 50.4% and 0.6% of AV reads, respectively. In contrast, mapping with HCV reference sequences hit an average of 96.7% of reads in 14 samples from our previous studies (Wang et al., 2017) (Supplementary Fig. 1). This clearly indicates an exceptional AV genome diversity. AV read extraction at three different similarity levels allows authentic de novo assembly of AV contigs as demonstrated using two recombinant TTV clones. Together with the estimation of relative abundance in GASiC and TaqMan PCR assay, the entire experimental procedure is able to depict a complex AV population in unprecedented detail. Anellome is regarded as a viral counterpart of the gut microbiome (Kaczorowska and van der Hoek, 2020). Given profound roles of the gut microbiome in HCC incidence (Lapidot et al., 2020; Schwabe and Greten, 2020), we applied an optimized method to explore anellome complexity in 21 HCV patients with and without HCC. Most patients had a large number of variants detected at the AV species level. However, only a few variants dominated the population (Fig. 6; Fig. 7). In patients co-infected by three AV genera, only one genus prevailed in the population (Fig. 7). As observed for HCV (Park et al., 2014), a decreased AV complexity in HCC patients might be an adaptation to compromised body immunity or disease stage. In spite of a large variance, AV copy numbers were higher in HCC than in HCV, which is consistent with an early report (Tokita et al., 2002). Likewise, AV copy numbers may also be an adaptation to disease status. Indeed, it is now considered to be a surrogate marker of immune competency in transplant patients (Rezahosseini et al., 2019).

An interesting observation is the expansion of either beta- or gammatorquevirus in HCC patients. It is not known whether such an expansion is also an adaptation or a faculty composition. AV appears to have a broad tropism in human tissues, including the liver. Thus, the extent of liver contribution to serum anellome remains unknown. Despite the lack of reliable AV cell culture and small animal models, in vitro studies already show the evidence that AV is capable of modulating innate and adaptive immunity in humans (Shulman and Davidson, 2017). Beta- and gammatorquevirus are phylogenetically clustered together to be separated from alphatorquevirus (Supplemental Fig. 2). If AV functions in a sequence-dependent manner, it is likely an active player rather than merely an adaptor or passenger. In the current study, AV species assignment was performed with 59 full-length ORF1 sequences. Given ongoing efforts to annotate new AV species (Varsani et al., 2021), our observation warrants further studies with more comprehensive AV reference sequences.

5. Conclusion

By combining multiple technical optimizations, we have established an NGS-based method for the quantitation of within-host anellome complexity. The method was developed with patient serum samples but it can be applicable to other types of clinical specimens. In HCV patients, the anellome showed a high complexity that was reduced along with the incidence of HCC. Quantitative estimation of anellome complexity would be a valuable tool to explore the role of AV, and therefore the human virome, in human health and disease.

Supplementary Material

mmc

Highlights.

  • Anellovirus is a major component of the human virome

  • An individual can be infected with multiple anellovirus genera and species that form a heterogeneous repertoire, called anellome

  • A method was developed to measure anellome complexity in a quantitative manner

  • The incidence of hepatocellular carcinoma in patients with hepatitis C virus infection was associated with reduced complexity and distinct compositions of the serum anellome

Acknowledgements

This work was supported by the US National Institutes of Health (NIH) grants AI139835 (X.F.), AI117128 (X.F.), and a seed grant from the Saint Louis University Liver Center (X.F.).

Footnotes

Competing Interests Statement

The authors have no conflict of interest to declare with respect to this manuscript.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Arze CA, Springer S, Dudas G, Patel S, Bhattacharyya A, Swaminathan H, Brugnara C, Delagrave S, Ong T, Kahvejian A, Echelard Y, Weinstein EG, Hajjar RJ, Andersen KG, Yozwiak NL, 2021. Global genome analysis reveals a vast and dynamic anellovirus landscape within the human virome. Cell Host Microbe 29, 1305–1315.e6. 10.1016/j.chom.2021.07.001. [DOI] [PubMed] [Google Scholar]
  2. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA, 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol 19, 455–477. 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL, 2009. BLAST+: architecture and applications. BMC Bioinformatics 10, 421. 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Cebriá-Mendoza M, Arbona C, Larrea L, Díaz W, Arnau V, Peña C, Bou JV, Sanjuán R, Cuevas JM, 2021. Deep viral blood metagenomics reveals extensive anellovirus diversity in healthy humans. Sci Rep. 11, 6921. 10.1038/s41598-021-86427-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Fan X, Di Bisceglie AM, 2010. RT-PCR amplification and cloning of large viral sequences. Methods Mol. Biol 630, 139–149. 10.1007/978-1-60761-629-0_10. [DOI] [PubMed] [Google Scholar]
  6. Fu L, Niu B, Zhu Z, Wu S, Li W, 2012. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152. 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Guerrero-Murillo M, Font J.Gi., 2020. QSutils: Quasispecies Diversity. R package version 1.6.0. [Google Scholar]
  8. Holtgrewe M, 2010. Mason: a read simulator for second generation sequencing data. Technical report TR-B-10-06. Institut fur Mathematik und Informatik, Freie Universitat; Berlin. [Google Scholar]
  9. Houldcroft CJ, Beale MA, Breuer J, 2017. Clinical and biological insights from viral genome sequencing. Nat. Rev. Microbiol 15, 183–192. 10.1038/nrmicro.2016.182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Kaczorowska J, van der Hoek L, 2020. Human anelloviruses: diverse, omnipresent and commensal members of the virome. FEMS Microbiol. Rev 44, 305–313. 10.1093/femsre/fuaa007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Langmead B, Salzberg SL, 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359. 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Lapidot Y, Amir A, Nosenko R, Uzan-Yulzari A, Veitsman E, Cohen-Ezra O, Davidov Y, Weiss P, Bradichevski T, Segev S, Koren O, Safran M, Ben-Ari Z, 2020. Alterations in the gut microbiome in the progression of cirrhosis to hepatocellular carcinoma. mSystems 5, e00153–20. 10.1128/mSystems.00153-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Li G, Zhou Z, Yao L, Xu Y, Wang L, Fan X, 2019. Full annotation of serum virome in Chinese blood donors with elevated alanine aminotransferase levels. Transfusion 59, 3177–3185. 10.1111/trf.15476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Li L, Giannitti F, Low J, Keyes C, Ullmann LS, Deng X, Aleman M, Pesavento PA, Pusterla N, Delwart E, 2015. Exploring the virome of diseased horses. J. Gen. Virol 96:2721–2733. 10.1099/vir.0.000199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Lindner MS, Renard BY, 2013. Metagenomic abundance estimation and diagnostic testing on species level. Nucleic Acids Res 41, e10. 10.1093/nar/gks803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Lizardi PM, Huang X, Zhu Z, Bray-Ward P, Thomas DC, Ward DC, 1998. Mutation detection and single-molecule counting using isothermal rolling-circle amplification. Nat. Genet 19, 225–232. 10.1038/898. [DOI] [PubMed] [Google Scholar]
  17. Macera L, Spezia PG, Medici C, Rofi E, Del Re M, Focosi D, Mazzetti P, Navarro D, Antonelli G, Danesi R, Pistello M, Maggi F, 2019. Comparative evaluation of molecular methods for the quantitative measure of torquetenovirus viremia, the new surrogate marker of immune competence. J. Med. Virol Online ahead of print. 10.1002/jmv.25488. [DOI] [PubMed] [Google Scholar]
  18. Maggi F, Bendinelli M, 2009. Immunobiology of the Torque teno viruses and other anelloviruses. Curr. Top. Microbiol. Immunol 331, 65–90. 10.1007/978-3-540-70972-5_5. [DOI] [PubMed] [Google Scholar]
  19. Maggi F, Fornai C, Vatteroni ML, Siciliano G, Menichetti F, Tascini C, Specter S, Pistello M, Bendinelli M, 2001. Low prevalence of TT virus in the cerebrospinal fluid of viremic patients with central nervous system disorders. J. Med. Virol 65, 418–422. 10.1002/jmv.2051. [DOI] [PubMed] [Google Scholar]
  20. Manzin A, Mallus F, Macera L, Maggi F, Blois S, 2015. Global impact of Torque teno virus infection in wild and domesticated animals. J. Infect. Dev. Ctries 9, 562–570. 10.3855/jidc.6912. [DOI] [PubMed] [Google Scholar]
  21. Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A, 2021. Pfam: The protein families database in 2021. Nucleic Acids Res 49(D1), D412–D419. 10.1093/nar/gkaa913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Nelson JR, 2014. Random-primed, Phi29 DNA polymerase-based whole genome amplification. Curr. Protoc. Mol. Biol 105, Unit 15.13. 10.1002/0471142727.mb1513s105. [DOI] [PubMed] [Google Scholar]
  23. Ng TFF, Dill JA, Camus AC, Delwart E, Van Meir EG, 2017. Two new species of betatorqueviruses identified in a human melanoma that metastasized to the brain. Oncotarget 8, 105800–105808. 10.18632/oncotarget.22400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Nishiyama S, Dutia BM, Stewart JP, Meredith AL, Shaw DJ, Simmonds P, Sharp CP, 2014. Identification of novel anelloviruses with broad diversity in UK rodents. J. Gen. Virol 95, 1544–1553. 10.1099/vir.0.065219-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Nishizawa T, Okamoto H, Konishi K, Yoshizawa H, Miyakawa Y, Mayumi M, 1997. A novel DNA virus (TTV) associated with elevated transaminase levels in posttransfusion hepatitis of unknown etiology. Biochem. Biophys. Res. Commun 241, 92–97. 10.1006/bbrc.1997.7765. [DOI] [PubMed] [Google Scholar]
  26. Park CW, Cho MC, Hwang K, Ko SY, Oh HB, Lee HC, 2014. Comparison of quasispecies diversity of HCV between chronic hepatitis C and hepatocellular carcinoma by ultradeep pyrosequencing. Biomed Res Int. 2014, 853076. 10.1155/2014/853076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Pistello M, Morrica A, Maggi F, Vatteroni ML, Freer G, Fornai C, Casula F, Marchi S, Ciccorossi P, Rovero P, Bendinelli M, 2001. TT virus levels in the plasma of infected individuals with different hepatic and extrahepatic pathology. J. Med. Virol 63, 189–195. [PubMed] [Google Scholar]
  28. Ren Y, Xu Y, Lee WM, Di Bisceglie AM, Fan X, 2020. In-depth serum virome analysis in patients with acute liver failure with indeterminate etiology. Arch. Virol 165, 127–135. 10.1007/s00705-019-04466-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Rezahosseini O, Drabe CH, Sørensen SS, Rasmussen A, Perch M, Ostrowski SR, Nielsen SD, 2019. Torque-Teno virus viral load as a potential endogenous marker of immune function in solid organ transplantation. Transplant Rev 33, 137–144. 10.1016/j.trre.2019.03.004. [DOI] [PubMed] [Google Scholar]
  30. Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC, Kitts PA, Murphy TD, Pruitt KD, Thibaud-Nissen F, Albracht D, Fulton RS, Kremitzki M, Magrini V, Markovic C, McGrath S, Steinberg KM, Auger K, Chow W, Collins J, Harden G, Hubbard T, Pelan S, Simpson JT., Threadgold G, Torrance J, Wood JM, Clarke L, Koren S, Boitano M, Peluso P, Li H, Chin CS, Phillippy AM, Durbin R, Wilson RK, Flicek P, Eichler EE, Church DM, 2017. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res 27, 849–864. 10.1101/gr.213611.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Schwabe RF, Greten TF, 2020. Gut microbiome in HCC - mechanisms, diagnosis and therapy. J Hepatol. 72, 230–238. 10.1016/j.jhep.2019.08.016. [DOI] [PubMed] [Google Scholar]
  32. Shulman LM, Davidson I, 2017. Viruses with circular single-stranded DNA genomes are everywhere! Annu. Rev. Virol 4, 159–180. 10.1146/annurev-virology-101416-041953. [DOI] [PubMed] [Google Scholar]
  33. Skewes-Cox P, Sharpton TJ, Pollard KS, DeRisi JL, 2014. Profile hidden Markov models for the detection of viruses within metagenomic sequence data. PLoS One 9, e105067. 10.1371/journal.pone.0105067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Simmonds P, Sharp CP, 2016. Anelloviridae. In Clinical Virology, Edited by Richman DD, Whitley RJ, Hayden FJ. Chapter 31 pp701–11 10.1128/9781555819439.ch31. [DOI] [Google Scholar]
  35. Spandole S, Cimponeriu D, Berca LM, Mihăescu G, 2015. Human anelloviruses: an update of molecular, epidemiological and clinical aspects. Arch. Virol 160, 893–908. 10.1007/s00705-015-2363-9. [DOI] [PubMed] [Google Scholar]
  36. Tisza MJ, Pastrana DV, Welch NL, Stewart B, Peretti A, Starrett GJ, Pang YS, Krishnamurthy SR, Pesavento PA, McDermott DH, Murphy PM, Whited JL, Miller B, Brenchley J, Rosshart SP, Rehermann B, Doorbar J, Ta'ala BA, Pletnikova O, Troncoso JC, Resnick SM, Bolduc B, Sullivan MB, Varsani A, Segall AM, Buck CB, 2020. Discovery of several thousand highly diverse circular DNA viruses. Elife 9, e51971. 10.7554/eLife.51971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Tokita H, Murai S, Kamitsukasa H, Yagura M, Harada H, Takahashi M, Okamoto H, 2002. High TT virus load as an independent factor associated with the occurrence of hepatocellular carcinoma among patients with hepatitis C virus-related chronic liver disease. J. Med. Virol 67, 501–509. 10.1002/jmv.10129. [DOI] [PubMed] [Google Scholar]
  38. Varsani A, Opriessnig T, Celer V, Maggi F, Okamoto H, Blomström AL, Cadar D, Harrach B, Biagini P, Kraberger S, 2021. Taxonomic update for mammalian anelloviruses (family Anelloviridae). Arch Virol. 166, 2943–2953. 10.1007/s00705-021-05192-x. [DOI] [PubMed] [Google Scholar]
  39. Wang D, 2020. 5 challenges in understanding the role of the virome in health and disease. PLoS Pathog 16, e1008318. 10.1371/journal.ppat.1008318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Wang W, Ren Y, Lu Y, Xu Y, Crosby SD, Di Bisceglie AM, Fan X, 2017. Template-dependent multiple displacement amplification for profiling human circulating RNA. Biotechniques 63, 21–27. 10.2144/000114566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Wingett S, Ewels P, Furlan-Magaril M, Nagano T, Schoenfelder S, Fraser P, Andrews S, 2015. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res 4, 1310. 10.12688/f1000research.7334.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Zhang W, Yang S, Shan T, Hou R, Liu Z, Li W, Guo L, Wang Y, Chen P, Wang X, Feng F, Wang H, Chen C, Shen Q, Zhou C, Hua X, Cui L, Deng X, Zhang Z, Qi D, Delwart E, 2017. Virome comparisons in wild-diseased and healthy captive giant pandas. Microbiome 5, 90. 10.1186/s40168-017-0308-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc

Data Availability Statement

Raw sequence data after the quality control in fastq format were deposited in the NCBI Sequence Read Archive (SRA) under BioProject ID: PRJNA749275

RESOURCES