Abstract
BACKGROUND.
Serum alanine aminotransferase (ALT) test is currently demanded for blood donation in China. One of major reasons to include such a test is possible etiology of known or unknown hepatotropic viruses. However, this hypothesis has never been examined convincingly.
STUDY DESIGN AND METHODS.
The study recruited ninety Chinese blood donors that were divided into three groups based on their ALT values. Serum virome from these donors was explored using a metagenomics approach with enhanced sensitivity resolved at single sequencing reads.
RESULTS.
Anellovirus and pegivirus C (GBV-C) were detected among these donors. None of them were found solely in donors with abnormal liver enzyme. Anellovirus was highly prevalent (93.3%) and the co-infection with multiple genera (alpha, beta, and gammatorquevirus) were more common in the donors with normal ALT values in comparison to those with elevated ALT (single/double/triple Anellovirus genera, 1/3/24 vs. 7/7/14 or 6/7/13, p=0.009). For unmapped reads that accounted for 15 ± 14.9% of the data, similarity-based (BLASTN, BLASTP, and HMMER3) and similarity-independent (k-mer frequency) analysis identified several circular rep encoding ssDNA (CRESS-DNA) genomes. Direct PCR testing indicated these genomes were likely reagent contaminants.
CONCLUSION.
Viral etiology is not responsible for elevated ALT levels in Chinese blood donors. The ALT test, if not abandoned, should be adjusted for its cutoff in response to donor shortage in China.
Keywords: virome, blood donor, metagenomics, Anellovirus
INTRODUCTION
Alanine aminotransferase (ALT) is a transaminase enzyme found mostly in the liver and kidney. In China, ALT test is currently demanded for blood donation to ensure transfusion safety. ALT is very sensitive to liver inflammation regardless of etiologies. In general population or apparent healthy blood donors, there is a substantial portion of individuals who have ALT values above the upper limit (1). Upon the inclusion of serological examination, single ALT test is of limited value to provide additional benefits for the prevention of transfusion-associated diseases, such as hepatitis B virus (HBV) and hepatitis C virus (HCV) infection (2). In line with these facts, there are ongoing debates in China as whether or not ALT test should be discarded, and if not, what is an appropriate cutoff for the balance between transfusion safety and donor resource. China is known to be an endemic area for viral hepatitis. Notably, approximately 1~4% of hepatitis cases newly diagnosed cannot be classified into known hepatitis viruses, i.e., hepatitis A through E (3). Potential risks for the existence of unrecognized hepatitis viruses might be a major concern for the inclusion of single ALT test in blood donors. Indeed, there is a steady portion of patients with no explicit etiologies across a wide spectrum of liver diseases, such as acute hepatitis (4), acute liver failure (5), chronic hepatitis (6), cirrhosis (7), and hepatocellular carcinoma (8). While these patients have been the subjects for continuous efforts trying to discover putative viruses, blood donors have received little attention. Recently, we have improved a nucleic acid amplification method, named template-dependent multiple strand amplification (tdMDA) (9). Based on tdMDA and state of the art bioinformatics tools, we conducted in-depth viome analysis in ninety Chinese blood donors with or without abnormal serum ALT values.
MATERIALS AND METHODS
Serum samples
All blood donors were screened for serum ALT (AU Clinical Biochemical Analysis System, Beckman Coulter, Beijing, China) and known bloodborne pathogens, including hepatitis B virus (HBV), hepatitis C virus (HCV), and HIV. These viruses were examined by antigen (HBsAg-ELISA kit, Wantai BioPharm, Beijing, China), antibody (anti-HCV ELISA kit, InTec Products, Xiamen, China and anti-HIV ELISA kit, Wantai BioPharm, Beijing, China), and viral nucleic acid test (NAT) (Roche Diagnostics, Shanghai, China). According to the current National Health Examination Criteria of Blood Donors (GB 18467-2011), the cut-off value of ALT is 40 U/L and 50 U/L respectively for male and female. Between January and December of 2015, blood donors with elevated ALT (≥ 50 U/L) and negative pathogen tests received the questionnaire to understand both pathological and non-pathological factors that are known to be potentially associated with abnormal ALT values (Supplementary Table 1). The exclusion of these factors thus enriched the subjects toward a more productive serum virome analysis. Eventually, a total of sixty blood donors were enrolled to the study. Based on ALT levels, they were divided into two groups, group 1 (donor b1 through b30) and group 2 (donor b31 through b60) with ALT values higher than 2.5 times of the upper limit (>100 U/L) and between 50 and 100 U/L , respectively. Thirty blood donors with normal ALT were also included as the control (group 3, donor b61 through b90) (Table 1). Written informed consents were obtained from all subjects. As a collaboration study, the research protocol for the collection and use of these blood donor samples was reviewed and approved by both the Research Committee of the Wuhan Blood Center and the Saint Louis University Institutional Review Board (Assurance No: FWA00005304).
Table 1.
Demographic characteristics of 90 blood donors.
| Group | 1 | 2 | 3 | |
|---|---|---|---|---|
| Number | 30 | 30 | 30 | |
| ALT (U/L) | Mean (SD) | 163.5 (79.8) | 78.5 (12.1) | 23.7 (14.9) |
| Range | 102-416 | 55.2-97.2 | 3.6-49.9 | |
| Sex | Male | 25 | 24 | 23 |
| Female | 5 | 6 | 7 | |
| Age (Year) | Mean (SD) | 28.3 (7.7)* | 22.7 (6.9)* | 21.5 (3.2)* |
| Range | 20-46 | 19-55 | 19-33 | |
| BMI | Mean (SD) | 25.3 (2.7)** | 23.9 (2.0)** | 22.8 (2.3)** |
| Range | 19.33-29.53 | 18.81-28 | 18.52-28.03 | |
| Education | ≤High school | 6 | 1 | 5 |
| College | 24 | 29 | 25 | |
| Marriage | Single | 23 | 26 | 28 |
| Married | 7 | 4 | 2 | |
SD, standard derivation; BMI, body mass index.
group 3 vs. group 1, p=3.74x10−5; group 2 vs. group 3, p=0.38; (group 1 + group 2) vs. group 3, p=0.008;
group 3 vs. group 1, p=0.4 x 10−4; group 2 vs. group 3, p=0.054; (group 1 + group 2) vs. group 3, p=0.001;
RNA extraction, reverse transcription (RT), tdMDA, and Illumina sequencing
Total RNA was extracted from 140 μL of serum and eluted into 60 μL Tris buffer (pH 8.5) using QIAamp Viral RNA Mini kit (Qiagen, Valencia, CA). This kit actually extracts both DNA and RNA larger than 200 bp. Total RNA was used for reverse transcription (RT)-tdMDA as described previously (9). In brief, 10.6 μL of extracted RNA was mixed with 9.4 μL RT matrix consisting of 1x SuperScript III buffer, 10 mM DTT, 80 μM of exonuclease-resistant random pentamer primers with the 5’ end blocked by C18 spacer (9), 2 mM dNTPs (Epicentre), 20 U of RNasein Ribonuclease Inhibitor (Promega), and 200 U of SuperScript III (Life technologies). The reaction was incubated at 37C for 30 minutes, 50C for 30 minutes, and inactivated at 70C for 15 minutes. An aliquot of 4 μL of RT was used for tdMDA in 40-μL volume consisting of 1x phi29 DNA polymerase buffer, 1 mM of dNTPs, 80 μM of random pentamer primers as used in the RT, and 20 units of phi29 DNA polymerase (New England Biolabs, Ipswich, MA). The reaction was incubated at 28C for 14 hours and then terminated by heating at 65C for 15 minutes. After the purification with QIAamp DNA mini kit (Qiagen), RT-tdMDA product was subjected to library construction with Nextera XT DNA Sample Preparation kit (Illumina, San Diego, CA), followed by sequencing on the Illumina NextSeq 500 platform (1 × 250-bp single reads and mid-output) at MOgene (St. Louis, MO). Besides 90 blood donor samples, two negative controls resulting from background amplification from RT-tdMDA were also included for Illumina sequencing.
Viral categorization
Raw sequence reads in fastq format from each sample were first filtered in PRINSEQ (v0.20) for the quality control, including read length ≥70 nt, mean read quality score ≥25, low complexity with DUST score ≤7, ambiguous bases ≤1%, and all kinds of duplicates (10). Using Bowtie 2 mapper (11), quality reads were subtracted sequentially by human sequences [The National Center for Biotechnology Information (NCBI) GRCh38 build (12)], NCBI microbial reference sequences for bacteria, archaea, fungi and protist (downloaded on September 11, 2018) (13), and microbial reference genomes from human microbiome project (HMP) (14). Subtracted reads were then mapped onto NCBI viral reference sequences (9,687 complete viral genomes downloaded on September 11, 2018) (13). Each mapped read was assigned to corresponding viral genome using SAMtools as described in our previous study (15).
Analysis of unmapped reads
After viral mapping, data from each sample was further filtered by the reads from the two negative controls. Remaining reads from ninety samples were pooled into nine subgroups, each containing 10 samples. Reads from each subgroup were de novo assembled using a short read assembler, SPAdes (16). In each subgroup, contigs were combined with unassembled reads (singletons) to generate a sequence dataset. After re-labelling the sequences from each subgroup using PRINSEQ (10), all nine sequence datasets were combined into a single dataset that was then compressed by similarity at 90% with CD-HIT (17). These sequences were subjected to similarity-based annotation, first by NCBI BLASTN against NCBI collection of nucleotide acid sequences (database “nt”) with a conserved e value setting at 1 x 10−5. Sequences having no BLASTN hits were translated in six-frame using a custom script (18). Amino acids sequences were searched by BLASTP against NCBI non-redundant protein database (“nr”) with the same setting of e value at 1 x 10−5. Amino acid sequences without BLASTP hits were conducted for remote homology search using Profile Hidden Markov Model (HMM) analysis in HMMER (v3.2.1) (19). In this approach, we used HMM-profiles built from NCBI viral RefSeq except for phage (vFam) (20), phage (21), and the collection of protein families (Pfam, 17,929 entries in version 32) (22). The sequences without HMMER hits were examined for the possibility of a virus using machine-learning method implemented in VirFinder under the model of “VF.modEPV_k8.rda”, which was trained with 5,800 eukaryotic viruses collected from NCBI (23). Potential viral sequences were then analyzed by cross-mapping, sequence complexity, and tandem repeats (24). Sequences suspected for a candidate virus were confirmed by PCR directly from serum samples.
Genome diversity of Anellovirus
Anellovirus is a single-strand circular DNA virus that shows great genome diversity (25). Using a new analytical approach, we examined its relevance to the elevated serum ALT levels. For each subject that showed a positive Anellovirus infection by viral categorization, Anellovirus-specific sequencing reads were collected from the mapping of 48 full-length Anellovirus genomes that cover all genera in the family of Anelloviriade (Supplementary Table 2). Additional BLASTX- and vFam-hit reads to the protein sequences of 48 Anellovirus isolates were also extracted from the data using Bowtie 2 under the “end to end” alignment mode with score setting at zero. These reads were combined together to generate contigs by de novo assembly using SPAdes (16). Based on Anellovirus ORF1 (open reading frame 1), the contigs were assigned by BLASTX into appropriate genera in Anelloviriade with the best (smallest) e values. Genetic diversity of Anellovirus at the genus level was then compared among three groups. For certain genera (alphatorquevirus and gammatorquevirus), phylogenetic trees were constructed in MEGA (Molecular Evolutionary Genetics Analysis) (26) based on Anellovirus ORF1 to see any topologies that display a phenotype-dependent clustering.
Statistical analysis
Among-group comparisons were done using either the two-tailed Student’s t-test or Chi-squared test. Data were expressed as the mean ± SD (standard deviation), and p<0.05 was considered statistically significant.
Data availability
Raw sequence data in fastq format from 90 blood donors and two negative controls were deposited in the NCBI Sequence Read Archive (SRA) under BioProject ID: PRJNA526976. Three virus-like sequences identified in the current study were deposited in the GenBank under the accession numbers MK659567 through MK659569.
RESULTS
Viruses detected in circulation by read mapping
Illumina sequencing from 92 RT-tdMDA product (90 blood donors and 2 negative controls) generated a total of 168,942,410 single-end reads. After quality control, average read number for each subject was 1,424,365 ± 522,770 with read length at 226 ± 6.12 nt. On the reference mapping, human genome received the most reads (83.2 ± 16%) and virus-mapped reads occupied around 1.43 ± 2.29%. A significant portion of reads (15 ± 14.9%) were unmappable (Supplementary Figure 1). Under an empirical cut-off of read number ≥ 5, the mapping detected a total of 59 viruses that were presented in at least one blood donor. Owing to sequence similarity among viral species within a given viral family, it should be noted that the number of viruses detected might be inflated by read mapping, a caveat of short-read sequencing. All viruses could be assigned into six families, including Anelloviridae, Flaviviridae, Genomoviridae, Retroviridae, Inoviridae, and Myoviridae (Figure 1). Notably, Anellovirus (Anelloviriade) was positive in 84 of 90 donors (93.3%) except for the donor b11, b30, b57, b58, b75, and b81. Two blood donors, b40 and b70, had positive detection of Pegivirus C, also known as GB virus C or hepatitis G virus (Figure 1). The phages, either Inoviriade or Myoviriade, were found in the serum of multiple donors (b1, b9, b19, b29, b30, b37, b38, b41, b43, b63, b76, b78, b86, b90). Given their frequent detection in blood (28, 29), these phage-mapped reads might resulted from gut microbiota. Similarly, circular single-strand DNA viruses from Genomoviridae were also present across three groups (b15, b18, b19, b21, b50, b78, b80, b86, and b88) (Figure 1). Some of these viruses were also detected from the control, suggesting a contamination or sequencing background perhaps owing to their ubiquitous nature (30). Finally, retroviral sequences were detected in two subjects, b27 and b57. For these two donors, retroviral reads were extracted and do novo assembled by SPAdes into two contigs that shared 100% nucleotide similarity between b27 and b57. Both contigs were all mapped onto the polymerase domain of moloney murine leukemia virus (MMLV) genome with 98% sequence identity (GenBank accession number NC_001501). Therefore, retroviral sequences might come from residual cloned MMLV polymerase sequences in the manufacture of SuperScript III reverse transcriptase, which was used in the RT reaction. Taken together, read mapping mainly detected the virus from two families (Anelloviridae and Flaviviridae) known to be circulated in human population. Other viruses were considered as the background and/or reagent contamination.
Figure 1. Detection of known viruses by read mapping.
Original virus-mapped read numbers were normalized based on the average reads among 90 donors (1,424,365), expressed as the number of reads per million of total reads, and presented in the form of heatmap (27). Of 59 viruses detected, their corresponding viral families were indicated.
Annotation of unmapped reads
The unmapped reads were annotated using a novel pipeline (Figure 2). Approximately 98.8% of unmapped reads were capable of assembling into contigs. There were a total of 31,220 sequences after combining the contigs with unassembled reads (singletons) from nine pooled subgroups (each n=10). This number was reduced to 9,181 by CD-HIT under 90% sequence similarity. Of 9,181 sequences, BLASTN got 4,404 unique hits in the NCBI “nt” database with the e value less than 1 x 10−5 (Supplementary Table 3). Remaining 4,777 sequences were translated in six-frame into 5,428 fragments with the length ranging from 19 to 817 amino acids. Subsequent BLASTP analysis got 2,690 unique hits in the NCBI “nr” database (Supplementary Table 4). The HMMER 3 search of 2,558 fragments without the BLASTP matches harvested 108, 135, and 192 hits respectively for the vFam, phage, and Pfam (Supplementary Table 5). Taken together, similarity-based approaches (BLASTN, BLASTP, and HMMER 3) annotated 7,074 of 9,181 sequences. Finally, the k-mer-based method (VirFinder) indicated 107 virus-like sequences (p<0.05) among unannotated 2,107 sequences, including 409 singletons. Of 107 virus-like sequences, 76 sequences were multiple tandem repeats ranging from 8 to 217 nt. Another 14 sequences showed low complexity using DUST method with the threshold setting at 7 in PRINSEQ (10). Cross-mapping of remaining 17 virus-like sequences showed the hits on multiple donors across three groups, suggesting a nature of contamination rather than authentic presence in circulation.
Figure 2. The bioinformatics pipeline for the analysis of unmapped reads.
The pipeline is featured by the circumvention of computation-limited steps.
Comprehensive analysis of the unmapped reads didn’t find potential viruses or virus-like sequences merely present in group 1 and/or group2 donors. Many sequences with BLASTN, BLASTP and HMMER 3 (Pfam) hits were attributed to human and bacterial genomes (Supplementary Table 3, 4 and 5). Most contigs with the vFam-based HMMER 3 hits were assigned to Polyomaviridae, Mimiviridae, Phycodnaviridae, and Pandoraviridae. However, simultaneous detection across three groups suggested a possible origin of contamination or background sequencing. Of note, multiple sequences through our bioinformatics analysis could be classified into circular single-strand DNA viruses. As this kind of virus was previously reported to be detected in human plasma or pericardial fluid (31, 32), we conducted nested PCR on three selective candidates, including one putative gemycircularvirus from read mapping and two unknown virus-like circular sequences from final 17 sequences. These sequences contained predicted open reading frames (ORFs) that were long enough for the primer design (Supplementary Table 6). Interestingly, three amplicons were successfully amplified with the predicted sizes from selective donors and negative controls in which water was used for total RNA extraction instead of serum. Sanger sequencing of gel-purified PCR amplicons confirmed 100% sequence identity (Supplementary Figure 2). By including 49 complete gemycircularvirus genomes retrieved from GenBank, the putative gemycirculavirus sequence (2,205 nt) was clustered with the human isolates in the phylogenetic tree (Supplementary Figure 2). In comparison to human associated gemyvongvirus 1 isolate DB1 (GenBank accession number NC_028459), however, it showed only 8% nucleotide and 22.5% amino acid similarity using Clustal W (33). Our data demonstrated that all three sequences were contaminated from Qiagen RNA extraction kit.
Reduced Anellovirus genetic diversity in blood donors with elevated serum ALT.
Anellovirus-specific reads were recovered at different levels of similarity-based search, including mapping, BLASTN, BLASTP, and HMMER 3 analysis. Given such a great genetic diversity and its high prevalence (93.3%), we investigated its role in the blood donors with abnormal liver enzyme. Collectively, de novo read assembly from each Anellovirus-positive donors generated 1,196 contigs relevant to the Anellovirus ORF1 domain. Only donors 35 and 43 in the group 2 had no ORF1-containing contigs perhaps due to their low read numbers. These contigs were assigned by the BLASTX into alphatorquevirus, betatorquevirus, or gammatorquevirus in Anelloviridae family. A total of nineteen contigs with unassigned genera by BLASTX were extracted and translated in six-frame, followed by BLASTP (web version) analysis. All of these unassigned sequences could be placed into known anellovirus genera, including 15 in alphatorquevirus, 3 in gammatorquevirus, and 1 in Opossum anellovirus. Infection with multiple genera was common among blood donors. Both group 1 and group 2 had a similar distribution with regard to the numbers of blood donors that presented single, double or triple genera of Anellovirus infection (7/7/14 vs. 6/7/13, p=0.98). However, such a distribution was significantly different from group 3 (7/7/14 or 6/7/13 vs. 1/3/24, p=0.009) (Figure 3B). At the level of Anellovirus genus, there was a weak association between the intra-donor Anellovirus diversity and Anellovirus concentrations (R2=0.24, p=0.02), which was quantitated using the number of reads after the normalization (Figure 3A). There was no statistical difference of average Anellovirus read numbers among three groups, 21,350 ± 39,193, 24,349 ± 39,301 and 25,500 ± 27,904, respectively for group 1, group 2, and group 3 (g1 vs. g2, p=0.77; g2 vs. g3, p=0.9; g1 vs. g3, p=0.65). Therefore, infection with multiple Anellovirus genera in blood donors may result in higher Anellovirus concentration, but not vice versa. Below the level of Anellovirus genus, phylogenetic analysis didn’t show group-based tree topologies in either alphatorquevirus (Figure 3C) or gammatorquevirus (Figure 3D), suggesting the lack of particular Anellovirus species that may be associated with blood donors with elevated ALT level.
Figure 3. Comparative analysis of intra-donor Anellovirus diversity among three groups.
First, linear regression analysis showed a weak association between the number of Anellovirus genus and viral concentration inferred from read counts (A); Second, there were significantly different compositions with regard to the donors infected with single, double, triple or more Anellovirus genera between group 3 and group 1 or group 2 (B); Finally, the neighbor-joining trees were constructed with 187 and 294 Anellovirus ORF1 sequences respectively from alphatorquevirus (each 882 nt) and gammatorquevirus (each 474 nt). Bootstrap values were indicated on major branches. For both genera, no apparent clustering was found three groups of donors, i.e., group 1 (red), group 2 (blue), and group 3 (black).
DISCUSSION
Applying a metagenomics approach, the current study for the first time characterized serum virome in Chinese blood donors with elevated ALT level. Given a comparable experimental design, the interpretation of these virome data has provided insights in terms of the approach, Anellovirus, and transfusion medicine.
Methodologically, we have applied the RT-tdMDA for whole genome/transcriptome amplification that overcomes the low quantity and highly degrading nature of cell-free nucleic acids in circulation. While tdMDA eliminates primer-associated artifacts in amplification (9), it is unable to eradicate the contamination from multiple sources of the experimental pipeline. Indeed, contamination is a major concern in metagenomics-based virome research that often results in false discovery, such as the identification of parvovirus-like NIH-CQV in the sera of chronic hepatitis patients with unknown etiology (34, 35). In the current study, we have also discovered three virus-like sequences, including one putative gemycirculavirus. However, like NIH-CQV, these sequences have eventually demonstrated to be contaminants from the Qiagen RNA Extraction Kit (Supplementary Figure 2). Therefore, the inclusion of a control group and a validation step is of utmost importance in virome research toward the discovery of unknown viruses.
In spite of continuous isolation of novel Anellovirus species from human specimens, it is now considered as a commensal virus owing to its high prevalence in human (36). Although co-infection with multiple Anellovirus genotypes has been reported (37), it is our finding that donors with normal ALT values (group 3) appear to be more likely infected with multiple Anellovirus genera. While such likelihood is statistically significant between group 3 and group 1 or group 2, quantitative rather than qualitative difference rules out an etiological explanation for various ALT levels among three groups. Instead, other factors beyond the etiology, like immune status, might be an underlying force. Anellovirus titer has been proposed as a surrogate marker for immune function in patients undergoing solid organ transplantation (38). Given comparable Anellovirus concentrations inferred from read numbers among three groups, our finding, however, suggests that intra-patient Anellovirus diversity, at least at the level of Anellovirus genus, might be a more reliable marker for surveillance of an immune status.
The current study hasn’t found known or novel viruses that may be potentially associated with elevated ALT levels among sixty blood donors. Owing to the dominance of human genome in the starting material, metagenomics approach is notoriously known for its low sensitivity in virome research (39). However, the low sensitivity is largely referred to the recovery of fragmented but not full-length viral genomes (39). For an efficient viral detection in a qualitative manner, such a concern has been circumvented by using an advanced bioinformatics pipeline with the resolution at single reads, as illustrated in the detection of Pegivirus C and even residual retroviral genome from the reagents. It is unlikely to have authentic known or unknown viruses missed for the detection in the current study. In fact, our results are consistent with recent reports that show no novel virus detectable in blood from Japanese blood donors with abnormal liver function (40), acute liver failure patients with indeterminate etiology (41), the subjects receiving multiple transfusions (42), and patients with a spectrum of liver diseases (43). The elevated ALT values in these blood donors may be attributed to other factors, such as genetics (44). Non-alcoholic fatty liver disease (NAFLD) might also be an explanation as implicated by significantly higher BMI values in group 1 than that from group 3 (Table 1) (45). Certainly, we cannot exclude the possibility for the existence of possible hepatotropic viruses with a non-parenteral transmission. For instance, Wang et al. found a higher prevalence of hepatitis E virus (HEV) antibody (IgG) and antigen among Chinese donors with elevated ALT levels (46). To this point, virome study using liver biopsy becomes an ultimate solution.
In summary, the lack of the detection of blood-borne pathogens in donor serum samples endorses the efficiency of current donor surveillance in transfusion safety. Given a high level of serum ALT in group 1, our study delivers a clear message that a single ALT test should be dropped and, if not, there is a large room for adjusting its cutoffs in response to donor shortage in China.
Supplementary Material
Acknowledgments
Financial disclosure: The study was funded by Wuhan Blood Center and partly supported by the US National Institutes of Health (NIH) grants AI117128 (X.F.). The funding is acknowledged and authors have nothing to disclose.
Footnotes
Conflicts of interest: The authors have no conflict of interest to declare with respect to this manuscript.
References
- 1.Kim WR, Flamm SL, Di Bisceglie AM, Bodenheimer HC. Serum activity of alanine aminotransferase (ALT) as an indicator of health and disease. Hepatology 2008; 47: 1363–70. [DOI] [PubMed] [Google Scholar]
- 2.Li L, Li KY, Yan K, et al. The History and Challenges of Blood Donor Screening in China. Transfus Med Rev 2017; 31:89–3. [DOI] [PubMed] [Google Scholar]
- 3.China Bureau for Disease Prevention and Control. Statistics of Infectious Diseases. Available at: http://www.nhc.gov.cn/jkj/ Accessed 25 February 26, 2019.
- 4.Alter MJ, Gallagher M, Morris TT, et al. Acute non-A-E hepatitis in the United States and the role of hepatitis G virus infection. Sentinel Counties Viral Hepatitis Study Team. N Engl J Med 1997; 336:741–6. [DOI] [PubMed] [Google Scholar]
- 5.Lee WM. Etiologies of acute liver failure. Semin Liver Dis 2008; 28:142–52. [DOI] [PubMed] [Google Scholar]
- 6.Alter HJ, Bradley DW. Non-A, non-B hepatitis unrelated to the hepatitis C virus. Semin Liver Dis 1995; 15:110–20. [DOI] [PubMed] [Google Scholar]
- 7.Kodali VP, Gordon SC, Silverman AL, McCray DG. Cryptogenic liver disease in the United States: further evidence for non-A, non-B, and non-C hepatitis. Am J Gastroenterol 1994; 89:1836–9. [PubMed] [Google Scholar]
- 8.Seeff LB. Hoofnagle JH. Epidemiology of hepatocellular carcinoma in areas of low hepatitis B and hepatitis C endemicity. Oncogene 2006; 25:3771–7. [DOI] [PubMed] [Google Scholar]
- 9.Wang W, Ren Y, Lu Y, Xu Y, Crosby SD, Di Bisceglie AM, Fan X. Template-dependent multiple displacement amplification for profiling human circulating RNA. Biotechniques 2017; 63:21–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 2011; 27: 863–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012; 9:357–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Guo Y, Dai Y, Yu H, Zhao S, Samuels DC, Shyr Y. Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis. Genomics 2017; 109:83–90. [DOI] [PubMed] [Google Scholar]
- 13.Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2007; 35(Database issue):D61–5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lloyd-Price J, Mahurkar A, Rahnavard G, et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 2017; 550:61–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wang W, Zhang X, Xu Y, Di Bisceglie AM, Fan X. Viral categorization and discovery in human circulation by transcriptome sequencing. Biochem Biophys Res Commun 2013; 436:525–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bankevich A, Nurk S, Antipov D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012; 19:455–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Niu B, Fu L, Sun S, Li W. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics 2010; 11:187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Novichkov PS, Ratnere I, Wolf YI, Koonin EV, Dubchak I. ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes. Nucleic Acids Res 2009; 37(Database issue):D448–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Johnson LS, Eddy SR, Portugaly E. Hidden Markov Model Speed Heuristic and Iterative HMM Search Procedure. BMC Bioinformatics 2010; 11:431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Skewes-Cox P, Sharpton TJ, Pollard KS, DeRisi JL. Profile hidden Markov models for the detection of viruses within metagenomic sequence data. PLoS One 2014; 9:e105067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Grazziotin AL, Koonin EV, Kristensen DM. Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation. Nucleic Acids Res 2017; 45(Database issue): D491–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Finn RD, Coggill P, Eberhardt RY, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 2016; 44(D1):D279–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ren J, Ahlgren NA, Lu YY, Fuhrman JA, Sun F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome 2017; 5:69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Benson G Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 1999; 27:573–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Spandole S, Cimponeriu D, Berca LM, Mihăescu G. Human anelloviruses: an update of molecular, epidemiological and clinical aspects. Arch Virol 2015; 160:893–908. [DOI] [PubMed] [Google Scholar]
- 26.Kumar S, Stecher G, and Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets Mol Biol Evol 2016; 33:1870–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhao S, Guo Y, Sheng Q, Shyr Y. Heatmap3: an improved heatmap package with more powerful and convenient features. BMC Bioinformatics 2014; 15(Suppl 10): 16.24428894 [Google Scholar]
- 28.Moustafa A, Xie C, Kirkness E, et al. The blood DNA virome in 8,000 humans. PLoS Pathog 2017; 13:e1006292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Navarro F, Muniesa M. Phages in the human body. Front Microbiol 2017; 8: 566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Shulman LM, Davidson I. Viruses with circular single-stranded DNA genomes are everywhere! Annu Rev Virol 2017; 4:159–80. [DOI] [PubMed] [Google Scholar]
- 31.Zhang W, Li L, Deng X, et al. Viral nucleic acids in human plasma pools. Transfusion 2016; 56:2248–55. [DOI] [PubMed] [Google Scholar]
- 32.Halary S, Duraisamy R, Fancello L, et al. Novel Single-Stranded DNA Circular Viruses in Pericardial Fluid of Patient with Recurrent Pericarditis. Emerg Infect Dis 2016; 22:1839–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994; 22:4673–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Xu B, Zhi N, Hu G, et al. Hybrid DNA virus in Chinese patients with seronegative hepatitis discovered by deep sequencing. Proc Natl Acad Sci U S A 2013; 110:10264–9 [DOI] [PMC free article] [PubMed] [Google Scholar] [Research Misconduct Found]
- 35.Naccache SN, Greninger AL, Lee D, et al. The perils of pathogen discovery: origin of a novel parvovirus-like hybrid genome traced to nucleic acid extraction spin columns. J Virol 2013; 87:11966–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Vu DL, Kaiser L. The concept of commensal viruses almost 20 years later: redefining borders in clinical virology. Clin Microbiol Infect 2017; 23:688–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Niel C, Saback FL, Lampe E. Coinfection with multiple TT virus strains belonging to different genotypes is a common event in healthy Brazilian adults. J Clin Microbiol 2000; 38:1926–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kotton CN. Torque Teno Virus: Predictor of Infection After Solid Organ Transplant? J Infect Dis 2018; 218:1185–7. [DOI] [PubMed] [Google Scholar]
- 39.Houldcroft CJ, Beale MA, Breuer J. Clinical and biological insights from viral genome sequencing. Nat Rev Microbiol 2017; 15:183–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Furuta RA, Sakamoto H, Kuroishi A, Yasiui K, Matsukura H, Hirayama F. Metagenomic profiling of the viromes of plasma collected from blood donors with elevated serum alanine aminotransferase levels. Transfusion 2015; 55:1889–99. [DOI] [PubMed] [Google Scholar]
- 41.Sauvage V, Laperche S, Cheval J, et al. Viral metagenomics applied to blood donors and recipients at high risk for blood-borne infections. Blood Transfus 2016; 14:400–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Somasekar S, Lee D, Rule J, et al. Viral surveillance in serum samples from patients with acute liver failure by metagenomic next-generation sequencing. Clin Infect Dis 2017; 65:1477–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Law J, Jovel J, Patterson J, et al. Identification of hepatotropic viruses from plasma using deep sequencing: a next generation diagnostic tool. PLoS One 2013; 8:e60595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Chambers JC, Zhang W, Sehmi J, et al. Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat Genet 2011; 43:1131–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Chalasani N, Younossi Z, Lavine JE, et al. The diagnosis and management of nonalcoholic fatty liver disease: Practice guidance from the American Association for the Study of Liver Diseases. Hepatology 2018; 67:328–57. [DOI] [PubMed] [Google Scholar]
- 46.Wang M, He M, Wu B, et al. The association of elevated alanine aminotransferase levels with hepatitis E virus infections among blood donors in China. Transfusion 2017; 57:273–9. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Raw sequence data in fastq format from 90 blood donors and two negative controls were deposited in the NCBI Sequence Read Archive (SRA) under BioProject ID: PRJNA526976. Three virus-like sequences identified in the current study were deposited in the GenBank under the accession numbers MK659567 through MK659569.



