Table 2.
Steps | Challenges | Possible solutions |
---|---|---|
Nucleic acid extraction |
• Existence of active and silent fractions of viromes • Total nucleic acid isolation protocols (TNAI): + Allow characterization of microbiome along with virome potential = holistic picture of all components of the microbiome + High-throughput – Lead to inflation of false-positive hits from bacteria in the subsequent data analysis • Viral-like particle (VLP) isolation protocols: + Ensure true positives on viruses due to physical removal of bacteria by filtration – Give a low-concentration output [79] that may complicate the genomic library preparation step – Usually require multiple time-consuming steps of VLP and nucleic acid precipitation [78, 80] |
• Combination of TNAI and VLP isolation protocol approaches [81] |
Genomic library preparation | • Limited amount of viral genetic material available | • Use of more sensitive genomic library preparation kits |
• MDA may lead to overrepresentation of circular ssDNA viruses [82] and underrepresentation of viruses with extreme GC content [83] | • Restricted use of MDA | |
• Studying RNA viruses requires additional effort due to the relative instability of RNA genetic material: - Use of reverse transcriptase to convert RNA to cDNA - Restricted usage of RNase in protocols handling both DNA and RNA viruses [84] - May require separate isolation protocol (arising from the previous point) and, therefore, increase of the starting material |
• Metatranscriptomics approaches • Use of reverse transcription step |
|
• Studying ssDNA viruses requires additional effort: - Some of the WGA techniques that precede the genomic library preparation procedure might introduce biases into the representation of ssDNA viruses [77, 82, 85] - The majority of current genomic library preparation procedures cannot handle ssDNA genomes due to the use of dsDNA adapters - ssDNA viruses have been shown to have higher mutation rates than dsDNA viruses [86], thus increasing the microdiversity of the metagenome, which limits reference-based approach |
• Use of ssDNA adaptors in adaptor-ligation reaction at the genomic library preparation step [77] | |
• Selection of an appropriate cut-off for coverage is complicated | • Studies report discoveries of a huge number of viruses at a depth of 1–15 × 106 reads per sample [60, 78–80] | |
Quality control | • Removal of bacterial sequences is complicated by the viral signals from prophages (both cryptic and inducible) carried by bacterial genomes | • Use of tools for identification of prophages in bacterial genomes [87–89], though some are limited to known prophages. The combination of multiple methods has been shown to enrich the set of detected prophages [90] and therefore prevent their concurrent removal with bacterial sequences. |
Data analysis | • Existing databases do not fully represent viral diversity [91] | • Use of de novo assembly approaches |
• Rapid evolution and diversity of viral genomes limits reference-based approaches |
• Use of reference databases that include both cultured viruses and computationally identified viral contigs [25, 92] • Use of a protein-based search • Use of a profile hidden Markov model based on protein domains allows the identification of remote homologs [93] |
|
• De novo assembly approach is sensitive to biases introduced during genomic library preparation and sequencing: - Low DNA input for genomic library preparation decreases the percentage of reads that map back to the corresponding assemblies [94, 95] - Use of a DNA amplification step might affect the distribution of read coverage [94, 96] - Shifts in GC content during genomic library preparation [97] affect the completeness of genomes and cause assembly fragmentation |
• Adjustment of the assembly pipeline according to applied genomic library preparation procedure [96]: use of modes suitable for an uneven distribution of read coverage such as single-cell SPAdes [98, 99] preceded by read de-duplication [96] or Velvet-SC [100] • Use of genomic library preparation protocols without any amplification procedure (needs high DNA input, probably not applicable for viromics) [101, 102] |
|
• Reproducibility of assembly results when combining different assemblers is complicated by technical challenges [103, 104] and the possibility of the appearance of chimera assemblies [104] |