Abstract
The increasing interest in finding new viruses within (meta)genomic datasets has fueled the development of computational tools for virus detection and characterization from environmental samples. One key driver is phage therapy, the treatment of drug-resistant bacteria with tailored bacteriophage cocktails. Yet, keeping up with the growing number of automated virus detection and analysis tools has become increasingly difficult. Both phage biologists with limited bioinformatics expertise and bioinformaticians with little background in virus biology will benefit from this guide. It focuses on navigating routine tasks and tools related to (pro)phage detection, gene annotation, taxonomic classification, and other downstream analyses. We give a brief historical overview of how detection methods evolved, starting with early sequence-composition assessments to today’s powerful machine-learning and deep learning techniques, including emerging language models capable of mining large, fragmented, and compositionally diverse metagenomic datasets. We also discuss tools specifically aimed at detecting filamentous phages (Inoviridae), a challenge for most phage predictors. Rather than providing an exhaustive list, we emphasize actively maintained and state-of-the-art tools that are accessible via web or command-line interfaces. This guide provides basic concepts and useful details about automated phage analysis for researchers in different biological and medical disciplines, helping them choose and apply appropriate tools for their quest to explore the genetic diversity and biology of the smallest and most abundant replicators on Earth.
Keywords: bacteriophages, prophages, metagenomics, microbial bioinformatics, gene annotation, phage prediction
Introduction
The growing interest in discovering new viruses in (meta)genomic datasets has led to a rapid increase in newly developed computational tools for virus detection and characterization from environmental samples [1–6]. This interest is also sparked by the potential of phage therapy, the application of phages to treat bacterial infections, especially those involving drug-resistant bacteria [7]. However, this surge of interest in bacteriophages (phages) extends beyond the promise of medical applications and is based on recognizing our planet as a bacterial world [8], where phages play pivotal roles as the most abundant replicators [9], shaping ecological and evolutionary dynamics of microbial communities, and have cascading effects on plants, animals, and entire ecosystems [10]. As a result of ever-cheaper sequencing costs, researchers from diverse fields, including microbiology, medicine, ecology, and evolution, started to explore and identify phages either in their own (meta)genomic datasets or in publicly available databases. With the rapid pace at which new virus analysis tools are emerging in recent years (Fig. 1), it becomes increasingly more difficult for researchers to select the most appropriate approaches to answer their most relevant questions. Acknowledging these dynamics, here we provide a comprehensive guide that equips researchers with the necessary knowledge to detect and describe bacterial viruses in genomic and metagenomic datasets, enabling an easy entry into this rapidly evolving research field. While recent reviews were aimed at technical users in metagenomics [11], this guide is aimed at both wet-lab phage biologists with limited bioinformatics expertise and bioinformaticians with little background in virus biology.
Figure 1.
Surging interest in computational phage research. The chart depicts the cumulative count of all bioinformatic tools referenced in this review, covering methods of phage detection, annotation, taxonomic classification, host prediction, life cycle inference, and genome quality assessment.
We begin with key concepts in bacteriophage biology and then briefly introduce the computational principles behind phage detection. We then provide a historical account of the evolution of (pro)phage detection tools and highlight modern state-of-the-art algorithmic approaches. After that, we transition to the core part of our review, a step-by-step guide comprising four parts. In these steps, we cover popular and well-maintained tools without claiming to be exhaustive and include methods that support the detection of filamentous phages (Inoviridae), an oft-neglected group that includes the important representative phage M13. Data processing, genome assembly, phylogenetics, and comparative genomics are only briefly addressed, as they fall outside the scope of this review.
Key concepts in bacteriophage biology
Phages display remarkable diversity in genome structure, morphology, and life cycle strategies [12]. Their genomes are encoded as either single- or double-stranded DNA or RNA and are often enclosed in protein shells, either spherical capsids or filamentous coats. Beyond morphology, phages evolved different life cycles (Fig. 2): virulent phages kill their infected hosts via the lytic cycle, ensuring rapid horizontal transmission. Temperate and filamentous phages establish long-term associations with their bacterial hosts, ensuring vertical host transmission. While filamentous phages typically persist extrachromosomally without causing lysis, temperate phages insert their genetic material into the host genome and together form the lysogen. Insertion happens either at specific attachment sites via integrases (phage lambda) or at random via transposases (phage mu). Integrated prophages can exit the host genome spontaneously or in response to stress via a molecular switch, replicate, and subsequently lyse the host. Prophages are widely found in bacterial genomes and can constitute up to a fifth of the host genome [13]; e.g. Escherichia coli O157:H7 strain Sakai harbors 18 prophages [14]. Integrated phages sometimes lose their ability to switch to the lytic cycle or produce viable viral particles following the acquisition of deleterious mutations [15]. Finally, more complex, multipartite viruses exist that are distributed across different genomic segments, each encapsulated in separate particles [16]. One example is RNA-phage phi-6, which infects Pseudomonas phaseolicola [17]. Such a complicated genome organization represents an important challenge for phage prediction tools.
Figure 2.

Illustration of lytic and lysogenic bacteriophage life cycles. In both cycles, the phage binds to the bacterial cell (1) and the phage’s genetic material then enters the host cell (2). During the lytic cycle, the phage multiplies (3a) and releases mature viruses through host cell lysis (4a). The lysogenic cycle is characterized by phage genome integration into the bacterial genome (prophage formation, 3b), vertical inheritance through host replication (4b), and occasional phage genome excision (5) before entering the lytic cycle (3a-4a). Created in BioRender. Wielgoss, S. (2025) https://BioRender.com/4si2h1x
Principles of automated phage prediction
Viral signals can be successfully detected with different computational approaches (Table 1). These approaches differ primarily in how much they rely on similarity to known viral sequences.
Table 1.
Summary of computational approaches to detect viral signals in sequence data
| Approach | Basis | Methods | Advantages | Limitations |
|---|---|---|---|---|
| Sequence similarity (SeqSim) | Infers homology from local sequence alignment | BLAST-based searches, HMM profiles from phage gene databases (e.g. pVOGs) | High accuracy when close reference sequences exist | Poor detection of novel or divergent phages; dependent on reference database completeness |
| Hybrid | Combines similarity-based and homology-independent genomic features | Sequence similarity and agnostic features (like GC/AT skew, gene density, transcription direction, and tRNA presence) | Improved accuracy and flexibility; detects fragmented or novel phages | Computationally more complex; requires integration of diverse signals |
| k-mer-based | Identifies composition patterns using short k-mer frequencies | k-mer frequency profiling, composition clustering | Alignment-free; efficient detection of rearranged or unknown sequences | Sensitive to k-mer size and sequence quality; still somewhat reference-biased |
| Machine learning and deep learning (ML/DL) | Learns patterns from data using statistical models | RFs, SVMs, CNNs, and LSTMs; often use k-mer or protein features | Can detect novel viruses by learning complex, non-obvious patterns | Requires large, high-quality training data; model tuning and validation are non-trivial |
Sequence-similarity-based approaches identify viral regions by homology to known phage proteins, e.g. via BLAST or hidden Markov models (HMMs) profiles from databases such as the prokaryotic virus orthologous groups (pVOGs) database. These tools often use sliding windows to detect phage-like regions enriched for phage genes. These methods are strongly dependent on database completeness and can miss divergent or unknown phages.
Hybrid approaches integrate classical sequence-similarity-based methods with sequence-agnostic approaches. The latter is based on homology-independent features, i.e. GC/AT skew, transcription directionality, gene length, or tRNA occurrence, to achieve higher accuracy and flexibility. Many recent hybrid tools incorporate machine or deep learning (ML/DL) to enhance the detection of fragmented and novel genomes.
K-mer-based methods classify sequences using the frequency of short nucleotide genomic substrings of length k (k-mers), equivalent to “DNA words”. This allows detection of viruses with limited similarity to known phages. These methods are alignment-free and can handle genome rearrangements but are sensitive to different k-mer sizes and input quality.
ML and DL approaches apply data-driven models to detect unknown or less well-characterized viruses, including ssDNA viruses [18]. ML/DL-models learn complex patterns that distinguish viral from microbial and plasmid sequences. Common ML-models are random forests (RF), a range of learning methods that build multiple decision trees and combine their output for predictions, and support vector machines (SVMs), which identify the optimal boundaries between different groups in the data. Widely used DL-models are convolutional neural networks (CNNs), which excel at identifying local data patterns (including k-mer frequencies), and long short-term memory (LSTM) networks, which specialize in capturing relationships in sequential data (including DNA and RNA). Some tools combine several ML/DL approaches and may integrate other approaches (hybrid or k-mer-based).
The evolution of phage prediction tools
Phage prediction tools emerged around the turn of the millennium to identify prophages in single bacterial genomes, at a time when available sequencing data were still scarce [19]. These early tools exploited simple composition-based signals [20], such as sudden shifts in GC content or dinucleotide relative abundance, to identify candidate prophage regions in host genomes [21–23]. However, their narrow scope made them unreliable for identifying low-abundance and cryptic prophages [19]. In the mid-2000s, a second wave of tools strongly enhanced prediction accuracy by integrating sequence composition analysis with homology-based methods. Phage_Finder [24], Prophage Finder [25], and Prophinder [26] integrated protein homology searches, tRNA detection, phage integration site prediction, and HMMs trained on viral genes. Despite these improved capabilities, their use was hampered by the limited diversity of available viral sequences and the technical expertise required for their implementation [19]. In response, tools such as the PHAST suite [19, 27, 28] implemented user-friendly web servers, which made in silico prophage prediction from closed single genomes accessible to many microbiologists and contributed to the suite’s high popularity. Moreover, PhiSpy [29] introduced hybrid approaches, the combination of sequence-similarity searches with sequence-agnostic features to improve the detection of novel and atypical prophages.
Yet, the advent of metagenomic sequencing completely changed the landscape of virus detection software, opening up unprecedented opportunities for phage discovery while exposing limitations in earlier tools designed for prophage detection in single, complete genomes. Metagenomic datasets are significantly larger and more taxonomically diverse, which requires more scalable virus detection methods that could also handle lowly covered and highly fragmented viral sequences. Early tools such as VirSorter [30] and MetaPhinder [31] extended detection to mixed-community data. VirSorter offered broad coverage with modular outputs but suffered from high false-positive rates [47]; and while MetaPhinder offered higher precision, it was constrained by its reliance on close similarity to known reference genomes, limiting its power to identify novel or mosaic phages [32].
The limitations of those initial tools, either prophage detectors constrained by known genome characteristics or early metagenomic tools restricted by reference similarity, boosted the development of a next generation of virus detection approaches starting from the latter half of the 2010s. These tools introduced conceptually distinct innovations to tackle key challenges:
Kraken [33] and Kraken2 [34] forewent alignment altogether and used hash-based k-mer mapping, which improved scalability for fractured metagenomic data.
VirFinder [35] (conventional ML) and DeepVirFinder [36] (CNNs) also use k-mer mapping but replace hand-tuned rules with ML and DL, respectively, further boosting sensitivity for novel or uncharacterized phages.
MARVEL [37] tackled low-abundance viruses by making predictions from metagenome-assembled genomes (MAGs) using RF classifiers.
VirMiner [38] combined ML-based classification with host prediction and gene annotation for deeper ecological insight.
PPR-Meta [39], VIBRANT [32], Seeker [40], and Virtifier [41] employed different DL architectures (CNNs or LSTMs) to learn viral signatures from raw or protein-level data, especially suitable for mosaic or rearranged genomes.
VirSorter2 [42] distinguishes itself through an ensemble ML approach that combines multiple phage-specific classifiers trained on viral and host genomic features, enabling robust and accurate predictions across a wide range of input types.
PhaMer [50], finally, introduces a paradigm shift in phage detection by applying Transformer-based large language models (LLMs) to protein-tokenized phage contigs, enabling the capture of long-range dependencies and hidden sequence patterns characteristic of compositionally atypical or cryptic phages.
PhaMer was integrated into PhaBOX2 [43], a comprehensive and user-friendly pipeline that bridges multiple tools in an end-to-end workflow. The use of such integrated pipelines is a recent trend in the field aimed at fostering reproducibility, scalability, and accessibility. In summary, cutting-edge phage prediction software is increasingly defined by its ability to detect divergent, low-abundance, or structurally complex phages, even from noisy, fragmented metagenomes. This is a significant shift away from earlier static, sequence-similarity-based tools. Today, modern approaches incorporate dynamic, data-driven ML/DL algorithms that have been developed to handle the scale and complexity of large metagenomic datasets.
A detailed step-by-step instruction guide
Our goal is to offer guidance on navigating the expanding landscape of phage analysis tools used for single genomes and metagenomes. To this end, we cover four essential steps (Fig. 3): phage detection, annotation, taxonomic classification, and further downstream analyses (including quality of predicted phages, phage life cycle, and host prediction). We conclude by presenting integrated pipelines that streamline phage analysis by automating most or all of these steps.
Figure 3.

Workflow for phage detection and analysis. This outline reflects the key steps of phage detection, annotation, classification, and further downstream analyses.
Step 1: Phage detection
General considerations
Most automated phage detection tools covered in this review operate on assembled contigs or genomes, e.g. in FASTA or annotated GBK formats. When working with assembled contigs, small genomic fragments can strongly hamper downstream analyses, including host prediction and viral core gene identification [44]. Thus, it is highly recommended to remove contigs <500 bp [35]. Several tools, including Kraken2 [34], VirMiner [38], and the pipelines PhaBOX2 [43] and ViWrap [45], all accept raw reads (FASTQ files). In particular, VirMiner [38] offers built-in modules for read pre-processing and classification, both of which are recommended. In the following, we present tools for phage discovery from metagenomes, then introduce prophage scanners for single genomes, and conclude with a brief section on filamentous phage detection. All prediction tools have been visually categorized (Fig. 4) and tabulated for data type, user expertise, and computational resource demands (Table 2, categories explained in Table 3).
Figure 4.

Classification of virus prediction tools by data type and user expertise. Tools are grouped based on their intended input (single genomes versus metagenomes) and anticipated user expertise. Approximate computational requirements are indicated in parentheses (L = low, M = medium, H = high; see also Table 3). Tools applicable to both single genomes and metagenomes are listed in both categories.
Table 2.
Summary table of phage prediction tools.
| Tool | Expertise | Data type | Resources | Approach | Use case |
|---|---|---|---|---|---|
| DBSCAN-SWA [57] | Expert: CLI [115] | Single genomes | Low | Hybrid | Rapid batch processing and prophage detection |
| DeepVirFinder [36] | Skilled: CLI [120] | Metagenomes | Medium | k-mer, DL | CNN-based tool for viral sequence detection from metagenomes |
| DEPhT [58] | Expert: CLI [116] | Single genomes | Medium | Hybrid | Rapid batch processing and prophage detection with boundary detection (focus on Mycobacterium) |
| Inovirus [18, 59] | Expert: CLI [60] | Single- and metagenomes | Medium | Hybrid, ML | ML predictor for filamentous phages from assembled genomes |
| Kraken2 [34] | Skilled: CLI [121] | Metagenomes | High | k-mer | Hash-based taxonomic k-mer sequence classification with high resource (RAM) demands |
| MARVEL [37] | Skilled: CLI [122] | Metagenomes | Medium | Hybrid, ML | RF-based recovery of tailed phage candidates from metagenomic bins; focus on Caudovirales |
| MetaPhinder [31] | Skilled: CLI [123] | Metagenomes | Medium | SeqSim | Phage identification from metagenomes via BLAST searches against custom phage DB; also detects filamentous phages |
| PhaBOX2 [43] | Novice: Web [132] Expert: CLI [133] | Single- and metagenomes | Low, Medium | Hybrid, DL | Integrated workflow for phage identification with lifestyle, host, and taxonomy prediction from contigs with visual outputs |
| PhageBoost [56] | Novice: Web [134] Skilled: CLI [135] | Single- and metagenomes | Low, Medium | ML | RF-based prophage detection with read quality control, assembly, and functional annotation |
| PhageTerm [95] | Skilled: CLI [117] | Single genomes | Medium | Hybrid | Accurate phage termini and packaging inference (requires reads) |
| PhaMer [50] | Skilled: CLI [136] | Single- and metagenomes | Medium | DL | Deep-language-model-based tool for phage detection from metagenomes |
| PHASTEST [52] | Novice: Web [118] | Single genomes | Low | Hybrid | Rapid web-based prophage detection and annotation |
| Phigaro [54] | Skilled: CLI [137] | Single- and metagenomes | Low | Hybrid | Scalable, high-throughput prophage prediction and annotation |
| PhiSpy [29] | Skilled: CLI [53] | Single genomes | Low | Hybrid, ML | RF-based prophage detection from annotated genomes, with boundary refinement |
| PPR-Meta [39] | Expert: CLI [124] | Metagenomes | Medium | DL | CNN-based phage and plasmid prediction |
| ProphET [55] | Skilled: CLI [119] | Single genomes | Medium | SeqSim | Prophage prediction using an auto-updating reference database, is best for known phages |
| Seeker [40] | Skilled: CLI [138] | Single- and metagenomes | Medium | DL | Alignment-free phage detection based on LSTM-models |
| VIBRANT [32] | Skilled: CLI [139] | Single- and metagenomes | Medium | Hybrid, DL | Automated DL tool trained on protein signatures for virus detection, annotation, and life cycle prediction |
| viralVerify [49] | Skilled: CLI [125] | Metagenomes | Medium | ML | Filters viral contigs from metagenomic assemblies; low precision on single-genome prophage scans |
| VirFinder [35] | Skilled: CLI [126] | Metagenomes | Low | k-mer, ML | Fast alignment-free approach to detect viral sequences in metagenomes; biased to known phages |
| VirMiner [38] | Novice: Web [127], Skilled: CLI [128] | Metagenomes | Low, Medium | ML | Highly sensitive RF model for virus and host predictions with functional annotation |
| VirSorter [30] | Skilled: CLI [140] | Single- and metagenomes | Medium | Hybrid | De novo hybrid virus detection from metagenomes with custom probabilistic models |
| VirSorter2 [42] | Expert: CLI [141] | Single- and metagenomes | High | Hybrid, ML, DL | Highly modular ML/DL hybrid pipeline to detect DNA and RNA viruses in complex viromes |
| Virtifier [41] (Seq2Vec) | Skilled: CLI [129] | Metagenomes | Medium | DL | Viral contig identification from metagenomes based on LSTM classifiers; also, for contigs <500bp |
| ViWrap [45] | Expert: CLI [130] | Metagenomes | High | Hybrid, ML | Modular integrated workflow for phage identification, binning, classification, and host prediction |
| What the Phage (WtP) [112] | Skilled: CLI [131] | Metagenomes | High | Hybrid, ML, DL | Scalable phage identification and analysis pipeline; includes ML/DL |
Table 3.
Explanation guide of expected user expertise and computational resource requirements for different viral detection tools (as referred to in Table 2, Fig. 4).
| Level | Definition | Explanations and examples |
|---|---|---|
| Required user expertise | ||
| Novice | Minimal to basic bioinformatics exposure; intuitive web interface or GUI | Web/GUI: point-and-click usage, sequence selection, or upload |
| Skilled | Proficient with CLI (command-line interface) | CLI basics in Bash, GitHub, Conda, Python, or R |
| Expert | Experienced with automation, tool chaining, and high-performance computing (HPC) | HPC usage, Snakemake, Docker, workflow debugging |
| Required computational resources | ||
| Low | Web, Standard computer (≤8 GB RAM, 1 CPU) |
Ideal for casual, exploratory, or classroom use |
| Medium | Moderate workstation (8–32 GB RAM, multi-core CPU, moderate storage space) |
Suitable for most genome and medium-sized metagenomic datasets |
| High | Requires server or HPC resources (>32 GB RAM; multiple threads for parallelization; large storage space) |
For demanding high-throughput projects and complex workflows |
Tool guide for analyzing metagenomes
We begin our tool guide with metagenomic phage detection tools, as this branch has become the fastest-growing field for viral bioinformatics. Tool performance can vary widely with input quality, contig fragmentation, and the viral/bacterial reference databases used [11]. To make informed decisions, users must rely on context-aware benchmarks [46–48]. Thorough benchmarks report the following standard metrics for tool cross-comparison:
precision (fraction of predicted viral contigs that are truly viral),
recall (fraction of all true viral contigs that are correctly recovered), and the
F1 score (the balanced, harmonic mean of precision and recall).
Figure 5 summarizes results from the comprehensive benchmark "Gauge your phage" [47], which evaluated 10 widely used metagenomic virus detection tools on artificial contigs created from RefSeq genomes, previously sequenced mock communities, and randomly shuffled sequences. For RefSeq-derived sequences, the top performers were VIBRANT [32], VirSorter2 [42], and PPR-Meta [39], in that ranked order, with F1 scores higher than 90% (Fig. 5). These skilled-to-expert-level tools have high precision and recall at optimal conditions (for non-fragmented contigs up to 15 kbp length). VirSorter2 is especially well-suited to dealing with intricate viromes; however, its high flexibility trades off against longer runtimes compared to the other metagenomics tools. The less complex VIBRANT had much shorter run times and shows higher precision than VirSorter2 but had lower recall success. PPR-Meta is even faster than VIBRANT or VirSorter2 based on its resource-optimized DL classifiers; however, it also produced more false positives than both of the aforementioned tools.
Figure 5.

Benchmark performance of virus prediction tools on metagenomic datasets. Bar plots show the F1 score, precision, and recall of 10 metagenomic viral prediction tools, evaluated on either RefSeq-derived sequences (red, upper bar) or synthetic mock community data (blue, lower bar). Each bar represents the average performance for the respective tool and dataset. Tools are sorted by decreasing F1 scores on the RefSeq dataset for clarity. Performance metrics adapted from [47] under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
In contrast, the more complex mock community dataset generally favored k-mer-based tools (Fig. 5). Here, the memory-intensive Kraken2 [34] had, by far, the highest F1-score across tools, which was carried by both excellent precision and recall. DeepVirFinder [36] ranked second due to a lower precision, but offers a highly resource-efficient alternative to Kraken2. As a consequence, k-mer-based methods are especially powerful for detecting phages in highly fragmented, contaminated, or low-abundance metagenomic data, such as environmental samples or ancient DNA. Their alignment-free nature allows for rapid detection of sequence composition patterns, even in the absence of close homologs, making them particularly effective when reference databases are incomplete or when sequence similarity is unreliable [33–36, 47].
Outside the top tier, VirSorter [30] delivered only moderate scores, especially on the more challenging mock-community data, and was clearly outperformed by its successor, VirSorter2. Likewise, VirFinder [35] was “easily handled” by its successor DeepVirFinder, whose convolutional-network model benefits from a much larger and more diverse viral training set. At the lower end of the spectrum, viralVerify (a module of MetaWRAP [49]) and Seeker [40] generally struggled with most benchmarks. Seeker also failed to represent both alpha- and beta-diversity in mock virome data, meaning it underestimated within-sample viral richness and between-sample community differences [47]. This makes Seeker unsuitable as a primary tool for viral ecology studies focused on diversity patterns or compositional structure. Among widely used tools outside of the scope of the benchmark, we want to single out two: MARVEL [37] and PhaMer [50]. MARVEL is a high-throughput tool intended for detecting free, tailed Caudovirales phages from metagenomic bins and is especially suitable for low-abundance and fragmented sequences when binning is feasible. In its original validation [37], MARVEL outperformed VirSorter and VirFinder in recall while maintaining similarly high precision, particularly on simulated MAGs. It is less suitable for detecting viruses from highly fragmented, unbinned contigs. PhaMer [50] utilizes a LLM for classifying phage contigs and is particularly effective at detecting cryptic and compositionally atypical phages. It achieved an F1-score of 0.93 on RefSeq-derived contigs and outperformed VirSorter, (Deep)VirFinder, Seeker, and PPR-Meta on mock metagenomic datasets [50]. While PhaMer requires high computational resources, this limitation is mitigated by its integration into the online workflow PhaBOX2 [43]. As a final note, averaging results from multiple prediction tools does not always improve accuracy, as many tools share overlapping reference biases and interdependent training data [46]. Therefore, tool outputs should be interpreted independently. Moreover, other factors, such as tool interface and computational resource demands, can be equally decisive (Tables 2 and 3, and Fig. 4) and should guide tool choice based on the dataset’s complexity and the user’s expertise.
Tool guide for prophage detection in single genomes
Compared to the metagenome-oriented tools described in the previous section, dedicated single-genome scanners offer higher efficiency and accuracy for identifying prophages in individual bacterial genomes. Figure 6 summarizes benchmark results from the Philympics 2021 study [51] and is supplemented with performance data from the PHAST suite [52]. Among all evaluated tools, PHASTEST achieved the highest overall performance across all tested metrics, though it was run on a different dataset (Casjens-54) [20]. It is highly recommended for users who prefer GUIs and provides quick but sensitive open-reading frame (ORF) annotation. Analyses are typically complete within minutes per genome, with interactive visualizations of prophage locations in the output [52]. Within the Philympics benchmark, the updated version of PhiSpy [53] led the field. It offers robust precision and recall without relying on static reference databases. PhiSpy features RF classifiers on annotated genomes, includes refined prophage boundary detection, and is especially well-suited for skilled users who value flexibility and parameter control. Among other high-performing tools, Phigaro [54] offers robust throughput by combining Prodigal gene prediction with HMM-based pVOG annotation. ProphET [55] also performed well and is notable for including a self-updating reference database. At the lower end of the performance spectrum, PhageBoost [56] and the batch-processing tool DBSCAN-SWA [57] showed significant drops in precision and boundary resolution. Of note, metagenome-focused virus predictors performed poorly on these single-genome benchmarks, showing lower precision, longer runtimes, and poor phage boundary resolution. This highlights the importance of using tools designed for single-genome prophage detection. While not part of the performance benchmark, DEPhT [58] deserves mention for its precise prophage boundary detection. As always, tool choice should be guided by the specific research question and available computational resources (Tables 2 and 3, and Fig. 4).
Figure 6.

Benchmark performance of tools used for prophage detection in single genomes. Horizontal bar plots compare F1 score, precision, and recall (panels from left to right) across 13 tools. Orange, data adapted from the PHAST suite benchmark [52], licensed under a Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/); red, data adapted from the Philympics 2021 benchmark [51], licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Tool guide for detecting filamentous phages (Inoviridae)
Filamentous phages (Inoviridae) are characterized by rod-shaped or long proteinaceous filaments with a circular ssDNA genome of ~5–15 kb that can establish chronic infections. Because of their unique and diverse gene content, most computational approaches are inefficient at detecting their sequences from whole-genome shotgun sequencing data [18, 59]. The ML tool Inovirus [60] implements a two-step pipeline specifically designed for this purpose. In the first step, the program Inovirus_detector scans for conserved Inoviridae marker proteins (especially pI-like proteins) using HMMs. In the second step, an RF classifier detects other characteristic Inoviridae features, such as small structural proteins. These predictions are then passed to the Inovirus_classifier module, which refines the taxonomic ranking of candidate sequences within Inoviridae, based on conserved protein clusters. This approach enables the automated discovery and taxonomic classification of inoviruses. The authors reported high recall and precision values of 92.5% and 99.8%, respectively, on a manually curated reference set [18, 60]. Of note, Virsorter2 [42] is also capable of identifying Inoviridae, as it includes pI-like proteins in its viral marker set.
Once high-confidence phage regions have been identified, users typically proceed to annotate and characterize the predicted viral genes. We describe this step in the following section.
Step 2: Phage gene prediction and annotation
Phage gene prediction
While tools such as GLIMMER [61], GeneMarkS [62], and Prodigal [63] were originally designed for application to bacterial genome annotation, they are frequently applied to phage genomes as well. However, their performance is limited by the compact and atypical architecture of phage genomes, which typically feature more overlapping, short, and embedded genes [64–66]. To address this, the graph-based PHANOTATE [67] was developed specifically for the compact nature of phage genomes. A benchmark with 2133 complete phage genomes showed that PHANOTATE predicted more genes than GLIMMER, GeneMarkS, and Prodigal and had an ~82% agreement with genes predicted by at least one of these tools [67]. Importantly, ~6% of its predictions were unique but mostly evolutionarily conserved. This suggests that PHANOTATE can uncover functional proteins that are not detected with standard approaches. As a best-practice recommendation, the outputs from various gene prediction tools should be compared. To this end, the comparative platforms Phage Commander [68] and MultiPhATE2 [69] assess consensus calls, visualize overlaps, and help select the most plausible gene models.
Functional gene annotation
Unlike in cellular organisms, prokaryotic viruses lack a universal common ancestor, and their proteins exhibit limited conservation levels. Therefore, only a minority of phage genes have known functions, which hampers the functional annotation of newly detected phage genes. To address this, Pharokka [70] integrates the prokaryotic virus remote homologous groups (PHROG) database [71], which clusters viral proteins into orthologous groups based on remote homology and manual curation. Paired with the PHANOTATE gene caller, Pharokka provides appropriate prediction and meaningful annotation for newly identified phages. For users who prefer web-based tools, PhANNs [72] offers an artificial neural network (ANN) ensemble to rapidly classify proteins into 10 structural classes. Finally, highly fragmented metagenomic assemblies present a significant challenge for standard gene callers. In this context, Balrog [73], which employs temporal CNNs, demonstrates strong performance by significantly reducing the number of hypothetical gene predictions. It effectively retains well-conserved genes while removing spurious ORFs, which improves confidence in both gene prediction and downstream annotation.
Step 3: Taxonomic classification
Historically, viruses were primarily classified based on phenotypes, e.g. traits such as tail morphology or capsid shape. Because such morphocentric groupings often lacked monophyly, the international committee on taxonomy of viruses (ICTV) [74] redefined viral taxonomy to be based on genomic and proteomic information. At higher taxonomic ranks (family, order, and class), classification is now done based on viral hallmark genes and whole-proteome comparisons. This approach is implemented by several programs, comprising VICTOR [75] and ViPTree [76], which both conduct whole-proteome phylogenetic inference; vConTACT2 [77], which groups viruses in terms of common protein clusters; GRAViTy [78], which integrates genomic architecture and protein profile HMMs; and VirClust [79], which uses adaptive homology models for proteins to identify taxonomic clusters across taxonomic levels without sacrificing sensitivity and specificity.
At lower taxonomic levels (genus and species), whole-genome or individual-gene alignments remain essential. However, phages frequently lack a common core genome due to high recombination frequencies or genomic mosaicism [80]. This complicates traditional phylogenetic classification. Clustering based on intergenomic nucleotide identity can circumvent this limitation. For example, VIRIDIC [81] clusters phages based on user-defined similarity levels, while ClassiPhage and ClassiPhages 2.0 [82, 83] utilize HMMs and ANNs to classify phages by conserved features.
However, taxonomic classification of novel or highly divergent phages remains challenging. Most lack sufficient similarity to reference sequences, and their modular genome structures impede the application of conventional classification methods. To better reflect evolutionary relatedness in these cases, several studies have adopted genome-wide similarity metrics as complementary approaches: average nucleotide identity (ANI) [84] and weighted gene repertoire relatedness (wGRR) [80].
ANI calculates the average nucleotide identity of orthologous regions of genes between two genomes. It is achieved by fragmenting genomes, matching homologous regions, and averaging nucleotide similarity. ANI accurately distinguishes phages at the genus or species levels but is less effective for highly recombinant or mosaic genomes, where alignable regions can be sparse.
In contrast to ANI, which relies on nucleotide-level similarity, wGRR establishes similarity at the protein level through the detection of reciprocal best hits between genomes and their weighting based on both sequence identity and alignment coverage. This protein-centric approach enables the estimation of evolutionary relatedness to be robust even when nucleotide homology is fragmented or low. While not a taxonomic method per se, wGRR is best applied for clustering phages based on shared gene content and evolutionary patterns, particularly if core genes are lacking or disrupted by recombination.
Step 4: Further downstream analyses
Following detection and taxonomic classification, different types of downstream analysis can provide key functional, ecological, and evolutionary information. These encompass sequence quality analysis, life cycle prediction, and the prediction of potential hosts, especially essential for phage-therapeutic design and viral ecology studies. We recommend three main categories: (i) quality assessment, and prediction of (ii) phage life cycle, and (iii) phage host. Other downstream analyses beyond this guide are core gene prediction and gene transfer [80, 85], viral density estimation (VIRMOTIF [86]), or functional potential prediction of viral communities [87]. Instead of a comprehensive list, we prefer to provide the beginner with a helpful overview of frequently used tools and best practices.
Quality assessment of phage genomes
Assuring high quality of novel genome assemblies is crucial for reliable annotation, taxonomic classification, and ecological interpretation. This is because incomplete or contaminated genomes can obscure significant viral functions or lead to incorrect taxonomic classification. To circumvent these issues, CheckV [44] is the most suitable software for precise assessment of host contamination and genome completeness. It uses reference-based scoring for known phages, HMM-based inference for novel viruses, GC content, and terminal repeat detection to determine completeness level and contamination status. It reports completeness values as a percentage of complete viral genome for each contig. In addition, other tools also measure viral completeness with different approaches: VIBRANT [32] scans for characteristic viral proteins; viralComplete [88] employs reference-length and content; PHASTEST [52] offers ORF-level completeness scores; and Phables [89] reconstructs fragmented metagenomic assemblies into genomes using flow-based graph modeling, a unique feature among existing tools. Completeness estimates are reference-coverage dependent and can miss novel genomes. Therefore, we recommend visually inspecting all datasets.
Life cycle prediction
Phage lifestyle prediction is a reflection of their ecological roles and therapeutic potential. However, most of the current methods predict lysogeny based on conserved markers or a positive hit to known integrases, a characteristic that novel viruses might not have. Furthermore, if only genome structure is considered, it is impossible to determine whether a prophage is biologically active. To ensure strong inferences of phage lifestyles, genomic predictions should be complemented by contextual data, such as gene expression or culture-based strategies. For automated prediction, the tool landscape offers a variety of different approaches, e.g. PHACTS [90] or BACPHLIP [91]. PHACTS employs RF classification to cross-match phage genomes with a reference database of phages whose known life cycles have been characterized, and BACPHLIP [91] distinguishes between temperate and virulent phages according to their conserved protein domains. Lytic or temperate life cycles can further be predicted for highly fragmented phages derived from short-contig assemblies (PhaTYP [92]) or metaviromes (DeePhage [93] or PhagePred [94]). Also, PhageTerm [95] can be employed to predict the packaging mechanisms when both sequencing reads and an assembly are available.
Host prediction
The accurate inference of a phage’s host range is crucial for any meaningful ecological interpretation, but also for technical considerations such as microbiome engineering and assessing therapeutic potential for phage therapy. Host ranges are traditionally assessed in the laboratory, which is both time-consuming and restricted to culturable bacteria. In silico host prediction is therefore now critical, especially for large-scale metagenomic data for which cultured hosts do not exist for viral sequences. In silico host prediction methods fall broadly into two categories.
Host prediction: Database-driven matching
These methods can be further classified into repositories of documented or predicted phage-host interactions (PHI-base [96], ViralHostRangeDB [97], and MVP [98]) and predictive computational approaches that infer a host from an input phage sequence. For the latter host prediction tools, they must strike a good balance between recall and false discovery rate (FDR). Here, PHISDetector [99] and VirHostMatcher-Net [100] show favorable recall values for the task, but they also reported unfavorably high FDRs of >10%. On the other hand, the supervised tool iPHoP [101] gives low FDRs coupled with high recall values for known and even novel phages at the genus level. Technically, it employs an automated approach that integrates database comparison with genome pattern analysis to simplify host prediction. Phage hosts can also be inferred by aligning the query phage with a database of known phage-host pairs, e.g. RaFAH [102], or by analysis of sequence alignment patterns, which can reveal prophage or CRISPR integration (using SpacePHARER [103]).
Host prediction: Alignment-free sequence feature models
These approaches analyze oligonucleotide usage patterns or trained sequence features to infer host identity without alignments. These include a collection of different tools, which determine the host genome k-mer frequencies relative to the phage genome, e.g. WIsH [104], PHIST [105], DeepHost [106], HostPhinder [, 107], and PHP [108]. Among them, the Prokaryotic virus Host Predictor (PHP) [108] is the most accurate at the genus level. It excels in situations where alignment-based methods fail and it features flexible host prediction from fragmented viral genomes and is particularly effective in predicting hosts from challenging metaviromes. Two accurate alternatives to PHP available for metaviromes are HoPhage [109], which features both a Markov-chain model and a DL-method for host genus prediction, and CHERRY [110], which combines proteome- and genome-derived feature graphs. To achieve optimal results, researchers are advised to cross-validate predictions among complementary tools and include ecological metadata where available.
Integrated pipelines
Phage discovery and downstream analysis is a step-by-step approach that includes the successive or parallel employment of different dedicated tools. In response, several groups have designed integrated virus analysis pipelines that bundle tools into workflows, comprising all or most of the steps outlined in this review (Table 3). Here we provide several examples that demonstrate the range of approaches currently available. PhageCompass (https://phagecompass.ku.dk) is a web application built by an international collaboration for translational phage therapy. It integrates several evaluation tools (including PhageBoost [56]) into a structured and easily accessible web interface, supporting open access and educational outreach. MetaPhage [111] is a Nextflow-based modular pipeline for expert users. It facilitates virus mining from metagenomic data through a multi-step process including read classification, assembly, and virus prediction through an ensemble of tools (including Phigaro, VIBRANT, VirFinder, and VirSorter). The pipeline “What the phage” (WtP) [112] is a reproducible and scalable NextFlow workflow for expert users comprising multiple phage detection tools (including VirFinder, PPR-Meta, VirSorter1/2, Seeker, MetaPhinder, DeepVirFinder, and VIBRANT) with subsequent virus annotation and classification (using Phigaro) and offers user-friendly summaries in chart and table format. Finally, PhaBOX2 [43] is suitable for both single genomes and metagenomes and offers a highly accessible, web server-based pipeline. It takes contigs/sequences in FASTA format and runs virus identification (PhaMer [50]), taxonomic classification (PhaGCN [113]), host and lifestyle prediction (CHERRY/HostG [110] and PhaTYP [92]), contamination and provirus integration screening, vOTU grouping, marker gene-based phylogenetic tree inference, and viral protein annotation using recent databases via ICTV 2024. Expert users can run PhaBOX2 locally using a command-line interface (CLI). In summary, workflows simplify the manual overhead of linking the outputs of multiple tools and produce formatted outputs that can aid reproducibility and interpretation, critical assets in large-scale virome studies and translational applications such as phage therapy.
Conclusion
The advent of new computational ML and DL methods has significantly elevated the speed, accuracy, and sensitivity of virus prediction. Nevertheless, significant challenges persist, such as the identification and taxonomic placement of rare or uncommon phages or the discrimination of closely related viral genomes in highly complex metagenomic data. With the surging number of new bioinformatic phage tools and acknowledging that no single tool represents the optimal global approach for tackling all research questions, scientists increasingly must pair analytical approaches to their specific questions. This guide seeks to help that process by supporting researchers to make informed, capable decisions in aid of their goals and abilities. As the field of viral signal detection in large metagenomic datasets continues to evolve rapidly, our review is a mere snapshot of this ongoing development. We do hope, though, that our historical treatment of the various underlying algorithms will enable users to better grasp and categorize new tools as they emerge. It is essential to harness the full potential of the latest tools, and so we hope that our guide will support phage explorers in their quest to discover novel phage elements from (meta)genomic datasets. In the future, tool design will likely integrate ecological background, metadata standards, and gene-sharing network approaches. For instance, clustering algorithms based on gene sharing, such as vConTACT2 [77], effectively group new viruses irrespective of their taxonomy. Concurrently, initiatives such as MIUViG [114] are setting the necessary metadata standards to improve reproducibility in viral ecology research. Finally, advanced host prediction programs and ML/DL-models that have been trained on ecological or temporal patterns will likely bridge the gap between detection and interpretation.
Key Points
The number and diversity of computational tools for predicting prokaryotic viruses from single genomes and metagenomic data have rapidly expanded over the past decade, reflecting both technical innovation and growing interest in viral applications like phage therapy.
Without claiming to be exhaustive, a wide range of state-of-the-art phage prediction tools are discussed and critically evaluated.
A step-by-step guide is proposed that covers and critically assesses tools for phage prediction, gene annotation, taxonomic classification, and more.
Since user input data can vary and sequence databases differ, it’s essential to evaluate how well each tool works under different scenarios using reliable statistical measures and consistent benchmarks, which are discussed for both metagenomes and single genomes.
In conclusion, in silico phage prediction provides valuable, testable hypotheses about phage biology and taxonomy, integration sites, and lifestyle traits, which all should be validated experimentally wherever possible.
Contributor Information
Carolin Charlotte Wendling, Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zürich, Universitätstrasse 16, 8092 Zürich, Switzerland; Max von Pettenkofer-Institut, Ludwig-Maximilians-Universität (LMU), Pettenkoferstraße 9a, 80366 München, Germany.
Marie Vasse, Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zürich, Universitätstrasse 16, 8092 Zürich, Switzerland; CNRS UMR 5164, ImmunoConcept, Université de Bordeaux, Site de Carreire, Bâtiment BBS, 2 Rue Dr Hoffmann Martinot, 33076 Bordeaux Cedex, France.
Sébastien Wielgoss, Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zürich, Universitätstrasse 16, 8092 Zürich, Switzerland.
Author contributions
C.C.W., M.V., and S.W. conceived and drafted the original manuscript. All authors have contributed to revising and editing of later versions of the manuscript. S.W. handled manuscript submission, editorial correspondence, and coordinated revisions. C.C.W. secured funding.
Conflict of interest: None declared.
Funding
This work was supported by the Swiss National Science Foundation (grant number PZ00P3_179743 to C.C.W.). We thank Anne Kupczok for providing helpful comments on the draft version of this manuscript. We thank Dr. Willem van Schaik for sharing raw data from [47].
Data availability
No new data were generated or analysed in support of this research.
References
- 1. Coutinho FH, Silveira CB, Gregoracci GB. et al. Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans. Nat Commun 2017;8:15955. 10.1038/ncomms15955 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Li R, Wang Y, Hu H. et al. Metagenomic analysis reveals unexplored diversity of archaeal virome in the human gut. Nat Commun 2022;13:7978. 10.1038/s41467-022-35735-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Paez-Espino D, Eloe-Fadrosh EA, Pavlopoulos GA. et al. Uncovering Earth’s virome. Nature 2016;536:425–30. 10.1038/nature19094 [DOI] [PubMed] [Google Scholar]
- 4. Paez-Espino D, Zhou J, Roux S. et al. Diversity, evolution, and classification of virophages uncovered through global metagenomics. Microbiome 2019;7:157. 10.1186/s40168-019-0768-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Roux S, Hallam SJ, Woyke T. et al. Viral dark matter and virus-host interactions resolved from publicly available microbial genomes. Elife 2015;4:4. 10.7554/eLife.08490 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Vik DR, Roux S, Brum JR. et al. Putative archaeal viruses from the mesopelagic ocean. PeerJ 2017;5:e3428. 10.7717/peerj.3428 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Murray CJL, Ikuta KS, Sharara F. et al. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet 2022;399:629–55. 10.1016/S0140-6736(21)02724-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Gould SJ. The evolution of life on the earth. Sci Am 1994;271:84–91. 10.1038/scientificamerican1094-84 [DOI] [PubMed] [Google Scholar]
- 9. Wilhelm SW, Suttle CA. Viruses and nutrient cycles in the sea - viruses play critical roles in the structure and function of aquatic food webs. Bioscience 1999;49:781–8. 10.2307/1313569 [DOI] [Google Scholar]
- 10. Wendling CC. Prophage mediated control of higher order interactions - insights from multi-level approaches. Curr Opin Syst Biol 2023;35:100469. 10.1016/j.coisb.2023.100469 [DOI] [Google Scholar]
- 11. Andrade-Martínez JS, Valera LCC, Cárdenas LAC. et al. Computational tools for the analysis of uncultivated phage genomes. Microbiol Mol Biol Rev 2022;86:e00004–21. 10.1128/mmbr.00004-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Dion MB, Oechslin F, Moineau S. Phage diversity, genomics and phylogeny. Nat Rev Microbiol 2020;18:125–38. 10.1038/s41579-019-0311-5 [DOI] [PubMed] [Google Scholar]
- 13. Casjens S, Palmer N, van Vugt R. et al. A bacterial genome in flux: the twelve linear and nine circular extrachromosomal DNAs in an infectious isolate of the Lyme disease spirochete Borrelia burgdorferi. Mol Microbiol 2000;35:490–516. 10.1046/j.1365-2958.2000.01698.x [DOI] [PubMed] [Google Scholar]
- 14. Asadulghani M, Ogura Y, Ooka T. et al. The defective prophage pool of Escherichia coli O157: prophage-prophage interactions potentiate horizontal transfer of virulence determinants. PLoS Pathog 2009;5:e1000408. 10.1371/journal.ppat.1000408 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Lawrence JG, Hendrix RW, Casjens S. Where are the pseudogenes in bacterial genomes? Trends Microbiol 2001;9:535–40. 10.1016/S0966-842X(01)02198-9 [DOI] [PubMed] [Google Scholar]
- 16. Sicard A, Michalakis Y, Gutierrez S. et al. The strange lifestyle of multipartite viruses. PLoS Pathog 2016;12:e1005819. 10.1371/journal.ppat.1005819 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Turgeon N, Toulouse MJ, Martel B et al. Comparison of five bacteriophages as models for viral aerosol studies. Appl Environ Microb 2014;80:4242-50. 10.1128/AEM.00767-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Roux S, Krupovic M, Daly RA. et al. Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes. Nat Microbiol 2019;4:1895–906. 10.1038/s41564-019-0510-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Arndt D, Marcu A, Liang YJ. et al. PHAST, PHASTER and PHASTEST: tools for finding prophage in bacterial genomes. Brief Bioinform 2019;20:1560–7. 10.1093/bib/bbx121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Casjens S. Prophages and bacterial genomics: what have we learned so far? Mol Microbiol 2003;49:277–300. 10.1046/j.1365-2958.2003.03580.x [DOI] [PubMed] [Google Scholar]
- 21. Karlin S. Global dinucleotide signatures and analysis of genomic heterogeneity. Curr Opin Microbiol 1998;1:598–610. 10.1016/S1369-5274(98)80095-7 [DOI] [PubMed] [Google Scholar]
- 22. Srividhya KV, Alaguraj V, Poornima G. et al. Identification of prophages in bacterial genomes by dinucleotide relative abundance difference. PloS One 2007;2:e1193. 10.1371/journal.pone.0001193 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Nicolas P, Bize L, Muri F. et al. Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models. Nucleic Acids Res 2002;3:1418–26. 10.1093/nar/30.6.1418 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Fouts DE. Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res 2006;34:5839–51. 10.1093/nar/gkl732 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Bose M, Barber RD. Prophage finder: a prophage loci prediction tool for prokaryotic genome sequences. In Silico Biol 2006;6:223–7. 10.3233/ISB-00235 [DOI] [PubMed] [Google Scholar]
- 26. Lima-Mendez G, Van Helden J, Toussaint A. et al. Prophinder: a computational tool for prophage prediction in prokaryotic genomes. Bioinformatics 2008;24:863–5. 10.1093/bioinformatics/btn043 [DOI] [PubMed] [Google Scholar]
- 27. Arndt D, Grant JR, Marcu A. et al. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res 2016;44:W16–21. 10.1093/nar/gkw387 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Zhou Y, Liang Y, Lynch KH. et al. PHAST: a fast phage search tool. Nucleic Acids Res 2011;39:W347–52. 10.1093/nar/gkr485 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res 2012;40:e126. 10.1093/nar/gks406 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Roux S, Enault F, Hurwitz BL. et al. VirSorter: mining viral signal from microbial genomic data. PeerJ 2015;3:e985. 10.7717/peerj.985 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Jurtz VI, Villarroel J, Lund O. et al. MetaPhinder-identifying bacteriophage sequences in metagenomic data sets. PloS One 2016;11:e0163111. 10.1371/journal.pone.0163111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Kieft K, Zhou Z, Anantharaman K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 2020;8:90. 10.1186/s40168-020-00867-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014;15:R46. 10.1186/gb-2014-15-3-r46 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019;20:257. 10.1186/s13059-019-1891-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Ren J, Ahlgren NA, Lu YY. et al. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome 2017;5:69. 10.1186/s40168-017-0283-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Ren J, Song K, Deng C. et al. Identifying viruses from metagenomic data using deep learning. Quant Biol 2020;8:64–77. 10.1007/s40484-019-0187-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Amgarten D, Braga LPP, da Silva AM. et al. MARVEL, a tool for prediction of bacteriophage sequences in metagenomic bins. Front Genet 2018;9:304. 10.3389/fgene.2018.00304 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Zheng T, Li J, Ni Y. et al. Mining, analyzing, and integrating viral signals from metagenomic data. Microbiome 2019;7:42. 10.1186/s40168-019-0657-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Fang Z, Tan J, Wu S. et al. PPR-meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning. Gigascience 2019;8:8. 10.1093/gigascience/giz066 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Auslander N, Gussow AB, Benler S. et al. Seeker: alignment-free identification of bacteriophage genomes by deep learning. Nucleic Acids Res 2020;48:e121. 10.1093/nar/gkaa856 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Miao Y, Liu F, Hou T. et al. Virtifier: a deep learning-based identifier for viral sequences from metagenomes. Bioinformatics 2022;38:1216–22. 10.1093/bioinformatics/btab845 [DOI] [PubMed] [Google Scholar]
- 42. Guo J, Bolduc B, Zayed AA. et al. VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome 2021;9:37. 10.1186/s40168-020-00990-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Shang JY, Peng C, Liao HR. et al. PhaBOX: a web server for identifying and characterizing phage contigs in metagenomic data. Bioinform Adv 2023;3:3. 10.1093/bioadv/vbad101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Nayfach S, Camargo AP, Schulz F. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat Biotechnol 2021;39:578–85. 10.1038/s41587-020-00774-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Zhou ZC, Martin C, Kosmopoulos JC. et al. ViWrap: a modular pipeline to identify, bin, classify, and predict viral-host relationships for viruses from metagenomes. Imeta 2023;2:2. 10.1002/imt2.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Hegarty B, Riddell J, Bastien E. et al. Benchmarking informatics approaches for virus discovery: caution is needed when combining identification methods. mSystems 2024;9:9. 10.1128/msystems.01105-23 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Ho SFS, Wheeler NE, Millard AD. et al. Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data. Microbiome 2023;11:84. 10.1186/s40168-023-01533-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Wu LY, Wijesekara Y, Piedade GJ. et al. Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes. Genome Biol 2024;25:97. 10.1186/s13059-024-03236-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Uritskiy GV, DiRuggiero J, Taylor J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 2018;6:158. 10.1186/s40168-018-0541-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Shang J, Tang X, Guo R. et al. Accurate identification of bacteriophages from metagenomic data using transformer. Brief Bioinform 2022;23:bbac258. 10.1093/bib/bbac258 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Roach MJ, McNair K, Michalczyk M. et al. Philympics 2021: prophage predictions perplex programs. F1000Research 2022;10:758. 10.12688/f1000research.54449.2 [DOI] [Google Scholar]
- 52. Wishart DS, Han S, Saha S. et al. PHASTEST: faster than PHASTER, better than PHAST. Nucleic Acids Res 2023;51:W443–50. 10.1093/nar/gkad382 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. McNair K, Decewicz P, Daniel S. et al. PhiSpy (version 3.4.5). https://github.com/linsalrob/PhiSpy.
- 54. Starikova EV, Tikhonova PO, Prianichnikov NA. et al. Phigaro: high-throughput prophage sequence annotation. Bioinformatics 2020;36:3882–4. 10.1093/bioinformatics/btaa250 [DOI] [PubMed] [Google Scholar]
- 55. Reis-Cunha JL, Bartholomeu DC, Manson AL. et al. ProphET, prophage estimation tool: a stand-alone prophage sequence prediction tool with self-updating reference database. PloS One 2019;14:e0223364. 10.1371/journal.pone.0223364 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Sirén K, Millard A, Petersen B. et al. Rapid discovery of novel prophages using biological feature engineering and machine learning. NAR Genom Bioinform 2021;3:lqaa109. 10.1093/nargab/lqaa109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Gan R, Zhou F, Si Y. et al. DBSCAN-SWA: an integrated tool for rapid prophage detection and annotation. Front Genet 2022;13:885048. 10.3389/fgene.2022.885048 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Gauthier CH, Abad L, Venbakkam AK. et al. DEPhT: a novel approach for efficient prophage discovery and precise extraction. Nucleic Acids Res 2022;50:e75. 10.1093/nar/gkac273 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Roux S, Krupovic M, Daly RA. et al. Author correction: cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes. Nat Microbiol 2020;5:527. 10.1038/s41564-020-0681-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Roux S. Inovirus. https://github.com/simroux/Inovirus [accessed 25 August 2025].
- 61. Delcher AL, Bratke KA, Powers EC. et al. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 2007;23:673–9. 10.1093/bioinformatics/btm009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Lomsadze A, Gemayel K, Tang S. et al. Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes. Genome Res 2018;28:1079–89. 10.1101/gr.230615.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Hyatt D, Chen GL, Locascio PF. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010;11:119. 10.1186/1471-2105-11-119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Cahill J, Rajaure M, O’Leary C. et al. Genetic analysis of the lambda Spanins Rz and Rz1: identification of functional domains. G3 (Bethesda) 2017;7:741–53. 10.1534/g3.116.037192 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Kang CH, Shin Y, Jang S. et al. Characterization of Vibrio parahaemolyticus isolated from oysters in Korea: resistance to various antibiotics and prevalence of virulence genes. Mar Pollut Bull 2017;118:261–6. 10.1016/j.marpolbul.2017.02.070 [DOI] [PubMed] [Google Scholar]
- 66. Kang HS, McNair K, Cuevas DA. et al. Prophage genomics reveals patterns in phage genome organization and replication. bioRxiv 2017. 10.1101/114819 preprint: not peer reviewed [DOI] [Google Scholar]
- 67. McNair K, Zhou C, Dinsdale EA. et al. PHANOTATE: a novel approach to gene identification in phage genomes. Bioinformatics 2019;35:4537–42. 10.1093/bioinformatics/btz265 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Lazeroff M, Ryder G, Harris SL. et al. Phage Commander, an application for rapid gene identification in bacteriophage genomes using multiple programs. Phage (New Rochelle) 2021;2:204–13. 10.1089/phage.2020.0044 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Ecale, Zhou CL, Kimbrel J, Edwards R. et al. MultiPhATE2: code for functional annotation and comparison of phage genomes. G3 (Bethesda) 2021;11:jkab074. 10.1093/g3journal/jkab074 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Bouras G, Nepal R, Houtak G. et al. Pharokka: a fast scalable bacteriophage annotation tool. Bioinformatics 2023;39:btac776. 10.1093/bioinformatics/btac776 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Terzian P, Olo Ndela E, Galiez C. et al. PHROG: families of prokaryotic virus proteins clustered using remote homology. NAR Genom Bioinform 2021;3:lqab067. 10.1093/nargab/lqab067 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Cantu VA, Salamon P, Seguritan V. et al. PhANNs, a fast and accurate tool and web server to classify phage structural proteins. PLoS Comput Biol 2020;16:e1007845. 10.1371/journal.pcbi.1007845 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Sommer MJ, Salzberg SL. Balrog: a universal protein model for prokaryotic gene prediction. PLoS Comput Biol 2021;17:e1008727. 10.1371/journal.pcbi.1008727 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Turner D, Shkoporov AN, Lood C. et al. Abolishment of morphology-based taxa and change to binomial species names: 2022 taxonomy update of the ICTV bacterial viruses subcommittee. Arch Virol 2023;168:74. 10.1007/s00705-022-05694-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Meier-Kolthoff JP, Goker M. VICTOR: genome-based phylogeny and classification of prokaryotic viruses. Bioinformatics 2017;33:3396–404. 10.1093/bioinformatics/btx440 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Nishimura Y, Yoshida T, Kuronishi M. et al. ViPTree: the viral proteomic tree server. Bioinformatics 2017;33:2379–80. 10.1093/bioinformatics/btx157 [DOI] [PubMed] [Google Scholar]
- 77. Bin Jang H, Bolduc B, Zablocki O. et al. Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nat Biotechnol 2019;37:632–9. 10.1038/s41587-019-0100-8 [DOI] [PubMed] [Google Scholar]
- 78. Aiewsakun P, Simmonds P. The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification. Microbiome 2018;6:38. 10.1186/s40168-018-0422-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Moraru C. VirClust-a tool for hierarchical clustering, core protein detection and annotation of (prokaryotic) viruses. Viruses 2023;15:15. 10.3390/v15041007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Kupczok A, Bailey ZM, Refardt D. et al. Co-transfer of functionally interdependent genes contributes to genome mosaicism in lambdoid phages. Microb Genom 2022;8:8. 10.1099/mgen.0.000915 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Moraru C, Varsani A, Kropinski AM. VIRIDIC-a novel tool to calculate the intergenomic similarities of prokaryote-infecting viruses. Viruses 2020;12:1268. 10.3390/v12111268 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82. Chibani CM, Farr A, Klama S. et al. Classifying the unclassified: a phage classification method. Viruses 2019;11:11. 10.3390/v11020195 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83. Chibani CM, Meinecke F, Farr A. et al. ClassiPhages 2.0: sequence-based classification of phages using artificial neural networks. bioRxiv 2019. 10.1101/558171 preprint: not peer reviewed [DOI] [Google Scholar]
- 84. Jain C, Rodriguez-R LM, Phillippy AM. et al. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 2018;9:9. 10.1038/s41467-018-07641-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85. de Sousa JAM, Fillol-Salom A, Penades JR. et al. Identification and characterization of thousands of bacteriophage satellites across bacteria. Nucleic Acids Res 2023;51:2759–77. 10.1093/nar/gkad123 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Rajaei P, Jahanian KH, Beheshti A. et al. VIRMOTIF: a user-friendly tool for viral sequence analysis. Genes (Basel) 2021;12:12. 10.3390/genes12020186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87. Roux S, Krupovic M, Debroas D. et al. Assessment of viral community functional potential from viral metagenomes may be hampered by contamination with cellular sequences. Open Biol 2013;3:130160. 10.1098/rsob.130160 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88. Raiko M. viralComplete: BLAST-based viral completeness verification. https://github.com/ablab/viralComplete [accessed 25 August 2025].
- 89. Mallawaarachchi V, Roach MJ, Decewicz P. et al. Phables: from fragmented assemblies to high-quality bacteriophage genomes. Bioinformatics 2023;39:btad586. 10.1093/bioinformatics/btad586 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90. McNair K, Bailey BA, Edwards RA. PHACTS, a computational approach to classifying the lifestyle of phages. Bioinformatics 2012;28:614–8. 10.1093/bioinformatics/bts014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91. Hockenberry AJ, Wilke CO. BACPHLIP: predicting bacteriophage lifestyle from conserved protein domains. PeerJ 2021;9:e11396. 10.7717/peerj.11396 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92. Shang J, Tang X, Sun Y. PhaTYP: predicting the lifestyle for bacteriophages using BERT. Brief Bioinform 2023;24:bbac487. 10.1093/bib/bbac487 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Wu S, Fang Z, Tan J. et al. DeePhage: distinguishing virulent and temperate phage-derived sequences in metavirome data with a deep learning approach. Gigascience 2021;10:10. 10.1093/gigascience/giab056 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94. Song K. PhagePred. https://github.com/songkai1987/PhagePred [accessed 25 August 2025].
- 95. Garneau JR, Depardieu F, Fortier LC. et al. PhageTerm: a tool for fast and accurate determination of phage termini and packaging mechanism using next-generation sequencing data. Sci Rep 2017;7:8292. 10.1038/s41598-017-07910-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96. Urban M, Cuzick A, Seager J. et al. PHI-base in 2022: a multi-species phenotype database for pathogen-host interactions. Nucleic Acids Res 2022;50:D837–47. 10.1093/nar/gkab1037 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97. Lamy-Besnier Q, Brancotte B, Menager H. et al. Viral host range database, an online tool for recording, analyzing and disseminating virus-host interactions. Bioinformatics 2021;37:2798–801. 10.1093/bioinformatics/btab070 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98. Gao NL, Zhang C, Zhang Z. et al. MVP: a microbe-phage interaction database. Nucleic Acids Res 2018;46:D700–7. 10.1093/nar/gkx1124 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99. Zhou F, Gan R, Zhang F. et al. PHISDetector: a tool to detect diverse in silico phage-host interaction signals for virome studies. Genom Proteom Bioinform 2022;20:508–23. 10.1016/j.gpb.2022.02.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100. Wang W, Ren J, Tang K. et al. A network-based integrated framework for predicting virus-prokaryote interactions. NAR Genom Bioinform 2020;2:lqaa044. 10.1093/nargab/lqaa044 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101. Roux S, Camargo AP, Coutinho FH. et al. iPHoP: an integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria. PLoS Biol 2023;21:e3002083. 10.1371/journal.pbio.3002083 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102. Coutinho FH, Zaragoza-Solas A, Lopez-Perez M. et al. RaFAH: host prediction for viruses of bacteria and archaea based on protein content. Patterns (N Y) 2021;2:100274. 10.1016/j.patter.2021.100274 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103. Zhang R, Mirdita M, Levy Karin E. et al. SpacePHARER: sensitive identification of phages from CRISPR spacers in prokaryotic hosts. Bioinformatics 2021;37:3364–6. 10.1093/bioinformatics/btab222 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104. Galiez C, Siebert M, Enault F. et al. WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs. Bioinformatics 2017;33:3113–4. 10.1093/bioinformatics/btx383 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105. Zielezinski A, Deorowicz S, Gudys A. PHIST: fast and accurate prediction of prokaryotic hosts from metagenomic viral sequences. Bioinformatics 2022;38:1447–9. 10.1093/bioinformatics/btab837 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106. Ruohan W, Xianglilan Z, Jianping W. et al. DeepHost: phage host prediction with convolutional neural network. Brief Bioinform 2022;23:bbab385. 10.1093/bib/bbab385 [DOI] [PubMed] [Google Scholar]
- 107. Villarroel J, Kleinheinz KA, Jurtz VI. et al. HostPhinder: a phage host prediction tool. Viruses-Basel 2016;8:8. 10.3390/v8050116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108. Lu CY, Zhang Z, Cai ZN. et al. Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics. BMC Biol 2021;19:19. 10.1186/s12915-020-00938-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109. Tan J, Fang Z, Wu S. et al. HoPhage: an ab initio tool for identifying hosts of phage fragments from metaviromes. Bioinformatics 2022;38:543–5. 10.1093/bioinformatics/btab585 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110. Shang J, Sun Y. CHERRY: a computational metHod for accuratE pRediction of virus-pRokarYotic interactions using a graph encoder-decoder model. Brief Bioinform 2022;23:bbac182. 10.1093/bib/bbac182 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111. Pandolfo M, Telatin A, Lazzari G. et al. MetaPhage: an automated pipeline for analyzing, annotating, and classifying bacteriophages in metagenomics sequencing data. mSystems 2022;7:e0074122. 10.23736/S2724-6051.25.06499-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112. Marquet M, Holzer M, Pletz MW. et al. What the phage: a scalable workflow for the identification and analysis of phage sequences. Gigascience 2022;11:11. 10.1093/gigascience/giac110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113. Shang JY, Jiang JZ, Sun YN. Bacteriophage classification for assembled contigs using graph convolutional network. Bioinformatics 2021;37:I25–33. 10.1093/bioinformatics/btab293 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114. Roux S, Adriaenssens EM, Dutilh BE. et al. Minimum information about an uncultivated virus genome (MIUViG). Nat Biotechnol 2019;37:29–37. 10.1038/nbt.4306 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115. Gan R, Zhou F, Si Y. et al. DBSCAN-SWA. https://github.com/HIT-ImmunologyLab/DBSCAN-SWA [DOI] [PMC free article] [PubMed]
- 116. Gauthier CH, Abad L, Venbakkam AK. et al. Detection and Extraction of Phages Tool (DEPhT). https://github.com/chg60/DEPhT [accessed 25 August 2025].
- 117. Garneau JR, Depardieu F, Fortier LC. et al. PhageTerm: Fork of the Code That Is Available via Sourceforge. https://github.com/avilella/phageterm [accessed 25 August 2025].
- 118. Wishart DS, Han S, Saha S. et al. PHASTEST: faster than PHASTER, better than PHAST. https://phastest.ca [accessed 25 August 2025]. [DOI] [PMC free article] [PubMed]
- 119. Reis-Cunha JL, Bartholomeu DC, Earl AM. et al. ProphET. https://github.com/jaumlrc/ProphET [accessed 25 August 2025].
- 120. Ren J, Song K, Deng C. et al. DeepVirFinder. https://github.com/jessieren/DeepVirFinder [accessed 25 August 2025].
- 121. Wood D. Kraken2. https://github.com/DerrickWood/kraken2 [accessed 25 August 2025].
- 122. Amgarten DE. MARVEL. https://github.com/LaboratorioBioinformatica/MARVEL [accessed 25 August 2025].
- 123. Jurtz V. MetaPhinder. https://github.com/vanessajurtz/MetaPhinder [accessed 25 August 2025].
- 124. Fang Z, Tan J, Wu S. et al. PPR-Meta. https://github.com/zhenchengfang/PPR-Meta [accessed 25 August 2025].
- 125. Uritskiy GV, DiRuggiero J, Taylor J. viralVerify 2018. https://github.com/ablab/viralVerify [accessed 25 August 2025].
- 126. Ren J, Ahlgren N, Lu Y. et al. VirFinder. https://github.com/jessieren/VirFinder [accessed 25 August 2025].
- 127. Zheng T, Li J, Ni Y. et al. VirMiner. A Web-server for Mining Viral Signals in Metagenomic Data. http://sbb.hku.hk/VirMiner/ [accessed 13 August 2020].
- 128. Zheng T, Li J, Ni Y. et al. VirMiner. https://github.com/TingtZHENG/VirMiner [accessed 25 August 2025].
- 129. Miao Y, Liu F, Liu Y. Seq2Vec (Virtifier). https://github.com/crazyinter/Seq2Vec [accessed 25 August 2025].
- 130. Zhou ZC, Martin C, Kosmopoulos JC. et al. ViWrap: A Modular Pipeline to Identify, Bin, Classify, and Predict Viral-host Relationships for Viruses from Metagenomes. https://github.com/AnantharamanLab/ViWrap [accessed 25 August 2025]. [DOI] [PMC free article] [PubMed]
- 131. Marquet M, Holzer M, Pletz MW. et al. What the Phage (WtP): Phage Identification via Nextflow and Docker or Singularity. https://github.com/replikation/What_the_Phage [accessed 25 August 2025].
- 132. Shang JY, Peng C, Liao HR. et al. PhaBOX: A Web Server for Identifying and Characterizing Phage Contigs in Metagenomic Data. https://phage.ee.cityu.edu.hk [accessed 25 August 2025]. [DOI] [PMC free article] [PubMed]
- 133. Shang JY, Peng C, Liao HR. et al. PhaBOX: Local Version of the Virus Identification and Analysis Web Server (Tool Set). https://github.com/KennthShang/PhaBOX [accessed 25 August 2025].
- 134. Siren K, Sicheritz-Pontén T. PhageBoost - Web. https://phageboost.ku.dk [accessed 25 August 2025].
- 135. Sirén K, Millard A, Petersen B. et al. PhageBoost - GitHub. https://github.com/ku-cbd/PhageBoost [accessed 25 August 2025].
- 136. Shang J, Tang X, Guo R. et al. PhaMer - GitHub. https://github.com/KennthShang/PhaMer [accessed 25 August 2025].
- 137. Starikova EV, Tikhonova PO, Prianichnikov NA. et al. Phigaro Is a Scalable Command-line Tool for Predicting Phages and Prophages. https://github.com/bobeobibo/phigaro [accessed 25 August 2025].
- 138. Auslander N, Gussow AB, Benler S. et al. Seeker. https://github.com/gussow/seeker [accessed 25 August 2025].
- 139. Kieft K. VIBRANT: GitHub. https://github.com/AnantharamanLab/VIBRANT [accessed 25 August 2025].
- 140. Roux S. VirSorter. https://github.com/simroux/VirSorter [accessed 25 August 2025].
- 141. Guo J, Bolduc B, Zayed AA. et al. VirSorter2. https://github.com/jiarong/VirSorter2 [accessed 25 August 2025].
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No new data were generated or analysed in support of this research.

