Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2025 Sep 3;26(5):bbaf449. doi: 10.1093/bib/bbaf449

Phage quest: a beginner’s guide to explore viral diversity in the prokaryotic world

Carolin Charlotte Wendling 1,2, Marie Vasse 3,4, Sébastien Wielgoss 5,
PMCID: PMC12406692  PMID: 40900113

Abstract

The increasing interest in finding new viruses within (meta)genomic datasets has fueled the development of computational tools for virus detection and characterization from environmental samples. One key driver is phage therapy, the treatment of drug-resistant bacteria with tailored bacteriophage cocktails. Yet, keeping up with the growing number of automated virus detection and analysis tools has become increasingly difficult. Both phage biologists with limited bioinformatics expertise and bioinformaticians with little background in virus biology will benefit from this guide. It focuses on navigating routine tasks and tools related to (pro)phage detection, gene annotation, taxonomic classification, and other downstream analyses. We give a brief historical overview of how detection methods evolved, starting with early sequence-composition assessments to today’s powerful machine-learning and deep learning techniques, including emerging language models capable of mining large, fragmented, and compositionally diverse metagenomic datasets. We also discuss tools specifically aimed at detecting filamentous phages (Inoviridae), a challenge for most phage predictors. Rather than providing an exhaustive list, we emphasize actively maintained and state-of-the-art tools that are accessible via web or command-line interfaces. This guide provides basic concepts and useful details about automated phage analysis for researchers in different biological and medical disciplines, helping them choose and apply appropriate tools for their quest to explore the genetic diversity and biology of the smallest and most abundant replicators on Earth.

Keywords: bacteriophages, prophages, metagenomics, microbial bioinformatics, gene annotation, phage prediction

Introduction

The growing interest in discovering new viruses in (meta)genomic datasets has led to a rapid increase in newly developed computational tools for virus detection and characterization from environmental samples [1–6]. This interest is also sparked by the potential of phage therapy, the application of phages to treat bacterial infections, especially those involving drug-resistant bacteria [7]. However, this surge of interest in bacteriophages (phages) extends beyond the promise of medical applications and is based on recognizing our planet as a bacterial world [8], where phages play pivotal roles as the most abundant replicators [9], shaping ecological and evolutionary dynamics of microbial communities, and have cascading effects on plants, animals, and entire ecosystems [10]. As a result of ever-cheaper sequencing costs, researchers from diverse fields, including microbiology, medicine, ecology, and evolution, started to explore and identify phages either in their own (meta)genomic datasets or in publicly available databases. With the rapid pace at which new virus analysis tools are emerging in recent years (Fig. 1), it becomes increasingly more difficult for researchers to select the most appropriate approaches to answer their most relevant questions. Acknowledging these dynamics, here we provide a comprehensive guide that equips researchers with the necessary knowledge to detect and describe bacterial viruses in genomic and metagenomic datasets, enabling an easy entry into this rapidly evolving research field. While recent reviews were aimed at technical users in metagenomics [11], this guide is aimed at both wet-lab phage biologists with limited bioinformatics expertise and bioinformaticians with little background in virus biology.

Figure 1.

Chart depicting the cumulative count of published bioinformatic tools cited in this review covering the past 25 years.

Surging interest in computational phage research. The chart depicts the cumulative count of all bioinformatic tools referenced in this review, covering methods of phage detection, annotation, taxonomic classification, host prediction, life cycle inference, and genome quality assessment.

We begin with key concepts in bacteriophage biology and then briefly introduce the computational principles behind phage detection. We then provide a historical account of the evolution of (pro)phage detection tools and highlight modern state-of-the-art algorithmic approaches. After that, we transition to the core part of our review, a step-by-step guide comprising four parts. In these steps, we cover popular and well-maintained tools without claiming to be exhaustive and include methods that support the detection of filamentous phages (Inoviridae), an oft-neglected group that includes the important representative phage M13. Data processing, genome assembly, phylogenetics, and comparative genomics are only briefly addressed, as they fall outside the scope of this review.

Key concepts in bacteriophage biology

Phages display remarkable diversity in genome structure, morphology, and life cycle strategies [12]. Their genomes are encoded as either single- or double-stranded DNA or RNA and are often enclosed in protein shells, either spherical capsids or filamentous coats. Beyond morphology, phages evolved different life cycles (Fig. 2): virulent phages kill their infected hosts via the lytic cycle, ensuring rapid horizontal transmission. Temperate and filamentous phages establish long-term associations with their bacterial hosts, ensuring vertical host transmission. While filamentous phages typically persist extrachromosomally without causing lysis, temperate phages insert their genetic material into the host genome and together form the lysogen. Insertion happens either at specific attachment sites via integrases (phage lambda) or at random via transposases (phage mu). Integrated prophages can exit the host genome spontaneously or in response to stress via a molecular switch, replicate, and subsequently lyse the host. Prophages are widely found in bacterial genomes and can constitute up to a fifth of the host genome [13]; e.g. Escherichia coli O157:H7 strain Sakai harbors 18 prophages [14]. Integrated phages sometimes lose their ability to switch to the lytic cycle or produce viable viral particles following the acquisition of deleterious mutations [15]. Finally, more complex, multipartite viruses exist that are distributed across different genomic segments, each encapsulated in separate particles [16]. One example is RNA-phage phi-6, which infects Pseudomonas phaseolicola [17]. Such a complicated genome organization represents an important challenge for phage prediction tools.

Figure 2.

Graphical illustration of both lytic and lysogenic bacteriophage life cycles.

Illustration of lytic and lysogenic bacteriophage life cycles. In both cycles, the phage binds to the bacterial cell (1) and the phage’s genetic material then enters the host cell (2). During the lytic cycle, the phage multiplies (3a) and releases mature viruses through host cell lysis (4a). The lysogenic cycle is characterized by phage genome integration into the bacterial genome (prophage formation, 3b), vertical inheritance through host replication (4b), and occasional phage genome excision (5) before entering the lytic cycle (3a-4a). Created in BioRender. Wielgoss, S. (2025) https://BioRender.com/4si2h1x

Principles of automated phage prediction

Viral signals can be successfully detected with different computational approaches (Table 1). These approaches differ primarily in how much they rely on similarity to known viral sequences.

Table 1.

Summary of computational approaches to detect viral signals in sequence data

Approach Basis Methods Advantages Limitations
Sequence similarity (SeqSim) Infers homology from local sequence alignment BLAST-based searches, HMM profiles from phage gene databases (e.g. pVOGs) High accuracy when close reference sequences exist Poor detection of novel or divergent phages; dependent on reference database completeness
Hybrid Combines similarity-based and homology-independent genomic features Sequence similarity and agnostic features (like GC/AT skew, gene density, transcription direction, and tRNA presence) Improved accuracy and flexibility; detects fragmented or novel phages Computationally more complex; requires integration of diverse signals
k-mer-based Identifies composition patterns using short k-mer frequencies k-mer frequency profiling, composition clustering Alignment-free; efficient detection of rearranged or unknown sequences Sensitive to k-mer size and sequence quality; still somewhat reference-biased
Machine learning and deep learning (ML/DL) Learns patterns from data using statistical models RFs, SVMs, CNNs, and LSTMs; often use k-mer or protein features Can detect novel viruses by learning complex, non-obvious patterns Requires large, high-quality training data; model tuning and validation are non-trivial
  • Sequence-similarity-based approaches identify viral regions by homology to known phage proteins, e.g. via BLAST or hidden Markov models (HMMs) profiles from databases such as the prokaryotic virus orthologous groups (pVOGs) database. These tools often use sliding windows to detect phage-like regions enriched for phage genes. These methods are strongly dependent on database completeness and can miss divergent or unknown phages.

  • Hybrid approaches integrate classical sequence-similarity-based methods with sequence-agnostic approaches. The latter is based on homology-independent features, i.e. GC/AT skew, transcription directionality, gene length, or tRNA occurrence, to achieve higher accuracy and flexibility. Many recent hybrid tools incorporate machine or deep learning (ML/DL) to enhance the detection of fragmented and novel genomes.

  • K-mer-based methods classify sequences using the frequency of short nucleotide genomic substrings of length k (k-mers), equivalent to “DNA words”. This allows detection of viruses with limited similarity to known phages. These methods are alignment-free and can handle genome rearrangements but are sensitive to different k-mer sizes and input quality.

  • ML and DL approaches apply data-driven models to detect unknown or less well-characterized viruses, including ssDNA viruses [18]. ML/DL-models learn complex patterns that distinguish viral from microbial and plasmid sequences. Common ML-models are random forests (RF), a range of learning methods that build multiple decision trees and combine their output for predictions, and support vector machines (SVMs), which identify the optimal boundaries between different groups in the data. Widely used DL-models are convolutional neural networks (CNNs), which excel at identifying local data patterns (including k-mer frequencies), and long short-term memory (LSTM) networks, which specialize in capturing relationships in sequential data (including DNA and RNA). Some tools combine several ML/DL approaches and may integrate other approaches (hybrid or k-mer-based).

The evolution of phage prediction tools

Phage prediction tools emerged around the turn of the millennium to identify prophages in single bacterial genomes, at a time when available sequencing data were still scarce [19]. These early tools exploited simple composition-based signals [20], such as sudden shifts in GC content or dinucleotide relative abundance, to identify candidate prophage regions in host genomes [21–23]. However, their narrow scope made them unreliable for identifying low-abundance and cryptic prophages [19]. In the mid-2000s, a second wave of tools strongly enhanced prediction accuracy by integrating sequence composition analysis with homology-based methods. Phage_Finder [24], Prophage Finder [25], and Prophinder [26] integrated protein homology searches, tRNA detection, phage integration site prediction, and HMMs trained on viral genes. Despite these improved capabilities, their use was hampered by the limited diversity of available viral sequences and the technical expertise required for their implementation [19]. In response, tools such as the PHAST suite [19, 27, 28] implemented user-friendly web servers, which made in silico prophage prediction from closed single genomes accessible to many microbiologists and contributed to the suite’s high popularity. Moreover, PhiSpy [29] introduced hybrid approaches, the combination of sequence-similarity searches with sequence-agnostic features to improve the detection of novel and atypical prophages.

Yet, the advent of metagenomic sequencing completely changed the landscape of virus detection software, opening up unprecedented opportunities for phage discovery while exposing limitations in earlier tools designed for prophage detection in single, complete genomes. Metagenomic datasets are significantly larger and more taxonomically diverse, which requires more scalable virus detection methods that could also handle lowly covered and highly fragmented viral sequences. Early tools such as VirSorter [30] and MetaPhinder [31] extended detection to mixed-community data. VirSorter offered broad coverage with modular outputs but suffered from high false-positive rates [47]; and while MetaPhinder offered higher precision, it was constrained by its reliance on close similarity to known reference genomes, limiting its power to identify novel or mosaic phages [32].

The limitations of those initial tools, either prophage detectors constrained by known genome characteristics or early metagenomic tools restricted by reference similarity, boosted the development of a next generation of virus detection approaches starting from the latter half of the 2010s. These tools introduced conceptually distinct innovations to tackle key challenges:

  • Kraken [33] and Kraken2 [34] forewent alignment altogether and used hash-based k-mer mapping, which improved scalability for fractured metagenomic data.

  • VirFinder [35] (conventional ML) and DeepVirFinder [36] (CNNs) also use k-mer mapping but replace hand-tuned rules with ML and DL, respectively, further boosting sensitivity for novel or uncharacterized phages.

  • MARVEL [37] tackled low-abundance viruses by making predictions from metagenome-assembled genomes (MAGs) using RF classifiers.

  • VirMiner [38] combined ML-based classification with host prediction and gene annotation for deeper ecological insight.

  • PPR-Meta [39], VIBRANT [32], Seeker [40], and Virtifier [41] employed different DL architectures (CNNs or LSTMs) to learn viral signatures from raw or protein-level data, especially suitable for mosaic or rearranged genomes.

  • VirSorter2 [42] distinguishes itself through an ensemble ML approach that combines multiple phage-specific classifiers trained on viral and host genomic features, enabling robust and accurate predictions across a wide range of input types.

  • PhaMer [50], finally, introduces a paradigm shift in phage detection by applying Transformer-based large language models (LLMs) to protein-tokenized phage contigs, enabling the capture of long-range dependencies and hidden sequence patterns characteristic of compositionally atypical or cryptic phages.

PhaMer was integrated into PhaBOX2 [43], a comprehensive and user-friendly pipeline that bridges multiple tools in an end-to-end workflow. The use of such integrated pipelines is a recent trend in the field aimed at fostering reproducibility, scalability, and accessibility. In summary, cutting-edge phage prediction software is increasingly defined by its ability to detect divergent, low-abundance, or structurally complex phages, even from noisy, fragmented metagenomes. This is a significant shift away from earlier static, sequence-similarity-based tools. Today, modern approaches incorporate dynamic, data-driven ML/DL algorithms that have been developed to handle the scale and complexity of large metagenomic datasets.

A detailed step-by-step instruction guide

Our goal is to offer guidance on navigating the expanding landscape of phage analysis tools used for single genomes and metagenomes. To this end, we cover four essential steps (Fig. 3): phage detection, annotation, taxonomic classification, and further downstream analyses (including quality of predicted phages, phage life cycle, and host prediction). We conclude by presenting integrated pipelines that streamline phage analysis by automating most or all of these steps.

Figure 3.

Graphical depiction of the step-by-step workflow instruction in this article.

Workflow for phage detection and analysis. This outline reflects the key steps of phage detection, annotation, classification, and further downstream analyses.

Step 1: Phage detection

General considerations

Most automated phage detection tools covered in this review operate on assembled contigs or genomes, e.g. in FASTA or annotated GBK formats. When working with assembled contigs, small genomic fragments can strongly hamper downstream analyses, including host prediction and viral core gene identification [44]. Thus, it is highly recommended to remove contigs <500 bp [35]. Several tools, including Kraken2 [34], VirMiner [38], and the pipelines PhaBOX2 [43] and ViWrap [45], all accept raw reads (FASTQ files). In particular, VirMiner [38] offers built-in modules for read pre-processing and classification, both of which are recommended. In the following, we present tools for phage discovery from metagenomes, then introduce prophage scanners for single genomes, and conclude with a brief section on filamentous phage detection. All prediction tools have been visually categorized (Fig. 4) and tabulated for data type, user expertise, and computational resource demands (Table 2, categories explained in Table 3).

Figure 4.

Graphical grouping of virus prediction tools according to sequence data and required user expertise.

Classification of virus prediction tools by data type and user expertise. Tools are grouped based on their intended input (single genomes versus metagenomes) and anticipated user expertise. Approximate computational requirements are indicated in parentheses (L = low, M = medium, H = high; see also Table 3). Tools applicable to both single genomes and metagenomes are listed in both categories.

Table 2.

Summary table of phage prediction tools.

Tool Expertise Data type Resources Approach Use case
DBSCAN-SWA [57] Expert: CLI [115] Single genomes Low Hybrid Rapid batch processing and prophage detection
DeepVirFinder [36] Skilled: CLI [120] Metagenomes Medium k-mer, DL CNN-based tool for viral sequence detection from metagenomes
DEPhT [58] Expert: CLI [116] Single genomes Medium Hybrid Rapid batch processing and prophage detection with boundary detection (focus on Mycobacterium)
Inovirus [18, 59] Expert: CLI [60] Single- and metagenomes Medium Hybrid, ML ML predictor for filamentous phages from assembled genomes
Kraken2 [34] Skilled: CLI [121] Metagenomes High k-mer Hash-based taxonomic k-mer sequence classification with high resource (RAM) demands
MARVEL [37] Skilled: CLI [122] Metagenomes Medium Hybrid, ML RF-based recovery of tailed phage candidates from metagenomic bins; focus on Caudovirales
MetaPhinder [31] Skilled: CLI [123] Metagenomes Medium SeqSim Phage identification from metagenomes via BLAST searches against custom phage DB; also detects filamentous phages
PhaBOX2 [43] Novice: Web [132] Expert: CLI [133] Single- and metagenomes Low, Medium Hybrid, DL Integrated workflow for phage identification with lifestyle, host, and taxonomy prediction from contigs with visual outputs
PhageBoost [56] Novice: Web [134] Skilled: CLI [135] Single- and metagenomes Low, Medium ML RF-based prophage detection with read quality control, assembly, and functional annotation
PhageTerm [95] Skilled: CLI [117] Single genomes Medium Hybrid Accurate phage termini and packaging inference (requires reads)
PhaMer [50] Skilled: CLI [136] Single- and metagenomes Medium DL Deep-language-model-based tool for phage detection from metagenomes
PHASTEST [52] Novice: Web [118] Single genomes Low Hybrid Rapid web-based prophage detection and annotation
Phigaro [54] Skilled: CLI [137] Single- and metagenomes Low Hybrid Scalable, high-throughput prophage prediction and annotation
PhiSpy [29] Skilled: CLI [53] Single genomes Low Hybrid, ML RF-based prophage detection from annotated genomes, with boundary refinement
PPR-Meta [39] Expert: CLI [124] Metagenomes Medium DL CNN-based phage and plasmid prediction
ProphET [55] Skilled: CLI [119] Single genomes Medium SeqSim Prophage prediction using an auto-updating reference database, is best for known phages
Seeker [40] Skilled: CLI [138] Single- and metagenomes Medium DL Alignment-free phage detection based on LSTM-models
VIBRANT [32] Skilled: CLI [139] Single- and metagenomes Medium Hybrid, DL Automated DL tool trained on protein signatures for virus detection, annotation, and life cycle prediction
viralVerify [49] Skilled: CLI [125] Metagenomes Medium ML Filters viral contigs from metagenomic assemblies; low precision on single-genome prophage scans
VirFinder [35] Skilled: CLI [126] Metagenomes Low k-mer, ML Fast alignment-free approach to detect viral sequences in metagenomes; biased to known phages
VirMiner [38] Novice: Web [127], Skilled: CLI [128] Metagenomes Low, Medium ML Highly sensitive RF model for virus and host predictions with functional annotation
VirSorter [30] Skilled: CLI [140] Single- and metagenomes Medium Hybrid De novo hybrid virus detection from metagenomes with custom probabilistic models
VirSorter2 [42] Expert: CLI [141] Single- and metagenomes High Hybrid, ML, DL Highly modular ML/DL hybrid pipeline to detect DNA and RNA viruses in complex viromes
Virtifier [41] (Seq2Vec) Skilled: CLI [129] Metagenomes Medium DL Viral contig identification from metagenomes based on LSTM classifiers; also, for contigs <500bp
ViWrap [45] Expert: CLI [130] Metagenomes High Hybrid, ML Modular integrated workflow for phage identification, binning, classification, and host prediction
What the Phage (WtP) [112] Skilled: CLI [131] Metagenomes High Hybrid, ML, DL Scalable phage identification and analysis pipeline; includes ML/DL
Table 3.

Explanation guide of expected user expertise and computational resource requirements for different viral detection tools (as referred to in Table 2, Fig. 4).

Level Definition Explanations and examples
Required user expertise
 Novice Minimal to basic bioinformatics exposure; intuitive web interface or GUI Web/GUI: point-and-click usage, sequence selection, or upload
 Skilled Proficient with CLI (command-line interface) CLI basics in Bash, GitHub, Conda, Python, or R
 Expert Experienced with automation, tool chaining, and high-performance computing (HPC) HPC usage, Snakemake, Docker, workflow debugging
Required computational resources
 Low Web,
Standard computer (≤8 GB RAM, 1 CPU)
Ideal for casual, exploratory, or classroom use
 Medium Moderate workstation
(8–32 GB RAM, multi-core CPU, moderate storage space)
Suitable for most genome and medium-sized metagenomic datasets
 High Requires server or HPC resources
(>32 GB RAM; multiple threads for parallelization; large storage space)
For demanding high-throughput projects and complex workflows

Tool guide for analyzing metagenomes

We begin our tool guide with metagenomic phage detection tools, as this branch has become the fastest-growing field for viral bioinformatics. Tool performance can vary widely with input quality, contig fragmentation, and the viral/bacterial reference databases used [11]. To make informed decisions, users must rely on context-aware benchmarks [46–48]. Thorough benchmarks report the following standard metrics for tool cross-comparison:

  • precision (fraction of predicted viral contigs that are truly viral),

  • recall (fraction of all true viral contigs that are correctly recovered), and the

  • F1 score (the balanced, harmonic mean of precision and recall).

Figure 5 summarizes results from the comprehensive benchmark "Gauge your phage" [47], which evaluated 10 widely used metagenomic virus detection tools on artificial contigs created from RefSeq genomes, previously sequenced mock communities, and randomly shuffled sequences. For RefSeq-derived sequences, the top performers were VIBRANT [32], VirSorter2 [42], and PPR-Meta [39], in that ranked order, with F1 scores higher than 90% (Fig. 5). These skilled-to-expert-level tools have high precision and recall at optimal conditions (for non-fragmented contigs up to 15 kbp length). VirSorter2 is especially well-suited to dealing with intricate viromes; however, its high flexibility trades off against longer runtimes compared to the other metagenomics tools. The less complex VIBRANT had much shorter run times and shows higher precision than VirSorter2 but had lower recall success. PPR-Meta is even faster than VIBRANT or VirSorter2 based on its resource-optimized DL classifiers; however, it also produced more false positives than both of the aforementioned tools.

Figure 5.

Bar graphs showing the three benchmark performance metrics F1, precision and recall of virus prediction tools on standardized metagenomic datasets.

Benchmark performance of virus prediction tools on metagenomic datasets. Bar plots show the F1 score, precision, and recall of 10 metagenomic viral prediction tools, evaluated on either RefSeq-derived sequences (red, upper bar) or synthetic mock community data (blue, lower bar). Each bar represents the average performance for the respective tool and dataset. Tools are sorted by decreasing F1 scores on the RefSeq dataset for clarity. Performance metrics adapted from [47] under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

In contrast, the more complex mock community dataset generally favored k-mer-based tools (Fig. 5). Here, the memory-intensive Kraken2 [34] had, by far, the highest F1-score across tools, which was carried by both excellent precision and recall. DeepVirFinder [36] ranked second due to a lower precision, but offers a highly resource-efficient alternative to Kraken2. As a consequence, k-mer-based methods are especially powerful for detecting phages in highly fragmented, contaminated, or low-abundance metagenomic data, such as environmental samples or ancient DNA. Their alignment-free nature allows for rapid detection of sequence composition patterns, even in the absence of close homologs, making them particularly effective when reference databases are incomplete or when sequence similarity is unreliable [33–36, 47].

Outside the top tier, VirSorter [30] delivered only moderate scores, especially on the more challenging mock-community data, and was clearly outperformed by its successor, VirSorter2. Likewise, VirFinder [35] was “easily handled” by its successor DeepVirFinder, whose convolutional-network model benefits from a much larger and more diverse viral training set. At the lower end of the spectrum, viralVerify (a module of MetaWRAP [49]) and Seeker [40] generally struggled with most benchmarks. Seeker also failed to represent both alpha- and beta-diversity in mock virome data, meaning it underestimated within-sample viral richness and between-sample community differences [47]. This makes Seeker unsuitable as a primary tool for viral ecology studies focused on diversity patterns or compositional structure. Among widely used tools outside of the scope of the benchmark, we want to single out two: MARVEL [37] and PhaMer [50]. MARVEL is a high-throughput tool intended for detecting free, tailed Caudovirales phages from metagenomic bins and is especially suitable for low-abundance and fragmented sequences when binning is feasible. In its original validation [37], MARVEL outperformed VirSorter and VirFinder in recall while maintaining similarly high precision, particularly on simulated MAGs. It is less suitable for detecting viruses from highly fragmented, unbinned contigs. PhaMer [50] utilizes a LLM for classifying phage contigs and is particularly effective at detecting cryptic and compositionally atypical phages. It achieved an F1-score of 0.93 on RefSeq-derived contigs and outperformed VirSorter, (Deep)VirFinder, Seeker, and PPR-Meta on mock metagenomic datasets [50]. While PhaMer requires high computational resources, this limitation is mitigated by its integration into the online workflow PhaBOX2 [43]. As a final note, averaging results from multiple prediction tools does not always improve accuracy, as many tools share overlapping reference biases and interdependent training data [46]. Therefore, tool outputs should be interpreted independently. Moreover, other factors, such as tool interface and computational resource demands, can be equally decisive (Tables 2 and 3, and Fig. 4) and should guide tool choice based on the dataset’s complexity and the user’s expertise.

Tool guide for prophage detection in single genomes

Compared to the metagenome-oriented tools described in the previous section, dedicated single-genome scanners offer higher efficiency and accuracy for identifying prophages in individual bacterial genomes. Figure 6 summarizes benchmark results from the Philympics 2021 study [51] and is supplemented with performance data from the PHAST suite [52]. Among all evaluated tools, PHASTEST achieved the highest overall performance across all tested metrics, though it was run on a different dataset (Casjens-54) [20]. It is highly recommended for users who prefer GUIs and provides quick but sensitive open-reading frame (ORF) annotation. Analyses are typically complete within minutes per genome, with interactive visualizations of prophage locations in the output [52]. Within the Philympics benchmark, the updated version of PhiSpy [53] led the field. It offers robust precision and recall without relying on static reference databases. PhiSpy features RF classifiers on annotated genomes, includes refined prophage boundary detection, and is especially well-suited for skilled users who value flexibility and parameter control. Among other high-performing tools, Phigaro [54] offers robust throughput by combining Prodigal gene prediction with HMM-based pVOG annotation. ProphET [55] also performed well and is notable for including a self-updating reference database. At the lower end of the performance spectrum, PhageBoost [56] and the batch-processing tool DBSCAN-SWA [57] showed significant drops in precision and boundary resolution. Of note, metagenome-focused virus predictors performed poorly on these single-genome benchmarks, showing lower precision, longer runtimes, and poor phage boundary resolution. This highlights the importance of using tools designed for single-genome prophage detection. While not part of the performance benchmark, DEPhT [58] deserves mention for its precise prophage boundary detection. As always, tool choice should be guided by the specific research question and available computational resources (Tables 2 and 3, and Fig. 4).

Figure 6.

Bar graphs showing the three benchmark performance metrics F1, precision and recall of prophage prediction tools on standardized single genome datasets.

Benchmark performance of tools used for prophage detection in single genomes. Horizontal bar plots compare F1 score, precision, and recall (panels from left to right) across 13 tools. Orange, data adapted from the PHAST suite benchmark [52], licensed under a Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/); red, data adapted from the Philympics 2021 benchmark [51], licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Tool guide for detecting filamentous phages (Inoviridae)

Filamentous phages (Inoviridae) are characterized by rod-shaped or long proteinaceous filaments with a circular ssDNA genome of ~5–15 kb that can establish chronic infections. Because of their unique and diverse gene content, most computational approaches are inefficient at detecting their sequences from whole-genome shotgun sequencing data [18, 59]. The ML tool Inovirus [60] implements a two-step pipeline specifically designed for this purpose. In the first step, the program Inovirus_detector scans for conserved Inoviridae marker proteins (especially pI-like proteins) using HMMs. In the second step, an RF classifier detects other characteristic Inoviridae features, such as small structural proteins. These predictions are then passed to the Inovirus_classifier module, which refines the taxonomic ranking of candidate sequences within Inoviridae, based on conserved protein clusters. This approach enables the automated discovery and taxonomic classification of inoviruses. The authors reported high recall and precision values of 92.5% and 99.8%, respectively, on a manually curated reference set [18, 60]. Of note, Virsorter2 [42] is also capable of identifying Inoviridae, as it includes pI-like proteins in its viral marker set.

Once high-confidence phage regions have been identified, users typically proceed to annotate and characterize the predicted viral genes. We describe this step in the following section.

Step 2: Phage gene prediction and annotation

Phage gene prediction

While tools such as GLIMMER [61], GeneMarkS [62], and Prodigal [63] were originally designed for application to bacterial genome annotation, they are frequently applied to phage genomes as well. However, their performance is limited by the compact and atypical architecture of phage genomes, which typically feature more overlapping, short, and embedded genes [64–66]. To address this, the graph-based PHANOTATE [67] was developed specifically for the compact nature of phage genomes. A benchmark with 2133 complete phage genomes showed that PHANOTATE predicted more genes than GLIMMER, GeneMarkS, and Prodigal and had an ~82% agreement with genes predicted by at least one of these tools [67]. Importantly, ~6% of its predictions were unique but mostly evolutionarily conserved. This suggests that PHANOTATE can uncover functional proteins that are not detected with standard approaches. As a best-practice recommendation, the outputs from various gene prediction tools should be compared. To this end, the comparative platforms Phage Commander [68] and MultiPhATE2 [69] assess consensus calls, visualize overlaps, and help select the most plausible gene models.

Functional gene annotation

Unlike in cellular organisms, prokaryotic viruses lack a universal common ancestor, and their proteins exhibit limited conservation levels. Therefore, only a minority of phage genes have known functions, which hampers the functional annotation of newly detected phage genes. To address this, Pharokka [70] integrates the prokaryotic virus remote homologous groups (PHROG) database [71], which clusters viral proteins into orthologous groups based on remote homology and manual curation. Paired with the PHANOTATE gene caller, Pharokka provides appropriate prediction and meaningful annotation for newly identified phages. For users who prefer web-based tools, PhANNs [72] offers an artificial neural network (ANN) ensemble to rapidly classify proteins into 10 structural classes. Finally, highly fragmented metagenomic assemblies present a significant challenge for standard gene callers. In this context, Balrog [73], which employs temporal CNNs, demonstrates strong performance by significantly reducing the number of hypothetical gene predictions. It effectively retains well-conserved genes while removing spurious ORFs, which improves confidence in both gene prediction and downstream annotation.

Step 3: Taxonomic classification

Historically, viruses were primarily classified based on phenotypes, e.g. traits such as tail morphology or capsid shape. Because such morphocentric groupings often lacked monophyly, the international committee on taxonomy of viruses (ICTV) [74] redefined viral taxonomy to be based on genomic and proteomic information. At higher taxonomic ranks (family, order, and class), classification is now done based on viral hallmark genes and whole-proteome comparisons. This approach is implemented by several programs, comprising VICTOR [75] and ViPTree [76], which both conduct whole-proteome phylogenetic inference; vConTACT2 [77], which groups viruses in terms of common protein clusters; GRAViTy [78], which integrates genomic architecture and protein profile HMMs; and VirClust [79], which uses adaptive homology models for proteins to identify taxonomic clusters across taxonomic levels without sacrificing sensitivity and specificity.

At lower taxonomic levels (genus and species), whole-genome or individual-gene alignments remain essential. However, phages frequently lack a common core genome due to high recombination frequencies or genomic mosaicism [80]. This complicates traditional phylogenetic classification. Clustering based on intergenomic nucleotide identity can circumvent this limitation. For example, VIRIDIC [81] clusters phages based on user-defined similarity levels, while ClassiPhage and ClassiPhages 2.0 [82, 83] utilize HMMs and ANNs to classify phages by conserved features.

However, taxonomic classification of novel or highly divergent phages remains challenging. Most lack sufficient similarity to reference sequences, and their modular genome structures impede the application of conventional classification methods. To better reflect evolutionary relatedness in these cases, several studies have adopted genome-wide similarity metrics as complementary approaches: average nucleotide identity (ANI) [84] and weighted gene repertoire relatedness (wGRR) [80].

ANI calculates the average nucleotide identity of orthologous regions of genes between two genomes. It is achieved by fragmenting genomes, matching homologous regions, and averaging nucleotide similarity. ANI accurately distinguishes phages at the genus or species levels but is less effective for highly recombinant or mosaic genomes, where alignable regions can be sparse.

In contrast to ANI, which relies on nucleotide-level similarity, wGRR establishes similarity at the protein level through the detection of reciprocal best hits between genomes and their weighting based on both sequence identity and alignment coverage. This protein-centric approach enables the estimation of evolutionary relatedness to be robust even when nucleotide homology is fragmented or low. While not a taxonomic method per se, wGRR is best applied for clustering phages based on shared gene content and evolutionary patterns, particularly if core genes are lacking or disrupted by recombination.

Step 4: Further downstream analyses

Following detection and taxonomic classification, different types of downstream analysis can provide key functional, ecological, and evolutionary information. These encompass sequence quality analysis, life cycle prediction, and the prediction of potential hosts, especially essential for phage-therapeutic design and viral ecology studies. We recommend three main categories: (i) quality assessment, and prediction of (ii) phage life cycle, and (iii) phage host. Other downstream analyses beyond this guide are core gene prediction and gene transfer [80, 85], viral density estimation (VIRMOTIF [86]), or functional potential prediction of viral communities [87]. Instead of a comprehensive list, we prefer to provide the beginner with a helpful overview of frequently used tools and best practices.

Quality assessment of phage genomes

Assuring high quality of novel genome assemblies is crucial for reliable annotation, taxonomic classification, and ecological interpretation. This is because incomplete or contaminated genomes can obscure significant viral functions or lead to incorrect taxonomic classification. To circumvent these issues, CheckV [44] is the most suitable software for precise assessment of host contamination and genome completeness. It uses reference-based scoring for known phages, HMM-based inference for novel viruses, GC content, and terminal repeat detection to determine completeness level and contamination status. It reports completeness values as a percentage of complete viral genome for each contig. In addition, other tools also measure viral completeness with different approaches: VIBRANT [32] scans for characteristic viral proteins; viralComplete [88] employs reference-length and content; PHASTEST [52] offers ORF-level completeness scores; and Phables [89] reconstructs fragmented metagenomic assemblies into genomes using flow-based graph modeling, a unique feature among existing tools. Completeness estimates are reference-coverage dependent and can miss novel genomes. Therefore, we recommend visually inspecting all datasets.

Life cycle prediction

Phage lifestyle prediction is a reflection of their ecological roles and therapeutic potential. However, most of the current methods predict lysogeny based on conserved markers or a positive hit to known integrases, a characteristic that novel viruses might not have. Furthermore, if only genome structure is considered, it is impossible to determine whether a prophage is biologically active. To ensure strong inferences of phage lifestyles, genomic predictions should be complemented by contextual data, such as gene expression or culture-based strategies. For automated prediction, the tool landscape offers a variety of different approaches, e.g. PHACTS [90] or BACPHLIP [91]. PHACTS employs RF classification to cross-match phage genomes with a reference database of phages whose known life cycles have been characterized, and BACPHLIP [91] distinguishes between temperate and virulent phages according to their conserved protein domains. Lytic or temperate life cycles can further be predicted for highly fragmented phages derived from short-contig assemblies (PhaTYP [92]) or metaviromes (DeePhage [93] or PhagePred [94]). Also, PhageTerm [95] can be employed to predict the packaging mechanisms when both sequencing reads and an assembly are available.

Host prediction

The accurate inference of a phage’s host range is crucial for any meaningful ecological interpretation, but also for technical considerations such as microbiome engineering and assessing therapeutic potential for phage therapy. Host ranges are traditionally assessed in the laboratory, which is both time-consuming and restricted to culturable bacteria. In silico host prediction is therefore now critical, especially for large-scale metagenomic data for which cultured hosts do not exist for viral sequences. In silico host prediction methods fall broadly into two categories.

Host prediction: Database-driven matching

These methods can be further classified into repositories of documented or predicted phage-host interactions (PHI-base [96], ViralHostRangeDB [97], and MVP [98]) and predictive computational approaches that infer a host from an input phage sequence. For the latter host prediction tools, they must strike a good balance between recall and false discovery rate (FDR). Here, PHISDetector [99] and VirHostMatcher-Net [100] show favorable recall values for the task, but they also reported unfavorably high FDRs of >10%. On the other hand, the supervised tool iPHoP [101] gives low FDRs coupled with high recall values for known and even novel phages at the genus level. Technically, it employs an automated approach that integrates database comparison with genome pattern analysis to simplify host prediction. Phage hosts can also be inferred by aligning the query phage with a database of known phage-host pairs, e.g. RaFAH [102], or by analysis of sequence alignment patterns, which can reveal prophage or CRISPR integration (using SpacePHARER [103]).

Host prediction: Alignment-free sequence feature models

These approaches analyze oligonucleotide usage patterns or trained sequence features to infer host identity without alignments. These include a collection of different tools, which determine the host genome k-mer frequencies relative to the phage genome, e.g. WIsH [104], PHIST [105], DeepHost [106], HostPhinder [, 107], and PHP [108]. Among them, the Prokaryotic virus Host Predictor (PHP) [108] is the most accurate at the genus level. It excels in situations where alignment-based methods fail and it features flexible host prediction from fragmented viral genomes and is particularly effective in predicting hosts from challenging metaviromes. Two accurate alternatives to PHP available for metaviromes are HoPhage [109], which features both a Markov-chain model and a DL-method for host genus prediction, and CHERRY [110], which combines proteome- and genome-derived feature graphs. To achieve optimal results, researchers are advised to cross-validate predictions among complementary tools and include ecological metadata where available.

Integrated pipelines

Phage discovery and downstream analysis is a step-by-step approach that includes the successive or parallel employment of different dedicated tools. In response, several groups have designed integrated virus analysis pipelines that bundle tools into workflows, comprising all or most of the steps outlined in this review (Table 3). Here we provide several examples that demonstrate the range of approaches currently available. PhageCompass (https://phagecompass.ku.dk) is a web application built by an international collaboration for translational phage therapy. It integrates several evaluation tools (including PhageBoost [56]) into a structured and easily accessible web interface, supporting open access and educational outreach. MetaPhage [111] is a Nextflow-based modular pipeline for expert users. It facilitates virus mining from metagenomic data through a multi-step process including read classification, assembly, and virus prediction through an ensemble of tools (including Phigaro, VIBRANT, VirFinder, and VirSorter). The pipeline “What the phage” (WtP) [112] is a reproducible and scalable NextFlow workflow for expert users comprising multiple phage detection tools (including VirFinder, PPR-Meta, VirSorter1/2, Seeker, MetaPhinder, DeepVirFinder, and VIBRANT) with subsequent virus annotation and classification (using Phigaro) and offers user-friendly summaries in chart and table format. Finally, PhaBOX2 [43] is suitable for both single genomes and metagenomes and offers a highly accessible, web server-based pipeline. It takes contigs/sequences in FASTA format and runs virus identification (PhaMer [50]), taxonomic classification (PhaGCN [113]), host and lifestyle prediction (CHERRY/HostG [110] and PhaTYP [92]), contamination and provirus integration screening, vOTU grouping, marker gene-based phylogenetic tree inference, and viral protein annotation using recent databases via ICTV 2024. Expert users can run PhaBOX2 locally using a command-line interface (CLI). In summary, workflows simplify the manual overhead of linking the outputs of multiple tools and produce formatted outputs that can aid reproducibility and interpretation, critical assets in large-scale virome studies and translational applications such as phage therapy.

Conclusion

The advent of new computational ML and DL methods has significantly elevated the speed, accuracy, and sensitivity of virus prediction. Nevertheless, significant challenges persist, such as the identification and taxonomic placement of rare or uncommon phages or the discrimination of closely related viral genomes in highly complex metagenomic data. With the surging number of new bioinformatic phage tools and acknowledging that no single tool represents the optimal global approach for tackling all research questions, scientists increasingly must pair analytical approaches to their specific questions. This guide seeks to help that process by supporting researchers to make informed, capable decisions in aid of their goals and abilities. As the field of viral signal detection in large metagenomic datasets continues to evolve rapidly, our review is a mere snapshot of this ongoing development. We do hope, though, that our historical treatment of the various underlying algorithms will enable users to better grasp and categorize new tools as they emerge. It is essential to harness the full potential of the latest tools, and so we hope that our guide will support phage explorers in their quest to discover novel phage elements from (meta)genomic datasets. In the future, tool design will likely integrate ecological background, metadata standards, and gene-sharing network approaches. For instance, clustering algorithms based on gene sharing, such as vConTACT2 [77], effectively group new viruses irrespective of their taxonomy. Concurrently, initiatives such as MIUViG [114] are setting the necessary metadata standards to improve reproducibility in viral ecology research. Finally, advanced host prediction programs and ML/DL-models that have been trained on ecological or temporal patterns will likely bridge the gap between detection and interpretation.

Key Points

  • The number and diversity of computational tools for predicting prokaryotic viruses from single genomes and metagenomic data have rapidly expanded over the past decade, reflecting both technical innovation and growing interest in viral applications like phage therapy.

  • Without claiming to be exhaustive, a wide range of state-of-the-art phage prediction tools are discussed and critically evaluated.

  • A step-by-step guide is proposed that covers and critically assesses tools for phage prediction, gene annotation, taxonomic classification, and more.

  • Since user input data can vary and sequence databases differ, it’s essential to evaluate how well each tool works under different scenarios using reliable statistical measures and consistent benchmarks, which are discussed for both metagenomes and single genomes.

  • In conclusion, in silico phage prediction provides valuable, testable hypotheses about phage biology and taxonomy, integration sites, and lifestyle traits, which all should be validated experimentally wherever possible.

Contributor Information

Carolin Charlotte Wendling, Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zürich, Universitätstrasse 16, 8092 Zürich, Switzerland; Max von Pettenkofer-Institut, Ludwig-Maximilians-Universität (LMU), Pettenkoferstraße 9a, 80366 München, Germany.

Marie Vasse, Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zürich, Universitätstrasse 16, 8092 Zürich, Switzerland; CNRS UMR 5164, ImmunoConcept, Université de Bordeaux, Site de Carreire, Bâtiment BBS, 2 Rue Dr Hoffmann Martinot, 33076 Bordeaux Cedex, France.

Sébastien Wielgoss, Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zürich, Universitätstrasse 16, 8092 Zürich, Switzerland.

Author contributions

C.C.W., M.V., and S.W. conceived and drafted the original manuscript. All authors have contributed to revising and editing of later versions of the manuscript. S.W. handled manuscript submission, editorial correspondence, and coordinated revisions. C.C.W. secured funding.

Conflict of interest: None declared.

Funding

This work was supported by the Swiss National Science Foundation (grant number PZ00P3_179743 to C.C.W.). We thank Anne Kupczok for providing helpful comments on the draft version of this manuscript. We thank Dr. Willem van Schaik for sharing raw data from [47].

Data availability

No new data were generated or analysed in support of this research.

References

  • 1. Coutinho  FH, Silveira  CB, Gregoracci  GB. et al.  Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans. Nat Commun  2017;8:15955. 10.1038/ncomms15955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Li  R, Wang  Y, Hu  H. et al.  Metagenomic analysis reveals unexplored diversity of archaeal virome in the human gut. Nat Commun  2022;13:7978. 10.1038/s41467-022-35735-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Paez-Espino  D, Eloe-Fadrosh  EA, Pavlopoulos  GA. et al.  Uncovering Earth’s virome. Nature  2016;536:425–30. 10.1038/nature19094 [DOI] [PubMed] [Google Scholar]
  • 4. Paez-Espino  D, Zhou  J, Roux  S. et al.  Diversity, evolution, and classification of virophages uncovered through global metagenomics. Microbiome  2019;7:157. 10.1186/s40168-019-0768-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Roux  S, Hallam  SJ, Woyke  T. et al.  Viral dark matter and virus-host interactions resolved from publicly available microbial genomes. Elife  2015;4:4. 10.7554/eLife.08490 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Vik  DR, Roux  S, Brum  JR. et al.  Putative archaeal viruses from the mesopelagic ocean. PeerJ  2017;5:e3428. 10.7717/peerj.3428 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Murray  CJL, Ikuta  KS, Sharara  F. et al.  Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet  2022;399:629–55. 10.1016/S0140-6736(21)02724-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Gould  SJ. The evolution of life on the earth. Sci Am  1994;271:84–91. 10.1038/scientificamerican1094-84 [DOI] [PubMed] [Google Scholar]
  • 9. Wilhelm  SW, Suttle  CA. Viruses and nutrient cycles in the sea - viruses play critical roles in the structure and function of aquatic food webs. Bioscience  1999;49:781–8. 10.2307/1313569 [DOI] [Google Scholar]
  • 10. Wendling  CC. Prophage mediated control of higher order interactions - insights from multi-level approaches. Curr Opin Syst Biol  2023;35:100469. 10.1016/j.coisb.2023.100469 [DOI] [Google Scholar]
  • 11. Andrade-Martínez  JS, Valera  LCC, Cárdenas  LAC. et al.  Computational tools for the analysis of uncultivated phage genomes. Microbiol Mol Biol Rev  2022;86:e00004–21. 10.1128/mmbr.00004-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Dion  MB, Oechslin  F, Moineau  S. Phage diversity, genomics and phylogeny. Nat Rev Microbiol  2020;18:125–38. 10.1038/s41579-019-0311-5 [DOI] [PubMed] [Google Scholar]
  • 13. Casjens  S, Palmer  N, van  Vugt  R. et al.  A bacterial genome in flux: the twelve linear and nine circular extrachromosomal DNAs in an infectious isolate of the Lyme disease spirochete Borrelia burgdorferi. Mol Microbiol  2000;35:490–516. 10.1046/j.1365-2958.2000.01698.x [DOI] [PubMed] [Google Scholar]
  • 14. Asadulghani  M, Ogura  Y, Ooka  T. et al.  The defective prophage pool of Escherichia coli O157: prophage-prophage interactions potentiate horizontal transfer of virulence determinants. PLoS Pathog  2009;5:e1000408. 10.1371/journal.ppat.1000408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Lawrence  JG, Hendrix  RW, Casjens  S. Where are the pseudogenes in bacterial genomes?  Trends Microbiol  2001;9:535–40. 10.1016/S0966-842X(01)02198-9 [DOI] [PubMed] [Google Scholar]
  • 16. Sicard  A, Michalakis  Y, Gutierrez  S. et al.  The strange lifestyle of multipartite viruses. PLoS Pathog  2016;12:e1005819. 10.1371/journal.ppat.1005819 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Turgeon N, Toulouse MJ, Martel B et al. Comparison of five bacteriophages as models for viral aerosol studies. Appl Environ Microb 2014;80:4242-50. 10.1128/AEM.00767-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Roux  S, Krupovic  M, Daly  RA. et al.  Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes. Nat Microbiol  2019;4:1895–906. 10.1038/s41564-019-0510-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Arndt  D, Marcu  A, Liang  YJ. et al.  PHAST, PHASTER and PHASTEST: tools for finding prophage in bacterial genomes. Brief Bioinform  2019;20:1560–7. 10.1093/bib/bbx121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Casjens  S. Prophages and bacterial genomics: what have we learned so far?  Mol Microbiol  2003;49:277–300. 10.1046/j.1365-2958.2003.03580.x [DOI] [PubMed] [Google Scholar]
  • 21. Karlin  S. Global dinucleotide signatures and analysis of genomic heterogeneity. Curr Opin Microbiol  1998;1:598–610. 10.1016/S1369-5274(98)80095-7 [DOI] [PubMed] [Google Scholar]
  • 22. Srividhya  KV, Alaguraj  V, Poornima  G. et al.  Identification of prophages in bacterial genomes by dinucleotide relative abundance difference. PloS One  2007;2:e1193. 10.1371/journal.pone.0001193 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Nicolas  P, Bize  L, Muri  F. et al.  Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models. Nucleic Acids Res  2002;3:1418–26. 10.1093/nar/30.6.1418 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Fouts  DE. Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res  2006;34:5839–51. 10.1093/nar/gkl732 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Bose  M, Barber  RD. Prophage finder: a prophage loci prediction tool for prokaryotic genome sequences. In Silico Biol  2006;6:223–7. 10.3233/ISB-00235 [DOI] [PubMed] [Google Scholar]
  • 26. Lima-Mendez  G, Van Helden  J, Toussaint  A. et al.  Prophinder: a computational tool for prophage prediction in prokaryotic genomes. Bioinformatics  2008;24:863–5. 10.1093/bioinformatics/btn043 [DOI] [PubMed] [Google Scholar]
  • 27. Arndt  D, Grant  JR, Marcu  A. et al.  PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res  2016;44:W16–21. 10.1093/nar/gkw387 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Zhou  Y, Liang  Y, Lynch  KH. et al.  PHAST: a fast phage search tool. Nucleic Acids Res  2011;39:W347–52. 10.1093/nar/gkr485 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Akhter  S, Aziz  RK, Edwards  RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res  2012;40:e126. 10.1093/nar/gks406 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Roux  S, Enault  F, Hurwitz  BL. et al.  VirSorter: mining viral signal from microbial genomic data. PeerJ  2015;3:e985. 10.7717/peerj.985 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Jurtz  VI, Villarroel  J, Lund  O. et al.  MetaPhinder-identifying bacteriophage sequences in metagenomic data sets. PloS One  2016;11:e0163111. 10.1371/journal.pone.0163111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Kieft  K, Zhou  Z, Anantharaman  K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome  2020;8:90. 10.1186/s40168-020-00867-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Wood  DE, Salzberg  SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol  2014;15:R46. 10.1186/gb-2014-15-3-r46 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Wood  DE, Lu  J, Langmead  B. Improved metagenomic analysis with Kraken 2. Genome Biol  2019;20:257. 10.1186/s13059-019-1891-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Ren  J, Ahlgren  NA, Lu  YY. et al.  VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome  2017;5:69. 10.1186/s40168-017-0283-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Ren  J, Song  K, Deng  C. et al.  Identifying viruses from metagenomic data using deep learning. Quant Biol  2020;8:64–77. 10.1007/s40484-019-0187-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Amgarten  D, Braga  LPP, da  Silva  AM. et al.  MARVEL, a tool for prediction of bacteriophage sequences in metagenomic bins. Front Genet  2018;9:304. 10.3389/fgene.2018.00304 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Zheng  T, Li  J, Ni  Y. et al.  Mining, analyzing, and integrating viral signals from metagenomic data. Microbiome  2019;7:42. 10.1186/s40168-019-0657-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Fang  Z, Tan  J, Wu  S. et al.  PPR-meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning. Gigascience  2019;8:8. 10.1093/gigascience/giz066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Auslander  N, Gussow  AB, Benler  S. et al.  Seeker: alignment-free identification of bacteriophage genomes by deep learning. Nucleic Acids Res  2020;48:e121. 10.1093/nar/gkaa856 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Miao  Y, Liu  F, Hou  T. et al.  Virtifier: a deep learning-based identifier for viral sequences from metagenomes. Bioinformatics  2022;38:1216–22. 10.1093/bioinformatics/btab845 [DOI] [PubMed] [Google Scholar]
  • 42. Guo  J, Bolduc  B, Zayed  AA. et al.  VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome  2021;9:37. 10.1186/s40168-020-00990-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Shang  JY, Peng  C, Liao  HR. et al.  PhaBOX: a web server for identifying and characterizing phage contigs in metagenomic data. Bioinform Adv  2023;3:3. 10.1093/bioadv/vbad101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Nayfach  S, Camargo  AP, Schulz  F. et al.  CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat Biotechnol  2021;39:578–85. 10.1038/s41587-020-00774-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Zhou  ZC, Martin  C, Kosmopoulos  JC. et al.  ViWrap: a modular pipeline to identify, bin, classify, and predict viral-host relationships for viruses from metagenomes. Imeta  2023;2:2. 10.1002/imt2.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Hegarty  B, Riddell  J, Bastien  E. et al.  Benchmarking informatics approaches for virus discovery: caution is needed when combining identification methods. mSystems  2024;9:9. 10.1128/msystems.01105-23 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Ho  SFS, Wheeler  NE, Millard  AD. et al.  Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data. Microbiome  2023;11:84. 10.1186/s40168-023-01533-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Wu  LY, Wijesekara  Y, Piedade  GJ. et al.  Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes. Genome Biol  2024;25:97. 10.1186/s13059-024-03236-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Uritskiy  GV, DiRuggiero  J, Taylor  J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome  2018;6:158. 10.1186/s40168-018-0541-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Shang  J, Tang  X, Guo  R. et al.  Accurate identification of bacteriophages from metagenomic data using transformer. Brief Bioinform  2022;23:bbac258. 10.1093/bib/bbac258 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Roach  MJ, McNair  K, Michalczyk  M. et al.  Philympics 2021: prophage predictions perplex programs. F1000Research  2022;10:758. 10.12688/f1000research.54449.2 [DOI] [Google Scholar]
  • 52. Wishart  DS, Han  S, Saha  S. et al.  PHASTEST: faster than PHASTER, better than PHAST. Nucleic Acids Res  2023;51:W443–50. 10.1093/nar/gkad382 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. McNair  K, Decewicz  P, Daniel  S. et al. PhiSpy (version 3.4.5). https://github.com/linsalrob/PhiSpy.
  • 54. Starikova  EV, Tikhonova  PO, Prianichnikov  NA. et al.  Phigaro: high-throughput prophage sequence annotation. Bioinformatics  2020;36:3882–4. 10.1093/bioinformatics/btaa250 [DOI] [PubMed] [Google Scholar]
  • 55. Reis-Cunha  JL, Bartholomeu  DC, Manson  AL. et al.  ProphET, prophage estimation tool: a stand-alone prophage sequence prediction tool with self-updating reference database. PloS One  2019;14:e0223364. 10.1371/journal.pone.0223364 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Sirén  K, Millard  A, Petersen  B. et al.  Rapid discovery of novel prophages using biological feature engineering and machine learning. NAR Genom Bioinform  2021;3:lqaa109. 10.1093/nargab/lqaa109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Gan  R, Zhou  F, Si  Y. et al.  DBSCAN-SWA: an integrated tool for rapid prophage detection and annotation. Front Genet  2022;13:885048. 10.3389/fgene.2022.885048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Gauthier  CH, Abad  L, Venbakkam  AK. et al.  DEPhT: a novel approach for efficient prophage discovery and precise extraction. Nucleic Acids Res  2022;50:e75. 10.1093/nar/gkac273 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Roux  S, Krupovic  M, Daly  RA. et al.  Author correction: cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes. Nat Microbiol  2020;5:527. 10.1038/s41564-020-0681-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Roux  S.  Inovirus. https://github.com/simroux/Inovirus [accessed 25 August 2025].
  • 61. Delcher  AL, Bratke  KA, Powers  EC. et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics  2007;23:673–9. 10.1093/bioinformatics/btm009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Lomsadze  A, Gemayel  K, Tang  S. et al.  Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes. Genome Res  2018;28:1079–89. 10.1101/gr.230615.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Hyatt  D, Chen  GL, Locascio  PF. et al.  Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics  2010;11:119. 10.1186/1471-2105-11-119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Cahill  J, Rajaure  M, O’Leary  C. et al.  Genetic analysis of the lambda Spanins Rz and Rz1: identification of functional domains. G3 (Bethesda)  2017;7:741–53. 10.1534/g3.116.037192 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Kang  CH, Shin  Y, Jang  S. et al.  Characterization of Vibrio parahaemolyticus isolated from oysters in Korea: resistance to various antibiotics and prevalence of virulence genes. Mar Pollut Bull  2017;118:261–6. 10.1016/j.marpolbul.2017.02.070 [DOI] [PubMed] [Google Scholar]
  • 66. Kang  HS, McNair  K, Cuevas  DA. et al.  Prophage genomics reveals patterns in phage genome organization and replication. bioRxiv  2017. 10.1101/114819  preprint: not peer reviewed [DOI] [Google Scholar]
  • 67. McNair  K, Zhou  C, Dinsdale  EA. et al.  PHANOTATE: a novel approach to gene identification in phage genomes. Bioinformatics  2019;35:4537–42. 10.1093/bioinformatics/btz265 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Lazeroff  M, Ryder  G, Harris  SL. et al.  Phage Commander, an application for rapid gene identification in bacteriophage genomes using multiple programs. Phage (New Rochelle)  2021;2:204–13. 10.1089/phage.2020.0044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Ecale, Zhou  CL, Kimbrel  J, Edwards  R. et al.  MultiPhATE2: code for functional annotation and comparison of phage genomes. G3 (Bethesda)  2021;11:jkab074. 10.1093/g3journal/jkab074 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Bouras  G, Nepal  R, Houtak  G. et al.  Pharokka: a fast scalable bacteriophage annotation tool. Bioinformatics  2023;39:btac776. 10.1093/bioinformatics/btac776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Terzian  P, Olo Ndela  E, Galiez  C. et al.  PHROG: families of prokaryotic virus proteins clustered using remote homology. NAR Genom Bioinform  2021;3:lqab067. 10.1093/nargab/lqab067 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Cantu  VA, Salamon  P, Seguritan  V. et al.  PhANNs, a fast and accurate tool and web server to classify phage structural proteins. PLoS Comput Biol  2020;16:e1007845. 10.1371/journal.pcbi.1007845 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Sommer  MJ, Salzberg  SL. Balrog: a universal protein model for prokaryotic gene prediction. PLoS Comput Biol  2021;17:e1008727. 10.1371/journal.pcbi.1008727 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Turner  D, Shkoporov  AN, Lood  C. et al.  Abolishment of morphology-based taxa and change to binomial species names: 2022 taxonomy update of the ICTV bacterial viruses subcommittee. Arch Virol  2023;168:74. 10.1007/s00705-022-05694-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Meier-Kolthoff  JP, Goker  M. VICTOR: genome-based phylogeny and classification of prokaryotic viruses. Bioinformatics  2017;33:3396–404. 10.1093/bioinformatics/btx440 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76. Nishimura  Y, Yoshida  T, Kuronishi  M. et al.  ViPTree: the viral proteomic tree server. Bioinformatics  2017;33:2379–80. 10.1093/bioinformatics/btx157 [DOI] [PubMed] [Google Scholar]
  • 77. Bin Jang  H, Bolduc  B, Zablocki  O. et al.  Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nat Biotechnol  2019;37:632–9. 10.1038/s41587-019-0100-8 [DOI] [PubMed] [Google Scholar]
  • 78. Aiewsakun  P, Simmonds  P. The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification. Microbiome  2018;6:38. 10.1186/s40168-018-0422-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Moraru  C. VirClust-a tool for hierarchical clustering, core protein detection and annotation of (prokaryotic) viruses. Viruses  2023;15:15. 10.3390/v15041007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Kupczok  A, Bailey  ZM, Refardt  D. et al.  Co-transfer of functionally interdependent genes contributes to genome mosaicism in lambdoid phages. Microb Genom  2022;8:8. 10.1099/mgen.0.000915 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81. Moraru  C, Varsani  A, Kropinski  AM. VIRIDIC-a novel tool to calculate the intergenomic similarities of prokaryote-infecting viruses. Viruses  2020;12:1268. 10.3390/v12111268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82. Chibani  CM, Farr  A, Klama  S. et al.  Classifying the unclassified: a phage classification method. Viruses  2019;11:11. 10.3390/v11020195 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Chibani  CM, Meinecke  F, Farr  A. et al.  ClassiPhages 2.0: sequence-based classification of phages using artificial neural networks. bioRxiv  2019. 10.1101/558171  preprint: not peer reviewed [DOI] [Google Scholar]
  • 84. Jain  C, Rodriguez-R  LM, Phillippy  AM. et al.  High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun  2018;9:9. 10.1038/s41467-018-07641-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85. de  Sousa  JAM, Fillol-Salom  A, Penades  JR. et al.  Identification and characterization of thousands of bacteriophage satellites across bacteria. Nucleic Acids Res  2023;51:2759–77. 10.1093/nar/gkad123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. Rajaei  P, Jahanian  KH, Beheshti  A. et al.  VIRMOTIF: a user-friendly tool for viral sequence analysis. Genes (Basel)  2021;12:12. 10.3390/genes12020186 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87. Roux  S, Krupovic  M, Debroas  D. et al.  Assessment of viral community functional potential from viral metagenomes may be hampered by contamination with cellular sequences. Open Biol  2013;3:130160. 10.1098/rsob.130160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88. Raiko  M.  viralComplete: BLAST-based viral completeness verification. https://github.com/ablab/viralComplete [accessed 25 August 2025].
  • 89. Mallawaarachchi  V, Roach  MJ, Decewicz  P. et al.  Phables: from fragmented assemblies to high-quality bacteriophage genomes. Bioinformatics  2023;39:btad586. 10.1093/bioinformatics/btad586 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90. McNair  K, Bailey  BA, Edwards  RA. PHACTS, a computational approach to classifying the lifestyle of phages. Bioinformatics  2012;28:614–8. 10.1093/bioinformatics/bts014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91. Hockenberry  AJ, Wilke  CO. BACPHLIP: predicting bacteriophage lifestyle from conserved protein domains. PeerJ  2021;9:e11396. 10.7717/peerj.11396 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Shang  J, Tang  X, Sun  Y. PhaTYP: predicting the lifestyle for bacteriophages using BERT. Brief Bioinform  2023;24:bbac487. 10.1093/bib/bbac487 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93. Wu  S, Fang  Z, Tan  J. et al.  DeePhage: distinguishing virulent and temperate phage-derived sequences in metavirome data with a deep learning approach. Gigascience  2021;10:10. 10.1093/gigascience/giab056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94. Song  K.  PhagePred. https://github.com/songkai1987/PhagePred [accessed 25 August 2025].
  • 95. Garneau  JR, Depardieu  F, Fortier  LC. et al.  PhageTerm: a tool for fast and accurate determination of phage termini and packaging mechanism using next-generation sequencing data. Sci Rep  2017;7:8292. 10.1038/s41598-017-07910-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96. Urban  M, Cuzick  A, Seager  J. et al.  PHI-base in 2022: a multi-species phenotype database for pathogen-host interactions. Nucleic Acids Res  2022;50:D837–47. 10.1093/nar/gkab1037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97. Lamy-Besnier  Q, Brancotte  B, Menager  H. et al.  Viral host range database, an online tool for recording, analyzing and disseminating virus-host interactions. Bioinformatics  2021;37:2798–801. 10.1093/bioinformatics/btab070 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98. Gao  NL, Zhang  C, Zhang  Z. et al.  MVP: a microbe-phage interaction database. Nucleic Acids Res  2018;46:D700–7. 10.1093/nar/gkx1124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99. Zhou  F, Gan  R, Zhang  F. et al.  PHISDetector: a tool to detect diverse in silico phage-host interaction signals for virome studies. Genom Proteom Bioinform  2022;20:508–23. 10.1016/j.gpb.2022.02.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100. Wang  W, Ren  J, Tang  K. et al.  A network-based integrated framework for predicting virus-prokaryote interactions. NAR Genom Bioinform  2020;2:lqaa044. 10.1093/nargab/lqaa044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101. Roux  S, Camargo  AP, Coutinho  FH. et al.  iPHoP: an integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria. PLoS Biol  2023;21:e3002083. 10.1371/journal.pbio.3002083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102. Coutinho  FH, Zaragoza-Solas  A, Lopez-Perez  M. et al.  RaFAH: host prediction for viruses of bacteria and archaea based on protein content. Patterns (N Y)  2021;2:100274. 10.1016/j.patter.2021.100274 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103. Zhang  R, Mirdita  M, Levy Karin  E. et al.  SpacePHARER: sensitive identification of phages from CRISPR spacers in prokaryotic hosts. Bioinformatics  2021;37:3364–6. 10.1093/bioinformatics/btab222 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104. Galiez  C, Siebert  M, Enault  F. et al.  WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs. Bioinformatics  2017;33:3113–4. 10.1093/bioinformatics/btx383 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105. Zielezinski  A, Deorowicz  S, Gudys  A. PHIST: fast and accurate prediction of prokaryotic hosts from metagenomic viral sequences. Bioinformatics  2022;38:1447–9. 10.1093/bioinformatics/btab837 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106. Ruohan  W, Xianglilan  Z, Jianping  W. et al.  DeepHost: phage host prediction with convolutional neural network. Brief Bioinform  2022;23:bbab385. 10.1093/bib/bbab385 [DOI] [PubMed] [Google Scholar]
  • 107. Villarroel  J, Kleinheinz  KA, Jurtz  VI. et al.  HostPhinder: a phage host prediction tool. Viruses-Basel  2016;8:8. 10.3390/v8050116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108. Lu  CY, Zhang  Z, Cai  ZN. et al.  Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics. BMC Biol  2021;19:19. 10.1186/s12915-020-00938-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109. Tan  J, Fang  Z, Wu  S. et al.  HoPhage: an ab initio tool for identifying hosts of phage fragments from metaviromes. Bioinformatics  2022;38:543–5. 10.1093/bioinformatics/btab585 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110. Shang  J, Sun  Y. CHERRY: a computational metHod for accuratE pRediction of virus-pRokarYotic interactions using a graph encoder-decoder model. Brief Bioinform  2022;23:bbac182. 10.1093/bib/bbac182 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111. Pandolfo  M, Telatin  A, Lazzari  G. et al.  MetaPhage: an automated pipeline for analyzing, annotating, and classifying bacteriophages in metagenomics sequencing data. mSystems  2022;7:e0074122. 10.23736/S2724-6051.25.06499-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112. Marquet  M, Holzer  M, Pletz  MW. et al.  What the phage: a scalable workflow for the identification and analysis of phage sequences. Gigascience  2022;11:11. 10.1093/gigascience/giac110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113. Shang  JY, Jiang  JZ, Sun  YN. Bacteriophage classification for assembled contigs using graph convolutional network. Bioinformatics  2021;37:I25–33. 10.1093/bioinformatics/btab293 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114. Roux  S, Adriaenssens  EM, Dutilh  BE. et al.  Minimum information about an uncultivated virus genome (MIUViG). Nat Biotechnol  2019;37:29–37. 10.1038/nbt.4306 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115. Gan  R, Zhou  F, Si  Y. et al. DBSCAN-SWA. https://github.com/HIT-ImmunologyLab/DBSCAN-SWA [DOI] [PMC free article] [PubMed]
  • 116. Gauthier  CH, Abad  L, Venbakkam  AK. et al. Detection and Extraction of Phages Tool (DEPhT).  https://github.com/chg60/DEPhT [accessed 25 August 2025].
  • 117. Garneau  JR, Depardieu  F, Fortier  LC. et al. PhageTerm: Fork of the Code That Is Available via Sourceforge. https://github.com/avilella/phageterm [accessed 25 August 2025].
  • 118. Wishart  DS, Han  S, Saha  S. et al. PHASTEST: faster than PHASTER, better than PHAST. https://phastest.ca [accessed 25 August 2025]. [DOI] [PMC free article] [PubMed]
  • 119. Reis-Cunha  JL, Bartholomeu  DC, Earl  AM. et al. ProphET. https://github.com/jaumlrc/ProphET [accessed 25 August 2025].
  • 120. Ren  J, Song  K, Deng  C. et al. DeepVirFinder. https://github.com/jessieren/DeepVirFinder [accessed 25 August 2025].
  • 121. Wood  D.  Kraken2. https://github.com/DerrickWood/kraken2 [accessed 25 August 2025].
  • 122. Amgarten  DE. MARVEL. https://github.com/LaboratorioBioinformatica/MARVEL [accessed 25 August 2025].
  • 123. Jurtz  V.  MetaPhinder. https://github.com/vanessajurtz/MetaPhinder [accessed 25 August 2025].
  • 124. Fang  Z, Tan  J, Wu  S. et al. PPR-Meta. https://github.com/zhenchengfang/PPR-Meta [accessed 25 August 2025].
  • 125. Uritskiy  GV, DiRuggiero  J, Taylor  J. viralVerify 2018. https://github.com/ablab/viralVerify [accessed 25 August 2025].
  • 126. Ren  J, Ahlgren  N, Lu  Y. et al. VirFinder. https://github.com/jessieren/VirFinder [accessed 25 August 2025].
  • 127. Zheng  T, Li  J, Ni  Y. et al.  VirMiner. A Web-server for Mining Viral Signals in Metagenomic Data. http://sbb.hku.hk/VirMiner/ [accessed 13 August 2020].
  • 128. Zheng  T, Li  J, Ni  Y. et al. VirMiner. https://github.com/TingtZHENG/VirMiner [accessed 25 August 2025].
  • 129. Miao  Y, Liu  F, Liu  Y. Seq2Vec (Virtifier). https://github.com/crazyinter/Seq2Vec [accessed 25 August 2025].
  • 130. Zhou  ZC, Martin  C, Kosmopoulos  JC. et al.  ViWrap: A Modular Pipeline to Identify, Bin, Classify, and Predict Viral-host Relationships for Viruses from Metagenomes. https://github.com/AnantharamanLab/ViWrap [accessed 25 August 2025]. [DOI] [PMC free article] [PubMed]
  • 131. Marquet  M, Holzer  M, Pletz  MW. et al.  What the Phage (WtP): Phage Identification via Nextflow and Docker or Singularity. https://github.com/replikation/What_the_Phage [accessed 25 August 2025].
  • 132. Shang  JY, Peng  C, Liao  HR. et al.  PhaBOX: A Web Server for Identifying and Characterizing Phage Contigs in Metagenomic Data. https://phage.ee.cityu.edu.hk [accessed 25 August 2025]. [DOI] [PMC free article] [PubMed]
  • 133. Shang  JY, Peng  C, Liao  HR. et al.  PhaBOX: Local Version of the Virus Identification and Analysis Web Server (Tool Set). https://github.com/KennthShang/PhaBOX [accessed 25 August 2025].
  • 134. Siren  K, Sicheritz-Pontén  T. PhageBoost - Web. https://phageboost.ku.dk [accessed 25 August 2025].
  • 135. Sirén  K, Millard  A, Petersen  B. et al.  PhageBoost - GitHub. https://github.com/ku-cbd/PhageBoost [accessed 25 August 2025].
  • 136. Shang  J, Tang  X, Guo  R. et al.  PhaMer - GitHub. https://github.com/KennthShang/PhaMer [accessed 25 August 2025].
  • 137. Starikova  EV, Tikhonova  PO, Prianichnikov  NA. et al.  Phigaro Is a Scalable Command-line Tool for Predicting Phages and Prophages. https://github.com/bobeobibo/phigaro [accessed 25 August 2025].
  • 138. Auslander  N, Gussow  AB, Benler  S. et al.  Seeker. https://github.com/gussow/seeker [accessed 25 August 2025].
  • 139. Kieft  K. VIBRANT: GitHub. https://github.com/AnantharamanLab/VIBRANT [accessed 25 August 2025].
  • 140. Roux  S. VirSorter. https://github.com/simroux/VirSorter [accessed 25 August 2025].
  • 141. Guo  J, Bolduc  B, Zayed  AA. et al.  VirSorter2. https://github.com/jiarong/VirSorter2 [accessed 25 August 2025].

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No new data were generated or analysed in support of this research.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES