Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2026 Feb 26;27(1):bbag085. doi: 10.1093/bib/bbag085

Towards accurate artificial intelligence models for strain-level phage–host prediction

Chris J Malajczuk 1,2, Andrew Vaitekenas 3,4, Joshua J Iszatt 5,6, Stephen M Stick 7, Anthony Kicic 8,9,10, Yuliya V Karpievitch, on behalf of PhageWA11,12,13,
PMCID: PMC12936788  PMID: 41744225

Abstract

Strain-level prediction of phage–host interactions (PHIs) is essential for developing targeted phage therapies. Traditional empirical and homology-based methods often lack the resolution and scalability needed for precision applications. Recently, a new generation of artificial intelligence-driven models has emerged leveraging genomic information to infer PHIs at strain-level resolution. Here, we review recent advances in strain-level PHI prediction, spanning biologically grounded feature-based models, hybrid representation-learning frameworks, phylogeny-agnostic machine learning approaches, and end-to-end deep learning architectures. We examine how these modelling strategies navigate shared structural constraints arising from sparse and imbalanced outcome data, assay-dependent labels, infection complexity, and limited generalization. We further analyse how evaluation design, negative definition, and train-test splitting strategies shape apparent strain-level performance, and why inappropriate benchmarking can inflate claims of biological resolution. Framing these issues in the context of clinical phage therapy, we examine how current strain-level PHI prediction frameworks perform under the biological, experimental, and data constraints characteristic of real-world therapeutic settings. Finally, we outline pragmatic pathways toward more robust, interpretable, and clinically translatable PHI prediction systems.

Keywords: phage–host interactions, bacteriophage, strain-level prediction, artificial intelligence, predictive modelling, phage therapy

Introduction

Phage-based technologies, including phage therapy, are emerging as precision tools to combat the growing threat of antibiotic-resistant bacteria [1–3]. Central to their success is the accurate and rapid prediction of phage–host interactions (PHIs), especially at the strain (intraspecies) level, where bacterial susceptibility can hinge on subtle genomic and structural variation [4, 5]. Many phages display narrow host ranges, sometimes infecting only a few strains within a species [6], making the identification of therapeutically effective phages a non-trivial and increasingly data-intensive task. This has driven growing interest in scalable, fine-grained PHI prediction methods to streamline phage candidate selection.

Two biological concepts underpin PHI prediction: host range and infection outcome. Host range describes the set of bacterial strains a phage can recognize and infect, shaped by receptor compatibility, bacterial defence systems, and phage-encoded countermeasures. Infection outcomes span failed adsorption, abortive infection, lysogeny, or productive lytic replication. For therapeutic applications, productive lytic infections are typically prioritized, as they result in efficient bacterial clearance while avoiding horizontal gene transfer and other risks associated with lysogeny [7].

Experimentally, PHIs are assessed using microbiological assays such as spot tests, plaque assays, and liquid co-culture growth experiments (time-kill or ‘kill curve’ assays). While foundational, these approaches are labor-intensive and often inconsistent across studies due to variation in host panels, growth conditions, and outcome interpretation [8, 9]. Such variability complicates data integration and limits the standardization of host-range datasets, yet these empirical measurements remain indispensable and form the basis of all PHI prediction frameworks (Fig. 1; bottom step).

Figure 1.

ALT Text: Flow diagram illustrating the progression of methodologies for predicting PHIs across four steps, beginning with experimental assays, followed by alignment-based computational methods, and culminating in advanced machine-learning and AI models that enable scalable, high-resolution predictions.

Evolution of methodologies for predicting PHIs. Step 1: Traditional experimental laboratory assays are foundational for all PHI predictions but are limited in their reproducibility and scalability prompting the development of early computational approaches. Step 2: Alignment-based bioinformatics methods partially addressed scalability but are limited by dependency on known interactions and genomic annotations. Steps 3 and 4: Modern ML- and AI-based models have emerged as sophisticated alternatives, capable of scalable predictions, generalizable beyond known interactions, and uniquely able to achieve genus-, species-, and strain-level specificity; yet still present distinct challenges.

Early computational approaches to PHI prediction relied primarily on bioinformatics-based similarity methods, including sequence alignment and CRISPR spacer matching, to infer interactions by homology to previously annotated phage–host pairs [10, 11] (Fig. 1; second step). Tools such as PhARIS [12], which compare phage receptor-binding proteins (RBPs) against curated reference sets, can achieve high accuracy in well-characterized systems. However, these approaches struggle to generalize to novel phages or under-explored bacterial strains, particularly where receptor profiles or defence systems are poorly annotated. Small sequence changes in RBPs [13] or genes involved in overcoming host defences [14] can substantially alter infection outcomes, limiting the robustness of purely homology-based inference.

To address these limitations, a growing body of artificial intelligence-based models has emerged to improve the scalability, flexibility, and generalizability of PHI prediction (Fig. 1; top two steps). Throughout this review, we use the term ‘artificial intelligence’ (AI) broadly to encompass data-driven approaches including classical machine learning (ML), ensemble methods, graph-based models, and deep learning (DL). Where relevant, we refer explicitly to specific architectural paradigms to avoid conflating general AI-driven inference with particular modelling choices.

Contemporary AI-based PHI predictors integrate diverse biological signals, including nucleotide composition, protein domains, and ecological or network-derived features, and span a wide architectural spectrum from feature-based ML to DL, graph neural networks, and hybrid frameworks. While these methods overcome some limitations of homology-based tools by learning latent patterns directly from data, most have been developed for genus- or species-level prediction rather than strain-level resolution [8, 15]. Recent reviews by Nie et al. [15] and Howell et al. [8] provide comprehensive overviews of these first- and second-generation models, detailing their architectures, input features, and taxonomic scope.

Tools such as vHULK [16], RaFAH [17], and PHIAF [18], along with alignment-free feature-based approaches such as PredPHI [19] and PB-LKS [20], have introduced ensemble learning, k-mer composition, protein clustering, and synthetic data augmentation, achieving high performance in genus-level and metagenomic settings. Other tools including DeepHost [21], PHISGAE [22], DSPHI [23], PHPGAT [24], and GE-PHI [25] illustrate emerging methodological trends such as attention-based learning, graph-based inference, and multimodal integration. However, the utility of these models for clinical applications requiring fine-grained specificity and biological interpretability remains limited.

AI-based models explicitly developed for strain-level prediction are typically designed for specific pathogens or phage systems and often require iterative updating to accommodate the rapid evolution of phage–host landscapes (Fig. 1; top step). It is therefore important to distinguish strain-level PHI prediction from the application of broader taxonomic-level models to strain-resolved datasets. Several predictors primarily developed for genus- or species-level tasks, including PredPHI [19], PHIAF [18], PB-LKS [20], and PHPRBP [26] have been evaluated on strain-resolved PHI matrices or rank-stratified benchmarks and can recover predictive signal under certain conditions [20, 26, 27]. However, these models were not explicitly designed to resolve fine-scale host specificity between closely related bacterial isolates, and observed performance may reflect implicit exploitation of phylogenetic structure, taxonomic constraints, or dataset composition, rather than genuine discrimination of strain-specific genomic compatibility. Throughout this review, we therefore reserve the term ‘strain-level PHI prediction’ for models explicitly developed and evaluated to resolve strain-specific compatibility using genomic or sequence-derived information.

The emergence of AI-based strain-level PHI prediction frameworks across the past 2 years coincides with continued progress in species- and genus-level models (Fig. 2), reflecting not a shift away from broader taxonomic inference, but a maturation and diversification of the field. As data availability, computational capacity, and representation learning techniques have advanced, PHI prediction has begun to branch toward increasingly specialized and application-driven objectives, including high-resolution tasks such as rational phage cocktail design for therapy [6]. This diversification underscores the need for careful assessment of how modelling assumptions, data structures, and evaluation strategies differ across resolution scales.

Figure 2.

ALT Text: Timeline illustrating notable AI-based PHI prediction models from the past 5 years, with strain-level models positioned above the timeline, species- and genus-level models below, and colour-coded markers indicating different model architectures.

Timeline of the notable AI-based PHI prediction models from the last 5 years, with strain-level models shown above the timeline and species/genus-level models below. Models are coloured according to model architecture with red flags for feature-based, blue flags for DL, green flags for graph neural-network, yellow flags for hybrid, and purple flags indicating a structure-based approach.

Accordingly, this review focuses specifically on AI-based models developed for and benchmarked against strain-level PHI prediction tasks. Our aims are to (i) critically evaluate the methodological design, predictive scope, and biological assumptions of current strain-level PHI models, and (ii) identify cross-cutting challenges and strategic priorities for advancing their precision, interpretability, and clinical applicability. By clarifying the emerging methodological landscape and its associated trade-offs, we seek to guide the development of strain-level PHI prediction tools that are not only accurate, but also biologically grounded and practically deployable.

Biological and methodological challenges in phage–host interaction prediction

Accurate prediction of PHIs at strain-level resolution is constrained by biological complexity, experimental limitations, and structural deficiencies in available datasets. These include sparse strain-resolved PHI matrices, inconsistent metadata, taxonomic bias toward model organisms, and heterogeneity in interaction labelling. Many of these challenges are not merely amplified but fundamentally altered at strain resolution, where subtle biological differences exert disproportionate influence on infection outcome. Together, these factors have impeded the development of generalizable and scalable PHI prediction tools (Fig. 3). This section briefly outlines the key biological and methodological barriers underlying this limitation.

Figure 3.

ALT Text: Conceptual schematic illustrating multiple upstream biological and methodological challenges feeding into a central strain-level PHI prediction framework, highlighting their combined impact on prediction performance.

Biological and methodological challenges in predicting PHIs. Accurately predicting PHIs is complicated by multiple interdependent upstream biological and methodological factors that feed into the strain-level PHI prediction model to ultimately inform predictive accuracy.

Complexities of phage biology and host specificity

Phages exhibit extensive genetic diversity, rapid evolutionary dynamics, and high ecological adaptability [28, 29]. Host specificity is typically initiated through binding between phage-encoded RBPs and specific host cell surface receptors [6, 30, 31], yet even minor variation in receptor structure, expression, or regulation can generate pronounced strain-level differences in susceptibility [32, 33]. At this resolution, micro-variation rather than overall genomic similarity often determines infection outcome [34–39].

Importantly, successful adsorption does not guarantee productive infection. Following entry, phages must overcome multiple intracellular barriers, including restriction-modification systems, CRISPR-Cas immunity, abortive infection pathways, and blocks to transcription, translation, or assembly [40]. Phages may counter these defences through dedicated anti-defence systems [30, 41] or broader regulatory manipulation, such as sigma factor hijacking and anti-repressor activity [42, 43]. These interactions are highly context-dependent [44] and shaped by evolutionary trade-offs. For example, amplification of the anti-defence gene tifA in T4 phages has been shown to coincide with compensatory loss of accessory genes important for infecting alternative hosts, thereby constraining host range [34].

Such dynamics reflect the evolutionary plasticity of phage genomes, which evolve rapidly via mutation, recombination, and horizontal gene transfer [45]. Even small genomic changes, including point mutations [35–37] or domain swaps [38, 39], can substantially alter infection capacity, limiting the reliability of predictions based solely on global sequence similarity.

Experimental constraints and dataset limitations

Laboratory assays remain the foundation of PHI determination, with spot tests, plaque assays, and liquid co-culture methods providing complementary perspectives on infection outcomes [46, 47]. Semi-quantitative measures such as efficiency of plating [48], or virulence and centroid indices derived from growth curves [49, 50], offer additional resolution, but each assay carries intrinsic limitations affecting reproducibility, interpretability, and scalability.

Assay outcomes are highly sensitive to experimental conditions, including growth medium, host physiological state, bacterial density, and incubation parameters [51]. Minor protocol variations can substantially alter observed infection dynamics [52, 53], complicating cross-study comparison and introducing hidden heterogeneity into PHI datasets.

Moreover, many commonly used assays prioritize visible lysis rather than mechanistic resolution of infection outcome. Spot tests, e.g. can overestimate virulence by conflating productive lysis with abortive infection or lysis from without [52], while apparent non-infective outcomes may reflect incomplete adsorption, suppressed replication, or alternative infection modes [54–57]. Without follow-up assays such as adsorption measurements, time-resolved growth monitoring, or sequencing, these biologically distinct states remain collapsed into coarse outcome labels, contributing to label noise and reduced interpretability in downstream models.

Assay accessibility further shapes dataset composition. Interactions involving slow-growing, fastidious, or difficult-to-culture strains are systematically underrepresented [58, 59], biasing PHI datasets towards well-characterized laboratory strains [60–62]. In clinical contexts, however, many priority pathogens fall precisely into these underrepresented categories [63–65], limiting the relevance of existing datasets for translational prediction tasks.

Data quality and taxonomic coverage

The performance and generalizability of PHI prediction models depends critically on the quality and representativeness of training data. Yet most available datasets exhibit limited strain-level resolution, inconsistent annotation practices, and uneven taxonomic coverage. Public resources such as the Viral Host Range Database [61] and Virus–Host DB [62] remain heavily skewed toward model genera such as Escherichia and Klebsiella, reflecting historical research focus rather than clinical need.

A major limitation is the absence of standardized frameworks for defining and reporting PHI outcomes. Interaction results are frequently reduced to binary or semi-quantitative labels without clear distinction between productive and non-productive infections. While such ambiguity may be tolerated at broader taxonomic scales [66], it becomes a dominant failure mode at strain resolution, where small phenotypic differences carry substantial predictive weight.

Additional data-quality challenges include missing or inconsistent metadata, non-standardized isolate identifiers, and incomplete reporting of experimental parameters such as multiplicity of infection, incubation time, or phage titre. These deficiencies hinder dataset integration, benchmarking, and reuse, further constraining the development of robust and generalizable strain-level PHI prediction models.

Artificial intelligence-based models for strain-level phage–host interaction prediction

The challenges outlined in Section 1 impose stringent and often competing demands on strain-level PHI prediction models. Effective frameworks must operate under sparse and noisy supervision, accommodate assay-dependent outcome labels, and resolve fine-scale host specificity without overfitting to lineage structure or experimental artefacts. In response, a small but rapidly evolving set of AI-based models has emerged, each reflecting distinct assumptions about how biological knowledge should be encoded, how uncertainty should be handled, and how predictive power should be balanced against interpretability and data availability.

Existing strain-level predictors span a continuum of architectural and representational strategies. These include the Phage-by-Phage Classifier (PxPC) [6], a supervised ML model based on curated genomic features and semi-quantitative lytic outcomes; PhageHostLearn [67], which integrates phage RBP and bacterial capsular protein sequence embeddings in a pairwise learning framework; DeepPBI-KG [68], which applies deep neural networks to engineered, system-specific genomic features; a phylogeny-agnostic ML framework (PAML) that leverages large-scale, feature-agnostic genomic representations across diverse taxa [69]; and the Phage–Bacterium Interaction Predictor (PBIP) [27], an end-to-end DL model that learns strain-level compatibility directly from pretrained protein sequence embeddings. A comparative summary of these models is provided in Table 1.

Table 1.

Summary table of AI-based strain-level PHI prediction models.

Model Host taxonomy PHI data Model input Predictor/ classifier Evaluation/ train-test strategy Evaluation metrics Primary application/design intent
N. Phages, N. Strains Total PHIs (positive PHI %) Assay type Class imbalance handling Sequence type Features / embeddings Pair representation
PxPC
Gaborieau et al. [6]
Escherichia coli 96 403 36 688 (20.6%) Multi-concentration plaque assay with semi-quantitative MLC scoring; binarized to lytic (MLC > 0) and non-lytic (MLC = 0). Per-phage class weighting during training; evaluation includes PR-based metrics; model selection by AUROC. Genome Curated, biologically motivated binary/categorical features (no learned embeddings). Phage-specific modelling: one model per phage; each predicts infectivity from the host feature vector (pair handled implicitly by per-phage framing). Random Forest classifier per phage; outputs probability of lytic infection. Phage-wise cross-validation within fixed host–phage system. AUROC (model selection), plus precision, recall, F1, average precision (AUPRC). Rational phage cocktail design and prioritization for E. coli
PhageHostLearn
Boeckaerts et al. [67]
Klebsiella spp. 105 200 10 006 (3.3%) Spot test screening with planktonic killing assay confirmation (growth inhibition); binarized to productive / non-productive. Not explicitly described as resampling/weighting; trained/evaluated on sparse positives with ranking-style evaluation. Proteome ESM-2 embeddings per protein; mean pooling to phage- and host-level vectors. Concatenate mean-pooled phage RBP vector with mean-pooled bacterial capsule vector (fixed-length). XGBoost classifier on concatenated embeddings. Stratified PHI-pair cross-validation within system. ROC AUC and hit ratio@k (retrieval/ranking). Pairwise host-range prediction to support phage prioritization.
DeepPBI-KG
Wei et al. [68]
Klebsiella pneumonia 110 113 12 430 (25.5%) Spot test screening; binarized to productive / non-productive. Explicit negative sampling/balancing; retraining with controlled class splits. Genome & proteome Engineered sequence-derived features (protein + DNA descriptors). Concatenation of engineered phage and host key-gene features into one vector. Deep neural network (multi-layer fully connected). Held-out validation set of unseen K. pneumonia phage–strain pairs. AUROC, AUPRC. General PHI prediction with retraining for intraspecies host-range analysis.
PAML
Arkin et al. [69]
Escherichia coli 94 402 37 788 (20.7%) Same multi-concentration plaque assay with semi-quantitative MLC scoring as PxPC but excluding non-infections phages and no-infection strains; binarized to lytic (MLC > 0) and non-lytic (MLC = 0). Phage-specific inverse-frequency class weights; emphasizes imbalance-aware evaluation (MCC/PR). Genome Engineered genome features: protein family presence/absence + k-mer composition, with feature filtering/reduction. Concatenation of independent phage and bacterial genome feature vectors into a single fixed-length feature vector per interaction. CatBoost gradient boosting reported as best-performing. Conservative PHI-pair splitting with imbalance-aware evaluation. Emphasizes MCC; also reports AUROC/PR-style evaluation. Phylogeny-agnostic host-range prediction for broad screening and downstream cocktail design
Klebsiella pneumonia 59 62 3658 (4.9%) Replicated spot tests with dilution-based confirmation of productive infection; binarized to lytic/non-lytic.
Mixed Klebsiella spp. 46 138 6348 (2.4%) Binarized spot tests.
Pseudomonas aeruginosa 19 23 437 (36.2%) Quantitative efficiency of plating; binarized to lytic/non-lytic.
Mixed Vibronaceae 248 256 63 488 (2.1%) Consensus spot testing validated by planktonic killing assays with phage progeny confirmation; binarized to lytic/non-lytic.
PBIP
Ma et al. [27]
Klebsiella pneumonia 104 120 12 480 (22.8%) Spot tests with associates ‘transmissivity’ score 0–3, averaged across replicates and thresholded to binary. Explicitly evaluates SMOTE during benchmarking; balanced test set; discusses imbalance sensitivity across models. Proteome UniRep embeddings per protein; mean pooling across all proteins to fixed phage and host vectors. Concatenate pooled vectors. DL architecture: CNN + Bi-GRU + attention. Random PHI-pair splits with data augmentation; no external validation. AUROC, AUPRC, ACC, precision, recall, F1; plus benchmarking comparisons. High-resolution strain-level infectivity prediction for clinical phage selection.

Phage-by-phage classifier, a feature-based model

Feature-based AI approaches to PHI prediction encode prior biological knowledge into supervised learning frameworks using curated genomic descriptors. At strain-level resolution, their success depends on the specificity and relevance of selected features, annotation quality, and the reliability of interaction labels. PxPC, introduced by Gaborieau et al. [6], provides a clear example of how biologically grounded feature selection can support accurate strain-level PHI prediction when paired with systematically generated experimental data.

PxPC was developed for virulent Escherichia phages, leveraging a system with well-defined adsorption phenotypes and systematically curated host-range data [6]. Rather than training a single global model, PxPC adopts a phage-specific strategy, training independent binary classifiers for each of 96 phages. Each model predicts susceptibility of Escherichia coli strains to a given phage, explicitly reflecting the narrow host ranges typical of these phages [6] and allowing phage-specific interaction patterns to be learned without dilution in pooled models.

Computationally, PxPC models phage-host compatibility implicitly through host-side genomic features. Bacterial genomes are encoded as fixed-length vectors of binary presence-absence indicators representing surface-associated adsorption determinants, including O-antigen serotypes, lipopolysaccharide (LPS) outer-core types, and ABC-transporter-dependent capsule (K-antigen) loci. These features were derived through systematic genomic annotation pipelines [6, 70–73]. Candidate defence systems were also identified via DefenseFinder [74] and evaluated for predictive signal, however these features were excluded from the final feature set due to their limited predictive value in this Escherichia system [6].

Encoding adsorption determinants as binary features affords interpretability and computational simplicity but treats receptor identity as categorical, without capturing fine-scale sequence or structural variation known to influence RBP-receptor compatibility within receptor classes [75–77]. PxPC therefore resolves coarse-grained adsorption compatibility rather than molecular interaction detail.

A major strength of PxPC lies in its training data. The authors generated a dense, internally consistent interaction matrix by systematically screening all pairwise combinations between 403 E. coli isolates and 96 phages using plaque assays, yielding 38 688 validated PHIs (20.6% considered lytic interactions) [6]. Interactions were scored using a minimum lytic concentration (MLC) metric and subsequently binarized for model training, a pragmatic choice that stabilized learning under extreme class imbalance and aligned with downstream goals such as phage cocktail design [6].

Within this system, PxPC achieved high predictive performance (AUROC ≈ 0.85–0.86) and has since served as a benchmark dataset for subsequent strain-level modelling efforts, including PAML [69]. Overall, PxPC illustrates how mechanistic priors and high-quality outcome data can enable interpretable strain-level PHI prediction within well-characterized systems, while also foregrounding scalability and transferability trade-offs revisited throughout Section 3.

PhageHostLearn, a deep learning-augmented feature-based model

DL offers a route to higher-resolution PHI prediction by learning distributed representations of biological sequences that extend beyond manually curated features. In principle, pretrained protein language models (pLMs) can encode evolutionary and contextual constraints without explicit annotation or alignment [78]. In practice, however, the scarcity of large, consistently labelled strain-level PHI datasets and the combinatorial complexity of PHIs have limited the applicability of fully end-to-end DL models. As a result, recent strain-level approaches have favoured hybrid strategies that integrate pretrained sequence embeddings within biologically informed supervised learning frameworks [67, 78].

PhageHostLearn [67] exemplifies this design philosophy. Rather than encoding adsorption determinants as binary features, the model represents key adsorption-related proteins using pretrained pLM embeddings while retaining a conventional ML classifier. As with PxPC, PhageHostLearn is grounded in a clear biological assumption: that productive infection in Klebsiella is strongly influenced by interactions between phage RBPs and the bacterial capsule. Host-side representation is therefore restricted to K-locus (capsular) proteins, reflecting the central role of capsule diversity in Klebsiella phage susceptibility [79, 80].

Operationally, genes in each phage genome are predicted using PHANOTATE [81] followed by domain-based RBP detection to identify putative RBPs among translated proteins (filtered to 200–1500 aa) [82]. In parallel, bacterial capsule K-locus proteins are identified using Kaptive [83, 84]. Protein sequences from both sides are embedded using the ESM-2 transformer-based pLM [85]. For each phage and host, embeddings are mean-pooled across all detected RBPs or capsule proteins, concatenated to form a phage–host pair representation, and classified using a binary XGBoost model [86].

This architecture balances interpretability and representational richness. Compared with the binary adsorption features used in PxPC, embedding-based representations relax assumptions about feature identity and allow graded similarity between related protein variants to influence predictions. However, aggregation through pooling necessarily summarizes adsorption machinery at a coarse level, collapsing heterogeneity among individual RBPs or capsule proteins. As a result, PhageHostLearn models average adsorption compatibility rather than resolving fine-grained protein-specific determinants, a limitation revisited in Section 3.2.

PhageHostLearn was trained on 10 006 pairwise spot–test interactions between 105 Klebsiella phages and 200 clinical Klebsiella strains, yielding 333 (3.33%) positive interactions [67, 79]. In cross-validation, PhageHostLearn achieved an AUROC of 0.818 [67]. Prospective evaluation on 28 multidrug-resistant Klebsiella pneumoniae clinical isolates yielded comparable performance (AUROC 0.793), with at least one experimentally confirmed interaction identified for 16 isolates following targeted validation of top-ranked predictions [67].

Validation analyses highlighted key limitations. Several predicted interactions were only observed at high phage titres, consistent with incomplete or non-productive infection states, and no consistent associations were detected between intracellular defence systems and infection outcomes. These findings underscore that adsorption-focused representations alone may not fully resolve post-adsorption barriers. Accordingly, the authors proposed a layered modelling strategy in which PhageHostLearn serves as an adsorption-centric first stage, to be complemented by predictors targeting intracellular defence mechanisms [67].

Overall, PhageHostLearn occupies an intermediate position between strictly feature-based models such as PxPC and fully representation-driven DL approaches. Its performance demonstrates the value of integrating learned sequence representations with mechanistic priors under current data constraints, while also highlighting the need for broader representations to capture downstream and condition-dependent infection determinants.

Feature-agnostic and data-driven machine learning

Beyond biologically constrained, adsorption-centric models, recent work has explored whether strain-level PHI prediction can be achieved by relaxing explicit mechanistic assumptions about infection determinants. Two representative approaches occupying this intermediate design space are DeepPBI-KG [68] and the PAML framework [69]. While both move beyond narrowly defined adsorption features, they reflect contrasting philosophies regarding how biological knowledge, interaction uncertainty, and dataset scale should be incorporated into strain-level prediction.

DeepPBI-KG [68] adopts a feature-engineered, data-driven strategy in which strain-level PHIs are inferred from curated genomic descriptors rather than direct modelling of specific infection mechanisms. Rather than operating directly on raw genome sequences, the framework constructs structured representations of phage and host genomes by first identifying subsets of ‘key genes’ associated with PHIs and then summarizing sequence-derived properties of those genes prior to classification. In the authors’ workflow, phage and bacterial genomes are annotated using Prokka [87] to identify coding sequences, and translated proteins are additionally screened against conserved domain databases to support biological interpretation and validation of candidate RBPs.

Feature construction proceeds in two stages. First, auxiliary gene-count vectors are built for phages and hosts, encoding the presence and frequency of gene types across genomes. These concatenated phage-host vectors are subjected to ensemble-based feature-importance ranking using random forests, and reduced gene sets are retained based on cumulative importance thresholds. Second, for coding sequences corresponding to these selected genes, a comprehensive panel of DNA- and protein-level descriptors is computed and aggregated using summary statistics to produce fixed-length genome representations. Phage and host feature vectors are then concatenated to form pairwise inputs for classification using a multi-layer neural network.

This design substantially expands the representational scope beyond the small set of curated adsorption features used in PxPC [6], allowing intracellular and non-adsorptive determinants to contribute indirectly through distributed genomic signal. However, unlike PhageHostLearn [67], representation learning is not performed end-to-end from sequence. Instead, predictive performance remains tightly coupled with annotation quality, feature-engineering choices, and system-specific gene selection, limiting transferability beyond the target host system.

DeepPBI-KG also adopts an aggressive strategy to address outcome uncertainty and class imbalance. Negative interactions are inferred from untested phage–host pairs, and balanced training sets are constructed via clustering-based subsampling of negatives. While these steps stabilize optimization and yield strong apparent performance within the Klebsiella system, they entangle predictive success with dataset construction and sampling strategy, complicating interpretation of reported metrics and limiting evidence for broader generalizability. Within its target system, DeepPBI-KG outperformed several retrained species-level baselines, including PredPHI and PHIAF [68], but remains best understood as a locally optimized, system-specific solution.

In contrast, PAML [69] was explicitly designed to test whether strain-level PHI prediction could be achieved without reliance on curated receptor annotations, host taxonomy, or predefined infection mechanisms. Rather than optimizing for a single bacterial system, PAML aggregates heterogeneous PHI datasets spanning multiple genera and experimental contexts while retaining strain-resolved interaction labels.

Genomic representations in PAML are derived directly from whole-genome sequences using annotation-light feature spaces. Specifically, phage and bacterial genomes are represented as protein family presence-absence profiles generated by clustering predicted coding sequences with MMSeqs2 [88] across the full dataset, producing a pangenome-like binary feature matrix in which each dimension corresponds to a protein family rather than a predefined functional category. In parallel, amino-acid k-mer composition features are computed directly from translated protein sequences, enabling sequence-level variation to contribute to prediction without requiring homology-based annotation. Importantly, neither representation assumes prior knowledge of receptors, RBPs, or defence systems.

Given the resulting high-dimensional feature space and extreme outcome sparsity, PAML places substantial emphasis on feature filtering, dimensionality control, and evaluation design. The workflow applies iterative recursive feature elimination combined with ensemble training to identify predictive features while mitigating overfitting. Features exhibiting strong phylogenetic linkage or restricted strain distribution are systematically filtered, explicitly addressing the risk that models learn lineage signatures rather than interaction determinants. Across extensive parameter sweeps, feature-selection strategy and training protocol were found to exert a larger influence on performance than the precise choice of genomic representation, underscoring that workflow design, rather than architectural complexity, dominates model behaviour under sparse supervision.

Prediction is performed using ensemble ML classifiers, with CatBoost gradient-boosted decision trees [89] selected as the final architecture based on stability and performance across datasets [69]. Interestingly, class imbalance is addressed through phage-specific inverse-frequency weighting, reflecting a deliberate decision to avoid reshaping class geometry in ways that may inflate apparent performance. Importantly, train-test splits are constructed using hierarchical clustering of host strains, ensuring that closely related strains are not trivially shared across folds. PAML explicitly evaluates multiple generalization regimes, including prediction for unseen strains, unseen phages, and entirely unseen strain–phage pairs, providing a stringent assessment of real-world deployment scenarios than random pairwise splitting.

Evaluation strategy is central to the PAML framework. The authors prioritize Matthews correlation coefficient (MCC) and precision-recall (PR) analysis, while explicitly cautioning against reliance on AUROC alone in sparse interaction matrices. Performance scales strongly with dataset size and interaction coverage, with the largest gains observed when datasets are expanded within genera rather than merged indiscriminately across distant taxa [69]. This scaling behaviour reinforces a core design premise of PAML: in the absence of strong mechanistic priors, data breadth, diversity, and outcome consistency become the dominant determinants of strain-level predictive power.

Despite its agnostic design, PAML systematically interrogates biological relevance using feature-attribution analyses based on SHapely Additive exPlanations [90]. Predictive signal maps back to protein families and k-mers corresponding to known receptors, defence systems, and anti-defence mechanisms, supported by experimental validation using large-scale host-range assays and transposon sequencing screens [69]. Together, these results demonstrate that biologically meaningful determinants can be recovered without explicit mechanistic priors when evaluation and generalization are handled conservatively.

Taken together, DeepPBI-KG and PAML illustrate two distinct pathways toward adsorption-agnostic strain-level PHI prediction. DeepPBI-KG expands the hypothesis space through engineered genomic features within a species-specific context, while PAML distributes predictive signal across genome-wide representations under conservative evaluation and generalization regimes. Both demonstrate that relaxing explicit mechanistic assumptions can recover strain-level signal, but also reinforce that predictive performance remains tightly coupled to dataset coverage, outcome structure, and workflow design rather than architectural expressiveness alone.

Phage–bacterium interaction predictor: an end-to-end deep learning framework

PBIP, introduced by Ma et al. [27], represents the most fully representation-driven strain-level PHI predictor reported to date. In contrast to feature-engineered or hybrid approaches, PBIP seeks to infer phage-host compatibility directly from pretrained protein sequence embeddings, without explicit specification of receptors, RBPs, or defence systems.

PBIP was developed using a tightly controlled, strain-resolved PHI dataset comprising 120 K. pneumoniae clinical isolates tested against 104 Klebsiella phages. Initial spot-test screening was followed by planktonic killing assays to confirm bactericidal activity, yielding a dense and internally consistent PHI matrix comprising 12 480 phage–host pairs [27].

At the representation level, PBIP encodes protein sequences from both phages and bacterial hosts using the pretrained pLM UniRep [91]. Rather than selecting or annotating specific infection determinants, PBIP applies a two-stage aggregation strategy: UniRep embeddings are first computed for each predicted protein and then mean-pooled across all proteins within a phage and within a host, respectively. These organism-level representations are subsequently concatenated to form a composite phage-host input vector for classification. The concatenated embeddings are processed by a deep neural network comprising a convolutional module for local feature extraction, a bidirectional gated recurrent unit to model longer-range dependencies, and an attention mechanism to weight salient features prior to final prediction [27].

This design introduces several strong but explicit inductive assumptions. It assumes that pretrained protein embeddings encode information relevant to infection compatibility despite being optimized for general sequence modelling rather than PHI-specific tasks, and that aggregation across all proteins yields a meaningful summary of phage-host compatibility despite large disparities in proteome size between bacterial hosts and phages. PBIP does not explicitly normalize for proteome size or constrain which proteins contribute most strongly to predictions, leaving the biological attribution of learned signal unresolved. While this abstraction enables end-to-end learning without manual feature engineering, it also compresses interaction structure in ways that complicate mechanistic interpretation.

A defining feature of PBIP is its explicit treatment of extreme class imbalance. Positive interactions are augmented using the synthetic minority oversampling technique (SMOTE) [92] directly within the learned embedding space. Ablation analyses show that removal of this augmentation step results in a substantial reduction in predictive performance (~7%–8%) [27], underscoring both the effectiveness of oversampling and the sensitivity of DL architectures to class balance under sparse positive sampling [93]. Notably, similar sensitivity to oversampling was observed when benchmarking against retrained species- and genus-level predictors, suggesting that part of PBIP’s apparent gains reflect reshaped decision boundaries under sparse supervision rather than improved biological resolution per se.

Across benchmarking experiments, PBIP outperformed several existing PHI predictors on strain-level tasks, particularly when training and test data shared high sequence similarity [27]. Performance declined as sequence divergence increased, indicating reduced robustness when extrapolating beyond densely sampled, system-specific interaction spaces.

Taken together, PBIP illustrates both the promise and the current constraints of fully end-to-end DL for strain-level PHI prediction. Under tightly controlled conditions with dense, high-quality training data, representation-driven architectures can achieve strong predictive performance without explicit mechanistic feature engineering. However, sensitivity to dataset composition, reliance on synthetic augmentation, and limited biological attribution indicate that architectural flexibility alone does not resolve the fundamental challenges of strain-level PHI modelling.

Bridging practical and conceptual gaps in strain-level PHI prediction

Recent advances in strain-level PHI prediction demonstrate that meaningful computational signal can be recovered when modelling strategies are aligned with biological context, assay design, and data availability. As reviewed across Section 2, several approaches achieve encouraging strain-resolved performance under carefully controlled experimental conditions. However, these successes are largely confined to narrow experimental and taxonomic settings, revealing structural constraints that persist across modelling paradigms, independent of architectural complexity.

Progress towards precise, interpretable, and generalizable strain-level PHI prediction therefore requires reframing current limitations in terms of downstream application requirements rather than individual model designs. Clinical phage therapy, in particular, represents a high-value but high-complexity use case, demanding rapid and reliable phage prioritization across diverse, often under-characterized bacterial strains. While regulatory and logistical considerations associated with clinical translation lie beyond the scope of this review (but are discussed elsewhere [94–96]), substantial modelling challenges remain intrinsic to strain-level PHI prediction itself; particularly with respect to data availability, infection complexity, evaluation at strain resolution, and generalizability across experimental and biological contexts.

Data and label constraints

At strain resolution, the primary constraint on PHI prediction is not model architecture but the structure, density, and biological resolution of available outcome data. Discriminating between closely related bacterial strains requires PHI matrices that are dense, internally consistent, and phenotypically informative. In practice, such datasets remain rare. Even intensively curated experimental screens typically yield only a small number of confirmed productive infections embedded within a vast space of non-infective, ambiguous, or untested outcomes, reflecting both the biological reality of narrow host ranges and the limitations of scalable infection assays [6, 40, 67].

This data landscape imposes two interrelated challenges. First, strain-level PHI labels are intrinsically uncertain: many phage–host pairs are never assayed, others yield assay-dependent or context-specific outcomes, and negative results may reflect experimental conditions rather than true biological incompatibility. Second, this uncertainty manifests statistically as extreme and structured class imbalance, with positive interactions forming a sparse minority unevenly distributed across phages, hosts, and lineages. Importantly, this imbalance is not a technical artefact, but a direct consequence of how strain-level PHI data are generated, labelled, and interpreted [97].

A common modelling response to sparcity is the binarization of semi-quantitative infection outcomes into permissive and non-permissive classes. For example, PxPC [6] and subsequently PAML [69] treat any non-zero MLC score as productive infection, increasing the number of positive PHIs available for training and stabilizing optimization under extreme imbalance. However, this strategy necessarily collapses heterogeneous biological states (including partial lysis, inefficient infection, and abortive processes such as lysis from without) into a single positive label. Similar ambiguities arise in spot assays, where treatment of partial or turbid plaques can materially affect class assignment. While such simplifications facilitate model training and evaluation, they discard information about infection efficiency, intracellular progression, and context dependence.

Different modelling frameworks operationalize these ambiguities in different ways. Feature-based and hybrid models such as PxPC and PhageHostLearn primarily address imbalance at the optimization and evaluation stages through class weighting, stratified sampling, and threshold calibration [6, 67]. More data-driven approaches extend imbalance correction into the data domain itself, as illustrated by DeepPBI-KG’s inferred negatives [68] and PBIP’s use of SMOTE within embedding space [27]. The strong sensitivity of performance to these choices, observed both for strain-level models and retrained higher-level predictors [18, 19, 27], underscores that imbalance mitigation can substantially reshape decision boundaries without necessarily improving biological resolution.

PAML adopts a more conservative stance, treating imbalance as an inseparable consequence of incomplete outcome knowledge [69]. By prioritizing restrained sampling, careful outcome definition, and imbalance-aware evaluation metrics such as PR analysis and MCC, it reduces susceptibility to skew-driven artefacts, implicitly accepting that predictive power scales with dataset breadth, diversity, and coverage rather than architectural expressiveness alone.

Representation choice is tightly coupled to these data constraints. Feature-informed approaches deliberately restrict hypothesis space to biologically interpretable determinants of infection, enabling tractable learning under sparse and noisy supervision. In contrast, feature-agnostic and relational models developed at broader taxonomic scales, including graph-based approaches such as PHIS-GAE and MI-RGC [22, 98], rely on dense interaction graphs or extensive local sequence relationships to distribute learning signals. These assumptions degrade at strain resolution precisely because PHI matrices are sparse, uneven, and often dominated by a small number of highly connected nodes, increasing the risk that models amplify degree- or connectivity-driven sampling artefacts rather than resolve true strain-specific compatibility.

Large-scale genomic foundation models, such as Evo and its successor Evo 2 [99, 100], introduce a distinct representational paradigm by learning broad sequence distributions rather than task-aligned infection determinants. By encoding genome-wide evolutionary constraints without requiring explicit annotation, these models capture a comprehensive view of known genomic variation across all domains of life. However, whole-genome representations also encompass extensive sequence variation unrelated to infection compatibility, which may dilute relevant signal under the sparse and highly imbalanced supervision of host-range datasets. While recent work demonstrates that foundation models can generate biologically viable synthetic genomes [101], genome viability is not coupled with strain-resolved infection outcomes. A critical next step will therefore be to evaluate Evo-based models fine-tuned on all available curated phage host-range data, to assess whether their global genomic knowledge can be effectively aligned with interaction-specific supervision. At present, foundation models are complementary to, rather than replacements for, outcome-driven PHI predictors.

Taken together, these considerations define an upper bound on what strain-level PHI models can learn from existing datasets. Even with optimal representations and careful evaluation, predictive performance remains constrained by outcome ambiguity, imbalance, and incomplete sampling. These limitations motivate a second, orthogonal challenge: whether current modelling strategies meaningfully capture the biological processes that determine infection at strain resolution.

Infection biology beyond adsorption

Addressing data and label constraints is necessary for strain-level PHI prediction, but it is not sufficient. These factors define what models can learn, not whether their predictions faithfully reflect infection biology. Productive phage infection is a multistage, context-dependent process shaped by numerous host and phage factors that vary at strain resolution and are sensitive to environmental conditions. Growth phase, host physiology, and experimental context can modulate receptor accessibility, defence activity, and intracellular pathways, leading to assay-dependent or discordant outcomes even among closely related strains [102–105]. As a result, strain-level PHI labels often represent a coarse projection of a deeper, multi-layered infection trajectory that may not generalize across assays or translate in vivo.

In principle, training datasets that capture this complexity through quantitative, time-resolved, or assay-consensus outcomes would provide a more faithful substrate for modelling. In practice, the experimental burden required to generate such data at scale remains prohibitive, as each added layer of biological resolution imposes a disproportionate cost in throughput and standardization. This tension motivates a central question for strain-level PHI prediction: how much biological detail is required to generate predictions that are clinically useful, rather than biologically exhaustive?

Most existing strain-level models respond to this trade-off through explicitly reductive strategies. PxPC and PhageHostLearn prioritize adsorption determinants because these features were empirically shown to be among the most predictive and reproducible signals in their respective systems [6, 67]. By focusing on well-characterized receptor-associated features, these frameworks enable tractable learning under sparse supervision, but necessarily under-represent downstream intracellular barriers to infection [106]. DeepPBI-KG and PBIP relax reliance on curated adsorption features through engineered genomic descriptors and end-to-end embeddings, respectively, yet remain constrained by the coarse biological resolution of the underlying outcome labels [27, 68].

An important and often overlooked consequence of these reductive strategies lies in how protein-level signal is aggregated prior to prediction. In PhageHostLearn, information from multiple RBPs or capsule-associated proteins is compressed into fixed-length vectors via pooling and concatenation [67]. While this stabilizes learning under sparse supervision, it collapses heterogeneity among individual proteins, discarding information about dominance, combinatorial effects, or context-dependent contributions. As a result, such models learn average compatibility between phage and host adsorption machinery rather than resolving protein-specific determinants that may drive strain-level susceptibility.

This limitation becomes more pronounced as models rely increasingly on representation learning. PBIP demonstrates that pretrained pLM embeddings can recover strain-level signals when paired with dense, system-specific datasets [27]. However, sequence-only representations encode infection determinants indirectly, abstracting away explicit biophysical properties of RBP-receptor recognition such as conformational flexibility, binding interfaces, and surface electrostatics, which are known to influence host specificity at fine scales [107–110]. Consistent with this, PHPRBP shows progressively stronger performance as prediction is relaxed from strain to higher taxonomic ranks, suggesting that rich sequence representations amplify adsorption-driven signal at coarse scales while failing to resolve fine-grained strain-level determinants [26].

Broader analyses of pLM scaling reinforce these limitations. Mutation-effect prediction performance is not monotonic with model size, peaking at intermediate perplexity and degrading at both extremes, where residue-level discrimination collapses [111]. Notably, viral proteins show systematically lower performance even at comparable perplexity levels, consistent with distribution shift and distinct evolutionary pressures. Together, these findings caution against assuming that pLM embeddings inherently capture strain-level, infection-relevant variation, particularly for phage RBPs.

Structure-aware approaches provide a complementary avenue to probe these limits. PHIStruct [112] integrates predicted three-dimensional protein structure with sequence information using SaProt embeddings [113–115], making aspects of molecular recognition more explicit than sequence-only representations. Although currently evaluated at genus-level resolution, it illustrates how structural context may recover binding-specific information that sequence embeddings capture only indirectly. However, uncertainty in predicted structures, task misalignment, and receptor diversity constrain universal applicability. In systems dominated by non-proteinaceous or structurally heterogeneous receptors, such as LPS-mediated adsorption in Escherichia and Pseudomonas, direct receptor modelling remains challenging. PxPC nonetheless demonstrates that carefully curated adsorption features can still provide strong predictive signal in such systems [6], suggesting that structural refinement is best viewed as augmentative rather than substitutive.

Alternative frameworks respond to infection complexity by deliberately relaxing mechanistic assumptions rather than attempting to resolve specific stages of infection. PAML [69] exemplifies this epistemically conservative approach. By explicitly acknowledging outcome ambiguity and constraining predictive claims through careful outcome definition, restrained handling of negatives, and imbalance-aware evaluation, PAML recognizes that inferring fine-grained infection stages from incomplete strain-level PHI matrices risks overinterpretation rather than improved biological insight.

The value of this restraint is particularly evident in biologically complex hosts such as Pseudomonas aeruginosa. PAML reports its lowest strain-level predictive performance in this genus [69], likely reflecting both smaller, more heterogeneous datasets and the intrinsically multifactorial infection biology of P. aeruginosa, in which determinants are genomically dispersed, variably expressed, and often poorly annotated [30, 116, 117]. Importantly, this relatively reduced performance does not indicate model failure; rather, it exposes the intrinsic difficulty of resolving strain-level compatibility under such conditions.

Finally, PHIs unfold within a dynamic coevolutionary landscape that varies across temporal and spatial scales [118]. Current strain-level models are trained on static datasets and therefore fail to capture evolutionary turnover in receptors, defence systems, and phage counter-adaptations [119–121]. While some bacterial populations exhibit spatiotemporal stability over multi-year periods [122], it remains unclear how generalizable this stability is across species and ecological contexts. This uncertainty complicates decisions about model update frequency and clinical durability. Iterative, feedback-driven updating strategies such as periodic re-training or re-ranking using contemporary data, may partially mitigate this limitation, but impose substantial experimental and computational costs. Ultimately, richer longitudinal PHI datasets will be required to fully integrate evolutionary dynamics into strain-level prediction frameworks.

Evaluation, benchmarking, and apparent strain-level performance

As strain-level PHI prediction models grow in architectural sophistication, evaluation methodology becomes a central determinant of perceived success. Apparent gains in predictive accuracy do not necessarily reflect genuine resolution of fine-scale host specificity, particularly in sparse, imbalanced, and system-specific PHI datasets. Under these conditions, models can achieve strong headline metrics by exploiting statistical regularities that are orthogonal to the biological question of strain-level compatibility, creating an illusion of strain-level performance driven by dataset structure rather than mechanistic inference.

A core challenge is that many commonly reported performance metrics were not designed for strain-resolved inference. Metrics such as accuracy and AUROC remain informative under balanced or moderately imbalanced conditions, but they are poorly aligned with strain-level PHI matrices, where positive interactions are rare and unevenly distributed (Section 3.1). In such settings, AUROC can remain deceptively high even when precision for the minority class is low, as it disproportionately rewards true-negative discrimination and is insensitive to false-positive burden. As a result, models that perform poorly in practical screening scenarios may nonetheless appear competitive when evaluated using AUROC alone [123].

Metrics derived from PR analysis provide a more informative perspective in rare-positive regimes, as they directly capture the trade-off between sensitivity and false discovery [123]. Similarly, the MCC offers a single-value summary that incorporates all four confusion-matrix terms and is less dominated by the majority class. Reflecting this, PAML explicitly prioritizes PR metrics and MCC over AUROC [69], while PxPC, DeepPBI-KG, and PBIP also emphasize PR-based evaluation [6, 27, 68]. Together, these choices reflect a growing consensus that imbalance-aware metrics are essential for meaningful assessment of strain-level PHI prediction performance.

Metric choice alone, however, is insufficient to ensure valid evaluation. A more subtle but equally consequential source of inflated performance arises from how training and test splits are constructed. Random pairwise splitting of PHI matrices can place highly similar bacterial strains or closely related phages on both sides of the train-test boundary, particularly in datasets derived from single laboratories or narrowly sampled cohorts. Under such conditions, models may achieve high test-set performance by learning lineage-level or system-specific signatures rather than determinants that truly distinguish permissive from non-permissive strains.

Several studies have begun to address this risk explicitly. For example, PAML quantified genomic similarity between phages in its training and test sets using Dashing [124], reporting moderate overlap and concluding that the resulting split was appropriate for evaluation [69]. While such analyses represent an important step toward transparency, the absence of established similarity thresholds and the continuous nature of genomic relatedness complicate interpretation. Without explicit controls on phylogenetic or genomic similarity, strain-level evaluation risks collapsing into a softened form of species- or lineage-level prediction. This limitation applies across modelling paradigms, affecting feature-based, hybrid, and DL approaches alike.

The construction and treatment of negative examples introduce an additional layer of complexity. As discussed in Section 3.1, strain-level PHI datasets contain a mixture of verified negatives, untested pairs, and context-dependent outcomes. Different studies operationalize this ambiguity in different ways, ranging from conservative masking to aggressive negative inference or synthetic resampling. Analyses in DeepPBI-KG and PBIP illustrate that imbalance mitigation strategies can substantially reshape decision boundaries and metric values, not only for the proposed models but also for broader taxonomic baselines [27, 68]. Improved performance under such regimes may therefore reflect altered class geometry rather than enhanced biological resolution.

Together with split design and metric choice, these considerations suggest that evidence for strain-level predictive capability should depend on the generalization regime being claimed, not solely on evaluation within a strain-resolved matrix. At minimum, evaluation should use grouped splits that limit leakage of closely related strains (and, for global models, closely related phages), rather than random pairwise splits. Stress tests such as leave-one-host-lineage and (where applicable) leave-one-phage-lineage validation better probe robustness to novelty. Where comparable datasets exist, cross-dataset or cross-laboratory evaluation provides the strongest test of clinical transfer, but should be interpreted explicitly as domain shift, given assay and label heterogeneity.

Within this context, benchmarking strain-level models against broader taxonomic predictors can be informative, but only if interpreted cautiously. PBIP’s comparison against species- and genus-level models such as PredPHI and PHIAF illustrates this point: competitive performance under certain evaluation regimes indicates that higher-level models can recover predictive signal in strain-resolved datasets, but it does not imply resolution of strain-specific determinants [27]. Rather, such comparisons establish practical baselines under shared evaluation constraints and underscore the difficulty of disentangling true strain-level discrimination from implicit exploitation of phylogenetic structure or dataset composition.

Towards generalizable and clinically useful models

The preceding sections highlight a central tension in strain-level PHI prediction: models can achieve strong apparent performance within narrowly defined experimental settings [6, 27, 67–69], yet fail to generalize to the heterogeneous and uncertain conditions encountered in clinical practice. As discussed, this limitation is structural rather than architectural, arising from sparse outcome data, assay-dependent labels, and evaluation regimes that can overestimate true strain-level resolution. Generalizability in this context should therefore be understood not as universal predictive accuracy, but as the capacity to provide robust, uncertainty-aware, and interpretable decision support when confronted with novel strains, incomplete data, and shifting biological conditions.

This distinction is particularly salient for clinical phage therapy, where model outputs may directly influence experimental prioritization or treatment decisions. Predictive frameworks must extrapolate beyond their training distributions while remaining transparent and interrogatable to support user trust and regulatory expectations [125–129]. Furthermore, patient-derived bacterial isolates may differ substantially from reference strains, carrying novel resistance determinants, surface structures, or regulatory features absent from model training datasets [130–133]. Likewise, candidate therapeutic phages may exhibit infection strategies not captured in the initial assay design or belong to genera that are underrepresented in the original model training datasets [134]. Under such distributional shift, even models with strong internal validation may degrade sharply, making overconfident reliance on static in silico predictions a clinical risk.

As outlined in Section 3.2, improving generalizability will likely require closer engagement with biological and experimental complexity. Training data must offer not only scale, but relevance: binary outcomes derived from spot tests or single-timepoint assays capture only a limited slice of the infection process and often correlate poorly with therapeutic efficacy [52]. In clinical workflows, actionable PHI labels typically reflect consensus across multiple assays, including spot tests, time-kill curves, and growth inhibition across phage-to-bacteria ratios [47]. Where generating such datasets at scale is impractical, the feasibility of fully autonomous strain-level PHI prediction remains uncertain.

A pragmatic alternative is to embed PHI models within hybrid, feedback-driven workflows. In this design, in silico predictions guide initial phage prioritization, followed by rapid phenotypic screening on the target isolate, with empirical outcomes used to recalibrate or re-rank candidates. Such hybrid systems align naturally with clinical phage therapy practice, repositioning strain-level PHI models as adaptive decision-support tools rather than standalone arbiters of efficacy: a framing, i.e. more consistent with current data realities and biological uncertainty.

Key Points.

  • Strain-level PHI prediction is feasible under controlled conditions, particularly in well-characterized phage–host systems, but apparent performance is highly sensitive to dataset composition, evaluation regime, and experimental context, limiting generalizability across taxa, laboratories, and clinical settings.

  • The dominant constraint is structural rather than architectural: strain-level PHI datasets are sparse, imbalanced, and assay-dependent, and predictive performance is tightly coupled to outcome definition, label resolution, and negative handling rather than model complexity alone.

  • Infection outcomes collapse multistage, context-dependent biology, and increasing representational flexibility does not inherently resolve this limitation. Structure-aware descriptors and annotation-light gene–genome compatibility can enhance prediction in selected systems, but no single representation universally captures strain-level specificity.

  • Evaluation choices critically shape perceived strain-level performance. Conventional metrics and random test-train splits can overestimate resolution by exploiting phylogenetic structure, sampling density, or class imbalance.

  • Clinically useful strain-level PHI prediction is best framed as decision support, not automation. Hybrid, modular systems that combine mechanistic priors, learned representations, and iterative phenotypic feedback offer the most credible path towards interpretable, adaptable, and translationally relevant models.

Contributor Information

Chris J Malajczuk, Wal-yan Respiratory Research Centre, The Kids Research Institute  Australia, Western Australia, Perth, 6009, Australia; UWA Centre for Child Health Research, University of Western Australia, Western Australia, Perth, 6009, Australia.

Andrew Vaitekenas, Wal-yan Respiratory Research Centre, The Kids Research Institute  Australia, Western Australia, Perth, 6009, Australia; UWA Centre for Child Health Research, University of Western Australia, Western Australia, Perth, 6009, Australia.

Joshua J Iszatt, Wal-yan Respiratory Research Centre, The Kids Research Institute  Australia, Western Australia, Perth, 6009, Australia; UWA Centre for Child Health Research, University of Western Australia, Western Australia, Perth, 6009, Australia.

Stephen M Stick, Department of Respiratory and Sleep Medicine, Perth Children’s Hospital, Western Australia, Perth, 6009, Australia.

Anthony Kicic, Wal-yan Respiratory Research Centre, The Kids Research Institute  Australia, Western Australia, Perth, 6009, Australia; UWA Centre for Child Health Research, University of Western Australia, Western Australia, Perth, 6009, Australia; School of Population Health, Curtin University, Western Australia, Perth, 6102, Australia.

Yuliya V Karpievitch, Wal-yan Respiratory Research Centre, The Kids Research Institute  Australia, Western Australia, Perth, 6009, Australia; School of Population Health, Curtin University, Western Australia, Perth, 6102, Australia; School of Biomedical Sciences, University of Western Australia, Western Australia, Perth, 6009, Australia.

Author contributions

C.J.M. conceptualized the manuscript and prepared the full first draft, followed by all subsequent redrafts and refinements. A.V. and J.J.I. contributed to the conceptual development and were actively involved in reviewing and editing, with a particular focus on the Introduction section and Section 1. S.M.S. and A.K. provided supervisory support and contributed to shaping the manuscript through broader perspective and guidance. Y.V.K. provided hands-on supervisory oversight throughout the manuscript preparation process and offered substantial feedback during the review and submission stages. All authors reviewed and approved the final version of the manuscript.

Conflict of interest

The authors have declared that no competing interests exist.

Funding

This work was supported by Cystic Fibrosis Foundation of Western Australia (Australian Cystic Fibrosis Research Trust: CFWA Golf Classic Innovation Grant to C.J.M., WA Cystic Fibrosis Collaborative Program 2025 Fellowship to Y.V.K.); Cure4 Cystic Fibrosis Foundation (Holckner Family CF Impact Grant), and the Western Australian Future Health Research & Innovation Fund (FHIRF Innovation Challenge—GAIA) [IC2023-GAIA/21].

Data availability

No new data were generated or analysed in support of this research.

References

  • 1. Pirnay  J-P, Djebara  S, Steurs  G  et al. Personalized bacteriophage therapy outcomes for 100 consecutive cases: a multicentre, multinational, retrospective observational study. Nat Microbiol  2024;9:1434–53. 10.1038/s41564-024-01705-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Czajkowski  R, Roca  A, Matilla  MA. Harnessing bacteriophages for sustainable crop protection in the face of climate change. Microb Biotechnol  2025;18:e70108. 10.1111/1751-7915.70108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Toribio-Celestino  L, San Millan  A. Plasmid-bacteria associations in the clinical context. Trends Microbiol  2025;33:937–47. [DOI] [PubMed] [Google Scholar]
  • 4. Markwitz  P, Lood  C, Olszak  T  et al. Genome-driven elucidation of phage-host interplay and impact of phage resistance evolution on bacterial fitness. ISME J  2022;16:533–42. 10.1038/s41396-021-01096-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Bai  J, Raustad  N, Denoncourt  J  et al. Genome-wide phage susceptibility analysis in Acinetobacter baumannii reveals capsule modulation strategies that determine phage infectivity. PLoS Pathog  2023;19:e1010928. 10.1371/journal.ppat.1010928 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Gaborieau  B, Vaysset  H, Tesson  F  et al. Prediction of strain level phage–host interactions across the Escherichia genus using only genomic information. Nat Microbiol  2024;9:2847–61. 10.1038/s41564-024-01832-5 [DOI] [PubMed] [Google Scholar]
  • 7. Strathdee  SA, Hatfull  GF, Mutalik  VK  et al. Phage therapy: from biological mechanisms to future directions. Cell  2023;186:17–31. 10.1016/j.cell.2022.11.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Howell  AA, Versoza  CJ, Pfeifer  SP. Computational host range prediction—the good, the bad, and the ugly. Virus Evolution  2024;10:vead083. 10.1093/ve/vead083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Edwards  RA, McNair  K, Faust  K  et al. Computational approaches to predict bacteriophage–host relationships. FEMS Microbiol Rev  2016;40:258–72. 10.1093/femsre/fuv048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Zhang  R. Computational Methods for Prokaryotic (Meta) Genomic Annotation and Function Prediction Using Horizontal Information Transfer. Doctoral thesis, Georg-August-Universität Göttingen, 2024. 10.53846/goediss-10594 [DOI]
  • 11. Zielezinski  A, Barylski  J, Karlowski  WM. Taxonomy-aware, sequence similarity ranking reliably predicts phage–host relationships. BMC Biol  2021;19:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Krusche  J, Beck  C, Lehmann  E  et al. Characterization and host range prediction of Staphylococcus aureus phages through receptor-binding protein analysis. Cell Rep  2025;44:115369. 10.1016/j.celrep.2025.115369 [DOI] [PubMed] [Google Scholar]
  • 13. Boon M, Holtappels D, Lood C  et al.  Host range expansion of pseudomonas virus LUZ7 is driven by a conserved tail Fiber mutation. Phage (New Rochelle)  2020;1:87–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Yamashita  W, Chihara  K, Azam  AH  et al. Phage engineering to overcome bacterial Tmn immunity in Dhillonvirus. Commun Biol  2025;8:290. 10.1038/s42003-025-07730-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Nie  W, Qiu  T, Wei  Y  et al. Advances in phage–host interaction prediction: in silico method enhances the development of phage therapies. Brief Bioinform  2024;25:bbae117. 10.1093/bib/bbae117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Amgarten  D, Iha BKV, Piroupo CM et al. vHULK, a new tool for bacteriophage host prediction based on annotated genomic features and neural networks. Phage  2022;3:204–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Coutinho  FH, Zaragoza-Solas  A, López-Pérez  M  et al. RaFAH: host prediction for viruses of bacteria and archaea based on protein content. Patterns  2021;2:100274. 10.1016/j.patter.2021.100274 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Li  M, Zhang  W. PHIAF: prediction of phage-host interactions with GAN-based data augmentation and sequence-based feature fusion. Brief Bioinform  2022;23:bbab348. 10.1093/bib/bbab348 [DOI] [PubMed] [Google Scholar]
  • 19. Li  M, Wang Y, Li F  et al. A deep learning-based method for identification of bacteriophage-host interaction. IEEE/ACM Trans Comput Biol Bioinform  2020;18:1801–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Qiu  J, Nie  W, Ding  H  et al. PB-LKS: a python package for predicting phage–bacteria interaction through local K-mer strategy. Brief Bioinform  2024;25:bbae010. 10.1093/bib/bbae010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Ruohan  W, Xianglilan  Z, Jianping  W  et al. DeepHost: phage host prediction with convolutional neural network. Brief Bioinform  2022;23:bab385. 10.1093/bib/bbab385 [DOI] [PubMed] [Google Scholar]
  • 22. Xiao  Z, Sun H, Wei A  et al. A novel framework for predicting phage-host interactions via host specificity-aware graph autoencoder. IEEE J Biomed Health Inform  2024;29:3069–78. [DOI] [PubMed] [Google Scholar]
  • 23. Wei  A, Zhan  H, Xiao  Z  et al. A novel framework for phage-host prediction via logical probability theory and network sparsification. Brief Bioinform  2025;26:bbae708. 10.1093/bib/bbae708 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Liu  F, Zhao  Z, Liu  Y. PHPGAT: predicting phage hosts based on multimodal heterogeneous knowledge graph with graph attention network. Brief Bioinform  2025;26:bbaf017. 10.1093/bib/bbaf017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Pan  J, Wang  R, Liu  W  et al. Predicting phage-host interaction via hyperbolic Poincaré graph embedding and large-scale protein language technique. iScience  2025;28:111647. 10.1016/j.isci.2024.111647 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Gao  P, Xu L, Bai Y  et al. Phage host prediction using deep neural network with multi-source protein language models and squeeze-and-excitation attention mechanism. IEEE J Biomed Health Inform  2025;30:1759–71. [DOI] [PubMed] [Google Scholar]
  • 27. Ma  L, Gao  P, Liu  G  et al. PBIP: a deep learning framework for predicting phage–bacterium interactions at the strain level. Brief Bioinform  2025;26:bbaf656. 10.1093/bib/bbaf656 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Cuevas  JM, Duffy  S, Sanjuán  R. Point mutation rate of bacteriophage ΦX174. Genetics  2009;183:747–9. 10.1534/genetics.109.106005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Kupczok  A, Neve H, Huang, KD  et al. Rates of mutation and recombination in Siphoviridae phage genome evolution over three decades. Mol Biol Evol  2018;35:1147–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Burke  KA, Urick  CD, Mzhavia  N  et al. Correlation of Pseudomonas aeruginosa phage resistance with the numbers and types of antiphage systems. Int J Mol Sci  2024;25:1424. 10.3390/ijms25031424 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Müller  DM, Pourtois JD, Kim MK et al. Bacterial receptors but not anti-phage defence mechanisms determine host range for a pair of Pseudomonas aeruginosa lytic Phages. bioRxiv  2024. 10.1101/2024.04.30.591980 [DOI] [Google Scholar]
  • 32. Wright  RC, Friman VP, Smith MC  et al. Resistance evolution against phage combinations depends on the timing and order of exposure. MBio  2019;10:10–1128. 10.1128/mbio.01652-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Burmeister  AR, Turner  PE. Trading-off and trading-up in the world of bacteria–phage evolution. Curr Biol  2020;30:R1120–4. 10.1016/j.cub.2020.07.036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Srikant  S, Guegler  CK, Laub  MT. The evolution of a counter-defense mechanism in a virus constrains its host range. elife  2022;11:e79549. 10.7554/eLife.79549 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Subramanian  S, Dover  JA, Parent  KN  et al. Host range expansion of shigella phage Sf6 evolves through point mutations in the Tailspike. J Virol  2022;96:e00929–2. 10.1128/jvi.00929-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Yuan  Y, Peng  Q, Zhang  S  et al. Phage reduce stability for regaining infectivity during antagonistic coevolution with host bacterium. Viruses  2019;11:118. 10.3390/v11020118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Broniewski  JM, Meaden  S, Paterson  S  et al. The effect of phage genetic diversity on bacterial resistance evolution. ISME J  2020;14:828–36. 10.1038/s41396-019-0577-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Górniak  M, Zalewska  A, Jurczak-Kurek  A. Recombination events in putative tail fibre gene in Litunavirus phages infecting Pseudomonas aeruginosa and their phylogenetic consequences. Viruses  2022;14:2669. 10.3390/v14122669 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Xiao  Y, Luo M, Dolan AE  et al. Structure basis for RNA-guided DNA degradation by Cascade and Cas3. Science  2018;361:eaat0839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. de  Jonge  PA, Nobrega  FL, Brouns  SJJ  et al. Molecular and evolutionary determinants of bacteriophage host range. Trends Microbiol  2019;27:51–63. 10.1016/j.tim.2018.08.006 [DOI] [PubMed] [Google Scholar]
  • 41. Murtazalieva  K, Mu  A, Petrovskaya  A  et al. The growing repertoire of phage anti-defence systems. Trends Microbiol  2024;32:1212–28. 10.1016/j.tim.2024.05.005 [DOI] [PubMed] [Google Scholar]
  • 42. Pedersen  M, Neergaard  JT, Cassias  J  et al. Repression of the lysogenic P(R) promoter in bacteriophage TP901-1 through binding of a CI-MOR complex to a composite O(M)-O(R) operator. Sci Rep  2020;10:8659. 10.1038/s41598-020-65493-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Wahl  MC, Sen  R. Exploiting phage strategies to modulate bacterial transcription. Transcription  2019;10:222–30. 10.1080/21541264.2019.1684137 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Holtappels  D, Alfenas-Zerbini  P, Koskella  B. Drivers and consequences of bacteriophage host range. FEMS Microbiol Rev  2023;47:fuad038. 10.1093/femsre/fuad038 [DOI] [PubMed] [Google Scholar]
  • 45. Moura de Sousa  JA, Pfeifer  E, Touchon  M  et al. Causes and consequences of bacteriophage diversification via genetic exchanges across lifestyles and bacterial taxa. Mol Biol Evol  2021;38:2497–512. 10.1093/molbev/msab044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Panteleev  V, Kulbachinskiy  A, Gelfenbein  D. Evaluating phage lytic activity: from plaque assays to single-cell technologies. Front Microbiol  2025;16:1659093. 10.3389/fmicb.2025.1659093 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Daubie  V, Chalhoub H, Blasdel B  et al. Determination of phage susceptibility as a clinical diagnostic tool: a routine perspective. Front Cell Infect Microbiol  2022;12:1000721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Ellis  EL, Delbrück  M. The growth of bacteriophage. J Gen Physiol  1939;22:365–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Storms  ZJ, Teel  MR, Mercurio  K  et al. The virulence index: a metric for quantitative analysis of phage virulence. Phage  2020;1:27–36. 10.1089/phage.2019.0001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Hosseini  N, Chehreghani  M, Moineau  S  et al. Centroid of the bacterial growth curves: a metric to assess phage efficiency. Communications Biology  2024;7:673. 10.1038/s42003-024-06379-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Glonti  T, Pirnay  J-P. In vitro techniques and measurements of phage characteristics that are important for phage therapy success. Viruses  2022;14:1490. 10.3390/v14071490 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Khan Mirzaei  M, Nilsson  AS. Isolation of phages for phage therapy: a comparison of spot tests and efficiency of plating analyses for determination of host range and efficacy. PLoS One  2015;10:e0118557. 10.1371/journal.pone.0118557 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Nour, El-Din HT, Kettle  M  et al. Isolation, characterization, and genomic analysis of bacteriophages against Pseudomonas aeruginosa clinical isolates from early and chronic cystic fibrosis patients for potential phage therapy. Microorganisms  2025;13:511. 10.3390/microorganisms13030511 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Xu  X, Gu  P. Overview of phage Defense Systems in Bacteria and Their Applications. Int J Mol Sci  2024;25:13316. 10.3390/ijms252413316 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Thanki  AM, Taylor-Joyce  G, Dowah  A  et al. Unravelling the links between phage adsorption and successful infection in Clostridium difficile. Viruses  2018;10:411. 10.3390/v10080411 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Storms  ZJ, Sauvageau  D. Modeling tailed bacteriophage adsorption: insight into mechanisms. Virology  2015;485:355–62. 10.1016/j.virol.2015.08.007 [DOI] [PubMed] [Google Scholar]
  • 57. Mäntynen  S, Laanto  E, Oksanen  HM  et al. Black box of phage–bacterium interactions: exploring alternative phage infection strategies. Open Biol  2021;11:210188. 10.1098/rsob.210188 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Hernández  S, Vives  MJ. Phages in anaerobic systems. Viruses  2020;12:10. 10.3390/v12101091 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Pham  VHT, Kim  J. Cultivation of unculturable soil bacteria. Trends Biotechnol  2012;30:475–84. [DOI] [PubMed] [Google Scholar]
  • 60. Zhou  F, Gan  R, Zhang  F  et al. PHISDetector: a tool to detect diverse In silico phage–host interaction signals for Virome studies. Genomics Proteomics Bioinformatics  2022;20:508–23. 10.1016/j.gpb.2022.02.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Lamy-Besnier  Q, Brancotte  B, Ménager  H  et al. Viral host range database, an online tool for recording, analyzing and disseminating virus-host interactions. Bioinformatics  2021;37:2798–801. 10.1093/bioinformatics/btab070 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Mihara  T, Nishimura  Y, Shimizu  Y  et al. Linking virus genomes with host taxonomy. Viruses  2016;8:66. 10.3390/v8030066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Saito  M, Katayama  Y, Hishinuma  T  et al. Slow VISA; a novel phenotype of vancomycin resistance, found In vitro in heterogeneous vancomycin-intermediate Staphylococcus aureus strain Mu3. Antimicrob Agents Chemother  2014;58:5024–35. 10.1128/AAC.02470-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Geaman  W, Choi  BI, Kaindl  J  et al. Microbroth dilution method for antibiotic susceptibility testing of fastidious and anaerobic bacteria of the urinary microbiome. Microbiology Spectrum  2024;12:e00314–24. 10.1128/spectrum.00314-24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Mobed  A, Baradaran B, de la Guardia, M  et al. Advances in detection of fastidious bacteria: from microscopic observation to molecular biosensors. TrAC Trends Anal Chem  2019;113:157–71. [Google Scholar]
  • 66. Ding  C, Pereira  T, Xiao  R  et al. Impact of label noise on the learning based models for a binary classification of physiological signal. Sensors (Basel)  2022;22:7166. 10.3390/s22197166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Boeckaerts  D, Stock  M, Ferriol-González  C  et al. Prediction of klebsiella phage-host specificity at the strain level. Nat Commun  2024;15:4355. 10.1038/s41467-024-48675-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Wei  T, Lu  C, du  H  et al. DeepPBI-KG: a deep learning method for the prediction of phage-bacteria interactions based on key genes. Brief Bioinform  2024;25:bbae484. 10.1093/bib/bbae484 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Arkin  A, Noonan A, Moriniere  et al. Phylogeny-agnostic strain-level prediction of phage-host interactions from genomes. Research Square  2025. 10.21203/rs.3.rs-8176565/v1 [DOI]
  • 70. Bessonov  K, Laing  C, Robertson  J  et al. ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data. Microbial genomics  2021;7:000728. 10.1099/mgen.0.000728 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Rendueles  O, Garcia-Garcerà  M, Néron  B  et al. Abundance and co-occurrence of extracellular capsules increase environmental breadth: implications for the emergence of pathogens. PLoS Pathog  2017;13:e1006525. 10.1371/journal.ppat.1006525 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Néron  B, Denise  R, Coluzzi  C  et al. MacSyFinder v2: improved modelling and search engine to identify molecular systems in genomes. Peer Community Journal  2023;3:e28. 10.24072/pcjournal.250 [DOI] [Google Scholar]
  • 73. Abby  SS, Néron  B, Ménager  H  et al. MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems. PLoS One  2014;9:e110726. 10.1371/journal.pone.0110726 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Tesson  F, Planel  R, Egorov  AA  et al. A comprehensive resource for exploring antiphage defense: DefenseFinder webservice, wiki and databases. Peer Community Journal  2024;4:4. 10.24072/pcjournal.470 [DOI] [Google Scholar]
  • 75. Strobel  HM, Labador  SD, Basu  D  et al. Viral receptor-binding protein evolves new function through mutations that cause trimer instability and functional heterogeneity. Mol Biol Evol  2024;41:msae056. 10.1093/molbev/msae056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76. Petrie  KL, Palmer  ND, Johnson  DT  et al. Destabilizing mutations encode nongenetic variation that drives evolutionary innovation. Science  2018;359:1542–5. 10.1126/science.aar1954 [DOI] [PubMed] [Google Scholar]
  • 77. Strobel  HM, Horwitz  EK, Meyer  JR. Viral protein instability enhances host-range evolvability. PLoS Genet  2022;18:e1010030. 10.1371/journal.pgen.1010030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Yang  Y, Dufault-Thompson  K, Yan  W  et al. Large-scale genomic survey with deep learning-based method reveals strain-level phage specificity determinants. GigaScience  2024;13:giae017. 10.1093/gigascience/giae017 [DOI] [PMC free article] [PubMed]
  • 79. Beamud  B, García-González  N, Gómez-Ortega  M  et al. Genetic determinants of host tropism in klebsiella phages. Cell Rep  2023;42:112048. 10.1016/j.celrep.2023.112048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Squeglia  F, Maciejewska  B, Łątka  A  et al. Structural and functional studies of a klebsiella phage capsule depolymerase tailspike: mechanistic insights into capsular degradation. Structure  2020;28:613–624.e4. 10.1016/j.str.2020.04.015 [DOI] [PubMed] [Google Scholar]
  • 81. McNair  K, Zhou  C, Dinsdale  EA  et al. PHANOTATE: a novel approach to gene identification in phage genomes. Bioinformatics  2019;35:4537–42. 10.1093/bioinformatics/btz265 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82. Boeckaerts  D, Stock  M, de Baets  B  et al. Identification of phage receptor-binding protein sequences with hidden Markov models and an extreme gradient boosting classifier. Viruses  2022;14:1329. 10.3390/v14061329 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Lam  MM, Wick  RR, Judd  LM  et al. Kaptive 2.0: updated capsule and lipopolysaccharide locus typing for the Klebsiella pneumoniae species complex. Microbial. Genomics  2022;8:000800. 10.1099/mgen.0.000800 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84. Wyres  KL, Wick  RR, Gorrie  C  et al. Identification of klebsiella capsule synthesis loci from whole genome data. Microbial genomics  2016;2:e000102. 10.1099/mgen.0.000102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85. Lin  Z, Akin H, Rao R  et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv  2022;2022:500902. [Google Scholar]
  • 86. Chen  T, Guestrin  C. Xgboost: A scalable tree boosting system. In: Krishnapuram B, Shah M, Smola AJ, Aggarwal CC, Shen D, Rastogi R (eds). Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM; 2016:785–94. 10.1145/2939672.2939785 [DOI]
  • 87. Seemann  T. Prokka: rapid prokaryotic genome annotation. Bioinformatics  2014;30:2068–9. 10.1093/bioinformatics/btu153 [DOI] [PubMed] [Google Scholar]
  • 88. Steinegger  M, Söding  J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol  2017;35:1026–8. 10.1038/nbt.3988 [DOI] [PubMed] [Google Scholar]
  • 89. Prokhorenkova  L, Gusev G, Vorobev, A  et al. CatBoost: unbiased boosting with categorical features. Adv Neural Inf Proces Syst  2018;31. [Google Scholar]
  • 90. Lundberg  SM, Lee  S-I. A unified approach to interpreting model predictions. Adv Neural Inf Proces Syst  2017;30. [Google Scholar]
  • 91. Alley  EC, Khimulya  G, Biswas  S  et al. Unified rational protein engineering with sequence-based deep representation learning. Nat Methods  2019;16:1315–22. 10.1038/s41592-019-0598-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Chawla  NV, Bowyer, KW, Hall LO  et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res  2002;16:321–57. [Google Scholar]
  • 93. Buda  M, Maki  A, Mazurowski  MA. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw  2018;106:249–59. 10.1016/j.neunet.2018.07.011 [DOI] [PubMed] [Google Scholar]
  • 94. Eloranta  S, Boman  M. Predictive models for clinical decision making: deep dives in practical machine learning. J Intern Med  2022;292:278–95. 10.1111/joim.13483 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95. Bernstam  EV, Shireman  PK, Meric-Bernstam  F  et al. Artificial intelligence in clinical and translational science: successes, challenges and opportunities. Clin Transl Sci  2022;15:309–21. 10.1111/cts.13175 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96. Terranova  N, Venkatakrishnan  K, Benincosa  LJ. Application of machine learning in translational medicine: current status and future opportunities. AAPS J  2021;23:74. 10.1208/s12248-021-00593-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97. Khan  AA, Chaudhari  O, Chandra  R. A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation. Expert Syst Appl  2024;244:122778. [Google Scholar]
  • 98. Wei  A, Xiao  Z, Fu  L  et al. Predicting phage–host interactions via feature augmentation and regional graph convolution. Brief Bioinform  2024;26:bbae672. 10.1093/bib/bbae672 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99. Brixi  G, Durrant MG, Ku J  et al. Genome modeling and design across all domains of life with evo 2. BioRxiv. 2025. 10.1101/2025.02.18.638918 [DOI]
  • 100. Nguyen  E, Poli M, Durrant MG  et al. Sequence modeling and design from molecular to genome scale with evo. Science  2024;386:eado9336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101. King  SH, Driscoll CL, Li DB  et al. Generative design of novel bacteriophages with genome language models. bioRxiv. 2025. https://dio.10.1101/2025.09.12.675911
  • 102. Daugelavičius  R, Daujotaitė  G, Bamford  DH. Lysis physiology of Pseudomonas aeruginosa infected with ssRNA phage PRR1. Viruses  2024;16:645. 10.3390/v16040645 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103. Pourcel  C, Midoux  C, Vergnaud  G  et al. The basis for natural multiresistance to phage in Pseudomonas aeruginosa. Antibiotics (Basel)  2020;9:339. 10.3390/antibiotics9060339 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104. Fister  S, Robben C, Witte AK  et al. Influence of environmental factors on phage-bacteria interaction and on the efficacy and infectivity of phage P100. Front Microbiol  2016;7:1152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105. Igler  C. Phenotypic flux: the role of physiology in explaining the conundrum of bacterial persistence amid phage attack. Virus Evol  2022;8:veac086. 10.1093/ve/veac086 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106. Egido  JE, Costa  AR, Aparicio-Maldonado  C  et al. Mechanisms and clinical importance of bacteriophage resistance. FEMS Microbiol Rev  2022;46:fuab048. 10.1093/femsre/fuab048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107. Leprince, A, J.  Mahillon, Phage adsorption to gram-positive bacteria.  Viruses, 2023. 15:196. 10.3390/v15010196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108. Bertozzi Silva  J, Storms  Z, Sauvageau  D. Host receptors for bacteriophage adsorption. FEMS Microbiol Lett  2016;363:fnw002. 10.1093/femsle/fnw002 [DOI] [PubMed] [Google Scholar]
  • 109. Dunne  M, Prokhorov  NS, Loessner  MJ  et al. Reprogramming bacteriophage host range: design principles and strategies for engineering receptor binding proteins. Curr Opin Biotechnol  2021;68:272–81. 10.1016/j.copbio.2021.02.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110. Dunne  M, Hupfeld  M, Klumpp  J  et al. Molecular basis of bacterial host interactions by gram-positive targeting bacteriophages. Viruses  2018;10:397. 10.3390/v10080397 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111. Hou  C, Liu D, Zafar A  et al. Understanding protein language model scaling on mutation effect prediction. bioRxiv. 2025. 10.1101/2025.04.25.650688 [DOI]
  • 112. Gonzales  MEM, Ureta  JC, Shrestha  AM. PHIStruct: improving phage–host interaction prediction at low sequence similarity settings using structure-aware protein embeddings. Bioinformatics  2025;41:btaf016. 10.1093/bioinformatics/btaf016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113. Jumper  J, Evans  R, Pritzel  A  et al. Highly accurate protein structure prediction with AlphaFold. Nature  2021;596:583–9. 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114. Mirdita  M, Schütze  K, Moriwaki  Y  et al. ColabFold: making protein folding accessible to all. Nat Methods  2022;19:679–82. 10.1038/s41592-022-01488-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115. Su  J, Han C, Zhou Y  et al. Saprot: protein language modeling with structure-aware vocabulary. bioRxiv. 2023. 10.1101/2023.10.01.560349 [DOI]
  • 116. De Smet  J, Hendrix H, Blasdel BG  et al. Pseudomonas predators: understanding and exploiting phage–host interactions. Nat Rev Microbiol  2017;15:517–30. 10.1038/nrmicro.2017.61 [DOI] [PubMed] [Google Scholar]
  • 117. Pelegrin  AC, Palmieri  M, Mirande  C  et al. Pseudomonas aeruginosa: a clinical and genomics update. FEMS Microbiol Rev  2021;45:fuab026. 10.1093/femsre/fuab026 [DOI] [PubMed] [Google Scholar]
  • 118. Jdeed  G, Kravchuk  B, Tikunova  NV. Factors affecting phage-bacteria coevolution dynamics. Viruses  2025;17:2. 10.3390/v17020235 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119. Piel  D, Bruto M, Labreuche Y  et al. Phage–host coevolution in natural populations. Nat Microbiol  2022;7:1075–86. [DOI] [PubMed] [Google Scholar]
  • 120. Mavrich  TN, Hatfull  GF. Bacteriophage evolution differs by host, lifestyle and genome. Nat Microbiol  2017;2:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121. Koskella  B, Parr  N. The evolution of bacterial resistance against bacteriophages in the horse chestnut phyllosphere is general across both space and time. Philos Trans R Soc Lond Ser B Biol Sci  2015;370:20140297. 10.1098/rstb.2014.0297 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122. Koncz  M, Stirling  T, Hadj Mehdi  H  et al. Genomic surveillance as a scalable framework for precision phage therapy against antibiotic-resistant pathogens. Cell  2024;187:5901–5918.e28. 10.1016/j.cell.2024.09.009 [DOI] [PubMed] [Google Scholar]
  • 123. Hancock  JT, Khoshgoftaar  TM, Johnson  JM. Evaluating classifier performance with highly imbalanced big data. J Big Data  2023;10:42. 10.1186/s40537-023-00724-5 [DOI] [Google Scholar]
  • 124. Baker  DN, Langmead  B. Dashing: fast and accurate genomic distances with HyperLogLog. Genome Biol  2019;20:265. 10.1186/s13059-019-1875-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125. Allen  B. The promise of explainable AI in digital health for precision medicine: a systematic review. Journal of Personalized Medicine  2024;14:277. 10.3390/jpm14030277 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126. Rosenbacke  R, Melhus  Å, McKee  M  et al. How explainable artificial intelligence can increase or decrease clinicians’ trust in AI applications in health care: systematic review. Jmir Ai  2024;3:e53207. 10.2196/53207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127. Van Kolfschooten  H, Van Oirschot  J. The EU artificial intelligence act (2024): implications for healthcare. Health Policy  2024;149:105152. [DOI] [PubMed] [Google Scholar]
  • 128. Aboy  M, Minssen  T, Vayena  E. Navigating the EU AI act: implications for regulated digital medical products. NPJ Digital Medicine  2024;7:237. 10.1038/s41746-024-01232-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129. Vardas  EP, Marketou  M, Vardas  PE. Medicine, healthcare and the AI act: gaps, challenges and future implications. European Heart Journal-Digital Health  2025;6:833–9. 10.1093/ehjdh/ztaf041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130. Rzhepishevska  O, Limanska  N, Galkin  M  et al. Characterization of clinically relevant model bacterial strains of Pseudomonas aeruginosa for anti-biofilm testing of materials. Acta Biomater  2018;76:99–107. 10.1016/j.actbio.2018.06.019 [DOI] [PubMed] [Google Scholar]
  • 131. Del Mar Cendra  M, Torrents  E. Differential adaptability between reference strains and clinical isolates of Pseudomonas aeruginosa into the lung epithelium intracellular lifestyle. Virulence  2020;11:862–76. 10.1080/21505594.2020.1787034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132. Fux  CA, Shirtliff  M, Stoodley  P  et al. Can laboratory reference strains mirror “real-world” pathogenesis?  Trends Microbiol  2005;13:58–63. [DOI] [PubMed] [Google Scholar]
  • 133. Handa  VL, Patel  BN, Bhattacharya  DA  et al. A study of antibiotic resistance pattern of clinical bacterial pathogens isolated from patients in a tertiary care hospital. Front Microbiol  2024;15–:1383989. 10.3389/fmicb.2024.1383989 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134. Zrelovs  N, Dislers  A, Kazaks  A. Motley crew: overview of the currently available phage diversity. Front Microbiol  2020;11:579452. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Yang  Y, Dufault-Thompson  K, Yan  W  et al. Large-scale genomic survey with deep learning-based method reveals strain-level phage specificity determinants. GigaScience  2024;13:giae017. 10.1093/gigascience/giae017 [DOI] [PMC free article] [PubMed]

Data Availability Statement

No new data were generated or analysed in support of this research.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES