Prophage Tracer: precisely tracing prophages in prokaryotic genomes using overlapping split-read alignment

Kaihao Tang; Weiquan Wang; Yamin Sun; Yiqing Zhou; Pengxia Wang; Yunxue Guo; Xiaoxue Wang

doi:10.1093/nar/gkab824

. 2021 Sep 22;49(22):e128. doi: 10.1093/nar/gkab824

Prophage Tracer: precisely tracing prophages in prokaryotic genomes using overlapping split-read alignment

Kaihao Tang ^1,², Weiquan Wang ^3,^4,⁵, Yamin Sun ⁶, Yiqing Zhou ^7,^8,⁹, Pengxia Wang ^10,^11,¹², Yunxue Guo ^13,^14,¹⁵, Xiaoxue Wang ^16,^17,^18,^✉

¹ Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Sciences, No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China

² Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China

³ Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Sciences, No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China

⁴ Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China

⁵ University of Chinese Academy of Sciences, Beijing, China

⁶ Research Center for Functional Genomics and Biochip, 23 Hongda St., Tianjin 300457, China

⁷ Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Sciences, No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China

⁸ Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China

⁹ University of Chinese Academy of Sciences, Beijing, China

¹⁰ Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Sciences, No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China

¹¹ Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China

¹² University of Chinese Academy of Sciences, Beijing, China

¹³ Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Sciences, No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China

¹⁴ Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China

¹⁵ University of Chinese Academy of Sciences, Beijing, China

¹⁶ Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Sciences, No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China

¹⁷ Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China

¹⁸ University of Chinese Academy of Sciences, Beijing, China

^✉

To whom correspondence should be addressed. Tel: +86 20 8926 7515; Email: xxwang@scsio.ac.cn

PMCID: PMC8682789 PMID: 34551431

Abstract

The life cycle of temperate phages includes a lysogenic cycle stage when the phage integrates into the host genome and becomes a prophage. However, the identification of prophages that are highly divergent from known phages remains challenging. In this study, by taking advantage of the lysis-lysogeny switch of temperate phages, we designed Prophage Tracer, a tool for recognizing active prophages in prokaryotic genomes using short-read sequencing data, independent of phage gene similarity searching. Prophage Tracer uses the criterion of overlapping split-read alignment to recognize discriminative reads that contain bacterial (attB) and phage (attP) att sites representing prophage excision signals. Performance testing showed that Prophage Tracer could predict known prophages with precise boundaries, as well as novel prophages. Two novel prophages, dsDNA and ssDNA, encoding highly divergent major capsid proteins, were identified in coral-associated bacteria. Prophage Tracer is a reliable data mining tool for the identification of novel temperate phages and mobile genetic elements. The code for the Prophage Tracer is publicly available at https://github.com/WangLab-SCSIO/Prophage_Tracer.

INTRODUCTION

Temperate phages can integrate into the bacterial chromosome to become prophages and enter lysogeny, maintaining a long-term association with their bacterial hosts. Lysogeny may be more prevalent than lytic cycles in bacteria-phage interactions and may become increasingly important in ecosystems with high microbial densities (1,2). Majority of commensal bacteria within the human and murine gut, as well as in coral microbiota (3–5), were found to be lysogens, and prophages can be spontaneously induced as active phages (4). Prophages may constitute up to 20% of a bacterium's genome (6) and serve as regulatory switches that regulate bacterial genes via genome excision (7,8). A novel family of non-tailed dsDNA viruses, Autolykiviridae, was identified recently and revealed a large number of previously unrecognized prophages in various bacterial taxa (9). Although the metagenomic analysis of geographically diverse samples contributes to the identification of new viruses (10,11), identifying novel prophages in prokaryotic genomes remains challenging.

Many tools have been developed to predict prophages using various strategies (12–18). Most of these methods, including Phage_Finder, PHASTER, VirSorter and Prophage Hunter, are mainly dependent on sequence similarity searching against a built-in validated dataset containing known phages to recognize phage-related gene enriched regions. However, phages are highly divergent and evolve rapidly. Sequence conservation among phage structural proteins, such as major capsid proteins (MCPs), decreases rapidly, even over short evolutionary distances (19,20), and therefore may not indicate readily detectable similarity with identified phages. In addition, known phages may represent only a small portion of phage diversity (10,11), and a previous analysis demonstrated that most identified prophages are derived from a small number of host phyla (21). Furthermore, auxiliary metabolic genes are prevalent in phages (11,22,23), which may also blur the boundaries between prophages and host genome sequences. Therefore, sequence-similarity-independent approaches are needed to identify novel temperate phages.

Compared to obligate lytic phages, the life cycle of temperate phages includes a lysis-lysogeny decision-making process. The lytic conversion of active prophages can affect individual cells, as well as entire communities, and is central to bacterial physiology, metabolism and evolution. Cryptic prophages, which are incapable of forming plaques, can also provide multiple benefits to the host for surviving adverse environmental conditions (24). We previously discovered that the cryptic prophage CP4So in Shewanella oneidensis excises specifically to increase the survival of host at cold temperatures (25), and recently we further revealed that the excision of CP4So relies on temperature-dependent phosphorylation of the host H-NS (26). Indeed, the spontaneous induction of various prophages at low rates has been observed in various bacterial taxa (27–29). Moreover, stress conditions, such as UV and oxidative stress, and biofilm formation also trigger prophage induction and/or prophage excision (24,30,31). Conventional whole-genome sequencing or the resequencing of microbes can generate millions of pieces of short-read or long-read DNA sequencing data. Among these reads, a large number are not properly aligned when mapped to the reference genomes, which may be attributable to horizontal gene transfer, genome rearrangement, and the activities of mobile DNA elements (32). These improperly aligned reads, including split reads and discordant read pairs, are usually overlooked during the genome assembly process. However, they may provide extra information on prophage induction and/or excision. Therefore, we reasoned that the split reads generated from prophage induction and/or prophage excision may provide an important genetic resource to identify unknown prophages hidden in various microbial hosts.

Therefore, we designed Prophage Tracer, a simple algorithm that uses overlapping split-read alignment to identify active and cryptic prophages hidden in DNA sequencing data. The basic logic of Prophage Tracer is that the attachment sites of direct repeats (attL and attR) are recombined to form bacterial (attB) and phage (attP) att sites (att sites representing attL/R/B/P common core sequences), and reads containing attB or attP can generate overlapping split-read alignments. These discriminative signals can facilitate the prediction of prophages, requiring a minimum of only one split read. In this study, utilizing the simulated reads and DNA sequencing reads of a variety of bacterial species, we demonstrate that Prophage Tracer can predict known and novel active prophages that are highly diverse with precise boundaries. This approach is independent of phage gene similarity search. Taking advantage of DNA sequencing data, Prophage Tracer is a reliable data mining tool and is complementary to other current state-of-the-art tools for the study of prophages.

MATERIALS AND METHODS

Prophage workflow

For the chromosome-level assembled genome, split reads and discordant read pairs were extracted from the alignment in SAM (Sequence Alignment/Map) format generated by Burrows-Wheeler Aligner (BWA-mem algorithm) (33,34). Split reads cannot be represented as a linear alignment that can be split into more than two parts that are aligned to different parts of the reference genome. First, split reads were preliminarily extracted according to FLAG strings matching aSbM and CIGAR strings matching 145, 81, 99 or 163 or FLAG strings matching aMbS and CIGAR strings matching 97, 161, 147 or 83. The integer values of a and b were allowed from 10–150 for paired-end reads (2 × 150 bp) generated by commonly used Illumina instruments. These reads were extracted for further BlastN (35) searching against the reference genome. If one read split into two parts spanning R₁–R₂ and R₃–R₄ on the query read, the integer values of these locations should be R₁ <R₃ <R₂ <R₄ and were aligned to two different regions of the reference genome by BlastN, ensuring an overlapping split-read alignment. Reads containing attB or attP can be differentiated by the FLAG strings and the alignment locations on the reference genomes. The R₁ to R₄ locations represent the endpoints of attL and attR of prophage candidates. These filtered reads were subsequently clustered and summarized according to the R₁ to R₄ locations. Furthermore, discordant read pairs were extracted according to FLAG strings matching dM (integer values of d > 130) and CIGAR strings matching 97, 145, 81 or 161 and merged to the previously clustered split reads according the values of POS and MRNM fields in the SAM file and whether they spanned the R₁ to R₄ locations. The positions between discordant read pairs representing attB and attP were also considered in the clustering process. The positions of representative extracted discordant read pairs are shown in Supplementary Figure S1. Finally, prophage candidates were filtered according to att site length (default > 2 bp), prophage size (default >5000 and <150 000 bp), and attB/attP event count (default both ≥1). The default parameters of att site length and prophage size were established according to previous studies (17,18).

For contig-level assembled genomes, further steps were employed to extract split reads and discordant read pairs. Briefly, if an intact prophage was located in two separate contigs, in consideration of four possible orientations, FLAG strings matching aSbM and CIGAR strings matching 113 or 117 or FLAG strings matching aMbS and CIGAR strings matching 65 or 129 were further used to extract split reads. FLAG strings matching dM and CIGAR strings matching 177, 113, 129 or 65 were further used to extract discordant read pairs.

Comparison with LUMPY using simulated data

To simulate genomes containing prophages, we used a custom shell script available via Prophage Tracer GitHub (https://github.com/WangLab-SCSIO/Prophage_Tracer). Genomes with ∼4 M base pairs containing one prophage each were simulated. The length of the att site was randomly selected from 2 to 145 bp (with a 1–2 bp mismatch if att site > 2 bp) and prophage size from 5000 to 150 000 bp. The GC content across genomes was allowed to be 20–80%. The corresponding bacterial host genomes with prophage-excised (containing attB) and circular prophage genomes (containing attP) were also generated. Paired reads of 2 × 150-bp with four different sequencing depths (10×, 20×, 50× and 100×) were generated using the sequencing read simulator GemSIM (36) in metagenomic mode which was used to simulate four different ratios of the host genome, host genome with prophage excised, and circular prophage genome (WT: attB: attP). A total of 320 sequencing read data points from 20 genomes were simulated, and this step was repeated three times. Simulated sequencing reads were aligned to reference genomes by Burrows-Wheeler Aligner (BWA-MEM algorithm) (33,34), and duplicates were removed by sambamba (37). The outputs were further compared to evaluate the effect of sequencing depth on the sensitivity of LUMPY and Prophage Tracer at various sequencing depths or att site lengths. The default parameters and pre-processing steps of data used in LUMPY procedure were the same as indicated on the LUMPY GitHub (https://github.com/hall-lab/lumpy-sv).

Identification and characterization of prophages in coral-associated bacteria

The genomes of seven bacteria belonging to Alphaproteobacteria, Gammaproteobacteria, and Flavobacteriia were sequenced by the Illumina and PacBio platforms, and complete genomes were assembled and annotated by the NCBI Prokaryotic Genome Annotation Pipeline (38). Short-read data from Illumina were used to predict active prophages with Prophage Tracer, and genome sequences were analyzed using the LUMPY, PHASTER and Prophage Hunter web portals. The prophage excision and predicted attB and attP sites were confirmed by a PCR-based assay followed by sequencing using primers flanking each prophage (Supplementary Table S1). The prophage excision rate was evaluated by quantitative PCR (qPCR) as previously described (25). The relative amounts of the excised prophages were determined using the reference gene gyrB. qPCR was assayed for technical triplicates of each biological repeat. Primer pairs are listed in Supplementary Table S1. Sequencing depths (i.e. coverage) across the genomes of seven coral-associated bacteria were plotted using karyoploteR with a window size of 1000 bp (39).

Prediction of prophages in publicly available genomes

Prophage Tracer was tested using publicly available chromosome-level genomes that had their corresponding short-read sequencing data also deposited in NCBI (Supplementary Table S2). In order to evaluate the capability of Prophage Tracer on the chromosome-level and the contig-level of assembled genomes, these genomes were reassembled to the contig-level only using their short-read sequencing data by Shovill v1.1.0 with default parameters (T. Seeman, https://github.com/tseemann/shovill). Short-read sequencing data were pre-processed by Trimmomatic v0.39 (40) to remove low-quality (pred33) regions and adapters. Predicted prophages from two different levels of genome assemblies were manually checked for the presence of phage structural genes or other phage related genes annotated using the CDD database (41). Chromosome-level, contig-level and prophage genomes of each strain were aligned by QUAST v5.0.2 (42) to confirm the locations of contigs and prophages on the chromosomes.

Phylogenetic analysis

Each sequence of major capsid protein of representative prophages was used as a query for PSI-BLAST (43) against the NR database and sequences with e-value < 0.05 were collected. All recovered sequences were clustered at 70% identity using CD-HIT suite (44).The filtered sequences were aligned by MAFFT (45) and further edited by trimAl (46). Each final data set was used for the maximum likelihood (ML) phylogenetic analysis by the W-IQ-TREE (47). The best-fit substitution model was automatically determined and the reliability of internal branches was tested by 1000 ultrafast bootstrap replicates (48) in the W-IQ-TREE web interface. The tree was further annotated by the iTOL tool (49).

RESULTS

Overview of Prophage Tracer

Prophage Tracer employs a simple principle: prophage induction and/or excision can generate genetic structural variations, including circular prophage DNAs and/or large genomic deletions on the bacterial chromosome. This process leads to some sequencing reads being improperly aligned to the reference genome during genome assembly. These improperly aligned reads can be utilized to identify prophages and to locate prophage boundaries. This strategy does not rely on known phage sequences and has the potential to identify novel prophages.

The overall Prophage Tracer workflow is shown in Figure 1A. Prophage Tracer takes aligned reads in SAM format as input. First, split reads and discordant read pairs are preliminarily extracted according to FLAG and CIGAR strings (defined by the SAM specification). As illustrated in Figure 1B, if the split reads contain attB or attP sites, then this site matches the attL and attR of the reference genome. Therefore, alignment of the split read and the corresponding reference genomes generate overlapping regions inside the split reads, suggesting that this region contains potential attB or attP sites. The concept of overlapping split-read alignment is simple but critical for Prophage Tracer to precisely identify candidate prophages. Next, overlapping alignment from BlastN output is used to infer the precise positions of attL and attR sites (Figure 1B and Supplementary Figure S2A). Candidate prophage boundaries are clustered by judging the proximity of all four of attL and attR site positions, and discordant read pairs are merged according to the candidate prophage boundaries. Meanwhile, attB/attP events are counted for each candidate prophage. Finally, candidate prophages are filtered by attB/attP event count, prophage size and att site length.

This approach can eliminate the overwhelming numbers of false positive split reads that are generated in mapping routine bacterial genome sequencing reads by other types of unknown structural variations. This approach can be applied to chromosome- or contig-level genomes. For an intact prophage located in a complete-level genome or in one contig of a contig-level genome, Prophage Tracer can provide the precise positions of att sites and the lengths of prophages. For an intact prophage located separately at the termini of two contigs of contig-level genomes, Prophage Tracer can also provide precise positions of att sites and the approximate lengths of prophages, which might be useful as a screening tool to determine whether contigs are worth converting contigs to complete-genomes for the extraction of intact prophages. Furthermore, mobile genetic elements that rely on site-specific recombinases can also detected by Prophage Tracer. The requirement of CPU and memory usage for Prophage Tracer is low, and the runtime is ∼30–60 s per run. A typical output of Prophage Tracer contains positions of attL and attR, evidence counts of attB and attP, and overlapping split-read alignment to enable the further manual determination of the potential impact on genes disrupted at the integration sites.

Comparison with LUMPY using simulated data

Since Prophage Tracer employs a strategy based on the detection of split reads and discordant read pairs, we compare it with LUMPY, which employs a similar strategy (50). LUMPY is designed for the detection of structural variation and is primarily employed for human genome analysis, as well as for bacterial resequencing analysis (51,52). Overall, Prophage Tracer performed better than LUMPY on simulated data with low prophage excision rates and low sequencing depths (Figure 2). Prophage Tracer was able to detect prophage excision signals when the prophage excision rate (attB/WT) was ∼1/1000 (without replication which was calculated from attP/attB) at a minimum sequencing depth of 50× (left panel of Figure 2A). At this excision rate, if the abundance of circular prophage DNA was 10 times higher, Prophage Tracer could detect prophage excision signals when the sequencing depth was as low as 10× (middle panels of Figure 2A). In comparison, LUMPY required a higher prophage excision rate and a higher abundance of circular prophages, and it only performed as well as with Prophage Tracer when the prophage excision rate (attB/WT ≥ 1%) and replication (attP/attB = 99) were both high (right panel of Figure 2A).

Figure 2. — Performance comparison of Prophage Tracer and LUMPY using simulated data. (A) Comparison of sensitivity for prophage detection. Sensitivity is defined as the average ratio of positive hits of three rounds of simulated data (each round with 20 genomes). The ratio of the host genome, host genome with prophage excised and circular prophage genome (WT: *attB*: *attP*) is on the top of each panel. (B) The average relative ratio of recovered split reads between LUMPY and Prophage Tracer. (C) The recovered split reads by Prophage Tracer and LUMPY from simulated data with *att* sites ranging from 2 to160 bp. Expected split reads in the SAM file using simulated data was extracted according to CIGAR strings of aMbS or aSbM (integer values of a and b from 1–149) mapping at expected prophage positions. Detailed information on the simulated data is listed in Supplementary Table S3.

By manually checking the simulated data, we found that the Prophage Tracer could detect more split reads than LUMPY at four different sequencing depths (Figure 2B). Further simulation analysis (using att site length 2–160 bp) revealed that Prophage Tracer could extract prophage split reads with att site lengths ranging from 2 bp to 130 bp, while LUMPY can only extract the split reads from 2 to 50 bp (Figure 2C and Supplementary Table S3). In addition, the ability to detect split reads by LUMPY was greatly reduced with the increase of att site length when att site length >20 bp. We further checked the scripts of LUMPY found that the algorithm used by LUMPY to recognize split reads relies on the previously assigned of the ‘SA’ or ‘XA’ tags by BWA-MEM in SAM files. According to the BWA-MEM and the SAM format specification (33,53), an alignment of a read can be linear or chimeric. For a chimeric alignment, it contains a set of alignments that do not have large overlaps. If a chimeric alignment contains two linear alignments spanning R₁–R₂ and R₃–R₄ on the query read (Supplementary Figure S3), the assignment of ‘SA’ or ‘XA’ tags to the alignment depends on the length the overlaps (R₂–R₃) and the proportion of the overlaps in each linear alignment [(R₂ – R₃)/(R₂ – R₁) and (R₂ – R₃)/(R₄ – R₃)]. This limits the ability of LUMPY to detect split reads containing att sites larger than 50 bp. Instead, we employed BlastN to generate reliable overlapping alignments from the output of BWA-MEM and a custom algorithm to extract split reads in Prophage Tracer. This strategy enabled Prophage Tracer to precisely detect prophage induction or excision signals as long as the discriminative split reads contain attB or attP, even with a low prophage excision rate and a low sequencing depth.

Validation of the Prophage Tracer workflow

To validate the capability of Prophage Tracer to predict prophages, publicly available whole-genome sequencing data of bacterial isolates with identified prophages were utilized. Active and cryptic prophages, including the Pf4 prophage in Pseudomonas aeruginosa PAO1, the CP4So and LambdaSo prophages in S. oneidensis MR-1, the rac prophage in Escherichia coli K-12 and Φ10403S in Listeria monocytogenes 10403S, were successfully detected by Prophage Tracer (Table 1 and Supplementary Table S4). Split reads representing the attP events of Pf4 were detected, which was consistent with the presence of replicative form Pf4 molecules in the liquid culture of P. aeruginosa PAO1 (54). Using published E. coli K-12 resequencing data (55), the prophage rac was identified, and only a small number of split reads representing attP events were observed in various samples, suggesting that rac can be spontaneously induced at low ratios, which was consistent with the results of our previous research (24,56). Using our resequencing data of S. oneidensis MR-1 cultured at 4°C, both the CP4So and LambdaSo prophages were predicted. In contrast, only LambdaSo prophages were predicted at 30°C. This result was in agreement with the results of our previous study, which demonstrated that CP4So was induced only at low temperatures (25) and that LambdaSo had a relatively high excision rate (57). Furthermore, the impact of prophage excision on genes at the integration loci was determined in the Prophage Tracer output. It was demonstrated that the excision of CP4So caused the deletion of a U at the 3′-end of the tmRNA (SsrA), destroying this G·U wobble base pairing (25) (Supplementary Figure S2B). Furthermore, the integration of prophage Φ10403S within comK in L. monocytogenes 10403S (8,58) was also predicted, and it contains a 3-bp att site and a serine-type recombinase. Overall, Prophage Tracer is able to precisely predict known active prophages.

Table 1.

Prediction of known prophages in four representative strains

Strains	Prophage	Contig	attL_start	attL_end	attR_start	attR_end	Size (bp)	Length of att site	References
Pseudomonas aeruginosa PAO1	Pf4	NC_002516.2	785288	785336	797699	797747	12411	49	(54)
Shewanella oneidensis MR-1	CP4So	NC_004347.2	1501853	1501946	1538064	1538157	36211	94	(25)
Shewanella oneidensis MR-1	LambdaSo	NC_004347.2	3074594	3074605	3126435	3126446	51841	12	(57)
Escherichia coli K-12	rac	NZ_CP009273.1	1406156	1406198	1429216	1429258	23060	43	(24,55,56)
Listeria monocytogenes 10403S	Φ10403S	NC_017544.1	2319845	2319847	2357456	2357458	37611	3	(8,58)

Open in a new tab

Comparison with PHASTER/Prophage Hunter/LUMPY to predict prophages

PHASTER (12) and Prophage Hunter (18) are designed for the detection of prophages in prokaryotic genomes using similarity searching. To evaluate the potential of Prophage Tracer to predict prophages, seven different bacterial strains isolated from the stony coral Galaxea fascicularis (11,59,60) were sequenced and analyzed by Prophage Tracer and these two methods. In total, nine candidate prophages were predicted by Prophage Tracer (Table 2). In comparison, LUMPY missed four of them because the number of split reads was too low or the length of att sites was too long to be detected by LUMPY, which was consistent with our tests on the simulated data above (Supplementary Table S5). In addition, among these nine candidate prophages, PHASTER also identified five of them and Prophage Hunter identified eight of them to some degree (Supplementary Table S5). The annotation of these five prophages demonstrated intact phage structural and regular proteins, such as capsid, head, tail, terminase, portal and integrase (Supplementary Table S6). Next, we checked the boundaries and attachment sites of these prophages using PCR primers to specifically amplify the region containing the attB or attP region, and we subsequently sequenced these regions (Supplementary Figure S4). The boundaries and attachment sites of these prophages predicted by Prophage Tracer agreed well with the results of the PCR-based assay (Supplementary Table S7). In contrast, some prophage boundaries predicted by Prophage Hunter or PHASTER were not accurate (Table 2).

Table 2.

Comparison of outputs of the predicted active prophages by Prophage Tracer with PHASTER or Prophage Hunter in seven coral-associated bacterial strains^a

		Prophage Tracer
Strain name	Prophage	attL_start	attL_end	attR_start	attR_end	Size	PHASTER^b	Prophage Hunter^c
Erythrobacter aquimaris SCSIO 43205	Pea1	1888722	1888741	1936851	1936870	48129	Questionable (70):	Active (0.9): 1885416–1903092
Erythrobacter aquimaris SCSIO 43205							1895962–1915899	Active (0.97): 1888722–1936870
								Active (0.91): 1917844–1948048
Ruegeria conchae SCSIO 43209	Prc1	1373446	1373460	1379997	1380011	6551	-	Inactive (0.12):1362384–1392851
Halomonas meridiana SCSIO 43005	Phm1	292609	292683	333351	333425	40742	Intact (150):	Inactive (0.14): 274307–310741
Halomonas meridiana SCSIO 43005							293153–331437	Active (0.9): 292613–333425
	Phm2	1064123	1064145	1100156	1100178	36033	Intact (150):	Ambiguous (0.73): 1064268–1100128
							1075299–1101737
	Phm3	2090511	2090576	2139945	2140010	49434	Incomplete (20):	Active (0.93): 2077440–2104589
							2090437–2116834	Active (0.97): 2090511–2140010
								Ambiguous (0.76): 2124591–2138678
Vibrio nigripulchritudo SCSIO 43132 (contig1)	Pvn1	353280	353303	367745	367768	14465	-	Ambiguous (0.77): 350448–374105
Marixanthomonas ophiurae SCSIO 43207	Pmo1	2643352	2643371	2676198	2676217	32846	-	-
Mesoflavibacter sabulilitoris SCSIO 43206	Pms1	2668021	2668042	2679241	2679262	11220	-	Inactive (0.34): 2648530–2674443
Zunongwangia mangrovi SCSIO 43204	Pzm1	1472262	1472314	1512357	1512409	40095	Incomplete (30):	Ambiguous (0.72): 1461300–1484415
Zunongwangia mangrovi SCSIO 43204							1486303–1511274	Active (0.92): 1469187–1485662
								Active (0.95): 1487346–1517819
								Inactive (0.26): 1510309–1532089

Open in a new tab

^aFull outputs of these three tools and LUMPY are shown in Supplementary Table S6.

^bOutputs of prophage regions predicted by PHASTER (the scores are in parenthesis and the predicted ends are shown). ‘–’ indicates ‘not detected’.

^cOutputs of prophage regions predicted by Prophage Hunter (the scores are in parenthesis and the predicted ends are shown). ‘–’ indicates ‘not detected’.

Among the five prophages, Phm3 is integrated into the tRNA-Leu of Halomonas meridiana SCSIO43005 (Supplementary Table S7). Further analysis showed that this prophage was similar to a metagenomic assembled prokaryotic dsDNA virus (MK892487.1) (Supplementary Figure S5 and Supplementary Figure S6) from the virome obtained during the Tara Oceans and Malaspina research expeditions (61,62), indicating that this dsDNA virus is a temperate phage. The MCP of Phm3 showed ∼30% sequence identity with the MCPs of characterized Myoviridae viruses.

Capability to predict novel prophages

For the nine prophages predicted by Prophage Tracer, three of them may represent novel temperate phages (Table 2). The annotation of the potential capsid proteins of these prophages only showed remote homologs with other viruses (Supplementary Table S6). In particular, these three prophages were not detected as prophages (intact, incomplete or questionable) by PHASTER. As Prophage Hunter generated up to 106 ambiguous or inactive candidate prophage regions for the seven strains tested, we found that some ambiguous or inactive prophages partially overlapped with the three novel prophages predicted by Prophage Tracer. However, these hits either had low scores or were far away (> 10 kb) from the ones predicted by Prophage Tracer (Table 2 and Supplementary Table S5).

Prophage Prc1 in Ruegeria conchae SCSIO 43209 has a 6 551-bp circular genome with nine predicted genes within the Microviridae family according to the genome content and phylogenetic analysis of MCPs (Figure 3AD). Closely related homologs of Prc1 MCP were found in other Alphaproteobacteria and metagenomic assembled Microviridae spp. and fell into a separate clade different from two members of the Microviridae subfamily (Gokushovirinae and Bullavirinae) and the recently identified Ruegeriap phage vB_RpoMi-Mini (63) and Citromicrobium phage vB_Cib_ssDNA_P1 (64). Temperate Microviridae phages are prevalent in the human gut and have been found to be integrated in the genomes of Firmicutes, Bacteroidetes, and Proteobacteria (65). Similarly, Microviridae sequences were dominant in coral virome communities (66), and their abundance increased in stressed/bleached corals (67,68). These results suggested that temperate Microviridae phages in coral are more diverse than previously thought.

Figure 3. — Gene maps and phylogenetic analysis of major capsid proteins of representative prophages. Gene maps of Pcr1 (A), Pvn1 (B) and Pms1 (C). Gene orientation of circular genomes was adjusted to make the aligned major capsid proteins. All the genomes are on the same scale as indicated. Genes are represented by block arrows and are colored according to gene function. Homologs of hypothetical proteins in (B) are indicated in black. Unrooted maximum likelihood trees of MCP homologs of Pcr1 (D), Pvn1 (E) and Pms1 (F). MCPs from isolated or uncultured viruses are highlighted in the trees, and MCPs from prophages are indicated as branches. Branch lengths are proportional to the number of amino acid substitutions.

In addition, prophage Pvn1 in Vibrio nigripulchritudo SCISO 43132 has a 14 465-bp circular genome with 27 predicted genes, integrated within the tRNA-dihydrouridine synthase A (dusA) gene and encoding double jelly roll (DJR) MCP. The genome organization of Pvn1 is similar to that of Pseudoalteromonas phage PM2 (69) and the recently identified prophages in Vibrio species (9) (Figure 3BE). Moreover, prophage Pms1 in Mesoflavibacter sabulilitoris SCSIO 43206 has an 11 220-bp circular genome with 18 predicted genes (Supplementary Table S6). BlastP searching revealed no sequence similarity of the phage structural genes to known viruses. Further utilization of the remote homology detection tool HHpred identified more phage-related genes in Pms1 (Figure 3C), especially INR78_12270, which showed undetectable amino acid sequence similarity but was structurally similar to the MCP of Flavobacterium phage FLiP (70), and INR78_12275, which showed 27% identity with the ssDNA replication protein in Cellulophaga phage phi48:2 (71). FLiP group phages are unusual lipid-containing ssDNA bacteriophages encoding DJR MCP that are mainly found in dsDNA bacteriophages (71,72). One representative FLiP group phage was isolated from red snapper tissue samples (73). All the MCPs found in the FLiP group primarily belong to marine Bacteroidetes, and the MCP of Pms1 was classified into a distinct clade different from other known FLiP group phages in the phylogenetic tree (Figure 3F). These results indicate that prophage Pms1 may represent a novel temperate bacteriophage that is similar to FLiP. Additionally, the Pmo1 element in Marixanthomonas ophiurae SCSIO 43207 is integrated into the tRNA gene and contains an integrase. This element contains various transporters, virulence associated protein E, VirE and outer membrane protein TolC encoding genes, and no phage structural genes were identified. This suggests that it may be other type of mobile genetic elements.

Genomes of the seven coral-associated bacteria tested were complete-level genomes. To further evaluate the capability of Prophage Tracer to detect prophages using contig-level genomes, the contig-level genomes of the same seven strains were re-assembled using their corresponding short-read sequencing data. In contig-level genomes, an intact prophage may be integrated into an intact contig surrounded by host sequences, separated at the termini of two contigs, or assembled into their own contigs. For the above eight prophages and one mobile genetic element, we found that six were in one intact contig and three (Pea1, Phm1 and Phm3) were in the termini of two or three contigs (Figure 4A, C). For Pea1, Phm1 and Phm3, Prophage Tracer could detect almost identical split reads and discordant read pairs using either complete-level or contig-level genomes (Figure 4A and Supplementary Table S8). Furthermore, we tested Prophage Tracer using publicly available chromosome-level genomes that have their corresponding short-read sequencing data also deposited. A total of 81 candidate prophages or other mobile genetic elements with tyrosine-type recombinases or serine-type recombinases in 51 archaeal and bacterial genomes were predicted (Supplementary Table S2). Among them, 32 strains containing 48 prophage regions with high sequencing qualities were chosen for further assembling contig-level genomes. Using these contig-level genomes, Prophage Tracer predicted that 18 prophage regions were integrated into a contig and 15 were separated at the termini of two contigs. The remaining 15 prophage regions were not predicted in contig-level genomes, partly because they were assembled into their own separate contigs. These results indicate that our approach may be useful as a preliminary screening tool for prophages in contig-level genomes to determine whether it is worth converting contigs to complete-genomes in order to extract intact prophages for subsequent study.

Figure 4. — Prophage Tracer combined with qPCR to estimate the fold-change of prophage excision rate with or without mitomycin C. (A) Read counts in the outputs of Prophage Tracer of seven coral-associated bacterial strains with or without mitomycin C. SR, split read; DRP, discordant read pair. ‘–’ indicates ‘not detected’ or ‘unable to calculate’. The calculation of the fold-change of excision rate using read counts in the outputs of Prophage Tracer (if a zero is in the dividend, use one instead of zero). Prophage Tracer outputs using contig-level genomes are shown at left bottom. ‘::’ indicates a potential junction of two contigs. ‘ = contig’ indicates left junction and ‘contig = ’ indicates right junction. Full outputs including positions of *att* sites on each contig are shown in Supplementary Table S8. (B) Excision rates of Phm1, Phm2 and Phm3 prophages in SCSIO 43005 quantified by qPCR. Fold-change are indicated for Phm1 and Phm3, and significant changes are marked with one asterisk for P < 0.05. (C) Alignments of prophages to contig-level genomes.

Prophage Tracer can not only predict prophages using the above sequencing data derived from pure culture genomes, but also from data derived from enriched mixed culture. A recently discovered manganese oxidation bacterium ‘Candidatus Manganitrophus noduliformans’ cannot be isolated a pure culture, and can only be enriched in a mixed culture with other bacteria (74). Using the sequencing data of the mixed culture downloaded from NCBI, Prophage Tracer detected two potential prophage regions in ‘Candidatus Manganitrophus noduliformans’ with accurate boundaries (Supplementary Table S2). One potential region contains genes encoding typical phage structural proteins, suggesting that it is an active prophage. Another potential region does not contain phage genes but contain genes encoding conjugal elements, transposase, and defense systems (i.e. retron (75) and type 3 BREX system (76)), suggesting that it is a defense island. Taken together, our results indicated that Prophage Tracer, which is built-in database-independent, is a reliable tool for predicting novel prophages and other mobile genetic elements.

Application and limits of Prophage Tracer to detect prophages

To further explore whether Prophage Tracer can be employed to detect prophage excision under stressed conditions, H. meridiana SCSIO 43005 was treated with 0.2 μg/mL mitomycin C for 4 hours and subjected to genome resequencing analysis. As shown in Figure 4A, compared to the untreated control, the number of extracted split reads and discordant read pairs containing the att sites of Phm1 and Phm3 relative to the total sequencing reads were much higher under mitomycin C induced condition. Our analysis on simulated data showed that the number of detected split reads of a prophage was highly correlated to the att site length at the same sequencing depth (Figure 2C), thus Prophage Tracer is not appropriate for the calculation of the excision rate of each prophage. However, for one specific prophage, read counts in the Prophage Tracer output can be used to estimate the fold-change of the excision rate under different conditions as shown in Figure 4A. It was found that mitomycin C induced the prophage excision of Phm1 and Phm3. Next, we performed qPCR to check the reliability of detecting the change of prophage excision using Prophage Tracer. Since Prophage Tracer can accurately predict the att sites of the prophages (Table 2), two pairs of qPCR primers were designed for each prophage to amplify the regions containing attB and attP (product size 200–300 bp; Supplementary Table S1), and used for quantifying the prophage excision. Consistently, qPCR results showed that the excision rates of Phm1 and Phm3 of SCSIO 43005 were greatly increased by mitomycin C (Figure 4B). The fold-change of excision rate quantified by qPCR was similar to the ones estimated using the reads values from the outputs of Prophage Tracer (Figure 4AB). The remaining six strains were also treated with mitomycin C and resequenced, and it was found that the excision rate of Pzm1 prophage of Z. mangrovi SCSIO 43204 was significantly increased with the mitomycin C treatment (Figure 4A). Thus, Prophage Tracer can be applied to detect the change of prophage excision at various conditions. Furthermore, the precise prediction of att sites by Prophage Tracer can then be used to design qPCR primers for subsequent quantification of the prophage excision rate by qPCR at a given condition.

Next, we investigated the detection power of Prophage Tracer to predict prophage with low excision rates and/or low replication rate at different sequencing depth. From our real sequencing data, Prophage Tracer can detect Phm1 (excision rate (attB/gyrB) of 2.6 × 10⁻³; replication (attP/gyrB) of 1.4 × 10⁻²), Phm2 (excision rate of 0.27 × 10⁻³; replication of 0.81 × 10⁻³) and Phm3 (excision rate of 4.2 × 10⁻³; replication of 1.3 × 10⁻¹) using ∼290 × sequencing depth in the absence of mitomycin C (Figure 4). Prophage Tracer can also predict prophage that is not excisable but can replicate. As shown above, Pf4 prophage was not excised in the liquid culture of P. aeruginosa PAO1, but Prophage Tracer detected the presence of replicative form Pf4 based on the split reads containing attP at ∼170 × sequencing depth (Supplementary Table S4). Based on our analysis, in order to detect prophages with low excision rate, 100–1000× sequencing depth for a genome is recommended. At this range of sequencing depth, Prophage Tracer can detect the hidden prophages with excision rates (attB/gyrB) >10⁻³ and/or replication (attP/gyrB) >10⁻³ in host genomes. Otherwise, more efforts should be given to explore the special conditions that can trigger prophage activation or excision in order to detect the hidden prophages by Prophage Tracer.

Last but not the least, we wanted to explore whether Prophage Tracer missed any prophages with high excision rates in the seven coral-associated bacteria through the analysis of sequencing depth across genomes. Briefly, the presence of genomic regions with unusually high sequencing depth indicates the possible presence of a prophage in this region. As shown in Supplementary Figure S7, the regions containing the three prophages (Pea1, Phm3 and Pzm1) with high excision rate or replication showed unusually high sequencing depths were all predicted by Prophage Tracer. Indeed, one genomic region also showed high sequencing depth in strain SCSIO 43204 but it was missed by Prophage Tracer. Further analysis showed that this prophage encodes proteins similar with Gp1 (protease I), Gp29 (DUF935 family) and Gp36 (DUF1320 family) of Mu phages, suggesting that it is a Mu-like prophage capable of packaging host genomes with variable ends (Supplementary Table S6). Likewise, we used Prophage Tracer to reanalyze the phage DNA sequencing data of a published study in which three mitomycin C induced prophages, BLi_Pp2, BLi_Pp3 and BLi_Pp6 were experimentally identified in Bacillus licheniformis DSM13 (77). Prophage Tracer detected BLi_Pp3 and BLi_Pp6 but not BLi_Pp2 (Supplementary Table S2), and a previous study showed that prophage BLi_Pp2 can randomly package DNA of the host genome (77). Noticeably, sequencing depth of the six prophages in the seven coral-associated bacteria were indistinguishable compared with the rest of host genomes, but they were able to be captured by Prophage Tracer (Supplementary Figure S7).

Here, we showed that the power of detecting prophage by Prophage Tracer is limited by the nature of the prophage, either having a very low excision rate or having variable ends. Collectively, Prophage Tracer can detect hidden prophages if they can excise with stable att sites at excision rate higher than >10⁻³ at the sequencing depth of 100–1000 × with precise boundaries.

DISCUSSION

Prophage-host interactions are currently recognized as being often mutualistic, rather than purely parasitic (78). Prophages are an important component of bacterial genomes and play critical roles in bacterial adaptation and evolution (7). The identification of active prophages is of central importance to the study of phage-host interactions. Prophage Tracer was validated and outperformed LUMPY using simulated reads, and it was determined to be superior to PHASTER and Prophage Hunter in predicting novel and highly divergent prophages in coral-associated bacteria. Furthermore, the predicted prophage boundaries were determined to be accurate, and read counts in the output can be used to estimate the fold-change of the excision rate under different conditions for one given prophage. The impact of prophage excision on genes containing attB or attP can also be manually analyzed in the Prophage Tracer output of overlapping split-read alignment. The accurate detection of prophage boundaries is important because prophages are usually integrated within bacterial functional genes (e.g. tRNA and tmRNA genes), and integration or excision may inactivate or reactivate target genes, which may affect the adaptation of bacterial hosts under diverse environments (7,79). Recent advances in DNA sequencing technologies have yielded overwhelming quantities of publicly available data on bacterial and archaeal genomes and their corresponding raw sequence reads. Mining active prophages in these genomes with accurate integrated sites may facilitate the study of phage ecology. Furthermore, we also expect that the application of Prophage Tracer will lead to the discovery of prophages in bacterial or archaeal taxa that are slow-growing and hard to cultivate, such as SAR11 (80) and ‘Asgard’ archaea (81). Additional functionality of the tool includes the identification of other families of mobile genetic elements that rely on site-specific recombinases, such as phage-inducible chromosomal islands, gene transfer agents, and integrative elements (82).

Because of the logic of Prophage Tracer, it has a few limitations. First, this tool cannot recognize prophages that do not excise or replicate during sample preparation for sequencing, or whose sequencing depth is too low to capture even one read containing attB or attP. Second, since Mu-like prophages excise with variable ends and other extrachromosomal/plasmidial prophages would not generate new junctions during their life cycle, they could not be detected by Prophage Tracer. Third, Prophage Tracer was designed for prophages with att site lengths shorter than read lengths. For att site lengths longer than the read length, discordant read pairs can also be used to estimate the boundaries. Lastly, Prophage Tracer may miss some prophages in contig-level genomes that have higher excision and replication activities and are assembled into their own separate contigs. In this case, the evaluation of sequencing the depth of contigs may be useful to distinguish which contigs are prophages. Therefore, Prophage Tracer is complementary to other tools, such as PHASTER and Prophage Hunter, and a combined approach would enable a more accurate prediction of prophages. Additionally, the performance of Prophage Tracer on long-read sequencing data has not been determined. Third-generation sequencing utilizing Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) methods can generate long-read sequencing data, and these methods are now widely employed for genome sequencing. Further efforts to optimize the performance using long-read sequencing data could expand the application of Prophage Tracer.

In theory, circularized prophage sequences resulting from prophage genome excision can also be recognized by Prophage Tracer in metagenomic sequencing data. Several tools have been developed for the identification of viral sequences from assembled metagenomic data. Seeker recognizes bacteriophage genomes through deep learning utilizing Long Short-Term Memory (LSTM) models neural networks (83). DeepVirFinder also utilizes deep learning to identify viral sequences (84). VirFinder employs k-mer frequency and machine learning to distinguish viral from bacterial contigs (85). Excised prophages could be an important component of the virome in various ecosystems (4,10). These tools cannot detect viral contigs representing excised circular or linear prophage DNA unless this prophage is excised or replicates at a high enough rate to assemble a viral contig. Prophage Tracer may recognize rare prophage excision signals in the metagenome if the host genome can be assembled. In this case, Prophage Tracer could be complementary to other current state-of-the-art tools for the study of prophages in metagenomes.

DATA AVAILABILITY

The code for the Prophage Tracer is written in the shell script including the Unix awk utility and is publicly available (https://github.com/WangLab-SCSIO/Prophage_Tracer). Bacterial genomes and sequencing read data have been deposited under GenBank BioProject numbers PRJNA668462 and PRJNA682846.

Supplementary Material

gkab824_Supplemental_Files

Click here for additional data file.^{(1.3MB, zip)}

Contributor Information

Kaihao Tang, Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Sciences, No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China; Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China.

Weiquan Wang, Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Sciences, No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China; Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China; University of Chinese Academy of Sciences, Beijing, China.

Yamin Sun, Research Center for Functional Genomics and Biochip, 23 Hongda St., Tianjin 300457, China.

Yiqing Zhou, Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Sciences, No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China; Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China; University of Chinese Academy of Sciences, Beijing, China.

Pengxia Wang, Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Sciences, No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China; Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China; University of Chinese Academy of Sciences, Beijing, China.

Yunxue Guo, Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Sciences, No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China; Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China; University of Chinese Academy of Sciences, Beijing, China.

Xiaoxue Wang, Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Sciences, No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China; Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road, Nansha District, Guangzhou 511458, China; University of Chinese Academy of Sciences, Beijing, China.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Science Foundation of China [31625001, 91951203, 41706172, 31970037, 32070175]; National Key R&D Program of China [2018YFC1406500]; Guangdong Local Innovation Team Program [2019BT02Y262]; Key Special Project for Introduced Talents Team of Southern Marine Science and Engineering Guangdong laboratory (Guangzhou) [GML2019ZD0407]; Guangdong Major Project of Basic and Applied Basic Research [2019B030302004]. Funding for open access charge: National Science Foundation of China [31625001, 91951203, 41706172, 31970037, 32070175]; National Key R&D Program of China [2018YFC1406500]; Guangdong Local Innovation Team Program [2019BT02Y262]; Key Special Project for Introduced Talents Team of Southern Marine Science and Engineering Guangdong laboratory (Guangzhou) [GML2019ZD0407]; Guangdong Major Project of Basic and Applied Basic Research [2019B030302004].

Conflict of interest statement. None declared.

REFERENCES

1. Silveira C.B., Rohwer F.L.. Piggyback-the-Winner in host-associated microbial communities. NPJ Biofilms Microbiomes. 2016; 2:16010. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Knowles B., Silveira C.B., Bailey B.A., Barott K., Cantu V.A., Cobian-Guemes A.G., Coutinho F.H., Dinsdale E.A., Felts B., Furby K.A.et al.. Lytic to temperate switching of viral communities. Nature. 2016; 531:466–470. [DOI] [PubMed] [Google Scholar]
3. Minot S., Bryson A., Chehoud C., Wu G.D., Lewis J.D., Bushman F.D.. Rapid evolution of the human gut virome. Proc. Natl. Acad. Sci. U.S.A. 2013; 110:12450–12455. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Kim M.S., Bae J.W.. Lysogeny is prevalent and widely distributed in the murine gut microbiota. ISME J. 2018; 12:1127–1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Bouvy M., Combe M., Bettarel Y., Dupuy C., Rochelle-Newall E., Charpy L.. Uncoupled viral and bacterial distributions in coral reef waters of Tuamotu Archipelago (French Polynesia). Mar. Pollut. Bull. 2012; 65:506–515. [DOI] [PubMed] [Google Scholar]
6. Casjens S. Prophages and bacterial genomics: what have we learned so far. Mol. Microbiol. 2003; 49:277–300. [DOI] [PubMed] [Google Scholar]
7. Feiner R., Argov T., Rabinovich L., Sigal N., Borovok I., Herskovits A.A.. A new perspective on lysogeny: prophages as active regulatory switches of bacteria. Nat. Rev. Micro. 2015; 13:641–650. [DOI] [PubMed] [Google Scholar]
8. Rabinovich L., Sigal N., Borovok I., Nir-Paz R., Herskovits A.A.. Prophage excision activates Listeria competence genes that promote phagosomal escape and virulence. Cell. 2012; 150:792–802. [DOI] [PubMed] [Google Scholar]
9. Kauffman K.M., Hussain F.A., Yang J., Arevalo P., Brown J.M., Chang W.K., VanInsberghe D., Elsherbini J., Sharma R.S., Cutler M.B.et al.. A major lineage of non-tailed dsDNA viruses as unrecognized killers of marine bacteria. Nature. 2018; 554:118–122. [DOI] [PubMed] [Google Scholar]
10. Paez-Espino D., Eloe-Fadrosh E.A., Pavlopoulos G.A., Thomas A.D., Huntemann M., Mikhailova N., Rubin E., Ivanova N.N., Kyrpides N.C.. Uncovering Earth's virome. Nature. 2016; 536:425–430. [DOI] [PubMed] [Google Scholar]
11. Roux S., Brum J.R., Dutilh B.E., Sunagawa S., Duhaime M.B., Loy A., Poulos B.T., Solonenko N., Lara E., Poulain J.et al.. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature. 2016; 537:689–693. [DOI] [PubMed] [Google Scholar]
12. Arndt D., Grant J.R., Marcu A., Sajed T., Pon A., Liang Y., Wishart D.S.. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. 2016; 44:W16–W21. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Zhou Y., Liang Y., Lynch K.H., Dennis J.J., Wishart D.S.. PHAST: a fast phage search tool. Nucleic Acids Res. 2011; 39:W347–W352. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Akhter S., Aziz R.K., Edwards R.A.. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012; 40:e126. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Fouts D.E. Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res. 2006; 34:5839–5851. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Lima-Mendez G., Van Helden J., Toussaint A., Leplae R.. Prophinder: a computational tool for prophage prediction in prokaryotic genomes. Bioinformatics. 2008; 24:863–865. [DOI] [PubMed] [Google Scholar]
17. Roux S., Enault F., Hurwitz B.L., Sullivan M.B.. VirSorter: mining viral signal from microbial genomic data. PeerJ. 2015; 3:e985. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Song W., Sun H.X., Zhang C., Cheng L., Peng Y., Deng Z., Wang D., Wang Y., Hu M., Liu W.et al.. Prophage Hunter: an integrative hunting tool for active prophages. Nucleic Acids Res. 2019; 47:W74–W80. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Krupovic M., Dolja V.V., Koonin E.V.. Origin of viruses: primordial replicators recruiting capsids from hosts. Nat. Rev. Microbiol. 2019; 17:449–458. [DOI] [PubMed] [Google Scholar]
20. Abrescia N.G., Bamford D.H., Grimes J.M., Stuart D.I.. Structure unifies the viral universe. Annu. Rev. Biochem. 2012; 81:795–822. [DOI] [PubMed] [Google Scholar]
21. Roux S., Hallam S.J., Woyke T., Sullivan M.B.. Viral dark matter and virus-host interactions resolved from publicly available microbial genomes. Elife. 2015; 4:e08490. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Thompson L.R., Zeng Q., Kelly L., Huang K.H., Singer A.U., Stubbe J., Chisholm S.W.. Phage auxiliary metabolic genes and the redirection of cyanobacterial host carbon metabolism. Proc. Natl. Acad. Sci. U.S.A. 2011; 108:E757–E764. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Hurwitz B.L., Hallam S.J., Sullivan M.B.. Metabolic reprogramming by viruses in the sunlit and dark ocean. Genome Biol. 2013; 14:R123. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Wang X., Kim Y., Ma Q., Hong S.H., Pokusaeva K., Sturino J.M., Wood T.K.. Cryptic prophages help bacteria cope with adverse environments. Nat. Commun. 2010; 1:147. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Zeng Z., Liu X., Yao J., Guo Y., Li B., Li Y., Jiao N., Wang X.. Cold adaptation regulated by cryptic prophage excision in Shewanella oneidensis. ISME J. 2016; 10:2787–2800. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Liu X., Lin S., Liu T., Zhou Y., Wang W., Yao J., Guo Y., Tang K., Chen R., Benedik M.J.et al.. Xenogeneic silencing relies on temperature-dependent phosphorylation of the host H-NS protein in Shewanella. Nucleic Acids Res. 2021; 49:3427–3440. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Alexeeva S., Guerra Martinez J.A., Spus M., Smid E.J.. Spontaneously induced prophages are abundant in a naturally evolved bacterial starter culture and deliver competitive advantage to the host. BMC Microbiol. 2018; 18:120. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Nanda A.M., Thormann K., Frunzke J.. Impact of spontaneous prophage induction on the fitness of bacterial populations and host-microbe interactions. J. Bacteriol. 2015; 197:410–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Liu X., Tang K., Zhang D., Li Y., Liu Z., Yao J., Wood T.K., Wang X.. Symbiosis of a P2-family phage and deep-sea Shewanella putrefaciens. Environ. Microbiol. 2019; 21:4212–4232. [DOI] [PubMed] [Google Scholar]
30. Wang X., Kim Y., Wood T.K.. Control and benefits of CP4-57 prophage excision in Escherichia coli biofilms. ISME J. 2009; 3:1164–1179. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Secor P.R., Sweere J.M., Michaels L.A., Malkovskiy A.V., Lazzareschi D., Katznelson E., Rajadas J., Birnbaum M.E., Arrigoni A., Braun K.R.et al.. Filamentous bacteriophage promote biofilm assembly and function. Cell Host Microbe. 2015; 18:549–559. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Darmon E., Leach D.R.. Bacterial genome instability. Microbiol. Mol. Biol. Rev. 2014; 78:1–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 2013; arXiv doi:26 May 2013, preprint: not peer reviewedhttps://arxiv.org/abs/1303.3997v2.
34. Li H., Durbin R.. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25:1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J.. Basic local alignment search tool. J. Mol. Biol. 1990; 215:403–410. [DOI] [PubMed] [Google Scholar]
36. McElroy K.E., Luciani F., Thomas T.. GemSIM: general, error-model based simulator of next-generation sequencing data. BMC Genomics. 2012; 13:74. [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Tarasov A., Vilella A.J., Cuppen E., Nijman I.J., Prins P.. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015; 31:2032–2034. [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Tatusova T., DiCuccio M., Badretdin A., Chetvernin V., Nawrocki E.P., Zaslavsky L., Lomsadze A., Pruitt K.D., Borodovsky M., Ostell J.. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 2016; 44:6614–6624. [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Gel B., Serra E.. karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics. 2017; 33:3088–3090. [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Bolger A.M., Lohse M., Usadel B.. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014; 30:2114–2120. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Marchler-Bauer A., Bo Y., Han L., He J., Lanczycki C.J., Lu S., Chitsaz F., Derbyshire M.K., Geer R.C., Gonzales N.R.et al.. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017; 45:D200–D203. [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Gurevich A., Saveliev V., Vyahhi N., Tesler G.. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013; 29:1072–1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
43. Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J.. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25:3389–3402. [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Huang Y., Niu B., Gao Y., Fu L., Li W.. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010; 26:680–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Katoh K., Kuma K., Toh H., Miyata T.. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005; 33:511–518. [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Capella-Gutierrez S., Silla-Martinez J.M., Gabaldon T.. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009; 25:1972–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
47. Trifinopoulos J., Nguyen L.T., von Haeseler A., Minh B.Q.. W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis. Nucleic Acids Res. 2016; 44:W232–W235. [DOI] [PMC free article] [PubMed] [Google Scholar]
48. Minh B.Q., Nguyen M.A., von Haeseler A.. Ultrafast approximation for phylogenetic bootstrap. Mol. Biol. Evol. 2013; 30:1188–1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
49. Letunic I., Bork P.. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 2016; 44:W242–W245. [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Layer R.M., Chiang C., Quinlan A.R., Hall I.M.. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 2014; 15:R84. [DOI] [PMC free article] [PubMed] [Google Scholar]
51. Guerillot R., Kostoulias X., Donovan L., Li L., Carter G.P., Hachani A., Vandelannoote K., Giulieri S., Monk I.R., Kunimoto M.et al.. Unstable chromosome rearrangements in Staphylococcus aureus cause phenotype switching associated with persistent infections. Proc. Natl. Acad. Sci. U.S.A. 2019; 116:20135–20140. [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Massonnet M., Morales-Cruz A., Minio A., Figueroa-Balderas R., Lawrence D.P., Travadon R., Rolshausen P.E., Baumgartner K., Cantu D. Whole-genome resequencing and pan-transcriptome reconstruction highlight the impact of genomic structural variation on secondary metabolite gene clusters in the grapevine esca pathogen Phaeoacremonium minimum. Front. Microbiol. 2018; 9:1784. [DOI] [PMC free article] [PubMed] [Google Scholar]
53. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R.Genome Project Data Processing, S. . The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25:2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
54. Li Y., Liu X., Tang K., Wang P., Zeng Z., Guo Y., Wang X.. Excisionase in Pf filamentous prophage controls lysis-lysogeny decision-making in Pseudomonas aeruginosa. Mol. Microbiol. 2019; 111:495–513. [DOI] [PMC free article] [PubMed] [Google Scholar]
55. Luo H., Hansen A.S.L., Yang L., Schneider K., Kristensen M., Christensen U., Christensen H.B., Du B., Özdemir E., Feist A.M.et al.. Coupling S-adenosylmethionine–dependent methylation to growth: Design and uses. PLoS Biol. 2019; 17:e2007050. [DOI] [PMC free article] [PubMed] [Google Scholar]
56. Liu X., Li Y., Guo Y., Zeng Z., Li B., Wood T.K., Cai X., Wang X.. Physiological function of rac prophage during biofilm formation and regulation of rac excision in Escherichia coli K-12. Sci. Rep. 2015; 5:16074. [DOI] [PMC free article] [PubMed] [Google Scholar]
57. Guo Q., Chen B., Tu Y., Du S., Chen X.. Prophage LambdaSo uses replication interference to suppress reproduction of coexisting temperate phage MuSo2 in Shewanella oneidensis MR-1. Environ. Microbiol. 2019; 21:2079–2094. [DOI] [PubMed] [Google Scholar]
58. Trudelle D.M., Bryan D.W., Hudson L.K., Denes T.G.. Cross-resistance to phage infection in Listeria monocytogenes serotype 1/2a mutants. Food Microbiol. 2019; 84:103239. [DOI] [PubMed] [Google Scholar]
59. Tang K.H., Zhan W.N., Zhou Y.Q., Xu T.Q., Chen X.Q., Wang W.Q., Zeng Z.S., Wang Y., Wang X.X.. Antagonism between coral pathogen Vibrio coralliilyticus and other bacteria in the gastric cavity of scleractinian coral Galaxea fascicularis. Sci. China Earth Sci. 2020; 63:157–166. [Google Scholar]
60. Zhou Y., Tang K., Wang P., Wang W., Wang Y., Wang X.. Identification of bacteria-derived urease in the coral gastric cavity. Sci. China Earth Sci. 2020; 63:1553–1563. [Google Scholar]
61. Duarte C.M. Seafaring in the 21st century: the Malaspina 2010 Circumnavigation Expedition. Limnol. Oceanogr. Bull. 2015; 24:11–14. [Google Scholar]
62. Karsenti E., Acinas S.G., Bork P., Bowler C., De Vargas C., Raes J., Sullivan M., Arendt D., Benzoni F., Claverie J.M.et al.. A holistic approach to marine eco-systems biology. PLoS Biol. 2011; 9:e1001177. [DOI] [PMC free article] [PubMed] [Google Scholar]
63. Zhan Y., Chen F.. The smallest ssDNA phage infecting a marine bacterium. Environ. Microbiol. 2019; 21:1916–1928. [DOI] [PubMed] [Google Scholar]
64. Zheng Q., Chen Q., Xu Y., Suttle C.A., Jiao N.. A virus infecting marine photoheterotrophic Alphaproteobacteria (Citromicrobium spp.) defines a new lineage of ssDNA viruses. Front. Microbiol. 2018; 9:1418. [DOI] [PMC free article] [PubMed] [Google Scholar]
65. Fujimoto K., Kimura Y., Shimohigoshi M., Satoh T., Sato S., Tremmel G., Uematsu M., Kawaguchi Y., Usui Y., Nakano Y.et al.. Metagenome data on intestinal phage-bacteria associations aids the development of phage therapy against pathobionts. Cell Host Microbe. 2020; 28:380–389. [DOI] [PubMed] [Google Scholar]
66. Laffy P.W., Wood-Charlson E.M., Turaev D., Jutz S., Pascelli C., Botte E.S., Bell S.C., Peirce T.E., Weynberg K.D., van Oppen M.J.H.et al.. Reef invertebrate viromics: diversity, host specificity and functional capacity. Environ. Microbiol. 2018; 20:2125–2141. [DOI] [PubMed] [Google Scholar]
67. Littman R., Willis B.L., Bourne D.G.. Metagenomic analysis of the coral holobiont during a natural bleaching event on the Great Barrier Reef. Environ. Microbiol. Rep. 2011; 3:651–660. [DOI] [PubMed] [Google Scholar]
68. Soffer N., Brandt M.E., Correa A.M., Smith T.B., Thurber R.V.. Potential role of viruses in white plague coral disease. ISME J. 2014; 8:271–283. [DOI] [PMC free article] [PubMed] [Google Scholar]
69. Cota-Robles E., Espejo R.T., Haywood P.W.. Ultrastructure of bacterial cells infected with bacteriophage PM2, a lipid-containing bacterial virus. J. Virol. 1968; 2:56–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
70. Laanto E., Mantynen S., De Colibus L., Marjakangas J., Gillum A., Stuart D.I., Ravantti J.J., Huiskonen J.T., Sundberg L.R.. Virus found in a boreal lake links ssDNA and dsDNA viruses. Proc. Natl. Acad. Sci. U.S.A. 2017; 114:8378–8383. [DOI] [PMC free article] [PubMed] [Google Scholar]
71. Holmfeldt K., Solonenko N., Shah M., Corrier K., Riemann L., Verberkmoes N.C., Sullivan M.B.. Twelve previously unknown phage genera are ubiquitous in global oceans. Proc. Natl. Acad. Sci. U.S.A. 2013; 110:12798–12803. [DOI] [PMC free article] [PubMed] [Google Scholar]
72. Yutin N., Backstrom D., Ettema T.J.G., Krupovic M., Koonin E.V.. Vast diversity of prokaryotic virus genomes encoding double jelly-roll major capsid proteins uncovered by genomic and metagenomic sequence analysis. Virol. J. 2018; 15:67. [DOI] [PMC free article] [PubMed] [Google Scholar]
73. Tisza M.J., Pastrana D.V., Welch N.L., Stewart B., Peretti A., Starrett G.J., Pang Y.Y.S., Krishnamurthy S.R., Pesavento P.A., McDermott D.H.et al.. Discovery of several thousand highly diverse circular DNA viruses. Elife. 2020; 9:e51971. [DOI] [PMC free article] [PubMed] [Google Scholar]
74. Yu H., Leadbetter J.R.. Bacterial chemolithoautotrophy via manganese oxidation. Nature. 2020; 583:453–458. [DOI] [PMC free article] [PubMed] [Google Scholar]
75. Millman A., Bernheim A., Stokar-Avihail A., Fedorenko T., Voichek M., Leavitt A., Oppenheimer-Shaanan Y., Sorek R.. Bacterial retrons function in anti-phage defense. Cell. 2020; 183:1551–1561. [DOI] [PubMed] [Google Scholar]
76. Goldfarb T., Sberro H., Weinstock E., Cohen O., Doron S., Charpak-Amikam Y., Afik S., Ofir G., Sorek R.. BREX is a novel phage resistance system widespread in microbial genomes. EMBO J. 2015; 34:169–183. [DOI] [PMC free article] [PubMed] [Google Scholar]
77. Hertel R., Rodriguez D.P., Hollensteiner J., Dietrich S., Leimbach A., Hoppert M., Liesegang H., Volland S.. Genome-based identification of active prophage regions by next generation sequencing in Bacillus licheniformis DSM13. PLoS One. 2015; 10:e0120759. [DOI] [PMC free article] [PubMed] [Google Scholar]
78. Obeng N., Pratama A.A., Elsas J.D.V.. The significance of mutualistic phages for bacterial ecology and evolution. Trends Microbiol. 2016; 24:440–449. [DOI] [PubMed] [Google Scholar]
79. Ofir G., Sorek R.. Contemporary phage biology: from classic models to new insights. Cell. 2018; 172:1260–1270. [DOI] [PubMed] [Google Scholar]
80. Morris R.M., Cain K.R., Hvorecny K.L., Kollman J.M.. Lysogenic host-virus interactions in SAR11 marine bacteria. Nat Microbiol. 2020; 5:1011–1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
81. Imachi H., Nobu M.K., Nakahara N., Morono Y., Ogawara M., Takaki Y., Takano Y., Uematsu K., Ikuta T., Ito M.et al.. Isolation of an archaeon at the prokaryote–eukaryote interface. Nature. 2020; 577:519–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
82. Koonin E.V., Makarova K.S., Wolf Y.I., Krupovic M.. Evolutionary entanglement of mobile genetic elements and host defence systems: guns for hire. Nat. Rev. Genet. 2020; 21:119–131. [DOI] [PubMed] [Google Scholar]
83. Auslander N., Gussow A.B., Benler S., Wolf Y.I., Koonin E.V.. Seeker: alignment-free identification of bacteriophage genomes by deep learning. Nucleic Acids Res. 2020; 48:e121. [DOI] [PMC free article] [PubMed] [Google Scholar]
84. Ren J., Song K., Deng C., Ahlgren N.A., Fuhrman J.A., Li Y., Xie X., Poplin R., Sun F.. Identifying viruses from metagenomic data using deep learning. Quant. Biol. 2020; 8:64–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
85. Ren J., Ahlgren N.A., Lu Y.Y., Fuhrman J.A., Sun F.. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome. 2017; 5:69. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkab824_Supplemental_Files

Click here for additional data file.^{(1.3MB, zip)}

Data Availability Statement

[B1] 1. Silveira C.B., Rohwer F.L.. Piggyback-the-Winner in host-associated microbial communities. NPJ Biofilms Microbiomes. 2016; 2:16010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2. Knowles B., Silveira C.B., Bailey B.A., Barott K., Cantu V.A., Cobian-Guemes A.G., Coutinho F.H., Dinsdale E.A., Felts B., Furby K.A.et al.. Lytic to temperate switching of viral communities. Nature. 2016; 531:466–470. [DOI] [PubMed] [Google Scholar]

[B3] 3. Minot S., Bryson A., Chehoud C., Wu G.D., Lewis J.D., Bushman F.D.. Rapid evolution of the human gut virome. Proc. Natl. Acad. Sci. U.S.A. 2013; 110:12450–12455. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4. Kim M.S., Bae J.W.. Lysogeny is prevalent and widely distributed in the murine gut microbiota. ISME J. 2018; 12:1127–1141. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5. Bouvy M., Combe M., Bettarel Y., Dupuy C., Rochelle-Newall E., Charpy L.. Uncoupled viral and bacterial distributions in coral reef waters of Tuamotu Archipelago (French Polynesia). Mar. Pollut. Bull. 2012; 65:506–515. [DOI] [PubMed] [Google Scholar]

[B6] 6. Casjens S. Prophages and bacterial genomics: what have we learned so far. Mol. Microbiol. 2003; 49:277–300. [DOI] [PubMed] [Google Scholar]

[B7] 7. Feiner R., Argov T., Rabinovich L., Sigal N., Borovok I., Herskovits A.A.. A new perspective on lysogeny: prophages as active regulatory switches of bacteria. Nat. Rev. Micro. 2015; 13:641–650. [DOI] [PubMed] [Google Scholar]

[B8] 8. Rabinovich L., Sigal N., Borovok I., Nir-Paz R., Herskovits A.A.. Prophage excision activates Listeria competence genes that promote phagosomal escape and virulence. Cell. 2012; 150:792–802. [DOI] [PubMed] [Google Scholar]

[B9] 9. Kauffman K.M., Hussain F.A., Yang J., Arevalo P., Brown J.M., Chang W.K., VanInsberghe D., Elsherbini J., Sharma R.S., Cutler M.B.et al.. A major lineage of non-tailed dsDNA viruses as unrecognized killers of marine bacteria. Nature. 2018; 554:118–122. [DOI] [PubMed] [Google Scholar]

[B10] 10. Paez-Espino D., Eloe-Fadrosh E.A., Pavlopoulos G.A., Thomas A.D., Huntemann M., Mikhailova N., Rubin E., Ivanova N.N., Kyrpides N.C.. Uncovering Earth's virome. Nature. 2016; 536:425–430. [DOI] [PubMed] [Google Scholar]

[B11] 11. Roux S., Brum J.R., Dutilh B.E., Sunagawa S., Duhaime M.B., Loy A., Poulos B.T., Solonenko N., Lara E., Poulain J.et al.. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature. 2016; 537:689–693. [DOI] [PubMed] [Google Scholar]

[B12] 12. Arndt D., Grant J.R., Marcu A., Sajed T., Pon A., Liang Y., Wishart D.S.. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. 2016; 44:W16–W21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13. Zhou Y., Liang Y., Lynch K.H., Dennis J.J., Wishart D.S.. PHAST: a fast phage search tool. Nucleic Acids Res. 2011; 39:W347–W352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14. Akhter S., Aziz R.K., Edwards R.A.. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012; 40:e126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15. Fouts D.E. Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res. 2006; 34:5839–5851. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16. Lima-Mendez G., Van Helden J., Toussaint A., Leplae R.. Prophinder: a computational tool for prophage prediction in prokaryotic genomes. Bioinformatics. 2008; 24:863–865. [DOI] [PubMed] [Google Scholar]

[B17] 17. Roux S., Enault F., Hurwitz B.L., Sullivan M.B.. VirSorter: mining viral signal from microbial genomic data. PeerJ. 2015; 3:e985. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18. Song W., Sun H.X., Zhang C., Cheng L., Peng Y., Deng Z., Wang D., Wang Y., Hu M., Liu W.et al.. Prophage Hunter: an integrative hunting tool for active prophages. Nucleic Acids Res. 2019; 47:W74–W80. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19. Krupovic M., Dolja V.V., Koonin E.V.. Origin of viruses: primordial replicators recruiting capsids from hosts. Nat. Rev. Microbiol. 2019; 17:449–458. [DOI] [PubMed] [Google Scholar]

[B20] 20. Abrescia N.G., Bamford D.H., Grimes J.M., Stuart D.I.. Structure unifies the viral universe. Annu. Rev. Biochem. 2012; 81:795–822. [DOI] [PubMed] [Google Scholar]

[B21] 21. Roux S., Hallam S.J., Woyke T., Sullivan M.B.. Viral dark matter and virus-host interactions resolved from publicly available microbial genomes. Elife. 2015; 4:e08490. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22. Thompson L.R., Zeng Q., Kelly L., Huang K.H., Singer A.U., Stubbe J., Chisholm S.W.. Phage auxiliary metabolic genes and the redirection of cyanobacterial host carbon metabolism. Proc. Natl. Acad. Sci. U.S.A. 2011; 108:E757–E764. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23. Hurwitz B.L., Hallam S.J., Sullivan M.B.. Metabolic reprogramming by viruses in the sunlit and dark ocean. Genome Biol. 2013; 14:R123. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24. Wang X., Kim Y., Ma Q., Hong S.H., Pokusaeva K., Sturino J.M., Wood T.K.. Cryptic prophages help bacteria cope with adverse environments. Nat. Commun. 2010; 1:147. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25. Zeng Z., Liu X., Yao J., Guo Y., Li B., Li Y., Jiao N., Wang X.. Cold adaptation regulated by cryptic prophage excision in Shewanella oneidensis. ISME J. 2016; 10:2787–2800. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26. Liu X., Lin S., Liu T., Zhou Y., Wang W., Yao J., Guo Y., Tang K., Chen R., Benedik M.J.et al.. Xenogeneic silencing relies on temperature-dependent phosphorylation of the host H-NS protein in Shewanella. Nucleic Acids Res. 2021; 49:3427–3440. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 27. Alexeeva S., Guerra Martinez J.A., Spus M., Smid E.J.. Spontaneously induced prophages are abundant in a naturally evolved bacterial starter culture and deliver competitive advantage to the host. BMC Microbiol. 2018; 18:120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] 28. Nanda A.M., Thormann K., Frunzke J.. Impact of spontaneous prophage induction on the fitness of bacterial populations and host-microbe interactions. J. Bacteriol. 2015; 197:410–419. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] 29. Liu X., Tang K., Zhang D., Li Y., Liu Z., Yao J., Wood T.K., Wang X.. Symbiosis of a P2-family phage and deep-sea Shewanella putrefaciens. Environ. Microbiol. 2019; 21:4212–4232. [DOI] [PubMed] [Google Scholar]

[B30] 30. Wang X., Kim Y., Wood T.K.. Control and benefits of CP4-57 prophage excision in Escherichia coli biofilms. ISME J. 2009; 3:1164–1179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31] 31. Secor P.R., Sweere J.M., Michaels L.A., Malkovskiy A.V., Lazzareschi D., Katznelson E., Rajadas J., Birnbaum M.E., Arrigoni A., Braun K.R.et al.. Filamentous bacteriophage promote biofilm assembly and function. Cell Host Microbe. 2015; 18:549–559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 32. Darmon E., Leach D.R.. Bacterial genome instability. Microbiol. Mol. Biol. Rev. 2014; 78:1–39. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] 33. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 2013; arXiv doi:26 May 2013, preprint: not peer reviewedhttps://arxiv.org/abs/1303.3997v2.

[B34] 34. Li H., Durbin R.. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25:1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B35] 35. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J.. Basic local alignment search tool. J. Mol. Biol. 1990; 215:403–410. [DOI] [PubMed] [Google Scholar]

[B36] 36. McElroy K.E., Luciani F., Thomas T.. GemSIM: general, error-model based simulator of next-generation sequencing data. BMC Genomics. 2012; 13:74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B37] 37. Tarasov A., Vilella A.J., Cuppen E., Nijman I.J., Prins P.. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015; 31:2032–2034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B38] 38. Tatusova T., DiCuccio M., Badretdin A., Chetvernin V., Nawrocki E.P., Zaslavsky L., Lomsadze A., Pruitt K.D., Borodovsky M., Ostell J.. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 2016; 44:6614–6624. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B39] 39. Gel B., Serra E.. karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics. 2017; 33:3088–3090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] 40. Bolger A.M., Lohse M., Usadel B.. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014; 30:2114–2120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B41] 41. Marchler-Bauer A., Bo Y., Han L., He J., Lanczycki C.J., Lu S., Chitsaz F., Derbyshire M.K., Geer R.C., Gonzales N.R.et al.. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017; 45:D200–D203. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B42] 42. Gurevich A., Saveliev V., Vyahhi N., Tesler G.. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013; 29:1072–1075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B43] 43. Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J.. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25:3389–3402. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B44] 44. Huang Y., Niu B., Gao Y., Fu L., Li W.. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010; 26:680–682. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B45] 45. Katoh K., Kuma K., Toh H., Miyata T.. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005; 33:511–518. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B46] 46. Capella-Gutierrez S., Silla-Martinez J.M., Gabaldon T.. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009; 25:1972–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B47] 47. Trifinopoulos J., Nguyen L.T., von Haeseler A., Minh B.Q.. W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis. Nucleic Acids Res. 2016; 44:W232–W235. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B48] 48. Minh B.Q., Nguyen M.A., von Haeseler A.. Ultrafast approximation for phylogenetic bootstrap. Mol. Biol. Evol. 2013; 30:1188–1195. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B49] 49. Letunic I., Bork P.. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 2016; 44:W242–W245. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B50] 50. Layer R.M., Chiang C., Quinlan A.R., Hall I.M.. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 2014; 15:R84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B51] 51. Guerillot R., Kostoulias X., Donovan L., Li L., Carter G.P., Hachani A., Vandelannoote K., Giulieri S., Monk I.R., Kunimoto M.et al.. Unstable chromosome rearrangements in Staphylococcus aureus cause phenotype switching associated with persistent infections. Proc. Natl. Acad. Sci. U.S.A. 2019; 116:20135–20140. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B52] 52. Massonnet M., Morales-Cruz A., Minio A., Figueroa-Balderas R., Lawrence D.P., Travadon R., Rolshausen P.E., Baumgartner K., Cantu D. Whole-genome resequencing and pan-transcriptome reconstruction highlight the impact of genomic structural variation on secondary metabolite gene clusters in the grapevine esca pathogen Phaeoacremonium minimum. Front. Microbiol. 2018; 9:1784. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B53] 53. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R.Genome Project Data Processing, S. . The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25:2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B54] 54. Li Y., Liu X., Tang K., Wang P., Zeng Z., Guo Y., Wang X.. Excisionase in Pf filamentous prophage controls lysis-lysogeny decision-making in Pseudomonas aeruginosa. Mol. Microbiol. 2019; 111:495–513. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B55] 55. Luo H., Hansen A.S.L., Yang L., Schneider K., Kristensen M., Christensen U., Christensen H.B., Du B., Özdemir E., Feist A.M.et al.. Coupling S-adenosylmethionine–dependent methylation to growth: Design and uses. PLoS Biol. 2019; 17:e2007050. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B56] 56. Liu X., Li Y., Guo Y., Zeng Z., Li B., Wood T.K., Cai X., Wang X.. Physiological function of rac prophage during biofilm formation and regulation of rac excision in Escherichia coli K-12. Sci. Rep. 2015; 5:16074. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B57] 57. Guo Q., Chen B., Tu Y., Du S., Chen X.. Prophage LambdaSo uses replication interference to suppress reproduction of coexisting temperate phage MuSo2 in Shewanella oneidensis MR-1. Environ. Microbiol. 2019; 21:2079–2094. [DOI] [PubMed] [Google Scholar]

[B58] 58. Trudelle D.M., Bryan D.W., Hudson L.K., Denes T.G.. Cross-resistance to phage infection in Listeria monocytogenes serotype 1/2a mutants. Food Microbiol. 2019; 84:103239. [DOI] [PubMed] [Google Scholar]

[B59] 59. Tang K.H., Zhan W.N., Zhou Y.Q., Xu T.Q., Chen X.Q., Wang W.Q., Zeng Z.S., Wang Y., Wang X.X.. Antagonism between coral pathogen Vibrio coralliilyticus and other bacteria in the gastric cavity of scleractinian coral Galaxea fascicularis. Sci. China Earth Sci. 2020; 63:157–166. [Google Scholar]

[B60] 60. Zhou Y., Tang K., Wang P., Wang W., Wang Y., Wang X.. Identification of bacteria-derived urease in the coral gastric cavity. Sci. China Earth Sci. 2020; 63:1553–1563. [Google Scholar]

[B61] 61. Duarte C.M. Seafaring in the 21st century: the Malaspina 2010 Circumnavigation Expedition. Limnol. Oceanogr. Bull. 2015; 24:11–14. [Google Scholar]

[B62] 62. Karsenti E., Acinas S.G., Bork P., Bowler C., De Vargas C., Raes J., Sullivan M., Arendt D., Benzoni F., Claverie J.M.et al.. A holistic approach to marine eco-systems biology. PLoS Biol. 2011; 9:e1001177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B63] 63. Zhan Y., Chen F.. The smallest ssDNA phage infecting a marine bacterium. Environ. Microbiol. 2019; 21:1916–1928. [DOI] [PubMed] [Google Scholar]

[B64] 64. Zheng Q., Chen Q., Xu Y., Suttle C.A., Jiao N.. A virus infecting marine photoheterotrophic Alphaproteobacteria (Citromicrobium spp.) defines a new lineage of ssDNA viruses. Front. Microbiol. 2018; 9:1418. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B65] 65. Fujimoto K., Kimura Y., Shimohigoshi M., Satoh T., Sato S., Tremmel G., Uematsu M., Kawaguchi Y., Usui Y., Nakano Y.et al.. Metagenome data on intestinal phage-bacteria associations aids the development of phage therapy against pathobionts. Cell Host Microbe. 2020; 28:380–389. [DOI] [PubMed] [Google Scholar]

[B66] 66. Laffy P.W., Wood-Charlson E.M., Turaev D., Jutz S., Pascelli C., Botte E.S., Bell S.C., Peirce T.E., Weynberg K.D., van Oppen M.J.H.et al.. Reef invertebrate viromics: diversity, host specificity and functional capacity. Environ. Microbiol. 2018; 20:2125–2141. [DOI] [PubMed] [Google Scholar]

[B67] 67. Littman R., Willis B.L., Bourne D.G.. Metagenomic analysis of the coral holobiont during a natural bleaching event on the Great Barrier Reef. Environ. Microbiol. Rep. 2011; 3:651–660. [DOI] [PubMed] [Google Scholar]

[B68] 68. Soffer N., Brandt M.E., Correa A.M., Smith T.B., Thurber R.V.. Potential role of viruses in white plague coral disease. ISME J. 2014; 8:271–283. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B69] 69. Cota-Robles E., Espejo R.T., Haywood P.W.. Ultrastructure of bacterial cells infected with bacteriophage PM2, a lipid-containing bacterial virus. J. Virol. 1968; 2:56–68. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B70] 70. Laanto E., Mantynen S., De Colibus L., Marjakangas J., Gillum A., Stuart D.I., Ravantti J.J., Huiskonen J.T., Sundberg L.R.. Virus found in a boreal lake links ssDNA and dsDNA viruses. Proc. Natl. Acad. Sci. U.S.A. 2017; 114:8378–8383. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B71] 71. Holmfeldt K., Solonenko N., Shah M., Corrier K., Riemann L., Verberkmoes N.C., Sullivan M.B.. Twelve previously unknown phage genera are ubiquitous in global oceans. Proc. Natl. Acad. Sci. U.S.A. 2013; 110:12798–12803. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B72] 72. Yutin N., Backstrom D., Ettema T.J.G., Krupovic M., Koonin E.V.. Vast diversity of prokaryotic virus genomes encoding double jelly-roll major capsid proteins uncovered by genomic and metagenomic sequence analysis. Virol. J. 2018; 15:67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B73] 73. Tisza M.J., Pastrana D.V., Welch N.L., Stewart B., Peretti A., Starrett G.J., Pang Y.Y.S., Krishnamurthy S.R., Pesavento P.A., McDermott D.H.et al.. Discovery of several thousand highly diverse circular DNA viruses. Elife. 2020; 9:e51971. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B74] 74. Yu H., Leadbetter J.R.. Bacterial chemolithoautotrophy via manganese oxidation. Nature. 2020; 583:453–458. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B75] 75. Millman A., Bernheim A., Stokar-Avihail A., Fedorenko T., Voichek M., Leavitt A., Oppenheimer-Shaanan Y., Sorek R.. Bacterial retrons function in anti-phage defense. Cell. 2020; 183:1551–1561. [DOI] [PubMed] [Google Scholar]

[B76] 76. Goldfarb T., Sberro H., Weinstock E., Cohen O., Doron S., Charpak-Amikam Y., Afik S., Ofir G., Sorek R.. BREX is a novel phage resistance system widespread in microbial genomes. EMBO J. 2015; 34:169–183. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B77] 77. Hertel R., Rodriguez D.P., Hollensteiner J., Dietrich S., Leimbach A., Hoppert M., Liesegang H., Volland S.. Genome-based identification of active prophage regions by next generation sequencing in Bacillus licheniformis DSM13. PLoS One. 2015; 10:e0120759. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B78] 78. Obeng N., Pratama A.A., Elsas J.D.V.. The significance of mutualistic phages for bacterial ecology and evolution. Trends Microbiol. 2016; 24:440–449. [DOI] [PubMed] [Google Scholar]

[B79] 79. Ofir G., Sorek R.. Contemporary phage biology: from classic models to new insights. Cell. 2018; 172:1260–1270. [DOI] [PubMed] [Google Scholar]

[B80] 80. Morris R.M., Cain K.R., Hvorecny K.L., Kollman J.M.. Lysogenic host-virus interactions in SAR11 marine bacteria. Nat Microbiol. 2020; 5:1011–1015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B81] 81. Imachi H., Nobu M.K., Nakahara N., Morono Y., Ogawara M., Takaki Y., Takano Y., Uematsu K., Ikuta T., Ito M.et al.. Isolation of an archaeon at the prokaryote–eukaryote interface. Nature. 2020; 577:519–525. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B82] 82. Koonin E.V., Makarova K.S., Wolf Y.I., Krupovic M.. Evolutionary entanglement of mobile genetic elements and host defence systems: guns for hire. Nat. Rev. Genet. 2020; 21:119–131. [DOI] [PubMed] [Google Scholar]

[B83] 83. Auslander N., Gussow A.B., Benler S., Wolf Y.I., Koonin E.V.. Seeker: alignment-free identification of bacteriophage genomes by deep learning. Nucleic Acids Res. 2020; 48:e121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B84] 84. Ren J., Song K., Deng C., Ahlgren N.A., Fuhrman J.A., Li Y., Xie X., Poplin R., Sun F.. Identifying viruses from metagenomic data using deep learning. Quant. Biol. 2020; 8:64–77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B85] 85. Ren J., Ahlgren N.A., Lu Y.Y., Fuhrman J.A., Sun F.. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome. 2017; 5:69. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Prophage Tracer: precisely tracing prophages in prokaryotic genomes using overlapping split-read alignment

Kaihao Tang

Weiquan Wang

Yamin Sun

Yiqing Zhou

Pengxia Wang

Yunxue Guo

Xiaoxue Wang

Abstract

INTRODUCTION

MATERIALS AND METHODS

Prophage workflow

Comparison with LUMPY using simulated data

Identification and characterization of prophages in coral-associated bacteria

Prediction of prophages in publicly available genomes

Phylogenetic analysis

RESULTS

Overview of Prophage Tracer

Figure 1.

Comparison with LUMPY using simulated data

Figure 2.

Validation of the Prophage Tracer workflow

Table 1.

Comparison with PHASTER/Prophage Hunter/LUMPY to predict prophages

Table 2.

Capability to predict novel prophages

Figure 3.

Figure 4.

Application and limits of Prophage Tracer to detect prophages

DISCUSSION

DATA AVAILABILITY

Supplementary Material

Contributor Information

SUPPLEMENTARY DATA

FUNDING

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases